Why am i getting different performance speeds for the "same" decorator?

fionnualasolomon · March 2, 2021, 3:14pm

Hi,

I am using just the @jit decorator and getting the warning “Compilation is falling back to object mode WITH looplifting enabled etc”

I run again and get a good run time 0.037s.

I like the object mode with loop lifting enabled because my function has some python types that I need but also a loop that can be optimized.

So to get rid of the warning I decorated my function with @jit(forceobj=True, looplift=True), I was expecting the same time but I get 0.167s. Why is the speed reduced is Numba not doing the same thing?

Any help would be greatly appreciated.

Thanks,

Fionnuala

esc · March 2, 2021, 5:27pm

When profiling jit decorated functions the first run is always significantly slower than subsequent runs. This is because Numba will compile the function on the first run and this can take some time. Only on the second call, does the compiled function execute:

In [1]: import numba as nb

In [2]: @nb.njit
   ...: def foo():
   ...:     acc = 0.0
   ...:     for i in range(1000000):
   ...:         acc += i
   ...:     return i
   ...:

In [3]: %time foo()
CPU times: user 135 ms, sys: 32.5 ms, total: 168 ms
Wall time: 211 ms
Out[3]: 999999

In [4]: %time foo()
CPU times: user 91 µs, sys: 1 µs, total: 92 µs
Wall time: 92.3 µs
Out[4]: 999999

fionnualasolomon · March 3, 2021, 9:17am

Hi,

Thank you for replying to my query.
Yes I was aware of the compilation time so I know it is not that.

def kernel(zr, zi, cr, ci, lim, cutoff):
    count = 0
    while ((zr*zr + zi*zi) < (lim*lim)) and count < cutoff:
        zr, zi = zr * zr - zi * zi + cr, 2 * zr * zi + ci
        count += 1
    return count

kernel_njit = njit()(kernel)

def plot_mandel(mandel):
    plt.imshow(mandel)
    plt.axis('off')
    plt.show()
    
def compute_mandel_py(cr, ci, N, bound=1.0, lim=1000.0, cutoff=1e6):
    mandel = np.empty((N, N), dtype=int)
    grid_x = np.linspace(-bound, bound, N)
    t0 = time.time()
    for i, x in enumerate(grid_x):
        for j, y in enumerate(grid_x):
            mandel[i,j] = kernel(x, y, cr, ci, lim, cutoff)
    return mandel, time.time() - t0


def compute_mandel_njit(cr, ci, N, bound=1.0, lim=1000.0, cutoff=1e6):
    mandel = np.empty((N, N))
    grid_x = np.linspace(-bound, bound, N)
    t0 = time.time()
    for i, x in enumerate(grid_x):
        for j, y in enumerate(grid_x):
            mandel[i,j] = kernel_njit(x, y, cr, ci, lim, cutoff)
    return mandel, time.time() - t0

compute_mandel_njit_jit1 = jit()(compute_mandel_njit)
compute_mandel_njit_jit2 = jit(forceobj=True, looplift=True)(compute_mandel_njit)

def python_run():
    kwargs = dict(cr=0.285, ci=0.01,
              N=500,
              bound=1.0)
    print("Using pure Python")
    mandel_func = compute_mandel_py       
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated in {} seconds".format(runtime))
    #plot_mandel(mandel_set)
    
    
def njit_run():
    kwargs = dict(cr=0.285, ci=0.01,
              N=500,
              bound=1.0)
    print("Using njitted kernel")
    mandel_func = compute_mandel_njit       
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated in {} seconds".format(runtime))
    #plot_mandel(mandel_set)
    
def njit_jit_run1():
    kwargs = dict(cr=0.285, ci=0.01,
              N=500,
              bound=1.0)
    print("Using njitted kernel and jitted compute function")
    mandel_func = compute_mandel_njit_jit1       
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated in {} seconds".format(runtime))
    #plot_mandel(mandel_set)
    
def njit_jit_run2():
    kwargs = dict(cr=0.285, ci=0.01,
              N=500,
              bound=1.0)
    print("Using njitted kernel and jitted compute function in object mode & looplift")
    mandel_func = compute_mandel_njit_jit2       
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated in {} seconds".format(runtime))
    #plot_mandel(mandel_set)

And then running

njit_run()
njit_jit_run1()
njit_jit_run2()

At least twice (accounting for compilation) I get these times;

Using njitted kernel
Mandelbrot set generated in 0.15392279624938965 seconds
Using njitted kernel and jitted compute function
Mandelbrot set generated in 0.028262853622436523 seconds
Using njitted kernel and jitted compute function in object mode & looplift
Mandelbrot set generated in 0.15626192092895508 seconds

What I don’t understand is why
compute_mandel_njit_jit1 = jit()(compute_mandel_njit)
compute_mandel_njit_jit2 = jit(forceobj=True, looplift=True)(compute_mandel_njit)

These is why the first (only jit no options set) is much faster when in the warning it says it is using object mode with loop lifting enabled. If this was true both functions should give similar performance.

Is my question more clear now?

Thanks again for replying.

Fionnuala

Hannes · March 3, 2021, 11:36am

Hi @fionnualasolomon

I agree with your point that those timing differences seem a bit odd if one expects the functions to work the same - and interestingly enough I cannot reproduce this behaviour on my own system. For me all implementations run in about 150 ms.

I wonder if this has something to do with certain library versions or hardware - would you mind sharing your output of numba -s?

Here is mine for comparison

numba -s ouput

System info:

Time Stamp
Report started (local time) : 2021-03-03 12:29:11.250888
UTC start time : 2021-03-03 11:29:11.250892
Running time (s) : 2.829611

Hardware Information
Machine : x86_64
CPU Name : skylake
CPU Count : 8
Number of accessible CPUs : 8
List of accessible CPUs cores : 0 1 2 3 4 5 6 7
CFS Restrictions (CPUs worth of runtime) : None

CPU Features : 64bit adx aes avx avx2 bmi bmi2
clflushopt cmov cx16 cx8 f16c fma
fsgsbase fxsr invpcid lzcnt mmx
movbe pclmul popcnt prfchw rdrnd
rdseed rtm sahf sgx sse sse2 sse3
sse4.1 sse4.2 ssse3 xsave xsavec
xsaveopt xsaves

Memory Total (MB) : 15893
Memory Available (MB) : 9993

OS Information
Platform Name : Linux-5.4.100-1-MANJARO-x86_64-with-glibc2.10
Platform Release : 5.4.100-1-MANJARO
OS Name : Linux
OS Version : #1 SMP PREEMPT Tue Feb 23 15:31:04 UTC 2021
OS Specific Version : ?
Libc Version : glibc 2.33

Python Information
Python Compiler : GCC 7.3.0
Python Implementation : CPython
Python Version : 3.8.5
Python Locale : en_GB.UTF-8

LLVM Information
LLVM Version : 10.0.1

CUDA Information
CUDA Device Initialized : False
CUDA Driver Version : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None

ROC information
ROC Available : False
ROC Toolchains : None
HSA Agents Count : 0
HSA Agents:
None
HSA Discrete GPUs Count : 0
HSA Discrete GPUs : None

SVML Information
SVML State, config.USING_SVML : True
SVML Library Loaded : True
llvmlite Using SVML Patched LLVM : True
SVML Operational : True

Threading Layer Information
TBB Threading Layer Available : True
±->TBB imported successfully.
OpenMP Threading Layer Available : True
±->Vendor: GNU
Workqueue Threading Layer Available : True
±->Workqueue imported successfully.

Numba Environment Variable Information
None found.

Conda Information
Conda Build : 3.17.1
Conda Env : 4.9.2
Conda Platform : linux-64
Conda Python Version : 3.7.4.final.0
Conda Root Writable : True

Installed Packages
_anaconda_depends 2020.07 py38_0
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
alabaster 0.7.12 py_0
anaconda custom py38_1
anaconda-client 1.7.2 py38_0
anaconda-project 0.8.4 py_0
appdirs 1.4.4 pyh9f0ad1d_0 conda-forge
argh 0.26.2 py38_0
argon2-cffi 20.1.0 py38h7b6447c_1
asn1crypto 1.4.0 py_0
astroid 2.4.2 py38_0
astropy 4.0.2 py38h7b6447c_0
async_generator 1.10 py_0
atomicwrites 1.4.0 py_0
attrs 20.3.0 pyhd3eb1b0_0
autopep8 1.5.4 py_0
babel 2.8.1 pyhd3eb1b0_0
backcall 0.2.0 py_0
backports 1.0 py_2
backports.shutil_get_terminal_size 1.0.0 py38_2
beautifulsoup4 4.9.3 pyhb0f4dca_0
bitarray 1.6.1 py38h27cfd23_0
bkcharts 0.2 py38_0
blas 1.0 mkl
bleach 3.2.1 py_0
blosc 1.20.1 hd408876_0
bokeh 2.2.3 py38_0
boto 2.49.0 py38_0
bottleneck 1.3.2 py38heb32a55_1
brotli-python 1.0.9 py38heb0550a_2
brotlipy 0.7.0 py38h7b6447c_1000
bzip2 1.0.8 h7b6447c_0
ca-certificates 2021.1.19 h06a4308_0
cairo 1.14.12 h8948797_3
certifi 2020.12.5 py38h06a4308_0
cffi 1.14.3 py38he30daa8_0
cfgv 3.2.0 py_0 conda-forge
chardet 3.0.4 py38_1003
click 7.1.2 py_0
cloudpickle 1.6.0 py_0
clyent 1.2.2 py38_1
colorama 0.4.4 py_0
contextlib2 0.6.0.post1 py_0
cryptography 3.1.1 py38h1ba5d50_0
curl 7.71.1 hbc83047_1
cycler 0.10.0 py38_0
cython 0.29.21 py38he6710b0_0
cytoolz 0.11.0 py38h7b6447c_0
dash 1.19.0 pyhd3eb1b0_0
dash-core-components 1.3.1 py_0
dash-html-components 1.0.1 py_0
dash-renderer 1.1.2 py_0
dash-table 4.4.1 py_0
dask 2.30.0 py_0
dask-core 2.30.0 py_0
dbus 1.13.18 hb2f20db_0
decorator 4.4.2 py_0
defusedxml 0.6.0 py_0
diff-match-patch 20200713 py_0
distlib 0.3.1 pyh9f0ad1d_0 conda-forge
distributed 2.30.1 py38h06a4308_0
docutils 0.16 py38_1
ebisim 0.1.0 dev_0
editdistance 0.5.3 py38h950e882_2 conda-forge
entrypoints 0.3 py38_0
et_xmlfile 1.0.1 py_1001
expat 2.2.10 he6710b0_2
fastcache 1.1.0 py38h7b6447c_0
filelock 3.0.12 py_0
flake8 3.8.4 py_0
flask 1.1.2 py_0
flask-compress 1.8.0 pyhd3eb1b0_0
fontconfig 2.13.0 h9420a91_0
freetype 2.10.4 h5ab3b9f_0
fribidi 1.0.10 h7b6447c_0
fsspec 0.8.3 py_0
future 0.18.2 py38_1
get_terminal_size 1.0.0 haa9412d_0
gevent 20.9.0 py38h7b6447c_0
glib 2.66.1 h92f7085_0
glob2 0.7 py_0
gmp 6.1.2 h6c8ec71_1
gmpy2 2.0.8 py38hd5f6e3b_3
graphite2 1.3.14 h23475e2_0
greenlet 0.4.17 py38h7b6447c_0
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb31296c_0
h5py 2.10.0 py38h7918eee_0
harfbuzz 2.4.0 hca77d97_1
hdf5 1.10.4 hb1b8bf9_0
heapdict 1.0.1 py_0
html5lib 1.1 py_0
icc_rt 2020.2 intel_254 numba
icu 58.2 he6710b0_3
identify 1.5.13 pyh44b312d_0 conda-forge
idna 2.10 py_0
imageio 2.9.0 py_0
imagesize 1.2.0 py_0
importlib-metadata 2.0.0 py_1
importlib_metadata 2.0.0 1
iniconfig 1.1.1 py_0
intel-openmp 2020.2 254
intervaltree 3.1.0 py_0
ipykernel 5.3.4 py38h5ca1d4c_0
ipython 7.19.0 py38hb070fc8_0
ipython_genutils 0.2.0 py38_0
ipywidgets 7.5.1 py_1
isort 5.6.4 py_0
itsdangerous 1.1.0 py_0
jbig 2.1 hdba287a_0
jdcal 1.4.1 py_0
jedi 0.17.1 py38_0
jeepney 0.5.0 pyhd3eb1b0_0
jinja2 2.11.2 py_0
joblib 1.0.0 pyhd3eb1b0_0
jpeg 9b h024ee3a_2
json5 0.9.5 py_0
jsonschema 3.2.0 py_2
jupyter 1.0.0 py38_7
jupyter_client 6.1.7 py_0
jupyter_console 6.2.0 py_0
jupyter_core 4.6.3 py38_0
jupyterlab 2.2.6 py_0
jupyterlab_pygments 0.1.2 py_0
jupyterlab_server 1.2.0 py_0
keyring 21.4.0 py38_1
kiwisolver 1.3.0 py38h2531618_0
krb5 1.18.2 h173b8e3_0
lazy-object-proxy 1.4.3 py38h7b6447c_0
lcms2 2.11 h396b838_0
ld_impl_linux-64 2.33.1 h53a641e_7
libarchive 3.4.2 h62408e4_0
libcurl 7.71.1 h20c2e04_1
libedit 3.1.20191231 h14c3975_1
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h9cbead1_17
libgfortran-ng 7.3.0 hdf63c60_0
libgomp 9.3.0 h9cbead1_17
liblief 0.10.1 he6710b0_0
libllvm10 10.0.1 hbcb73fb_5
libllvm9 9.0.1 h4a3c616_1
libpng 1.6.37 hbc83047_0
libsodium 1.0.18 h7b6447c_0
libspatialindex 1.9.3 he6710b0_0
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_1
libtool 2.4.6 h7b6447c_1005
libuuid 1.0.3 h1bed415_2
libxcb 1.14 h7b6447c_0
libxml2 2.9.10 hb55368b_3
libxslt 1.1.34 hc22bd24_0
llvmlite 0.36.0rc1 py38hf484d3e_0 numba
locket 0.2.0 py38_1
lxml 4.6.1 py38hefd8a0e_0
lz4-c 1.9.2 heb0550a_3
lzo 2.10 h7b6447c_2
markupsafe 1.1.1 py38h7b6447c_0
matplotlib 3.3.2 0
matplotlib-base 3.3.2 py38h817c723_0
mccabe 0.6.1 py38_1
mistune 0.8.4 py38h7b6447c_1000
mkl 2020.2 256
mkl-service 2.3.0 py38he904b0f_0
mkl_fft 1.2.0 py38h23d657b_0
mkl_random 1.1.1 py38h0573a6f_0
mock 4.0.2 py_0
more-itertools 8.6.0 pyhd3eb1b0_0
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpmath 1.1.0 py38_0
msgpack-python 1.0.0 py38hfd86e86_1
multipledispatch 0.6.0 py38_0
mypy 0.800 pyhd3eb1b0_0
mypy_extensions 0.4.3 py38_0
nbclient 0.5.1 py_0
nbconvert 6.0.7 py38_0
nbformat 5.0.8 py_0
ncurses 6.2 he6710b0_1
nest-asyncio 1.4.2 pyhd3eb1b0_0
networkx 2.5 py_0
nltk 3.5 py_0
nodeenv 1.5.0 pyh9f0ad1d_0 conda-forge
nose 1.3.7 py38_2
notebook 6.1.4 py38_0
numba 0.53.0rc1 np1.11py3.8h04863e7_ga3d29a7f4_0 numba
numexpr 2.7.1 py38h423224d_0
numpy 1.19.2 py38h54aff64_0
numpy-base 1.19.2 py38hfa32c7d_0
numpydoc 1.1.0 pyhd3eb1b0_1
olefile 0.46 py_0
openpyxl 3.0.5 py_0
openssl 1.1.1i h27cfd23_0
packaging 20.4 py_0
pandas 1.1.3 py38he6710b0_0
pandoc 2.11 hb0f4dca_0
pandocfilters 1.4.3 py38h06a4308_1
pango 1.45.3 hd140c19_0
parso 0.7.0 py_0
partd 1.1.0 py_0
patchelf 0.12 he6710b0_0
path 15.0.0 py38_0
path.py 12.5.0 0
pathlib2 2.3.5 py38_0
pathtools 0.1.2 py_1
patsy 0.5.1 py38_0
pbr 5.5.1 py_0
pcre 8.44 he6710b0_0
pep8 1.7.1 py38_0
pexpect 4.8.0 py38_0
pickleshare 0.7.5 py38_1000
pillow 8.0.1 py38he98fc37_0
pip 20.2.4 py38h06a4308_0
pixman 0.40.0 h7b6447c_0
pkginfo 1.6.1 py38h06a4308_0
plotly 4.14.3 pyhd3eb1b0_0
pluggy 0.13.1 py38_0
ply 3.11 py38_0
pre-commit 2.10.1 py38h578d9bd_0 conda-forge
prometheus_client 0.8.0 py_0
prompt-toolkit 3.0.8 py_0
prompt_toolkit 3.0.8 0
psutil 5.7.2 py38h7b6447c_0
ptyprocess 0.6.0 py38_0
py 1.9.0 py_0
py-lief 0.10.1 py38h403a769_0
pycodestyle 2.6.0 py_0
pycosat 0.6.3 py38h7b6447c_1
pycparser 2.20 py_2
pycurl 7.43.0.6 py38h1ba5d50_0
pydocstyle 5.1.1 py_0
pyflakes 2.2.0 py_0
pygments 2.7.2 pyhd3eb1b0_0
pylint 2.6.0 py38_0
pyodbc 4.0.30 py38he6710b0_0
pyopenssl 19.1.0 py_1
pyparsing 2.4.7 py_0
pyqt 5.9.2 py38h05f1152_4
pyrsistent 0.17.3 py38h7b6447c_0
pysocks 1.7.1 py38_0
pytables 3.6.1 py38h9fd0a39_0
pytest 6.1.1 py38_0
python 3.8.5 h7579374_1
python-dateutil 2.8.1 py_0
python-jsonrpc-server 0.4.0 py_0
python-language-server 0.35.1 py_0
python-libarchive-c 2.9 py_0
python_abi 3.8 1_cp38 conda-forge
pytz 2020.1 py_0
pywavelets 1.1.1 py38h7b6447c_2
pyxdg 0.27 pyhd3eb1b0_0
pyyaml 5.3.1 py38h7b6447c_1
pyzmq 19.0.2 py38he6710b0_1
qdarkstyle 2.8.1 py_0
qt 5.9.7 h5867ecd_1
qtawesome 1.0.1 py_0
qtconsole 4.7.7 py_0
qtpy 1.9.0 py_0
readline 8.0 h7b6447c_0
regex 2020.10.15 py38h7b6447c_0
requests 2.24.0 py_0
retrying 1.3.3 py_2
ripgrep 12.1.1 0
rope 0.18.0 py_0
rtree 0.9.4 py38_1
ruamel_yaml 0.15.87 py38h7b6447c_1
scikit-image 0.17.2 py38hdf5156a_0
scikit-learn 0.23.2 py38h0573a6f_0
scipy 1.5.2 py38h0b6359f_0
seaborn 0.11.0 py_0
secretstorage 3.1.2 py38_0
send2trash 1.5.0 py38_0
setuptools 50.3.1 py38h06a4308_1
simplegeneric 0.8.1 py38_2
singledispatch 3.4.0.3 py_1001
sip 4.19.13 py38he6710b0_0
six 1.15.0 py38h06a4308_0
snappy 1.1.8 he6710b0_0
snowballstemmer 2.0.0 py_0
sortedcollections 1.2.1 py_0
sortedcontainers 2.2.2 py_0
soupsieve 2.0.1 py_0
sphinx 3.2.1 py_0
sphinx-autodoc-typehints 1.11.1 pypi_0 pypi
sphinx_rtd_theme 0.5.1 pyhd3deb0d_0 conda-forge
sphinxcontrib 1.0 py38_1
sphinxcontrib-apidoc 0.3.0 py_1 conda-forge
sphinxcontrib-applehelp 1.0.2 py_0
sphinxcontrib-devhelp 1.0.2 py_0
sphinxcontrib-htmlhelp 1.0.3 py_0
sphinxcontrib-jsmath 1.0.1 py_0
sphinxcontrib-qthelp 1.0.3 py_0
sphinxcontrib-serializinghtml 1.1.4 py_0
sphinxcontrib-websupport 1.2.4 py_0
spyder 4.1.5 py38_0
spyder-kernels 1.9.4 py38_0
sqlalchemy 1.3.20 py38h7b6447c_0
sqlite 3.33.0 h62c20be_0
statsmodels 0.12.0 py38h7b6447c_0
sympy 1.6.2 py38h06a4308_1
tbb 2020.3 hfd86e86_0
tblib 1.7.0 py_0
terminado 0.9.1 py38_0
testpath 0.4.4 py_0
threadpoolctl 2.1.0 pyh5ca1d4c_0
tifffile 2020.10.1 py38hdd07704_2
tk 8.6.10 hbc83047_0
toml 0.10.1 py_0
toolz 0.11.1 py_0
tornado 6.0.4 py38h7b6447c_1
tqdm 4.50.2 py_0
traitlets 5.0.5 py_0
typed-ast 1.4.2 py38h27cfd23_1
typing_extensions 3.7.4.3 py_0
ujson 4.0.1 py38he6710b0_0
unicodecsv 0.14.1 py38_0
unixodbc 2.3.9 h7b6447c_0
urllib3 1.25.11 py_0
virtualenv 20.4.2 py38h578d9bd_0 conda-forge
watchdog 0.10.3 py38_0
wcwidth 0.2.5 py_0
webencodings 0.5.1 py38_1
werkzeug 1.0.1 py_0
wheel 0.35.1 py_0
widgetsnbextension 3.5.1 py38_0
wrapt 1.11.2 py38h7b6447c_0
wurlitzer 2.0.1 py38_0
xlrd 1.2.0 py_0
xlsxwriter 1.3.7 py_0
xlwt 1.3.0 py38_0
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
yapf 0.30.0 py_0
zeromq 4.3.3 he6710b0_3
zict 2.0.0 py_0
zipp 3.4.0 pyhd3eb1b0_0
zlib 1.2.11 h7b6447c_3
zope 1.0 py38_1
zope.event 4.5.0 py38_0
zope.interface 5.1.2 py38h7b6447c_0
zstd 1.4.5 h9ceee32_0

No errors reported.

Warning log
Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present.
Exception class: <class ‘numba.cuda.cudadrv.error.CudaSupportError’>
Warning (roc): Error initialising ROC: No ROC toolchains found.
Warning (roc): No HSA Agents found, encountered exception when searching: Error at driver init:
NUMBA_HSA_DRIVER /opt/rocm/lib/libhsa-runtime64.so is not a valid file path. Note it must be a filepath of the .so/.dll/.dylib or the driver:

If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.

fionnualasolomon · March 3, 2021, 12:31pm

Hi @Hannes,

Thank you for responding. I was using jupyter lab and then thought it might have been contributing but even run as one script from my terminal I was getting different timings.

My output from numba -s is:

System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2021-03-03 12:26:03.814271
UTC start time                                : 2021-03-03 12:26:03.814279
Running time (s)                              : 1.144859

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : icelake-client
CPU Count                                     : 8
Number of accessible CPUs                     : ?
List of accessible CPUs cores                 : ?
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2
                                                avx512bitalg avx512bw avx512cd
                                                avx512dq avx512f avx512ifma
                                                avx512vbmi avx512vbmi2 avx512vl
                                                avx512vnni avx512vpopcntdq bmi
                                                bmi2 clflushopt cmov cx16 cx8 f16c
                                                fma fsgsbase fxsr gfni invpcid
                                                lzcnt mmx movbe pclmul popcnt
                                                prfchw rdpid rdrnd rdseed sahf sgx
                                                sha sse sse2 sse3 sse4.1 sse4.2
                                                ssse3 vaes vpclmulqdq xsave xsavec
                                                xsaveopt xsaves

Memory Total (MB)                             : 16384
Free Memory (MB)                              : 66

__OS Information__
Platform Name                                 : Darwin-19.6.0-x86_64-i386-64bit
Platform Release                              : 19.6.0
OS Name                                       : Darwin
OS Version                                    : Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64
OS Specific Version                           : 10.15.7   x86_64
Libc Version                                  : ?

__Python Information__
Python Compiler                               : Clang 10.0.0 
Python Implementation                         : CPython
Python Version                                : 3.7.9
Python Locale                                 : en_IE.UTF-8

__LLVM Information__
LLVM Version                                  : 10.0.1

__CUDA Information__
CUDA Device Initialized                       : False
CUDA Driver Version                           : ?
CUDA Detect Output:
None
CUDA Librairies Test Output:
None

__ROC information__
ROC Available                                 : False
ROC Toolchains                                : None
HSA Agents Count                              : 0
HSA Agents:
None
HSA Discrete GPUs Count                       : 0
HSA Discrete GPUs                             : None

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available              : True
+-->Vendor: Intel
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda Build                                   : not installed
Conda Env                                     : 4.9.2
Conda Platform                                : osx-64
Conda Python Version                          : 3.8.5.final.0
Conda Root Writable                           : True

__Installed Packages__
appnope                   0.1.0           py37hf985489_1002    conda-forge
argon2-cffi               20.1.0           py37h4b544eb_2    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     20.3.0             pyhd3deb0d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.1                      py_0    conda-forge
blas                      1.0                         mkl    anaconda
bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
brotlipy                  0.7.0           py37h395d20d_1001    conda-forge
ca-certificates           2020.10.14                    0    anaconda
certifi                   2020.6.20                py37_0    anaconda
cffi                      1.14.3           py37hed5b41f_0    anaconda
chardet                   3.0.4           py37h2987424_1008    conda-forge
cryptography              3.2.1            py37h3b7a55b_0    conda-forge
cycler                    0.10.0                     py_2    conda-forge
cython                    0.29.21          py37hb1e8313_0    anaconda
decorator                 4.4.2                      py_0    conda-forge
defusedxml                0.6.0                      py_0    conda-forge
entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
freetype                  2.10.4               h3f75d11_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
importlib-metadata        3.1.0              pyhd8ed1ab_0    conda-forge
importlib_metadata        3.1.0                hd8ed1ab_0    conda-forge
intel-openmp              2020.2                      258    anaconda
ipykernel                 5.3.4            py37he01cfaa_1    conda-forge
ipython                   5.8.0                    py37_1    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
json5                     0.9.5              pyh9f0ad1d_0    conda-forge
jsonschema                3.2.0                      py_2    conda-forge
jupyter_client            6.1.7                      py_0    conda-forge
jupyter_core              4.7.0            py37hf985489_0    conda-forge
jupyterlab                2.2.9                      py_0    conda-forge
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
jupyterlab_server         1.2.0                      py_0    conda-forge
kiwisolver                1.3.0            py37h23ab428_0  
libcxx                    10.0.0                        1  
libedit                   3.1.20191231         h1de35cc_1  
libffi                    3.3                  hb1e8313_2  
libgfortran               3.0.1                h93005f0_2    anaconda
libllvm10                 10.0.1               h76017ad_5  
libpng                    1.6.37               h7cec526_2    conda-forge
libsodium                 1.0.18               hbcb3906_1    conda-forge
llvm-openmp               10.0.0               h28b9765_0  
llvmlite                  0.34.0           py37h739e7dc_4  
markupsafe                1.1.1            py37h395d20d_2    conda-forge
matplotlib                2.2.5                h694c41f_3    conda-forge
matplotlib-base           2.2.5            py37h11da6c2_1    conda-forge
mistune                   0.8.4           py37h4b544eb_1002    conda-forge
mkl                       2019.4                      233    anaconda
mkl-service               2.3.0            py37hfbe908c_0    anaconda
mkl_fft                   1.2.0            py37hc64f4ea_0    anaconda
mkl_random                1.1.1            py37h959d312_0    anaconda
mpi                       1.0                       mpich    anaconda
mpi4py                    3.0.3            py37h77202c6_0    anaconda
mpich                     3.3.2                hc856adb_0    anaconda
nbclient                  0.5.1                      py_0    conda-forge
nbconvert                 6.0.7            py37hf985489_3    conda-forge
nbformat                  5.0.8                      py_0    conda-forge
ncurses                   6.2                  h0a44026_1  
nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
notebook                  6.1.5            py37hf985489_0    conda-forge
numba                     0.51.2           py37h959d312_1  
numexpr                   2.7.1            py37hce01a72_0    anaconda
numpy                     1.19.1           py37h3b9f5b6_0    anaconda
numpy-base                1.19.1           py37hcfb5961_0    anaconda
openssl                   1.1.1h               haf1e3a3_0    anaconda
packaging                 20.4               pyh9f0ad1d_0    conda-forge
pandoc                    2.11.2               hc929b4f_0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pip                       20.2.4           py37hecd8cb5_0  
prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
prompt_toolkit            1.0.15                     py_1    conda-forge
ptyprocess                0.6.0                   py_1001    conda-forge
pycparser                 2.20                       py_2    anaconda
pygments                  2.7.2                      py_0    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyrsistent                0.17.3           py37h4b544eb_1    conda-forge
pysocks                   1.7.1            py37h2987424_2    conda-forge
python                    3.7.9                h26836e1_0  
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2020.4             pyhd8ed1ab_0    conda-forge
pyzmq                     19.0.2           py37hb1e8313_1  
readline                  8.0                  h1de35cc_0  
requests                  2.25.0             pyhd3deb0d_0    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                50.3.1           py37hecd8cb5_1  
simplegeneric             0.8.1                      py_1    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.33.0               hffcf06c_0  
terminado                 0.9.1            py37hf985489_1    conda-forge
testpath                  0.4.4                      py_0    conda-forge
time                      1.8                  h01d97ff_0    conda-forge
tk                        8.6.10               hb0a8c7a_0  
tornado                   6.1              py37h4b544eb_0    conda-forge
traitlets                 5.0.5                      py_0    conda-forge
urllib3                   1.25.11                    py_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.35.1             pyhd3eb1b0_0  
xz                        5.2.5                h1de35cc_0  
zeromq                    4.3.3                hb1e8313_3  
zipp                      3.4.0                      py_0    conda-forge
zlib                      1.2.11               h1de35cc_3  

No errors reported.


__Warning log__
Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present.
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (roc): Error initialising ROC: No ROC toolchains found.
Warning (roc): No HSA Agents found, encountered exception when searching: Error at driver init: 

HSA is not currently supported on this platform (darwin).
:
Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================

That is odd you get all the same timings - that is what I would expect.

Fionnuala

Hannes · March 3, 2021, 2:38pm

The most obvious difference I see is that you are using and older numba / llvmlite version.

I almost find your timing for the explicit object mode / looplift run suspiciously fast.
I tried on my own machine to remove the timer from the jitted function and compile it in nopython mode and then measure the time outside the nopython compiled function, which should be very close to optimal since object mode should slow things down a little.
Even with that setup I cannot optimise the run time much beyond that of 150ms. I don’t know why the explicit looplifting version would be almost 10 times faster on your machine.

Have you checked of that version actually returns the correct result? Maybe something is buggy and the loop is cut short or similar.

gmarkall · March 3, 2021, 3:04pm

I’m not sure that calling time.time() inside an njitted function can be relied on - optimizations might move code around inside the function so that the code being timed may not correlate with the same code from within the source.

If you move your timing outside the jitted function, do you get more consistent results? (Also, object mode + loop lifting might not be needed if the timing is moved outside the jitted function)

fionnualasolomon · March 4, 2021, 9:28am

We have the same LLVM version no? 10.0.1? and I have numba 0.51.2 which I had thought was the most up to date?

I also now removed the time.time from within the function and get these times.

Using njitted kernel and jitted compute function
CPU times: user 34.3 ms, sys: 694 µs, total: 35 ms
Wall time: 35.1 ms
Using njitted kernel and njitted compute function
CPU times: user 29 ms, sys: 283 µs, total: 29.3 ms
Wall time: 29.4 ms
Using njitted kernel and jitted compute function in object mode & looplift
CPU times: user 174 ms, sys: 1.33 ms, total: 175 ms
Wall time: 175 ms

Which are consistent with what I expect; first one the jit is choosing nopython mode, second one is forced to use nopython mode, both have similar times and the third is being forced to use object mode and so is slower.

So I think, despite the warning saying it was running in object mode with looplifting, it was somehow running in nopython mode or at least getting the speed of nopython mode when the time.time was included in the function.

The mandel set image they all generated were correct too.

fionnualasolomon · March 4, 2021, 9:35am

Hi @gmarkall,

Yes I did get much more consistent results removing the time.time (see my other reply to @Hannes) .

The time.time in the function was definitely confusing things

Thanks

gmarkall · March 4, 2021, 9:45am

Glad things seem more consistent now

The latest version at the moment is 0.52.0 - though, 0.53.0 should be out in a few days.

Hannes · March 4, 2021, 9:48am

Hi,

seems like Graham’s guess hit dead center - now everything looks right I’d say

Just FYI: numba is currently at version 0.52 with version 0.53RC2 recently released. And I was referring to the version of llvmlite - a small Python wrapper used by numba to interact with LLVM afaik.

EDIT: Graham, you beat me by like 2 minutes

fionnualasolomon · March 4, 2021, 10:25am

oh ok thanks for letting me know (@hannes too). I’m all up to date now and will watch for the new updates. Cheers for the help

Topic		Replies	Views
3x slowdown in parent function when applying njit Community Support	11	504	July 16, 2020
Why @njit(parallel=True) seems to be faster than @vectorize(target='parallel')? Support: How do I do ...?	0	198	January 10, 2024
Why numba decorator cause 10 times slower run Support: How do I do ...?	3	509	August 31, 2022
Timings for arr[:, i] seem much slower in numba Community Support	5	118	March 14, 2024
Passing namedTuple to a jitted function is slow Community Support	1	588	April 25, 2022

Why am i getting different performance speeds for the "same" decorator?

System info:

============================================================= IMPORTANT: Please ensure that you are happy with sharing the contents of the information present, any information that you wish to keep private you should remove before sharing.

Related Topics

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.