I guys!
I’m trying to run my numba code on our cluster composed by different nodes managed by PBS.
I installed anaconda in user-space and I created my environment with all of the libraries I need. The shell is tcsh, thus I configured the base environment importing conda.csh from profile.d in my .tcshrc.
I run my script by typing:
setenv NUMBA_NUM_THREADS 12
conda activate myenv
python myscript.py
conda deactivate
When I run on the cluster front end (for test only) it works fine; 24 cpus are used and the script terminates.
Here problems start!
When I submit the job via qsub, it seems to use all the allocated resources without complete processing, thus I tried to directly run the code on a free node of the cluster; I found out that independently by NUMBA_NUM_THREADS variable, all of the cpus of the cluster node are saturated! Furthermore the script never complete the job!
What’s going on?
Thanks for help.
Ivan
[EDIT]
On the front end:
(py39) frontend /home/user 105 % numba -s
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time) : 2021-07-03 13:42:37.380803
UTC start time : 2021-07-03 11:42:37.381489
Running time (s) : 7.433437
__Hardware Information__
Machine : x86_64
CPU Name : broadwell
CPU Count : 20
Number of accessible CPUs : 32
List of accessible CPUs cores : 0-191
CFS Restrictions (CPUs worth of runtime) : None
CPU Features : 64bit adx aes avx avx2 bmi bmi2
cmov cx16 cx8 f16c fma fsgsbase
fxsr invpcid lzcnt mmx movbe
pclmul popcnt prfchw rdrnd rdseed
rtm sahf sse sse2 sse3 sse4.1
sse4.2 ssse3 xsave xsaveopt
Memory Total (MB) : 257541
Memory Available (MB) : 243705
__OS Information__
Platform Name : Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
Platform Release : 3.10.0-957.el7.x86_64
OS Name : Linux
OS Version : #1 SMP Thu Oct 4 20:48:51 UTC 2018
OS Specific Version : ?
Libc Version : glibc 2.17
__Python Information__
Python Compiler : GCC 7.5.0
Python Implementation : CPython
Python Version : 3.9.5
Python Locale : en_US.UTF-8
__LLVM Information__
LLVM Version : 10.0.1
__CUDA Information__
CUDA Device Initialized : False
CUDA Driver Version : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None
__ROC information__
ROC Available : False
ROC Toolchains : None
HSA Agents Count : 0
HSA Agents:
None
HSA Discrete GPUs Count : 0
HSA Discrete GPUs : None
__SVML Information__
SVML State, config.USING_SVML : True
SVML Library Loaded : True
llvmlite Using SVML Patched LLVM : True
SVML Operational : True
__Threading Layer Information__
TBB Threading Layer Available : True
+-->TBB imported successfully.
OpenMP Threading Layer Available : True
+-->Vendor: GNU
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.
__Numba Environment Variable Information__
None found.
__Conda Information__
Conda Build : 3.21.4
Conda Env : 4.10.3
Conda Platform : linux-64
Conda Python Version : 3.8.10.final.0
Conda Root Writable : True
__Installed Packages__
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
blas 1.0 mkl
ca-certificates 2021.5.25 h06a4308_1
certifi 2021.5.30 py39h06a4308_0
cudatoolkit 11.0.221 h6bb024c_0
cycler 0.10.0 py39h06a4308_0
dbus 1.13.18 hb2f20db_0
expat 2.4.1 h2531618_2
fontconfig 2.13.1 h6c09931_0
freetype 2.10.4 h5ab3b9f_0
glib 2.68.2 h36276a3_0
gst-plugins-base 1.14.0 h8213a91_2
gstreamer 1.14.0 h28cd5cc_2
icu 58.2 he6710b0_3
intel-openmp 2021.2.0 h06a4308_610
jpeg 9b h024ee3a_2
kiwisolver 1.3.1 py39h2531618_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 9.3.0 h5101ec6_17
libllvm10 10.0.1 hbcb73fb_5
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 hd4cf53a_17
libtiff 4.2.0 h85742a9_0
libuuid 1.0.3 h1bed415_2
libwebp-base 1.2.0 h27cfd23_0
libxcb 1.14 h7b6447c_0
libxml2 2.9.12 h03d6c58_0
llvmlite 0.36.0 py39h612dafd_4
lz4-c 1.9.3 h2531618_0
matplotlib 3.3.4 py39h06a4308_0
matplotlib-base 3.3.4 py39h62a2d02_0
mkl 2021.2.0 h06a4308_296
mkl-service 2.3.0 py39h27cfd23_1
mkl_fft 1.3.0 py39h42c9631_2
mkl_random 1.2.1 py39ha9443f7_2
ncurses 6.2 he6710b0_1
numba 0.53.1 py39ha9443f7_0
numpy 1.20.2 py39h2d18471_0
numpy-base 1.20.2 py39hfae3a4d_0
olefile 0.46 py_0
openssl 1.1.1k h27cfd23_0
pcre 8.45 h295c915_0
pillow 8.2.0 py39he98fc37_0
pip 21.1.3 py39h06a4308_0
pyparsing 2.4.7 pyhd3eb1b0_0
pyqt 5.9.2 py39h2531618_6
python 3.9.5 h12debd9_4
python-dateutil 2.8.1 pyhd3eb1b0_0
qt 5.9.7 h5867ecd_1
readline 8.1 h27cfd23_0
scipy 1.6.2 py39had2a1c9_1
setuptools 52.0.0 py39h06a4308_0
sip 4.19.13 py39h2531618_0
six 1.16.0 pyhd3eb1b0_0
sqlite 3.36.0 hc218d9a_0
tbb 2020.3 hfd86e86_0
tk 8.6.10 hbc83047_0
tornado 6.1 py39h27cfd23_0
tzdata 2021a h52ac0ba_0
wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0
No errors reported.
__Warning log__
Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present.
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (roc): Error initialising ROC: No ROC toolchains found.
Warning (roc): No HSA Agents found, encountered exception when searching: Error at driver init:
NUMBA_HSA_DRIVER /opt/rocm/lib/libhsa-runtime64.so is not a valid file path. Note it must be a filepath of the .so/.dll/.dylib or the driver:
Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.
=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================
On the cluster node:
(py39) [user@cn013 ~]$ numba -s
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time) : 2021-07-03 13:43:15.883785
UTC start time : 2021-07-03 11:43:15.884534
Running time (s) : 4.449655
__Hardware Information__
Machine : x86_64
CPU Name : broadwell
CPU Count : 36
Number of accessible CPUs : 32
List of accessible CPUs cores : 0-191
CFS Restrictions (CPUs worth of runtime) : None
CPU Features : 64bit adx aes avx avx2 bmi bmi2
cmov cx16 cx8 f16c fma fsgsbase
fxsr invpcid lzcnt mmx movbe
pclmul popcnt prfchw rdrnd rdseed
rtm sahf sse sse2 sse3 sse4.1
sse4.2 ssse3 xsave xsaveopt
Memory Total (MB) : 257845
Memory Available (MB) : 253311
__OS Information__
Platform Name : Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
Platform Release : 3.10.0-957.el7.x86_64
OS Name : Linux
OS Version : #1 SMP Thu Oct 4 20:48:51 UTC 2018
OS Specific Version : ?
Libc Version : glibc 2.17
__Python Information__
Python Compiler : GCC 7.5.0
Python Implementation : CPython
Python Version : 3.9.5
Python Locale : en_US.UTF-8
__LLVM Information__
LLVM Version : 10.0.1
__CUDA Information__
CUDA Device Initialized : False
CUDA Driver Version : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None
__ROC information__
ROC Available : False
ROC Toolchains : None
HSA Agents Count : 0
HSA Agents:
None
HSA Discrete GPUs Count : 0
HSA Discrete GPUs : None
__SVML Information__
SVML State, config.USING_SVML : False
SVML Library Loaded : False
llvmlite Using SVML Patched LLVM : True
SVML Operational : False
__Threading Layer Information__
TBB Threading Layer Available : True
+-->TBB imported successfully.
OpenMP Threading Layer Available : True
+-->Vendor: GNU
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.
__Numba Environment Variable Information__
None found.
__Conda Information__
Conda Build : 3.21.4
Conda Env : 4.10.3
Conda Platform : linux-64
Conda Python Version : 3.8.10.final.0
Conda Root Writable : True
__Installed Packages__
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
blas 1.0 mkl
ca-certificates 2021.5.25 h06a4308_1
certifi 2021.5.30 py39h06a4308_0
cudatoolkit 11.0.221 h6bb024c_0
cycler 0.10.0 py39h06a4308_0
dbus 1.13.18 hb2f20db_0
expat 2.4.1 h2531618_2
fontconfig 2.13.1 h6c09931_0
freetype 2.10.4 h5ab3b9f_0
glib 2.68.2 h36276a3_0
gst-plugins-base 1.14.0 h8213a91_2
gstreamer 1.14.0 h28cd5cc_2
icu 58.2 he6710b0_3
intel-openmp 2021.2.0 h06a4308_610
jpeg 9b h024ee3a_2
kiwisolver 1.3.1 py39h2531618_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 9.3.0 h5101ec6_17
libllvm10 10.0.1 hbcb73fb_5
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 hd4cf53a_17
libtiff 4.2.0 h85742a9_0
libuuid 1.0.3 h1bed415_2
libwebp-base 1.2.0 h27cfd23_0
libxcb 1.14 h7b6447c_0
libxml2 2.9.12 h03d6c58_0
llvmlite 0.36.0 py39h612dafd_4
lz4-c 1.9.3 h2531618_0
matplotlib 3.3.4 py39h06a4308_0
matplotlib-base 3.3.4 py39h62a2d02_0
mkl 2021.2.0 h06a4308_296
mkl-service 2.3.0 py39h27cfd23_1
mkl_fft 1.3.0 py39h42c9631_2
mkl_random 1.2.1 py39ha9443f7_2
ncurses 6.2 he6710b0_1
numba 0.53.1 py39ha9443f7_0
numpy 1.20.2 py39h2d18471_0
numpy-base 1.20.2 py39hfae3a4d_0
olefile 0.46 py_0
openssl 1.1.1k h27cfd23_0
pcre 8.45 h295c915_0
pillow 8.2.0 py39he98fc37_0
pip 21.1.3 py39h06a4308_0
pyparsing 2.4.7 pyhd3eb1b0_0
pyqt 5.9.2 py39h2531618_6
python 3.9.5 h12debd9_4
python-dateutil 2.8.1 pyhd3eb1b0_0
qt 5.9.7 h5867ecd_1
readline 8.1 h27cfd23_0
scipy 1.6.2 py39had2a1c9_1
setuptools 52.0.0 py39h06a4308_0
sip 4.19.13 py39h2531618_0
six 1.16.0 pyhd3eb1b0_0
sqlite 3.36.0 hc218d9a_0
tbb 2020.3 hfd86e86_0
tk 8.6.10 hbc83047_0
tornado 6.1 py39h27cfd23_0
tzdata 2021a h52ac0ba_0
wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0
No errors reported.
__Warning log__
Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present.
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (roc): Error initialising ROC: No ROC toolchains found.
Warning (roc): No HSA Agents found, encountered exception when searching: Error at driver init:
NUMBA_HSA_DRIVER /opt/rocm/lib/libhsa-runtime64.so is not a valid file path. Note it must be a filepath of the .so/.dll/.dylib or the driver:
Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.
=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================