I’m currently using Numba version 0.60.0 on a supercomputer running the CentOS operating system. Each node has 56 cores.
I want Numba to utilize more cores for parallel computing with prange. So, I set the environment variable using the following command: export NUMBA_NUM_THREADS=224
After submitting the program, I received the following error: TBB Warning: The number of workers is currently limited to 55. The request for 223 workers is ignored. Further requests for more workers will be silently ignored until the limit changes.
I suspect it’s related to the threading layer. So, I used print("Threading layer chosen: %s" % threading_layer()) to output the threading layer settings.
However, I got the following error: ValueError: Threading layer is not initialized.
May I ask if I can use Numba for parallel computing across nodes?
Numba is single node. It seems like you had 4 nodes and you were hoping that Numba would use 4 nodes if you gave the total core count of those 4 nodes but Numba is just exclusively single node and this won’t work. For multi-node work, if you want to keep Numba within the node then you can manually do mpi4py across the nodes and Numba within the node. Another option is to do mpi4py across the nodes and pyomp within the node. PyOMP is a Numba extension that implements a pretty full OpenMP experience in Numba so you get the most common HPC programming model of mpi across nodes and openmp within the node. If you don’t want two separate paradigms between and within nodes then you can do something like Dask or Ramba (which I also worked on and isn’t maintained anymore but you could still try it).
My task involves a main loop where the subsequent step is dependent on the results obtained in the previous step. Within each step of this main loop, there exists a sub-loop, and this sub-loop can be parallelized.
To parallelize the sub-loop at each step, I’ve employed numba.prange. Considering the nature of my task, I think that mpi4py might not be a suitable choice.