Does Numba support MPI and/or openMP parallelization?

Dear all, Hi!
This is my first post in this mailing list.

I have an MPI-based cluster of processors for which I have been using the mpi4py module.
Now, that I am aware about the existence of Numba, I want to know if Numba supports MPI parallelization. I also want to know if Numba supports openMP (shared memory) parallelization.
If the answer to these two questions is positive, Numba may give me a very interesting way of writing parallel python code two both parallel architectures.

I thank you in advance for your answer to this topic.

All the best,
Moshe Goldstein

Hi @goldmosh,

Not out of the box. You can use a lot of ctypes in Numba and could call MPI functions if you wanted to but it’d probably be a lot of work. You might be interested in trying out dask and it’s dask.distributed backend, it works well with Numba.

Yes. See Automatic parallelization with @jit — Numba 0.52.0-py3.7-linux-x86_64.egg documentation, you can even elect to use OpenMP as the threading layer.

Hope this helps?

Hi Moshe,

In some previous work at Intel Labs, we built an analytics system called HPAT that used MPI to distribute Pandas and NumPy workloads through Numba to a cluster. That code is still available at Intel Labs github but that work has been taken over by a product team and become an Intel product called SDC (scalable dataframe compiler). SDC is currently focused on single-node but may in the future work on clusters, potentially using MPI. However, the HPAT work is a proof point that you can layer MPI and Numba together.

In terms of OpenMP, the OpenMP that Numba currently supports is limited and hidden behind the Numba threading layer interface. Thus, it is only accessible through Numba parallelization like parallel=True, vectorize, guvectorize. However, again at Intel Labs, we are developing a prototype that will allow you to use most of the OpenMP syntax directly in Numba, like you would in C or Fortran. We hope to present a preview of that work at the upcoming SciPy2021. If you have a license for the Intel C Compiler then there’s even potential to become a tester of the prototype. An Intel compiler license is required right now because the prototype uses parts of the C compiler to help provide OpenMP support.

Todd A. Anderson
Intel Labs

Stuartchibald and DrTodd13, Hi.

Thank you, both, for your replies. They certainly help me, and I will investigate the suggestions of you, both.
Thanks.
Moshe

Stuart,
Thank you very much. I will try both.
Moshe

DrTodd13, hi.
Thank you for your reply. I will ask the system administrators in my college (Jerusalem College of Technology) is they have a license for the Intel C Compiler. What about the Intel Fortran compiler?
Moshe

@goldmosh Bodo provides MPI parallelization for data analytics codes in Pandas/Numpy so may be useful for you. What are your applications like? Are they data analytics type that could be written in Pandas/Numpy? Or are they things like multi-dimensional physical simulations in FDM, etc. (which is not Bodo’s target today).

Perhaps this might be of interest as well: numba-mpi · PyPI
HTH,
Sylwester

Hi!
Thank you for the information. I will download numba-mpi and try it. The application for which I wanted to try using Numba is protein structure prediction, especially by using my own algorithm, called DEEPSAM, that I wanted to re-write it on Numba and compare with the current implementation. Also, I am member of a research lab, called FLEXCOMP (at flexcomp.jct.ac.il), where we developed an embedded language, called EFL, for parallel programming. I wanted to write a new version of the EFL’s pre-compiler to generate parallel code for Numba. We already have a version for Python Multiprocessing and another version for mpi4py.
Yours,
Moshe Goldstein

Thanks for the reply Moshe. In numba-mpi, there are just some basic send/recv operations handled for now. Please report any needed additions at Issues · atmos-cloud-sim-uj/numba-mpi · GitHub, we’ll be happy to work on adding more wrappers - it should be straightforward given the technicalities are solved and exemplified. Best, Sylwester

We have been working on an OpenMP Numba release without the Intel compiler requirement. It is out on github but right now we have a problem with a couple of the repos being private. This week I’m going to be creating conda builds so that this can be easily installed through conda. Keep your eye out for an announcement soon about where you can get it!

The prototype is now available for install with “conda install llvmlite numba -c drtodd13 -c conda-forge --override-channels”. There’s a few core things like reduction clauses that aren’t working at the moment but getting those to work is in progress and should happen soon. Please give me feedback.@goldmosh

Dear DrTodd13, Hi.

Thank you very much.

I will try it soon and I will let you know about my experience with it.

Moshe Goldstein

You can use that link to learn the syntax you will need to use the prototype. There’s a few examples at github.com/Python-for-HPC/pyomp/examples you can look at as well.

Reductions are working now.

Thank you very much for the update.

I appreciate that very much.

Can you comment on PyOMP status? It seems that activity on GitHub - Python-for-HPC/PyOMP: OpenMP for Python in Numba ceased 2 months ago.

Development is still ongoing (although slow for the holidays). We are mainly working from a fork of that repository at the moment and we plan to do periodic updates of the repo you mentioned. However, given the complexity of building PyOMP, we suggest that all users of PyOMP get it from the conda install command that I previously gave. If you want to be more involved in the development of PyOMP then let me know. Given that parallel for, SPMD style and tasking are working, our next target (pun intended) is the target directive to allow GPU offload. Development of target offload capabilities is the current ongoing PyOMP work.

Thank you for responding. I’m at the research stage of comparing compiled extension to Python for implementing performance critical pieces of code. PyOMP caught my eye as a good candidate, since I have some familiarity with OpenMP on C++. I just wanted to make sure the project is still alive before I do some testing. Also, curious how well it performs compared to data parallel Python from OneAPI.

We don’t have target working yet so no performance numbers for GPU comparison. Are you also looking at regular Numba parallel=True? Can you say more about your criteria? CPU or GPU? One node or distributed? We also have a new distributed array package that uses parallel=True on multinode.