2020 Roadmap Discussion and Planning

We said on the call last week that we would use Discourse to formulate a new project roadmap. I know I at least would really appreciate clarity on what would be meaningful to work on for the overall project vision. How will we go about this? Could we use this thread to brainstorm ideas and continue the conversation from the call?

Hi. On last week’s call, there was a lot of discussion on improving usability (like better error messaging, communicating how numba should be used), which I think is very good, but there was very little discussion on compilation performance. Not sure how big of an issue this is for others, but I have frequently seen that compilation takes longer than running the actual function. And numba caching still has several issues especially when there are jitted functions calling other jitted functions (missing Environment errors), and other limitations as shown here: http://numba.pydata.org/numba-doc/latest/developer/caching.html#caching-limitations

So I’m wondering where making caching more robust is on the roadmap.

Thanks

I think we were going to come up with a list of user profiles, to help the discussion on who needs what.

I can get the ball the rolling on this front using some profiles that were mentioned in the call.

  • Basic user: is proficient, but not advanced, in Python. Needs to accelerate a few functions. Might not know a compiled language, so many Numba concepts (types, overloading) might be new. Very likely to be working on a Jupyter notebook. e.g.: data scientist, or scientist in computational fields.

  • Intermediate user: is very good at Python, and most likely knows other, compiled languages. Typically working on a large number of functions within an application, library or framework. Is able to work with high-level Numba extensions (overload but not the low level extensions).

  • Advanced user: is very good at Python and another low-level language (C/C++). Understands low-level Numba constructs like intrinsic and the type system. Is working on an application, library, framework, sometimes using Numba compiler machinery directly without necessarily using njit.

From there we can discuss what their needs are, for example:

  • Basic user:

    • simple to understand out-of-the-box acceleration of functions (compilation and parallelization).
    • easy error messages.
    • documentation to grow their knowledge (including plenty of examples) of Numba and of writing high-performance code (what’s possible and what not, using the right data structure in the the right place, how to write fast loops).
    • Large coverage of Numpy, Scipy and Pandas functions.
    • Forum to interact with more experienced users.
  • Intermediate user:

    • Ability to use large subset of common Python patterns, and patterns from compiled languages to generate efficient code.
    • Ability to customize behaviour (via types, njit parameters, etc).
    • For large applications compilation time and caching might be an issue, and for demanding applications efficiency of the lowered code.
    • For library builders looking to speed up their internal code: performance, ease of distribution, and Numba not creating a bad user experience. Some type of jitted objects (either current jitclass or something new).
    • For library builders looking to create “numba native” libraries, ease of creating new types.
    • For library builders integrating with existing numerical libraries, ability to call arbitrary Cython/C/C++/Fortan code.
    • For framework builders, ability to manipulate input user functions and classes, compilation time.
    • Forum to interact with others users (at his level and above).
  • Advanced user (I’m not one, so the following is a just a guess)

    • Customization of compiler passes.
    • Ability to re-use some parts of Numba (Bitecode-to-IR, or IR-to-LLVM machine).
1 Like

The core devs have been discussing the roadmap in our weekly meetings. We identified a few issues that are making it quite hard at present to define a concrete roadmap, specifically:

  • Current core team is small. There are 5 core developers and they also have other roles in their respective organization.
  • Maintaining Numba is time consuming. Numba supports many platforms. Maintaining the infrastructure to support Numba development consumes significant amount of core developers’ time.
  • Many enhancements are open research. It is hard to estimate time to completion.

Growing the number of Numba developers and maintainers seems to be necessary to keep up with the demand on Numba. We will be taking action to encourage both community and industry contributors to take part in the maintainer role. A lot of this is still in planning phase. I will provide updates as we have more information.

2 Likes

This would be nice.

I’m currently looking for a way to write algorithms in Numba that tightly interface with PyTorch and PyBullet without having to pop back up into Python using an objmode context-manager (which often seems to slow things down so much as to cancel out the speed up provided by the Numba compiled code). Since PyTorch and PyBullet both offer C++ frontends, if Numba could interface directly with C++, this would seem to solve the issue.

If there would be a better way to approach this issue, I’d be interested in any suggestions.

I could see this feature being useful to many others as well since in my experience, having to pop back up into Python only to interface with other libraries that essentially just wrap C++ code, seems to be the biggest bottleneck I face when trying to use Numba to speed up existing code, and often results in the Numba version I try being slower than the original due to the added overhead of the context switching.

Reduce objmode and unpickling overhead by sklam · Pull Request #5922 · numba/numba · GitHub is hopefully going to solve some of the slowness in with objmode.

1 Like

@sklam, nice. Looking forward to trying it out.

@elliotwaite @luk-f-a Ability to call “arbitrary” native library functions is an area that I am investigating as well. At this point, it is mostly a very high-level design/requirement gathering stage. It will be very useful to me if you can share some real-world use cases demonstrating the requirement.

@diptorupd, this would be fantastic, thanks for looking into it.

A simple example comes from an optimization library I had to use last year, https://github.com/whuang08/ROPTLIB. It’s written in c++ and I wanted to integrate it in my python code. I had to write a wrapper in Cython, and even then it was not available in jitted code. Cython functions can be called from jitted code, but not Cython extension classes. The dream case would be to be able to point to the c++ library source code, and be able to instantiate classes from that library.

@njit
def foo():
    myobj = MyCppClass()
    return myobj.run_calculation()

Maybe some of the work done for cppyy (https://cppyy.readthedocs.io/en/latest/) can be useful? Since cling compiles to LLVM IR, maybe there’s a way to get numba to use c++ objects. At least for my use case, the objects do not need to leave the jitted “world”.

A more complex example comes from quantitative finance. There’s a very well known library called Quantlib (https://www.quantlib.org/download.shtml). The use case is similar, “how to instantiate c++ objects inside jitted code”. The impact is, of course, larger because the library has thousands of users.

I guess an important distinction is whether the libraries are available as source code or already compiled libraries. I don’t know which use-case is easier or harder to support. In the cases I know of, I always had access to the source code.

Happy to refine the use cases and bounce ideas if it helps.

Luk

I wasn’t aware of cppyy before. Looks interesting. The different library that I know is dragonffi (https://github.com/aguinet/dragonffi) but that’s C only.

What do you expect to happen if you remove the @njit decorator? The issue that we want to avoid is to make the code Numba-specific and failing in normal Python mode. That is why my take has been that calling a native library needs to be via a Python FFI interface, but somehow Numba should infer or be informed by the library writer to compile the the Python FFI function call directly into a native library call avoiding the FFI cruft if possible.

Of course the devil is in the details. For example, how do you handle boxing/unboxing of types?

you make a great point. I didn’t consider what would happen without njit. Maybe fallback to cppyy? Or to an equivalent library that provides python object level access (not a small task of course).

@diptorupd, my use cases are that I want to be able to interface with PyTorch and Bullet, which both have separate Python and C++ frontends. Ideally, I would want both of those libraries to add Numba support so that I could interface with them as easily as I can interface with Numpy, but until then, I was thinking that if I could just interface directly with their C++ libraries, that might be a usable workaround to avoiding the slowdown of having to go through their slower Python frontends.

The type of things I would want to do with these libraries are similar to the example given by @luk-f-a. Here are some specific code examples that demonstrate how these libraries would typically be used:

A more real-world example would be training a PyTorch neural network using the Bullet physics simulator as the environment, which would have a tight loop that looked something like this:

  1. Get the state data of the physics engine.
  2. Pass that state data into the neural network and get out some data for what forces should be applied during the next step of the physics engine.
  3. Pass the forces data to the physics engine and advance the physics engine one step forward in time.
  4. Repeat.

The cppyy fallback idea seems like an elegant solution. I would also even be okay with the C++ interface part of the code being Numba-specific and only working when the @njit decorator was present and raising an error otherwise.

to bring a balanced view, I’ll attempt to be my own devil’s advocate, and explore alternatives to the “dream use case” of being able to directly read C++ classes.

When I used the optimization library, I didn’t know about objmode. If a Cython wrapper exists (or when using cppyy) then long-running tasks (1 second or more), can be comfortably used with Numba’s objmode. Even more comfortably if this PR was merged (https://github.com/numba/numba/pull/3640). That’s because optimization libraries already include their own loops, so it’s unusual to use them for very short tasks.

In the case of quantlib, which provides individual object that the user must put in their own loop, the overhead of objmode will be more of a problem. One alternative would be to write the innermost loop in Cython, and then call the Cython function/object from objmode. An improvement on that would be to allow Numba to talk to Cython/SWIG objects without passing through Python (in Cython those are cdefand cpdef methods).

Another example I forgot to mention before, and more universally useful, is the new Numpy random number generator. It’s C implementation is based on structs, which are accessible using CFFI (https://numpy.org/doc/stable/reference/random/extending.html#cffi) but cannot be used in Numba.