How to build complex data structures with working JIT cache?

Hi dear friends!

Simple functions in Numba work excellently and very fast — as long as everything boils down to processing NumPy arrays.

But I’m developing a schedule optimizer for production, and need more complex data structures, at least (1) an array of NumPy arrays of different lengths, and (2) many named NumPy arrays representing graph structures.

As soon as I try to use a JIT class, the built-in kernel caching stops working, even if I just pass the JIT class as a parameter to a function. There are no warnings, but measuring the time shows that it compiles every time. I tried on NUMBA 0.56.4 and NUMBA 0.60.

I also tried using typed lists. For some reason, JIT caching also stops working. Moreover, typed lists somehow work very slowly at runtime. For instance, even adding an element to the end is slower than fully recreating a NumPy array, and accessing elements in a typed list is slower than for reflected lists.

Please advise on how to build more complex data structures and not lose JIT caching?

For a large project, the absence of caching practically leads to the impossibility of using Numba.

Jitclass cannot be cached, use structref instead.

nelson2005, thank you!

I haven’t tried structrefs yet.

What is the simplest of the structures that still works and is cached?
Maybe passing some tuples or reflected lists and then extracting from them?

I don’t understand how to write complex programs. Some kind of structures are needed.

How do other people write? I’ve heard about structref, but it seems quite complicated, judging by the examples. Not for everyday use.

Another helpful link with some helpful patterns for using structrefs:

I would feel guilty if I didn’t mention that this question comes up quite a lot, and has been coming up a lot for several years now, with no indication from the devs that they consider improving on jitclass/structref to be a priority. My usual response when folks have this question is something like: “You can use structref… BUT writing and maintaining structrefs is a pain in the butt. It is a lot of boilerplate just to define a class/struct-like thing.” There are some tricks to get around all of this boilerplate, which I share in the link above.

That being said, I would be remiss if I didn’t mention that I’ve recently started porting a lot of numba code to C++. There are several factors to that decision: long compile times, portability, startup times, ability to control memory management etc… But a really big one is just how unproductive I was wrangle structrefs all day. One of my main motivations for using numba was that I didn’t want to subject grad students with limited programming experience to reading my C/C++ code, but frankly once I started doing complicated things in numba, the code became way more complicated than C++, since you quite often need to invent non-standard solutions to do many things that are standard in C/C++ (inheritance, casting, weak pointers, etc…). I’ve been having a much better time lately writing C++, and integrating it with nanobind. So far, there have been far fewer performance gotchas related to passing stuff between the compiled C++ and Python than what I’ve encountered using numba.

1 Like

DannyWeitekamp, thank you so much! I actually saw that ticket and how you respond there.

I was just hoping there was some easy solution to avoid structref ><

I saw that you linked to your tool, but it seemed like it’s from a large project with its own caching, and I thought it might not be so easy to extract it from there.

To be honest, my initial thought was to use C++ for the core of the genetic algorithm and do the wrapper and everything else in Python.

But then I suddenly found Numba and it seemed like… well as if the solution is right there. Speed comparable to compiled C code!

And I also thought about how easy it would be for users to install it. Pip install and off you go. Works on all possible platforms!

With C++ you still need a build, you have to somehow make binaries for different platforms.

(ChatGPT suggests there’s also something called Cython, but I haven’t tried it. In any case, you’d definitely need to install a compiler for it unless it’s Linux, where it comes out of the box.)

@hexagonal Portability with numba is kind of double-edged sword. Numba certainly has the advantages you mention however, it just does not scale particularly well right now (fine for small projects, not so much for big ones). As your project grows the amount of time users need to wait for numba to compile on first execution grows (not a terrible drawback), but so too does the startup times (annoying when it gets to >10 seconds). Maybe that will change when the devs finish their current Ahead of Time (AOT) compilation toolset PIXIE.

One trick you can use for C/C++ is to use GitHub workflows as a Continous Integration (CI) tool to make pre-built wheels for you. My understanding is that GH workflows is free for public repos up to fairly large workloads (I certainly haven’t paid anything for it yet).

Cython is an odd duck, it is basically C, but in Python-ish syntax. I’ve never written with it, but read a good amount of it, and it always annoyed me that it is a third language that isn’t quite Python and not quite C or C++ either. Also, a lot of the things mentioned on this list of drawbacks feel like deal breakers to me (messy memory safety, yuck!). Numba was always more attractive because, for many problems (usually programs that are mostly numerical numpy heavy stuff), it often works out of the box with pure Python. Too bad it’s such a pain to use with classes, or I would have stuck with it.

Danny, thanks for the information!

This time to startup — do you mean that even if JIT has already compiled, and even if the cache works, there is still some startup time accumulating in a large project?

I’ve noticed that there is often hidden time when the JIT doesn’t cache properly; it doesn’t write anything or give any messages but simply compiles silently each time. I only discovered it by timing measurements.

Wow, GitHub can do the builds itself. Cool, thanks!

Regarding Cython, it seems you talked me out of using it. Indeed, I didn’t think about it; there won’t be any boundary checks for arrays. In the case with Numba, you can always disable the decorator—I use my own decorator, and run it on regular Python, where it checks array boundaries. Then switch it back on, and it runs fast but doesn’t check (even for writing beyond the array).

This time to startup — do you mean that even if JIT has already compiled, and even if the cache works, there is still some startup time accumulating in a large project?

In my experience, yes. It isn’t a ton, but it is noticeable. If your project gets big enough it can amount to several seconds of lag on each run. Essentially every cached end-point, every individually compiled @jit decorated function w/ cache=True that is called from Python (jits called from jits don’t count) has a cache file with its compiled code. Checking types and hashing the signature and file contents (in case of dependant globals), then reading that cache from the disk takes a noticeable amount of time.

I’ve noticed that there is often hidden time when the JIT doesn’t cache properly; it doesn’t write anything or give any messages but simply compiles silently each time. I only discovered it by timing measurements.

If you change a file then every jitted function in that file will probably need to be recompiled. Numba is very conservative about when it recompiles because of edge cases where a jitted function depends on a global variable.

Regarding Cython, it seems you talked me out of using it. Indeed, I didn’t think about it; there won’t be any boundary checks for arrays…

I can’t say from experience that Cython is bad, but my impression from other people’s reviews of it is that it can be limiting and it can produce all sorts of new problems, like the fact that it tends to interleave Python and C when it cannot fully compile things, and consequently not always give you a full speed up. It sounds like from some accounts that it also doesn’t offer as much protection as you might hope from the kinds of mistakes you can make in C or C++, including memory safety issues (I suspect those issues come into play when you use the extended syntax beyond just Python). Reading past the end of an array is really the tip of the iceberg in that department. A lot of memory safety issues have to do with malloc’ing and free’ing things at apporpriate times. If those are indeed issues in Cython, then perhaps it would be easier to use C/C++, Rust, C#, Java or other relatively fast options for which tools already exist (e.g. valgrind) for finding hard-to-spot issues like memory safety that don’t occur in Python. Put another way, Cython feels like it is at best easing (but not completely eliminating) an issue that takes maybe a week or two to overcome: learning C/C++ syntax—but not helping much with what makes writing safe and efficient C/C++ hard. If you really need the fastest implementation possible, C/C++ won’t hold you back, but there is a learning curve.

I have to say numba, seems to get pretty darn close to C/C++ for number crunching stuff. It’s got a way to go w/ objects/classes, however. And it would be pretty hard to do all of the same things you can do with objects in C/C++ (like stack allocating them, using move semantics, etc…) with objects in numba, since uses Python syntax.

Numba can do bounds checking, I typically enable it because I feel like the minor performance penalty is worth the safety.

I agree that a more streamlined version of structref is absolutely necessary to make Numba usable in professional medium to large-scale projects. However, after modifying your metaprogramming example for generating general structrefs and adding Python class warppers that are linked to strucref objects, I can be reasonably efficient for large-scale research projects if I am using VS-code and GitHub Copilot to automate boilerplate code. The python wrapper class allows code completion to work in VS-code. I would not want my graduate students to be distracted by all of this boilerplate, and handing off this type of hackish code to a development team for scale up is a no go. I do think non-computer science students can get bogged down by C++ too, and Python in general can really accelerate the development of cleaner solutions IMHO. Julia seems like an attractive alternative to Python, Numba and C++ for scientific computing given that it has stable data structures, performance that for many (but not all) cases is better than Numba, and Python-like syntax and interactivity. But it is not easy to migrate entirely away from the Python ecosystem. Have you considered Julia instead of C++ for your projects? If yes, what moved you toward C++?

+1 to the idea that Python is great for quick clean solutions. That’s the biggest reason why I’ve stuck with making Python extensions instead of switching to another language outright. In this regard, Numba has been nice for striking a balance between thinking in Python/numpy syntax and ecking out impressive speed.

My choice to move on to C++ was twofold: 1) I moved to a new lab and they had recently ported a bunch of Python code to a C++ extension using nanobind. So some of it is just alignment with the skillset of my new lab. 2) While I’d love to scratch the itch of dipping into my shortlist of languages that people rave about (e.g. Julia, Rust, Lisp to name a few)—the reality is that while I was writing in numba I was always putting mental pins in a bunch of places, noting ways that I had estimated the code could be made much faster, if only I could use C/C++ features. As it turns out, 90% of those potential optimizations were either nitty gritty memory management things (mostly getting rid of unnecessary mallocs by stack allocating or pool allocating in key places) or streamlining the way that objects are passed between Python and the compiled code (in this regard nanobind is faster by 10x or more than numba, and there are very simple things numba could be doing to catch up to this). In short I knew I could do what I wanted to do in C++, and it was not clear if another language would support all the feature I needed, or if doing so would require rearchitecting my code in unexpected ways. I might have considered a different decision if I was starting a new project, but I’m at the phase where the algorithms are figured out (numba was helpful with that), and the only thing left to do is make sure they are portable and fast enough that all my little mental pins won’t keep me awake at night. The tools available in C++ felt like the right fit for that purpose.

Thank you for sharing your thought process. I utilize C++ as well but haven’t used nanobind yet. I will definitely give it a go.

I think the structref boiler plate problem associated with structrefs could be resolved (or significantly mitigated) for many use cases if the community built a pipable python module that refines and augments your metaprogramming example. The user could pip-install this module and create a numba compatible data structure with a reasonable amount of boiler plate. This could work as follows:

from structref_maker import make_structref
from structref_maker import srtypes

MyStructref = make_structref(MyStructref", [“param1”, srtypes.float, “param2”, srtypes.int])
my_structref = MyStructref(100.0, 2)
print(my_structref.param1, my_structref.param2)

100, 2

You could build data structures made of other data structures (composition) by accessing a type attribute: my_structref_type = my_structref.type. One aspect that makes me reluctant to support such an effort is that structrefs are “experimental” and this doesn’t seem likely to change any time soon. Do you have any feedback on this idea?

For the time being a user built extension seems like a good idea. My only note for what you’ve written above is that the type specification should be a list of tuples or a dictionary for specifying attribute-name + type pairs.

It would also be nice if that extension could help relax the strict structref/jitclass convention of needing to provide every attribute as parameters to the constructor. Requiring every attribute by default makes sense as a convention to avoid uninitialized values, but it’s a bit non-Pythonic. Having a default way to customize the constructor seems like an important feature… Although users should be warned about segfaults that might ensue if they leave values undefined in their custom constructor.

My gut feeling is that the ideal official solution from the devs would avoid meta programming if possible. Writing to a file or invoking the interpreter over a string to handle type definitions opens up lots of caching complexity to keep the generated code up to date with the real code. I’m not sure if you’ve encountered the same thing, but I sometimes have to clear the cache when using my meta-programming approach to force a recompile. I suspect that a more direct approach would shorten compile times as well.

If I had free reign, and infinite time to rearchitect structref/jitclass I would consider changing a lot. For instance, it’s kind of annoying that the proxy Python class and the numba type are not quite the same thing (I think unifying the two would require changes to numba’s type system… which are possibly impractical). Structref/jitclass also leave a lot to be desired in terms of boxing/unboxing speed by storing meminfos in a regular python attribute of the proxy objects (which involves invoking getattr on the object’s dict) instead of using a regular old pointer added to the end of each PyObject instance (which is what numpy and nanobind do). Basically every container type including typed.List, and typed.Dict, but not nparray have this issue as well.

Yes, the type specification would be a list of of tuple name-type pairs pre-defined for easier access. I have indeed encountered the need to clear the cache during active development. The way I manage this is that the structref cache is cleared at startup if a class attribute clear_strucref_cache is True. I typically have this set to True if I am modifying data structures.

I agree with your comment that the ideal solution would avoid meta-programming. A fully integrated and optimized mutable data structure that works in jitted-functions would significantly elevate the utility of Numba. But your meta-programming solution is clever and allows complex data structures to be used in jitted-functions. The use case I am focused on involves simulations that run over hours to days so some of the startup cost (4-5 seconds once cached vs 50-70 seconds without caching) for structrefs are not important. It would be useful to get some clarity about the roadmap for structrefs. Will they be upgraded? Will they move from an experimental feature to standard feature? This would allow the community to make a more informed call on investing time to make a Python package for better managing structrefs.

I have been experimenting quite a bit with Julia as a Python-Numba replacement. Julia is fast, flexible like Python, has optimized data structures with little boiler plate, has distributed and shared memory parallel loops and can be used to build complex applications once you get a good feel for the language. Unfortunately, Julia uses 1-based arrays which makes porting over large Python/Numba and C++ projects with multi-dimensional arrays painful. Another pain point is that PyJulia can significantly impact the performance of Numba making it difficult to transition parts of your Python/Numba application to Julia.

Although not ideal from a clean code perspective, for my scientific computing use case the Numba + the meta programming approach to structrefs has worked well. But for new projects I think I might go a different route unless a standard, clean and optimized data structure is developed for Numba.