Static fixed-size arrays in other arrays

Dear community, I’m considering migrating our code from C++ to Numba, and I’ve found a lot of questions concerning fixed-size arrays for stack storage. But it is not only for storing on the stack fixed size is useful.

Consider that you have a class that contains a few member variables, one of which represents a physical vector. This vector always has 3 components, and while we could store it as three floats x, y, and z, it is more convenient to treat it as a single variable. Now, we have a very, very large array of this kind of object, and we’re doing lots of calculations on all object. If the vector in the objects is dynamically allocated (of type std::vector), there will be pointer dereferencing for every object in the array, and caching will probably be poor. If, on the other hand, we use std::array, the x, y and z-components will appear linear in memory, in the order we access it when iterating through the large array. At least on my code this tends to make a big difference on performance.

If I’m going to use Numba, I’d like to make a similar object with @jitclass, but I need to make sure that when I put several objects in a Numpy array, the x, y, and z-components appear linearly in memory. Any hints for how to do this?

Kind regards.

hi @sigvaldm
I don’t claim to be an expert but I’ve done similar work and I can mention my thought process. Maybe others will chime in and I’ll learn something along the way.

The first thing you should know if that while C++ is primarily object-oriented (although non-OO style is possible), Numba is primarily multiple-dispatch based. OO patterns are possible with @jitclass and structref. Both are experimental which means that you will find rough edges, especially on complicated use cases (likely to appear if you have a complex C++ codebase in place).

In my main project at work I have a situation similar to yours (large array of objects, which contain a few member variables). I designed it from the beginning thinking about Numba, and the most efficient structure I came up with was a structured array (https://numpy.org/doc/stable/user/basics.rec.html). That will put the entirety of your data (not just the 3-component vector) in linear memory (in the heap). AFAIK, this is equivalent to a C-style array of struct.

Doing that, however, will make it harder to keep some OO syntax patterns in your code. Things like individual object methods, eg object_array[2].obj_method() are possible but not straightforward to implement. It would be easier to write my_method(object_array) to operate on all elements, or my_method(object_array[2]) to operate one element at a time. In numba, it is easier to implement overloaded functions that object methods.

If you want to stay closer to your current design, you should look into both @jitclass and structref. Each has advantages and disadvantages. In any case, and going to your main question, to keep your 3-element vector in linear memory, you should make that element a numpy array. For example

from numba.experimental import jitclass
from numba.typed import List
from numba.types import float64
import numpy as np

@jitclass({'my_vect': float64[::1]})
class MyObject:
    def __init__(self, x, y, z):
        self.my_vect = np.empty(3, np.float64)
        self.my_vect[0] = x
        self.my_vect[1] = y
        self.my_vect[2] = z

obj1 = MyObject(1,2,3)
obj2 = MyObject(4,5,6)
list_objects = List([obj1, obj2])

That above assumes linear memory is the most important property for you. If you want linear memory and fixed size, you can use Tuple, but in that case you’ll also get immutability, which might or not work for you.

Some of the main constraints of this latter approach are around parallelization. AFAIK (and I’m curious to hear from a core dev about this point), it’s not possible to get automatic parallelization (prange and parallel=True) with a list of jitclasses. Also, jitclasses are not pickable, and therefore you cannot use multiprocessing either.

I hope this helps with your decision.

cheers,
Luk

Thank you, that’s very helpful. I had a brief look at structured arrays and structref, but I’ll need to look at those a bit more closely. The way you defined the class is exactly what I assumed would be sub-optimal, but maybe I’m mistaken. Let me explain why I thought so, and you can correct me if I’m wrong.

Numpy arrays may be large, and you can change their size (although with a certain performance penalty). I’ve therefore always assumed they are implemented like dynamically allocated arrays in C or C++. Dynamically allocated arrays may be anywhere on the heap, and are (at a low level) represented by a pointer to the respective address on the heap. Thus, if I have a large array of objects containing a small dynamically allocated array, the large array will store the objects linearly, but the objects will not contain the small array, only a pointer to wherever the small array is stored in memory. Thus, for each object, I would need to look up that address, and get the three elements of the small array. The three elements in the small array will necessarily be linear, but the three elements of one object will not come immediately after the three elements of the foregoing object.

Did you consider this in your above example? I don’t know much about Numba (or the internals of Numpy for that matter), I just did not expect contiguity in this case. Perhaps I’ll do a quick benchmark, one case with numpy array, and one with three floats.

Again, thank you for your response :slight_smile:

we’re going into territory where I don’t have all the answers for you, here’s what I know and what I don’t know:

if you really need to optimize at the level where one pointer lookup is something you are trying to avoid, then it seems that you want to have all the data in linear memory. I am reading your reply as saying that you want all the objects in the object array (let’s call it outer array) to be in linear memory, including the inner array (the x, y, z one) to be embedded in the object data. Ie, not as a pointer to another memory locatio) but to be contiguous to the object they are part of.

The only structure I am aware of that will provide both outer and inner arrays to be in linear memory is the structured array (which can have internal, nested arrays btw). Working with structured arrays will require larger changes to your code than working with jitclasses. The jitclass sample code I provided is most likely not in linear memory. I provided that example to show how OO code would look like.

Outside that, I don’t know what your options are. Maybe on Monday when one of the core devs is online, they can help you with this. What I know is:

  • regarding the outer array: both tuples and Lists will keep value types (integer, floats) in linear memory. I don’t know what the memory model is for memory managed objects. I doubt that jitclass instances can be put in linear memory, but maybe they can. Their pointers will, but not the content. Numba arrays are numpy arrays, so not comparable to c++ arrays because they limited to numpy types, they cannot hold arbitrary user defined types.

  • regarding the inner array: tuples, Lists and arrays will hold floats in linear memory. Whether they are themselves put into linear memory I don’t think it depends on them, but on the container. For example, if putting the array inside a jitclass, it will be the jitclass memory model that determines whether the contents will be held in linear memory with the rest of the object. Most likely, any array that can grow cannot be put in linear memory, for obvious reasons. But even an immutable container like a tuple, might or not be put in linear memory with the rest of the jitclass. It depends on the jitclass.

If you are trying to optimize your code to such an extent that one memory indirection is a problem, it’s possible that you won’t be able to keep an OO syntax/structure, ie no “methods” that “belong” to instances of classes, with containers holding those instances. Depending on the answer to the open questions (below), your only option might be to work in the multiple-dispatch (not OO) style: overloaded functions with different implementations for different input types.

Open questions:

  • If a jitclass has two properties: a float and a tuple of 3 floats, will the 4 floats be allocated contiguously?
  • If yes, will a List or a Tuple of jitclass instances hold the their data or only pointers to the data?
  • If a StructRef has two properties: a float and a tuple of 3 floats, will the 4 floats be allocated contiguously?
  • If yes, will a List or a Tuple of jitclass instances hold the their data or only pointers to the data?

Some extra remarks:

  • I would recommend that you build some examples and test the speed on your own. Sometimes even experts can be surprised, there’s no way to avoid actually benchmarking potential solutions.
  • I don’t think that talking about the heap is relevant in this context, because you said that your outer array is very large, which means that everything has to be allocated on the heap anyway (the stack would be too small to hold all that data). Since everything will be on the heap anyway, the only difference between approaches is whether the data is held contiguously or indirected via pointers.

I hope the above helps more that it confuses :slight_smile: as I said, we’re approaching a level of detail where I don’t know everything.

Thank you. Yes, everything will have to be linear on the heap. We’re talking millions or perhaps billions of the inner objects, with repetitive operations, so having indirections on the inner objects is a real concern. We use supercomputers for this, but up till now, we’ve used C or C++, but I’m curious to see if we can get acceptable results with a much simpler to maintain Python code somehow. This will of course take some experimentation. I’ve had a look at the structured Numpy arrays you mentioned, and so far they look the most promising. We don’t use methods for those anyway.

have you considered the difference between arrays of structures and structures of arrays? that’s something to look into if you are optimizing operations on large arrays of objects.

Sure! In my experience array of structs achieves similar performance if you’re careful. The structs need to be small, and only contain simple data types stored linearly in memory (i.e. no indirection). Then it’s esdentially the same in memory as if you hav structs of arrays. I have tested this for my application, and it works well, but I suppose there could be cases where this does not hold. For instance if you can’t fix the size of containers, which would lead to indirection, or if you end up storing many variables that you don’t use for many of the computations. Then I suppose you could end up with too large strides between the members you actually use. For my application I believe it’s fine, and the abstraction gets somewhat prettier this way. But thanks for the consideration, and (soon) merry christmas, by the way :slight_smile: