Array of jitclass with low memory usage

Nicholaswogan · February 2, 2023, 5:03pm

I have some data that is awkwardly structured. In Fortran, I can easily contain this data with a 2-D array of derived types (i.e. structs in C).

program main
  use iso_fortran_env, only: dp => real64
  implicit none

  type :: ContainerType
    real(dp), allocatable :: arr1(:)
    real(dp), allocatable :: arr2(:)
  end type

  type(ContainerType), allocatable :: dat(:,:)

  ! `dat` can easily contain all my data

end program

My goal is to create a similar data structure in Numba. My current solution is using nested lists of jitclasses

spec = [
    ('arr1', types.double[:]),
    ('arr2', types.double[:]),
]
@nb.experimental.jitclass(spec)
class Container_0():
    def __init__(self, arr1, arr2):
        self.arr1 = arr1
        self.arr2 = arr2

spec = [
    ("c0", nb.types.ListType(Container_0.class_type.instance_type)),
]
@nb.experimental.jitclass(spec)
class Container_1():
    def __init__(self, c0):
        self.c0 = c0
        
spec = [
    ("c1", nb.types.ListType(Container_1.class_type.instance_type)),
]
@nb.experimental.jitclass(spec)
class Container_2():
    def __init__(self, c1):
        self.c1 = c1

While this approach seems to work in numba, it takes up massive amounts of memory. For example, If I store 60,000 double precision numbers in this data structure, then it takes ~ 1 Gb of memory. In Fortran or C, it should only take at most 1 Mb to store the data.

MY QUESTION: Is there an alternative way to store my data in Numba which doesn’t use so much memory?

luk-f-a · February 2, 2023, 9:30pm

Hi!

have you considered structured arrays? Structured arrays — NumPy v1.24 Manual

luk

Nicholaswogan · February 2, 2023, 11:39pm

I have not considered Structured arrays. But I just looked at the documentation, and it seems like they can not hold data in the way that I describe. I need to be able to hold an array of objects, which seems to not be supported by structured arrays.

nelson2005 · February 2, 2023, 11:41pm

a C example would help make the problem more concrete for me.

Nicholaswogan · February 3, 2023, 12:30am

Below is an example in C. Container ** is effectively a 2-D array of Container structs, which in the example is 100 by 100 in size. Each element of this 2D array is a struct which itself has two separate arrays (arr1 and arr2). In the example, I allocate arrays of length 3 always, but in principle arr1 and arr2 could be any length. In my real application, arr1 and arr2 would be different lengths for each element of the 2d array.

Of course, I don’t delete the memory in the example. But its just an example.

The script stores 60,000 doubles with 800 Kb of memory.

#include <stdlib.h>
#include <stdio.h>

typedef struct Container
{
  double *arr1;
  int arr1_n;
  double *arr2;
  int arr2_n;
} Container;

int main()
{ 
  int tmp;
  int n = 100;
  int m = 100;
  int total_memory = 0;

  Container **c = (Container **) malloc(sizeof(Container*)*n);
  total_memory += sizeof(void*)*n;
  for (int i = 0; i < n; i++){
    c[i] = (Container *) malloc(sizeof(Container)*m);
    total_memory += sizeof(Container)*m;
    for (int j = 0; j < m; j++){
      c[i][j].arr1 = (double *) malloc(sizeof(double)*3);
      total_memory += sizeof(double)*3;
      c[i][j].arr1_n = 3;

      c[i][j].arr2 = (double *) malloc(sizeof(double)*3);
      total_memory += sizeof(double)*3;
      c[i][j].arr2_n = 3;
    }
  }

  printf("double precision numbers stored: %i\n", n*m*6);
  printf("total memory: %i bytes\n",total_memory);
  return 0;
}

nelson2005 · February 3, 2023, 1:51am

Thanks, that’s quite helpful.

How are you measuring the memory usage in python?

You used length 3, is that an expected ‘average’ value? Having some idea about that could inform solutions.

A runnable minimal python reproducer could be useful starting point. I’d think a python solution with memory usage reasonably comparable to the C example may well be possible.

Nicholaswogan · February 3, 2023, 4:49pm

I’m using the memory_profiler package to measure memory usage in Python. I’ve also just used the “Activity Monitor” application that comes with Macs to see how much more memory a python process takes, when I make an object.

You used length 3, is that an expected ‘average’ value? Having some idea about that could inform solutions.

The length is random between 0 and maybe up to 10

Below is a minimum example. Copy-paste into a .py file and run with python -m memory_profiler profile_script.py. The object c2 is ~215 Mb on my computer.

I’ve created the equivalent script in C, which takes 32 Mb to store the same amount of information. In the C program (or Python program), about 2,500,000 doubles are stored, which should take about 20 Mb. In C, it takes an additional 12 Mb of memory to define the structure of the data (i.e. the pointers and sizes of arrays). In Numba, it is taking 215 - 20 = 195 Mb to define the structure of the data, a factor of 195/12 = ~16 more data than in C.

import numpy as np
from numba import types
import numba as nb
from memory_profiler import profile

spec = [
    ('arr1', types.double[:]),
    ('arr2', types.double[:]),
]
@nb.experimental.jitclass(spec)
class Container_0():
    def __init__(self, arr1, arr2):
        self.arr1 = arr1
        self.arr2 = arr2

spec = [
    ("c0", nb.types.ListType(Container_0.class_type.instance_type)),
]
@nb.experimental.jitclass(spec)
class Container_1():
    def __init__(self, c0):
        self.c0 = c0
        
spec = [
    ("c1", nb.types.ListType(Container_1.class_type.instance_type)),
]
@nb.experimental.jitclass(spec)
class Container_2():
    def __init__(self, c1):
        self.c1 = c1
        
@nb.njit()
def make_Container_2():
    np.random.seed(0)
    n = 1000
    m = 500
    N = np.random.randint(0,6,size=(n,m))
    
    list_2 = nb.typed.List()
    for i in range(n):
        list_1 = nb.typed.List()
        for j in range(m):
            arr1 = np.random.uniform(0.0, 1.0, size=N[i,j])
            arr2 = np.random.uniform(0.0, 1.0, size=N[i,j])
            
            c0 = Container_0(arr1, arr2)
            list_1.append(c0)
        c1 = Container_1(list_1)
        list_2.append(c1)
    c2 = Container_2(list_2)
    
    return c2

@profile
def main():
    c2 = make_Container_2()

if __name__ == "__main__":
    make_Container_2() # compile stuff
    main()

nelson2005 · February 3, 2023, 8:12pm

Thanks for a great starting point!

luk-f-a · February 3, 2023, 10:10pm

if the inner arrays can have different lenghts then struct array does not work but maybe an “awkward” array can help you. Awkward Array documentation — Awkward Array 2.0.6 documentation

it’s an array-like container of arrays, so it might work for your case.

Luk

nelson2005 · February 4, 2023, 12:05am

would you consider a solution where all the arr1, arr2 values are stored in a single ndarray and the 2-D control structure was simply indexes into that storage array? For example, the control structure could be something like below, all pointing into the single storage array.

index_type = np.dtype([
    ('arr1_begin_index', np.uint64),
    ('arr1_end_index', np.uint64),
    ('arr2_begin_index', np.uint64),
    ('arr2_begin_index', np.uint64)
])

Nicholaswogan · February 4, 2023, 7:45pm

That would work in practice. But it would be a little clumsy I guess.

It gives me an idea. It might take much less memory to store the 2-D array of objects in a single list, instead of a list of lists. I’m going to try this tomorrow at some point.

DannyWeitekamp · February 5, 2023, 2:15am

An awkward/ragged array-like solution as luk and nelson suggest would definitely be more clumsy. Although if your aim is low-memory and speed I doubt you would find a better solution—I also wouldn’t be surprised if this had a better runtime than the C or Fortran options. It’s definitely going to be faster and less memory intensive to allocate a handful of big array than 10,000 individual arrays. Keep in mind that there is a little more memory overhead in numba than C/Fortran per array. Numpy arrays have extra members like a shape, and they also have a refcount so they can get cleaned up in the Numba Runtime. With that in mind, these extra things don’t seem big enough to account for the 1 Mb → 1 Gb jump you’re seeing… so I’m a little bit curious about the results of your idea to use a single list.

Nicholaswogan · February 5, 2023, 11:58pm

I tried a single list of jitclass instead of a list of lists. Roughly the same memory is used either way. I’ve done some experimenting with C, Fortran and Numba.

The following struct in C is 24 bytes (sizeof(Container) is 24 bytes).

typedef struct Container_t
{
  int n;
  double *arr1;
  double *arr2;
} Container;

The following somewhat equivalent derived type in Fortran is 128 bytes

type :: Container
  real(dp), allocatable :: arr1(:)
  real(dp), allocatable :: arr2(:)
end type

The following Python somewhat equivalent jitclass is about 400 bytes:

@nb.experimental.jitclass
class Container_0():
    arr1 : types.double[:]
    arr2 : types.double[:]
    def __init__(self, arr1, arr2):
        self.arr1 = arr1
        self.arr2 = arr2

So, it looks like C will do a factor of ~17 lower memory than numba, and Fortran will do a factor of 3 lower memory than numba, using the array of structs technique.

The solution proposed by @nelson2005 will be the most memory efficient in my case by far. It should only take four, 4 byte integers (= 16 bytes) to store indexes for one “struct” shown above. Therefore this solution should be even more efficient than an array of Structs in C.

Edit:

It is possible to write a lower storage struct in Fortran that is equivalent to the C struct in memory and usage.

type :: Container_1
  integer :: n
  type(c_ptr) :: arr1
  type(c_ptr) :: arr2
end type

nelson2005 · February 6, 2023, 1:36am

I’d think a single 4-byte int for base offset plus two one-byte integers for arr1, arr2 length should be sufficient, for a total of 6 bytes to store indexes for one struct.

I think the interface needn’t be worse than the C interface… something like below should be pretty straightforward to use.

cls.set(i, j, arr1, arr2)
and arr1, arr2 = cls.get(i, j)

stuartarchibald · February 6, 2023, 11:24am

I may have missed something in the above, but thought I’d check… my question is: is the memory use being attributed correctly?

Depending on which version of Numba is in use, at some point between importing Numba and there being an instance of a jitclass (or a typed.List), the library bindings to LLVM have to be loaded and initialised, these are not small, then a lot of code has to be compiled to make the jitclass actually work (think accessors, constructors, destructors, boxing/unboxing code etc) along with things like the Numba runtime, all of which adds to memory use. Once all this is done, a jitclass/typed.List would finally be at a point where it can be instantiated. Perhaps it would be worth seeing what the memory cost of creating a one element list with a single jitclass instance is to make sure that the memory cost isn’t also including Numba/LLVM/compiled objects etc?

As @DannyWeitekamp noted above, there is some additional information associated with Numba equivalents of many data structures, but the difference is usually something like a couple of void* members or similar so I wouldn’t expect it to cause the extreme differences noted above.

Hope this helps?

Nicholaswogan · February 6, 2023, 4:25pm

I think I’m measuring the memory correctly.

A numba list of int64 seems to have 8 bytes per element.
A numba list of jitclass which contains only a single int64 seems to take about 64 bytes per element
A numba list of jitclass which contains a 1-D array of doubles allocated to zero length seems to take about 200 bytes per element

So there is a little extra memory baggage with jitclass. But it’s not as terrible as I was originally thinking.

Topic		Replies	Views
Static fixed-size arrays in other arrays Community Support	6	1828	December 20, 2020
C like struct in numba Community Support	14	3759	October 12, 2020
Complex Structured Inputs Community Support	10	532	July 13, 2022
Heterogeneous data container with mutable elements in jitted code Support: How do I do ...?	12	2006	April 27, 2024
How to build complex data structures with working JIT cache? Support: How do I do ...?	13	367	December 8, 2024

Array of jitclass with low memory usage

Related topics