Cuda compiled argument signature

I’ve been playing with compiling functions with the numba.cuda.compile_ptx function and examining the PTX output. I’m interested in using it to compile functions that I will link to a larger PTX code base.
To that end, I need to understand the arguments that get passed to the generated function. I’ve managed to figure out most of them. For example, if I compile a device function that takes a 2-D array of float64s and returns a float64, the resulting function takes 10 arguments:

  • Pointer to the returned value
  • ???
  • ???
  • size
  • itemsize
  • array data
  • shape[0]
  • shape[1]
  • strides[0]
  • strides[1]

My question is, what are the other two arguments? I can’t seem to figure it out.
-Ed

The last 9 arguments are all components of Numba’s arystruct_t, which is used to represent a NumPy array in JITted code. It is defined in numba/_arraystruct.h:

typedef struct {
    void     *meminfo;  /* see _nrt_python.c and nrt.h in numba/core/runtime */
    PyObject *parent;
    npy_intp nitems;
    npy_intp itemsize;
    void *data;

    npy_intp shape_and_strides[];
} arystruct_t;

The arguments you can’t figure out are pointers to the MemInfo object, used for reference counting (not relevant on the GPU) and also a pointer to the Python object that owns the array data (also not used on the GPU). When the CUDA target launches a kernel with an array parameter, it just passes in null pointers: numba/dispatcher.py at 720b357320d99eceed149be5f2a7ae20ec67642c · numba/numba · GitHub

            meminfo = ctypes.c_void_p(0)
            parent = ctypes.c_void_p(0)

Related to your use case, there is this PR: CUDA: Facilitate and document passing arrays / pointers to foreign functions by gmarkall · Pull Request #8167 · numba/numba · GitHub - this allows you to call other device functions passing pointers instead of entire array structures - if you have C++ device functions that expect a pointer, it will be handy - the docs generated from the PR for this are: Calling foreign functions from Python kernels — Numba 0.56.0.dev0+1018.g435d7ee66.dirty-py3.7-linux-x86_64.egg documentation.

Your use case makes me wonder if we should look at doing something like that PR, but for functions compiled by Numba, so that you don’t have to pass in so many parameters - what do you think about this idea? In an ideal scenario, I wonder how many of these parameters we could drop.

1 Like