Avoid multiple copies of large numpy array in closure?

Hi Stan and Siu,

Good morning!

I am an author of mcvine, a Monte Carlo neutron ray tracing simulation package. We have been using numba to accelerate mcvine: mcvine.acc. We exchanged emails a couple years ago. Thanks for the help back then!

numba has been working quite well for us. Thank you again for the great work on numba. We encountered a problem recently that I think is likely related to variables in closure. The numba docsays that numba may or may not copy global variables. I think in our case it is copied and requires a lot of memory and eventually leads to failure of the code. The code in question is like this

def makeS(S_QxQy, Qx_min, Qx_max, Qy_min, Qy_max):
@cuda.jit(device=True)
def S(threadindex, rng_states, neutron):
# assume neutron velocity is mostly along z
v = neutron[3:6]

, where S_QxQy is a numpy float array while Qx_min, Qx_max, Qy_min, Qy_max are floating point numbers.

  • If the shape of the array is (10, 10) the code works perfectly fine.
  • If the shape of the array is (100, 100) the code fails with this error:
    • File uses too much global constant data (0x13880 bytes, 0x10000 max)
  • If the shape of the array is (1100, 1475), which is the typical use case, the code does not fail immediately but rather can be seen allocating a lot of memory until it crashes

I wonder if there is a way to make sure the array is not copied? I tried using cuda.to_device on S_QxQy before passing it into the closure, thinking that might tell numba to create references instead of copies, but it failed with the error

‘DeviceNDArray’ object has no attribute ‘tobytes’

Please advise how to make sure the array is not copied multiple times. Your help will be much appreciated.

Best,

Jiao

Thanks for the report - in all cases Numba is creating const copies into the kernel with different results:

  • For (10, 10), the data is small enough that this succeeds.
  • For (100, 100), Numba manages to produce the PTX incorporating the constant which is too large, resulting in the error message you reported.
  • For (1100, 1475), Numba is probably getting tied up producing an extremely long string, and not finishing in a reasonable amount of time. ptxas would produce a similar error message if this finished.

I think always creating const copies of global and closure variables for kernels (as documented in Deviations from Python Semantics — Numba 0.57.1+0.g04e81073b.dirty documentation) is a wrong policy decision, and Numba should be changed to always create references instead. I’ve created an issue to describe this along with a small proof-of-concept patch: CUDA kernels should not make const copies of global and closure variables · Issue #9084 · numba/numba · GitHub

I haven’t yet thought of a good workaround for your use case, but will give it some more thought.

Thanks a lot @gmarkall !! For now I was able to apply your revision to my local numba installation and it seems to be working. I hope the revision can be further researched and adopted. Thanks again!