I am an author of mcvine, a Monte Carlo neutron ray tracing simulation package. We have been using numba to accelerate mcvine: mcvine.acc. We exchanged emails a couple years ago. Thanks for the help back then!
numba has been working quite well for us. Thank you again for the great work on numba. We encountered a problem recently that I think is likely related to variables in closure. The numba docsays that numba may or may not copy global variables. I think in our case it is copied and requires a lot of memory and eventually leads to failure of the code. The code in question is like this
def makeS(S_QxQy, Qx_min, Qx_max, Qy_min, Qy_max): @cuda.jit(device=True)
def S(threadindex, rng_states, neutron):
# assume neutron velocity is mostly along z
v = neutron[3:6]
…
, where S_QxQy is a numpy float array while Qx_min, Qx_max, Qy_min, Qy_max are floating point numbers.
If the shape of the array is (10, 10) the code works perfectly fine.
If the shape of the array is (100, 100) the code fails with this error:
File uses too much global constant data (0x13880 bytes, 0x10000 max)
If the shape of the array is (1100, 1475), which is the typical use case, the code does not fail immediately but rather can be seen allocating a lot of memory until it crashes
I wonder if there is a way to make sure the array is not copied? I tried using cuda.to_device on S_QxQy before passing it into the closure, thinking that might tell numba to create references instead of copies, but it failed with the error
‘DeviceNDArray’ object has no attribute ‘tobytes’
Please advise how to make sure the array is not copied multiple times. Your help will be much appreciated.
Thanks for the report - in all cases Numba is creating const copies into the kernel with different results:
For (10, 10), the data is small enough that this succeeds.
For (100, 100), Numba manages to produce the PTX incorporating the constant which is too large, resulting in the error message you reported.
For (1100, 1475), Numba is probably getting tied up producing an extremely long string, and not finishing in a reasonable amount of time. ptxas would produce a similar error message if this finished.
Thanks a lot @gmarkall !! For now I was able to apply your revision to my local numba installation and it seems to be working. I hope the revision can be further researched and adopted. Thanks again!
@yqxd There is now a resolution to this issue - it requires copying the arrays to the device, but then any device arrays referenced as globals or closure variables are captured by reference rather than by value. There is some new documentation explaining this behaviour in the Device Array Capture documentation.