Avoid multiple copies of large numpy array in closure?

yxqd · July 20, 2023, 4:14pm

Hi Stan and Siu,

Good morning!

I am an author of mcvine, a Monte Carlo neutron ray tracing simulation package. We have been using numba to accelerate mcvine: mcvine.acc. We exchanged emails a couple years ago. Thanks for the help back then!

numba has been working quite well for us. Thank you again for the great work on numba. We encountered a problem recently that I think is likely related to variables in closure. The numba docsays that numba may or may not copy global variables. I think in our case it is copied and requires a lot of memory and eventually leads to failure of the code. The code in question is like this

def makeS(S_QxQy, Qx_min, Qx_max, Qy_min, Qy_max):
@cuda.jit(device=True)
def S(threadindex, rng_states, neutron):
# assume neutron velocity is mostly along z
v = neutron[3:6]
…

, where S_QxQy is a numpy float array while Qx_min, Qx_max, Qy_min, Qy_max are floating point numbers.

If the shape of the array is (10, 10) the code works perfectly fine.
If the shape of the array is (100, 100) the code fails with this error:
- File uses too much global constant data (0x13880 bytes, 0x10000 max)
If the shape of the array is (1100, 1475), which is the typical use case, the code does not fail immediately but rather can be seen allocating a lot of memory until it crashes

I wonder if there is a way to make sure the array is not copied? I tried using cuda.to_device on S_QxQy before passing it into the closure, thinking that might tell numba to create references instead of copies, but it failed with the error

‘DeviceNDArray’ object has no attribute ‘tobytes’

Please advise how to make sure the array is not copied multiple times. Your help will be much appreciated.

Best,

Jiao

gmarkall · July 21, 2023, 4:15pm

Thanks for the report - in all cases Numba is creating const copies into the kernel with different results:

For (10, 10), the data is small enough that this succeeds.
For (100, 100), Numba manages to produce the PTX incorporating the constant which is too large, resulting in the error message you reported.
For (1100, 1475), Numba is probably getting tied up producing an extremely long string, and not finishing in a reasonable amount of time. ptxas would produce a similar error message if this finished.

I think always creating const copies of global and closure variables for kernels (as documented in Deviations from Python Semantics — Numba 0.57.1+0.g04e81073b.dirty documentation) is a wrong policy decision, and Numba should be changed to always create references instead. I’ve created an issue to describe this along with a small proof-of-concept patch: CUDA kernels should not make const copies of global and closure variables · Issue #9084 · numba/numba · GitHub

I haven’t yet thought of a good workaround for your use case, but will give it some more thought.

yxqd · July 23, 2023, 6:17am

Thanks a lot @gmarkall !! For now I was able to apply your revision to my local numba installation and it seems to be working. I hope the revision can be further researched and adopted. Thanks again!

Topic		Replies	Views
Vectorize with arbitrary output shape Community Support	1	524	September 8, 2021
Feedback on tips for first-timers Community Support	14	499	August 15, 2023
Convert array with numbers to array of strings with format Support: How do I do ...?	1	339	April 9, 2023
String Slicing and clipping in Numba Numba	5	219	April 11, 2023
Understanding the Global Array Lowering Size Limit Development	1	473	December 9, 2021

Avoid multiple copies of large numpy array in closure?

Related Topics