Windows 10
Numba 0.56.4
I’ve observed significant memory and disk usage that may be related to mangling. We see mangled type strings that are ~500kB in size and they seem to appear at least ten times in the llvm code, leading to trivial functions with complex arguments producing .nbc cache files on the order of 5-10MB.
These large mangled type strings are also used as keys in the Environment._memo store and it seems likely that the llvm code is persisted in memory as well, leading to excessive memory use.
Is there any way to reduce this impact? I’ve considered monkey-patching the mangling function, with some strategy like keeping the prefix and namespace and replacing the remainder with a hash of the whole string. Does that seem reasonable, or do obvious dragons live down that path?
Minimal sample program is below.
from numba import njit, float64, int16, int64, uint32
from numba.core import types
from numba.core.itanium_mangler import mangle_type_or_value
from numba.experimental import structref
from numba.core.environment import Environment
@structref.register
class MyStructTypeClass(types.StructRef):
pass
MyStructType = MyStructTypeClass([('my_field', int64)])
@structref.register
class S1TypeClass(types.StructRef):
pass
S1Type = S1TypeClass([('x1', int16), ('x2', MyStructType), ('x3', float64)])
# notice below uint32 mangled as `j` https://github.com/numba/numba/blob/main/numba/core/itanium_mangler.py#L54
@njit(float64(uint32, S1Type))
def calculation(new_term, my_s1):
return my_s1.x1 + my_s1.x2.my_field + my_s1.x3 * new_term
sig_substr = "".join([mangle_type_or_value(arg) for arg in calculation.nopython_signatures[0].args])
assert sig_substr == "jN5numba67S1TypeClass_28_28_27x1_27_2c_20int16_29_2c_20_28_27x2_27_2c_20numba96MyStructTypeClass_28_28_27my_field_27_2c_20int64_29_2c_29_29_2c_20_28_27x3_27_2c_20float64_29_29E" # noqa: E501
if __name__ == '__main__':
calculation_llvm = next(iter(calculation.inspect_llvm().values()))
ct = calculation_llvm.count(sig_substr)
assert ct >= 10, ct # ten or more copies of the mangled signature in the llvm code
print([x for x in Environment._memo.keys()])