Ok, I’ll explain how it works in my case:
Pytorch code runs 16 samplers each contains 128 envieronments. Each env is normal python class with all calculation moved to Numba part. So all envs is persistent (if you familar with gym envs so it is - pytorch produce actions, do step in envs and then observations and rewards returns back, after some steps envs resets and repeat same work pattern). All envs located inside 16 forked processes, which on windows obviously used spawn. But fork in my case provides 100% speed gain, so thats why I forced to use Ubuntu for learning. On windows I have 20k fps but on ubuntu my throughoutput is 50k.
I’m not sure that all functions compilling regularry since all envs are persistent and works in multiprocessing worker in loop.
Also I checked that problem located inside Numba code - it works fine without memory grows if I remove all @njit attributes.
So may be you’re right the core located with forking and numba on Ubunutu. Don’t think this is possible fragmentation issue since changing ENV malloc vars or even changing malloc to jemalloc does not help.
Could you please explain more about immortal nature of LLVMs functions and how to check it. May be I’m using numba in wrong way.
PS: I send to Numba code large numpy np.float32 arrays and process them, so may be somehow copies of python and c++ arrays are stay in memory and GC don’t see it.