Memory grows all time

I use Numba im multiprocessing environment with pytorch for reinforcement learning task. Numba pre-process my observations before pytorch do SGD job. Numba’s rtsys shows no memory leaks - all allocated objects released inside Numba. But something eats 1gb of my memory per 10 mins, so I can’t run learning more than 1hr long on my 32gb machine with Ubuntu 20.04.

Without njit pure python code works fine. No memory grow at all for 24hr. I tried to use jemalloc and python3 but on my Ubuntu machine but I did not see any positive changes. As I remmember same Numba code on windows machine works fine - memory drops and stay clear for next processing steps so GC works on windows but not on Ubuntu. Please help me, since Numba gives x2 speedup and this is huge gain for my research.

Please help me, i’m stuck on this problem for 2 weeks.

Hi Frankie,

Can you please post a reproducible example? It is difficult to help if the community cannot examine and test the code you are running.

BR.

It’s not easy, lots of additional pytorch code and libs. I have number of NumbaIRAssumptionWarning with strange variable names like

NumbaIRAssumptionWarning: variable '$phi572.2.290' is not in scope.

This warning came from an internal pedantic check.

But the following code:

    s = rtsys.get_allocation_stats()
    if s.free != s.alloc:
      print(s)

show no memory leaks inside numba. Also on windows I dont have leaks at all GC works as expected, so it seems this is Ubuntu specific issue. I thought yesterday that may be it related with memory fragmentation but

echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory

have no effects to clear unused memory.

Its not easy - lots of additional code base on pytorch.
I have several warning from Numba like:

/home/andrey/miniconda3/envs/rlpyt/lib/python3.9/site-packages/numba/core/ssa.py:272: NumbaIRAssumptionWarning: variable 'j.70' is not in scope.

This warning came from an internal pedantic check. Please report the warning

but no memory leaks inside Numba according to

    rtsys.get_allocation_stats()
    if s.free != s.alloc:
      print(s)

I dont understand what NumbaIRAssumptionWarning means so may be this is the core of problem.

I thoght may be it related with memory fragmentation issue on Ubuntu since same code on windows works fine - GC works as expected, but the following commands does not help

3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory

I also tried malloc sepcific env vars like

MALLOC_MMAP_THRESHOLD_=1024;
MALLOC_MMAP_MAX_=16777216

no success.

@frankie4fingers
It’s going to be tricky to diagnose this without some sort of reproducer, but I’ll make some guesses/comments.

I don’t think the Numba IR warning is related. I’m also fairly sure the runtime isn’t leaking if stats for alloc == free.

This could be related to the “immortal” nature of functions compiled with LLVMs MCJIT, i.e. they can’t be free’d. Are you compiling a lot of functions/many versions of functions or compiling regularly throughout the lifetime of the program? However, were this the case I’d expect windows to also have this problem.

As windows does not show this problem, is it a spawn vs. fork thing perhaps? Do you use multiprocessing somewhere? If so could you perhaps try changing fork to spawn with multiprocessing — Process-based parallelism — Python 3.10.0 documentation?

If the issue is fragmentation then there’s quite a lot of discussion about this and how to measure/spot it on this issue in the Dask project Memory leak with Numpy arrays and the threaded scheduler · Issue #3530 · dask/dask · GitHub

I’d also double check that the environments and versions of the packages are absolutely identical across windows and linux to make sure as close to the same thing is being compared as possible. Also, where did you get Numba from for your env?

Hope this helps?

Ok, I’ll explain how it works in my case:

Pytorch code runs 16 samplers each contains 128 envieronments. Each env is normal python class with all calculation moved to Numba part. So all envs is persistent (if you familar with gym envs so it is - pytorch produce actions, do step in envs and then observations and rewards returns back, after some steps envs resets and repeat same work pattern). All envs located inside 16 forked processes, which on windows obviously used spawn. But fork in my case provides 100% speed gain, so thats why I forced to use Ubuntu for learning. On windows I have 20k fps but on ubuntu my throughoutput is 50k.

I’m not sure that all functions compilling regularry since all envs are persistent and works in multiprocessing worker in loop.

Also I checked that problem located inside Numba code - it works fine without memory grows if I remove all @njit attributes.

So may be you’re right the core located with forking and numba on Ubunutu. Don’t think this is possible fragmentation issue since changing ENV malloc vars or even changing malloc to jemalloc does not help.

Could you please explain more about immortal nature of LLVMs functions and how to check it. May be I’m using numba in wrong way.

PS: I send to Numba code large numpy np.float32 arrays and process them, so may be somehow copies of python and c++ arrays are stay in memory and GC don’t see it.

@frankie4fingers this does sound like a fork vs. spawn issue.

When Numba compiles functions, it does so with LLVM. Numba’s LLVM usage has an internal cache which contains everything it has ever compiled, this cannot be cleared and is live for the lifetime of a process. If you compile lots of functions throughout the lifetime of a process this cache can obviously grow.