I am new to numba. I am using PyTorch and Numba together. I wish to clear the resources in the CUDA context after training one model and then continue to train another model in one script. I am currently using
cuda.current_context().reset() API. However, there are a part of the CUDA memory that I cannot clear with this API. As I train more and more models in this script, it seems that the CUDA memory that I cannot clear increases, and it finally leads to OOM error.
I am pretty sure that I have deleted the old models and datasets before training another model. I also runI have already tried
It seems that
cuda.close() will clear that part of memory. But these API will destroy CUDA context, and I cannot continue to use torch.distributed APIs afterwards.
I am wondering why
cuda.current_context().reset() cannot clean up all the memory in the context? From the doc, I think this API should clean up all resources in current context. Is there any way to clear the context without destroying it “for real”?
I would really appreciate any help! Thank you!