Hi, I am doing some higher genus Riemann Theta, i need to squeeze the last possible cycles
over the cpu or gpu over my laptop. I hope even to make it cpu+gpu together, I am using Deepseek/Claude heavily for coding.
I noticed that Numba 0.61.2 is built against LLVM 15 not the last 20.
For the cpu:
I tried a naive approach to “hack” Numba into llvm/ir and build against clang-20
but it turned out to be more complicated there is some libs for linking.
For the gpu:
It seems I am hiting into PTX optimization problem … standard gpu optimization
better balance between ram (constant/shared/global) and gpu loading ..it is ok
I accept this… may be better Numba/CUDA kernel with some inline PTX for critical
regions.
For both cases, it seems that i need more than what is in the manual. I am wondering
if you guys shall be generous to share some internal technical manual with the rest.
Since now, Numba is opensource.
Any help, do you have an internal comparative study of Numba vs new LLVM?!
Thank you very much for any help.
Kh.