Supporting several targets in library code

As part of a NumFOCUS Small Development Grant that we got awarded in poliastro, @s-m-e has been doing a lot of exploration around accelerating orbit propagation for many orbits and many epochs.

One of the things we noticed is that, even though “in general it is recommended not to pass any explicit signature to @jit (unsure if this still holds 6 years later?), it’s unclear if the type inference of the parallel and cuda targets are on par with the cpu target.

If that’s the case, if we are a library and want to support several targets at the same time, we have to either

  • Specify the types on the signature, which is not recommended for the cpu target, or
  • Not specify the types, which might give problems for the parallel and cuda targets.

There’s another possibility, which is that we are wrong with our assessment and all targets have equal type inference capabilities and we can let numba do the work.

What do you folks advise us to do?

The targets have equal type inference capabilities, but the impact of the typing on the targets differs. CUDA kernels are extremely sensitive to register usage, so typing everything as 64-bit types has little impact on the CPU variant of a function, but a big impact on the CUDA variant - the performance improvements from providing signatures that I understand @s-m-e has observed from providing signatures is likely down to this.

The general advice is not to provide signatures - however, I think your use case is a more advanced one for which I’d deviate from the standard advice, and instead I’d suggest explicitly providing signatures.

Providing signatures for the CPU target isn’t an issue per se - it’s just that beginners get it wrong a lot of the time and it is not necessary in many cases unlike your particular situation - so we generally advise against doing it - but, there does come a point at which it makes sense to do so which in my opinion you have reached with this Poliastro work.

1 Like

Also, congratuations on being awarded the NumFOCUS Small Development Grant! :tada:

1 Like

Thanks a lot for the detailed answer @gmarkall! My main concern has always been losing generality by using float64 instead of letting numba figure out the types, getting errors if someone accidentally passes an integer, etc. But these are minor things that can be compensated by other means.

It’s partially performance, yes, but also failures of the compile process itself. If I do not provide signatures for the parallel and cuda targets, the JIT will spit out various fun tracebacks, in some cases not even related to types at all. The “fix”, from experience, is to provide types and suddenly the JIT is happy again. I ran into this before, on smaller scales, but big time in poliastro. I can provide a few examples if I find the time. I distinctly remember three tracebacks I was seeing frequently. My impression was that thanks to the (automatic) parallelization that happens in those cases for both the CPU and GPU the code generation differs enough to make the type interference fail. At least this how I pictured it.

Were any of these for the @guvectorize decorator? That still requires signatures for the CUDA target (I don’t know if it’s also needed for the parallel one).