Intel SDC, Pandas and contributions to Numba?

Hello, I’m a Numba user, a co-founder of the PyDataLondon meetup and conference series and semi-regular conference speaker. Earlier in the year I spoke at a couple of conferences on high performance Python (EuroPython, Remote Pizza Python, PyData Amsterdam) and at one Intel also spoke where they introduced Intel SDC (Scalable Dataframe Compiler). I had some questions for them and wasn’t clear on their answers.
In chat they noted that they had extended Numba to work efficiently on strings and on datetimes (IIRC, my memory may be faulty) and to work with Pandas. As best I see such changes aren’t in Numba so I’m confused.
Does anyone know about the relationship with Intel SDC to Numba, whether improvements are shared back to Numba (and to other projects) and whether SDC does indeed extend Numba to work with Pandas? The SDC documentation isn’t brilliant at this stage.
No worries if nobody here has a clear answer, I figured this might be a sensible first place to check on the topic. Cheers, Ian (UK)

The Intel SDC team has been working with the Numba team and has been upstreaming many enhancements, including but not limited to string features. SDC’s work on supporting Pandas API has also sponsored many improvements to Numba around extending the compiler to support new datatypes and compiler passes.

Not all SDC’s features are shared back to Numba and, IMO, is a good thing. SDC is pushing the limit of what a compiler can do in auto-parallelization of Pandas operations, but Numba needs to be more cautious to ensure stability for its users. The separation also help ease our maintenance burden as Numba is becoming large and complicated.

1 Like

Also, the complexity of Pandas support (which requires reimplementing many algorithms in a compiler-friendly way) means that it would not be a good idea to upstream Pandas support into Numba itself. Instead, that feature set should live as an extension to Numba, which is how SDC is implemented. This work has pushed the Numba team to increase focus on extensibility by external code bases, which also enables tools like numba-scipy, Awkward Array, and others.

Incidentally, the NumPy support in Numba probably ought be moved (in principle, though there is much entanglement currently due to the age of this part of the code) to an extension, as well as the GPU targets. There are historical reasons not to do that now, but conceptually it should be possible.

@sseibert @sklam apologies to be slow to acknowledge your replies - many thanks to you both (I’m a new first-time father and a baby makes…the world much slower!). I’ll keep an eye on Intel SDC and I look forward to trying it, I’m glad there’s good cooperation back to Numba. Thanks both! Ian.