Pandas dataframes with numba

tulbureandreit · October 18, 2022, 6:00pm

Hi,

Sorry if double post.

Has anybody managed to speed up DataFrame computes with Numba?

I have a forecasting compute job that uses dataframes and takes up about 10 hours to compute for the most intensive customer(that has more than 1000 products).

I have a for loop for model selection(evaluating 5 models using cross validation on historic data) which takes up 95% of this time. Can anybody tell me how could I use numba or cython (?) to speed things up ? I think caching the functions would be the game changer here as the for loop repeats the same compute over and over again.

any help if welcome and golden for me right now!

Thanks,
Andrei

cuda_enthusiast · October 19, 2022, 3:15am

Hi @tulbureandreit, please look into the CUDF from Nvidia Rapids. CUDF is an [almost] drop-in replacement of pandas DF that runs on GPU. We’ve seen massive performance improvements with CUDF.

cuda_enthusiast · October 19, 2022, 3:17am

Here is the link for you convenience: https://rapids.ai/

Topic		Replies	Views
Numba and PySpark users? Community Support	3	1869	May 11, 2022
Use Numba with PySpark Community Support	2	503	June 27, 2022
Implementing pandas DataFrame type via numba extension types Community Support	1	718	July 21, 2023
Implementing a function including ARIMA Model to run in a CUDA Kernel Support: How do I do ...?	0	544	July 11, 2022
Tutorial on supporting Python User-Defined Functions in CUDA-accelerated Applications with Numba Showcase	0	465	March 25, 2022

Pandas dataframes with numba

Related topics