I am iterating over a large list with above 1 million elements inside a jitted function. I want to make use of the 128 CPU cores that my system has – how can I parallelise this loop to achieve greater efficiency? Currently it looks like only one core is being used to iterate through this list. Would using prange do this? Or would I have to create my own multiprocessing Pool?
prange is a good way because it is managing threads for you. Things that you need to consider are:
Is the loop body data parallel? If yes, prange will be easy. If not, you might need to partition the work manually.
Beware of race condition because prange will not automatically lock your containers. Since you mention the use of list, make sure the list is not mutated by multiple threads.