How do I use Numba to run Trading Back Testing?

I programmed a trading algorithm to get me 50 Lambos. I also have acquired years of market data for back testing the algorithm. The algorithm has a few basic parameters that I modify and then back test and get some basic statistics on how the algorithm performed historically. This allows me to fine tune the parameters for max Lambos. :smiley:

The problem is testing all these combos in my single threaded python app takes FOREVER. I want to get my video card to do it in parallel to speed things up significantly.

So, I’ve got the static read-only historical market data broken down into 4x 1-D float arrays (open, low, high, close). There is also a single 1-D int32 array with a unix timestamp. The indicies of these 5 arrays correspond to the same historical data point. To me, this should be in a shared memory.

Then I have the parameters, A start and end time to check in the above arrays, as well as a few other float values that I need to make decisions while traversing the market data (4x arrays above) between the start and end times.

As output, I have several statistics as floats (could be put into a float array if needed) that are captured to help shed light on how a particular parameter combination performed. The number of these statistics is fixed for any iteration and has no relation to the size of any input array.

I took a look @ guvectorize, but it seems like the output array needs to be related to one of the input variable sizes. I also tried iterating through the historical data arrays, but it didn’t seem to like not starting at the beginning.

Am I looking in the right place? @guvectorize? If not, how can I get my back tester running on a GPU core?

I am new to all this (but am an experienced developer) so I would appreciate any pointers anyone has.

@guvectorize might just be too restrictive for your usecase; esp, if you want to look into GPU shared memory to optimize things. I would suggest writing in @cuda.jit for all the freedom.

I’d also suggest using cupy and cudf for high-level API for GPU array and GPU dataframe, respectively. You can write CUDA kernels in Numba that work with data stored in these libraries.

1 Like

thanks @sklam ! This is perfect. I had a feeling that guvectorize wasn’t where I needed to be. I’ll check out your suggestions! Thanks again!

I’m writing a bit late, but I wanted to share my experience.

The main computation loop is implemented using Numba, which, compared to plain Python, offers a speed improvement of 100-1000 times. Here’s a snippet of what the code looks like:

@nb.njit((int64)(nb_record_type[:], unicode_type, float64, float64, NPDatetime('ns')))
def RunTrack(...):
    for i in range(N):
        ...
        with objmode():
            ...

The key data structure in use, nb_record_type[:] , is a shared array (gigabytes) consisting of structures such as:

record_type = np.dtype([("var1", np.float64), ("wo_spy", np.float64), ...,
("b_tRngRto", np.float32),("e_tRngRto", np.float32),
("b_tRngRto100", np.float32),("e_tRngRto100", np.float32),
("oneetf_coded", np.uint64),("dt",'datetime64[ns]')
], align = True)

All this runs on a cluster managed by ray (framework), and inside each 32-processor node nb_record_type[:] is used by all processes without memory copying.

If not for Numba (i.e., if I had to stick to Python), I would have had to rent not 100 nodes, but 10,000 nodes, which would be financially untenable.

A small trick, modern Numba allows using with objmode() to run Python procedures in which I conveniently load (pandas,json,databases,numpy,fs) additional data for computation (in chunks) as I progress through the loop.