I programmed a trading algorithm to get me 50 Lambos. I also have acquired years of market data for back testing the algorithm. The algorithm has a few basic parameters that I modify and then back test and get some basic statistics on how the algorithm performed historically. This allows me to fine tune the parameters for max Lambos.
The problem is testing all these combos in my single threaded python app takes FOREVER. I want to get my video card to do it in parallel to speed things up significantly.
So, I’ve got the static read-only historical market data broken down into 4x 1-D float arrays (open, low, high, close). There is also a single 1-D int32 array with a unix timestamp. The indicies of these 5 arrays correspond to the same historical data point. To me, this should be in a shared memory.
Then I have the parameters, A start and end time to check in the above arrays, as well as a few other float values that I need to make decisions while traversing the market data (4x arrays above) between the start and end times.
As output, I have several statistics as floats (could be put into a float array if needed) that are captured to help shed light on how a particular parameter combination performed. The number of these statistics is fixed for any iteration and has no relation to the size of any input array.
I took a look @ guvectorize, but it seems like the output array needs to be related to one of the input variable sizes. I also tried iterating through the historical data arrays, but it didn’t seem to like not starting at the beginning.
Am I looking in the right place? @guvectorize? If not, how can I get my back tester running on a GPU core?
I am new to all this (but am an experienced developer) so I would appreciate any pointers anyone has.