Convert array with numbers to array of strings with format

I am working on producing a data export that I think would benefit a lot from parallelization but am unable to figure out how to achive this. The goal is simple: Given 3 arrays of x,y, height and time values, output an array of strings with data from the arrays (see example)

This can then be written to file.

This is a working numpy example:

import numpy as np

x_arr = np.arange(10)
y_arr = np.arange(10)
h_arr = np.arange(10)
ts_arr = np.arange(10)

def to_points(x, y, h, ts):
    return f'<Point timeStamp="{ts}">{x} {y} {h}</Point>\n'

points= np.vectorize(to_points)(x_arr, y_arr, h_arr, ts_arr)

print(points)

Very grateful for any tips =)

I have some doubts that numba would make this any faster, or faster enough that it would be worth the effort to make a numba-based solution. In my experience numba tends to speed up purely numerical calculations considerably, but as soon as you start messing around with strings their isn’t any certainty of big performance gains over native python. The reason string calculations are often hard to optimize is that at the end of the day you’re pretty much always stuck heap allocating the memory for every intermediate string you create, which creates unavoidable overhead and can make for a not so nice memory access patterns in many cases. There are definitely tricky things you could do with fixed sized buffers and what-not, but at that point you might as well just write a C extension. Regardless, the cost of writing the final product to the file is probably going to dominate the time cost.

1 Like