Accelerate loops that use ctypes, c_char_p, numpy str_?

I wrote a python library that wraps a c shared-library with over 600 functions using pure ctypes and numpy. The functions are typically simple, say given a string return a float, and for some functions I allow users to pass in lists or numpy arrays that I then loop through in python calling the function repeatedly and shepherding data to and from ctypes. But some users wish to call certain functions millions of times so I want to use numba to jit those wrapper functions to speed things up, without re-writing everything into cython or swig.
Here is a pseudo code example of some of the python code I want to accelerate in numba:

fs = []
f = ctypes.c_double()
for s in strings: #strings in this case is a numpy.str_ array, but it could be a list of strings
      libsomething.str2float(s.encode(encoding="utf-8"), ctypes.byref(f))
      fs.append(f.value)
return numpy.array(fs)

What would be the best way to get numba to work with these strings? c_char_p is not supported in numba (numba/numba#3207) so I can’t just use the jit decorators as is. Would be possible to get around this issue by just enforcing that all lists/tuples/etc become numpy arrays, and use the numpy.str_ datatype?

Any ideas if that could work/proof of concepts would be appreciated

I’d love to see a general purpose solution to this problem… It seems like one of those things that should be simple but isn’t (I’m sure for good reasons)

Numba will need to implement str.encode() for real by porting https://github.com/python/cpython/blob/8a64ceaf9856e7570cad6f5d628cce789834e019/Objects/stringlib/codecs.h#L262

If the strings were already encoded correctly in the numpy array, could it work without str.encode()

There maybe a hack to do that. I guess you can have numpy array of bytes of the correctly encoded string. Then, you can just pass a pointer to the C library by doing pointer arithmetic from the base pointer; i.e. numpy_array.ctypes.data or just something like numpy_array[item_index:].ctypes.data.

Note, I think numpy uses UTF32 internally (and depends on compilation option) if you uses it’s unicode char type.

I’ll give that a try :slight_smile: soonish

-Andrew Annex

this may actually be partially working, by casting the numpy array of strings to bytes type I was able to get it to return correct result but I got a lot of numba compilation issues. It seems that numba does not understand ‘ctypes.c_double()’ or ‘ctype.byref’ even though from the docs it seems that numba does support ctypes? currently I am only using ‘@jit(nopython=False)’.

I am also trying a similar trick by creating an empty numpy array in the jit’d function but I am running into surprising issues with trying to do something as simple as ‘res = np.empty(times.shape, dtype=np.float)’

specific warning:

Compilation is falling back to object mode WITH looplifting enabled because Function “nbstr2et” failed type inference due to: Unknown attribute ‘c_double’ of type Module(<module ‘ctypes’ from ‘/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ctypes/init.py’>)

File “”, line 3:
def nbstr2et(times):
et = ctypes.c_double()