Mismatch specification error

I have the following function

@njit(nb.types.ListType(nb.f4[:,:])(typeTriangle,
                                    nb.f4[:,:],
                                    nb.f4[:,:]))
def calculate_barycentric_weights_for_all_pts(t, X, Y):
    
    p0, p1, p2 = t.vertices
    area = 0.5 * (p0.x * (p1.y - p2.y) + p1.x * (p2.y - p0.y) + p2.x * (p0.y - p1.y))
    w1 = 0.5 * (X * (p1.y - p2.y) + p1.x * (p2.y - Y) + p2.x * (Y - p1.y)) / area
    w2 = 0.5 * (p0.x * (Y - p2.y) + X * (p2.y - p0.y) + p2.x * (p0.y - Y)) / area
    w3 = 0.5 * (p0.x * (p1.y - Y) + p1.x * (Y - p0.y) + X * (p0.y - p1.y)) / area

    return [w1, w2, w3] 

but when I execute it, I get a very long error where the last section the most interesting part

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No conversion from list(array(float64, 2d, C))<iv=None> to ListType[array(float32, 2d, A)] for '$554return_value.119', defined at None

File "..\..\..\..\Users\xxxAppData\Local\Temp\ipykernel_14124\78179440.py", line 12:
<source missing, REPL/exec in use?>

During: typing of assignment at C:\Users\xxx\AppData\Local\Temp\ipykernel_14124\78179440.py (12)

File "..\..\..\..\Users\xxx\AppData\Local\Temp\ipykernel_14124\78179440.py", line 12:
<source missing, REPL/exec in use?>

Why would specify that is trying to convert from a list with an array of float64 when I specified the array to be of float32??

Hey @MLLO ,
your function expects to return a typed list but your function returns a reflected list.
Define an empty typed list and append the results for w1-w3 to this object or convert your reflected list into a typed list before you return it.

Thank you @Oyibo !

I must admit that I have not come across a ‘reflected’ list (though I have come across introspection). This is probably not the right forum for this but I still feel I need to ask: why is it a reflected list? if I am not really asking anything about the objects inside the function, just returning a list with the results.

thanks again.

Hey @MLLO ,
Numba supports two types of lists. The reflected list (Python list) and the typed list (Numba list). Your function signature suggests that you want to return a typed list but instead you return the reflected list.
https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types

myReflectedList = ["apple", "orange", "banana"]
myTypedList = nb.typed.List(myReflectedList)

Hi @Oyibo

I tried what you suggested and substituted,

return [w1, w2, w3] 

with the following,

return nb.typed.List([w1,w2,w3])

However, I still get a similar error

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No conversion from ListType[array(float64, 2d, C)] to ListType[array(float32, 2d, A)] for '$612return_value.124', defined at None

For some reason numba does not like

w1 = 0.5 * (X * (p1.y - p2.y) + p1.x * (p2.y - Y) + p2.x * (Y - p1.y)) / area

this line??? The p.xs and p.y `s are all np.float32 and so are X, Y. I am wondering if numpy making the results to be np.float64 and this is what is blowing things up??? Interestingly if I do this

@njit(nb.types.ListType(nb.f4[:,:])(typeTriangle, nb.f4[:,:], nb.f4[:,:]))
def dummy(t, X, Y):
    p0, p1, p2 = t.vertices
    area = 0.5 * (p0.x * (p1.y - p2.y) + p1.x * (p2.y - p0.y) + p2.x * (p0.y - p1.y))
    w1 = 0.5 * (X * (p1.y - p2.y) + p1.x * (p2.y - Y) + p2.x * (Y - p1.y)) / area
    return nb.typed.List([X, w1.astype(np.float32)])

So I return one of the parameters, I do not get any errors. If instead, I remove X I do get an error!

Hey @MLLO,

Numba offers two compilation modes: “lazy” and “eager.” In lazy mode, you can use the jit or njit decorators, and Numba takes care of most of the compilation for you. This approach is much easier. It has its advantages and disadvantages.
On the other hand, in the “eager” compilation mode, you must ensure that your function’s signature is absolutely correct because Numba tries to compile the function during the first call and will raise errors if there are any issues. You have to be pedantic.
Unfortunately, creating the right signature for more complex types is not always well-documented, and you might have to dive into the source code for guidance. Two helpful tools are the numba.typeof function and the signatures attribute of the compiled function.
Most of the time, I prefer to start with lazy compilation to let Numba infer the correct signatures and then reverse engineer the signature based on what I observe. This way, you can fine-tune the function’s type annotations until it works as expected in eager compilation mode. It might take some trial and error which can be frustrating sometimes.
Identifying your problems is quite difficult without a complete code context. Potential problems might arise from input data types not aligning with your Numba function signature, implicit type conversions, or an incorrect function signature.

Here is an example which should help you figure it out. I replaced your triangle class with a namedtuple for simplicity.
Happy weekend!

from collections import namedtuple
import numpy as np
import numba as nb
import numba.types as nbt

# I replaced your triangle class with a namedtuple for simplicity
Point = namedtuple('Point', ('x', 'y'))
Triangle = namedtuple('Triangle', ('p0', 'p1', 'p2'))
PointType = nbt.NamedUniTuple(nbt.f4, 2, Point)
TriangleType = nbt.NamedUniTuple(PointType, 3, Triangle)
# Check your array layout <Array>.flags
# It is probably C_CONTIGUOUS
Arr2DFloat32C = nb.f4[:, ::1]
ResultList = nb.types.ListType(Arr2DFloat32C)
# Compare your generated signature with the signature of lazy compiliation
Signature = ResultList(TriangleType, Arr2DFloat32C, Arr2DFloat32C)


@nb.njit(Signature)
def calculate_barycentric_weights_for_all_pts(t, X, Y):
    p0, p1, p2 = t
    # Generate an empty typed list
    res = nb.typed.List.empty_list(Arr2DFloat32C)
    area = 0.5 * (p0.x * (p1.y - p2.y) + p1.x * (p2.y - p0.y) + p2.x * (p0.y - p1.y))
    # prevent implicit conversion to float64 using astype(np.float32)
    # append the results to your list
    res.append((0.5 * (X * (p1.y - p2.y) + p1.x * (p2.y - Y) + p2.x * (Y - p1.y)) / area).astype(np.float32))
    res.append((0.5 * (p0.x * (Y - p2.y) + X * (p2.y - p0.y) + p2.x * (p0.y - Y)) / area).astype(np.float32))
    res.append((0.5 * (p0.x * (p1.y - Y) + p1.x * (Y - p0.y) + X * (p0.y - p1.y)) / area).astype(np.float32))
    return res

X = np.ones((2, 2), dtype=np.float32)
Y = np.ones((2, 2), dtype=np.float32)
ZERO = np.float32(0)
ONE = np.float32(1)
p0 = Point(ZERO, ZERO)
p1 = Point(ONE, ZERO)
p2 = Point(ZERO, ONE)
triangle = Triangle(p0, p1, p2)
print(calculate_barycentric_weights_for_all_pts(triangle, X, Y))
print(calculate_barycentric_weights_for_all_pts.signatures)
# [[[-1. -1.] [-1. -1.]], [[1. 1.] [1. 1.]], [[1. 1.] [1. 1.]], ...]
# [(Triangle(Point(float32 x 2) x 3), Array(float32, 2, 'C', False, aligned=True), Array(float32, 2, 'C', False, aligned=True))]

Thanks @Oyibo! Thanks for the insights as always. I will spend time chewing on what you mentioned and try to work out what is going on. I hope to get this done and if so I will report back.

Hey @MLLO ,
there might be another tool which could be helpful in your case.
<function_name>.inspect_types()
This method allows you to inspect the types of all variables and temporary values inferred by Numba.
It is a bit of an overkill on information but it is still readable.
In your case we can observe i.e. that we have a float64 constant in the formula const(float, 0.5) :: float64. This might be one of the reasons you have to cast the dtype to float32.

    res.append((0.5 * (X * (p1.y - p2.y) + p1.x * (p2.y - Y) + p2.x * (Y - p1.y)) / area).astype(np.float32))

    # --- LINE 38 --- 
    #   $156load_method.74 = getattr(value=res, attr=append)  :: BoundFunction((<class 'numba.core.types.containers.ListType'>, 'append') for ListType[array(float32, 2d, C)])
    #   $const158.75 = const(float, 0.5)  :: float64

https://numba.readthedocs.io/en/stable/user/troubleshoot.html#the-compiled-code-is-too-slow

@Oyibo Gotcha!! Many thanks

If you make sure the constant is float32 like nb.float32(0.5) the conversion is not necessary anymore.

@nb.njit(Signatures)
def calculate_barycentric_weights_for_all_pts(t, X, Y):
    p0, p1, p2 = t
    # Generate an empty typed list
    res = nb.typed.List.empty_list(Arr2DFloat32C)
    mult = nb.float32(0.5)
    area = mult * (p0.x * (p1.y - p2.y) + p1.x * (p2.y - p0.y) + p2.x * (p0.y - p1.y))
    # append the results to your list
    res.append(mult * (X * (p1.y - p2.y) + p1.x * (p2.y - Y) + p2.x * (Y - p1.y)) / area)
    res.append(mult * (p0.x * (Y - p2.y) + X * (p2.y - p0.y) + p2.x * (p0.y - Y)) / area)
    res.append(mult * (p0.x * (p1.y - Y) + p1.x * (Y - p0.y) + X * (p0.y - p1.y)) / area)
    return res

@Oyibo Using .signatures attribute I was able to determine the source of my problem (it was in the definition of my point class) I think I have managed to resolved it by adjusting the type of variables associated with this class (I believe it was creating a conversion problem).

Hey @MLLO ,

yesterday when I checked the inspect_types method on your function it became pretty obvious that many object access operations are happening in the background during the computations.
I was curious what the impact on the performance would be.
I have made a simple test setup where just the sum of all coordinates is been calculated.
I’ve compared the element-wise access speed of a 2d-array vs Numba namedtuple triangles vs Python namedtuple triangles.
The raw coordinate access via 2D-array is 100-300 times faster than numba and python namedtuples.
The numba triangles are faster than python triangles by factor 3.
Have you made some benchmark checks? Is this in the range of your expectations?

import timeit
from collections import namedtuple
import numpy as np
import numba as nb
import numba.types as nbt

# I replaced your triangle class with a namedtuple for simplicity
Point = namedtuple('Point', ('x', 'y'))
Triangle = namedtuple('Triangle', ('p0', 'p1', 'p2'))
PointType = nbt.NamedUniTuple(nbt.f4, 2, Point)
TriangleType = nbt.NamedUniTuple(PointType, 3, Triangle)

def sum_of_triangles_py(triangles):
    result = 0.0
    for i in range(len(triangles)):
        result += (triangles[i].p0.x + triangles[i].p0.y
                   + triangles[i].p1.x + triangles[i].p1.y
                   + triangles[i].p2.x + triangles[i].p2.y)
    return result

sum_of_triangles_nb = nb.njit(sum_of_triangles_py)
# sum_of_triangles_nb.signatures

@nb.njit
def sum_of_coordinates(coordinates):
    result = 0.0
    for i in range(len(coordinates)):
        result += (coordinates[i, 0] + coordinates[i, 1]
                   + coordinates[i, 2] + coordinates[i, 3]
                   + coordinates[i, 4] + coordinates[i, 5])
    return result

def get_data(n: int):
    coordinates = np.random.default_rng(seed=123).normal(0, 1, (n, 6)).astype(np.float32)
    triangles_nb = nb.typed.List.empty_list(TriangleType)
    triangles_py = []
    for i in range(n):
        p0 = Point(coordinates[i, 0], coordinates[i, 1])
        p1 = Point(coordinates[i, 2], coordinates[i, 3])
        p2 = Point(coordinates[i, 4], coordinates[i, 5])
        triangles_nb.append(Triangle(p0, p1, p2))
        triangles_py.append(Triangle(p0, p1, p2))
    return coordinates, triangles_nb, triangles_py

# =============================================================================
# warm up
# =============================================================================
n = 10
coordinates, triangles_nb, triangles_py = get_data(n)
sum_of_coordinates(coordinates)
sum_of_triangles_py(triangles_py)
sum_of_triangles_nb(triangles_nb)

# =============================================================================
# speed check for element-wise access
# =============================================================================
n = 100_000
coordinates, triangles_nb, triangles_py = get_data(n)
print(f'Compare element-wise access times for {n} triangles vs ND-array:')
time_coordinates = np.mean(timeit.repeat(stmt='sum_of_coordinates(coordinates)',
                                         repeat=10, number=100, globals=globals()))
print(f'coordinates: {time_coordinates}')
time_triangles_py = np.mean(timeit.repeat(stmt='sum_of_triangles_py(triangles_py)',
                                          repeat=10, number=100, globals=globals()))
print(f'triangles_py:   {time_triangles_py}')
time_triangles_nb = np.mean(timeit.repeat(stmt='sum_of_triangles_nb(triangles_nb)',
                                          repeat=10, number=100, globals=globals()))
print(f'triangles_nb:   {time_triangles_nb}')
mult_nb_arr = round(time_triangles_nb/time_coordinates, 1)
mult_nb_py = round(time_triangles_py/time_triangles_nb, 1)
print(f'coordinates  vs triangles_nb faster by factor: {mult_nb_arr}')
print(f'triangles_nb vs triangles_py faster by factor: {mult_nb_py}')
# Compare element-wise access times for 100000 triangles vs ND-array:
# coordinates: 0.022830185398925097
# triangles_py:   6.99924858030281
# triangles_nb:   2.3839610328926937
# coordinates  vs triangles_nb faster by factor: 104.4
# triangles_nb vs triangles_py faster by factor: 2.9

If memory serves accessing from a typed.List is much slower than it ought to be at the moment. The issue is that the get-item implementation is compiled externally from C code, then accesses as a prebuilt library so it isn’t being inlined properly. At least that’s the conclusion I came to poking around the source after doing similar benchmarks. Every access is causing an opaque function call, which is slowish, and limits the kinds of optimizations that the compiler can safely do in the surrounding loop. By contrast the numpy arrays have been painstakingly implemented by directly writing LLVM intrinsics, so they are very fast. I suspect that some of this may change, albeit maybe not right away, in the coming reorganization of numba around the proposed PIXIE format (planned for end of year).

Another concern is that if your triangle type is a jitclass or structref then there will be some overhead associated with dereferencing into them. Because they are dynamically allocated objects, they could live anywhere in memory, and thus run more risk of cache misses than a numpy array which has one big contiguously allocated block. There is also overhead associated with refcounting them, although that should really get optimized out, but maybe that isn’t happening properly with the opaque function call for get-item.

In the near term numpy arrays are going to be by far the fastest option. You can look up how to make record arrays in the docs if you prefer for each item to have named members like my_points[0].x

2 Likes

Hey @DannyWeitekamp,
great explanation.
To summarize, as soon as you leave array-land with its highly optimized low-level code you lose most of the edge Numpy and Numba provides.

Just a quick note: What @DannyWeitekamp mentioned implies that you can gain some performance back by limiting the indexing frequency both in the list and in the named tuple itself. The following should be noticeably faster than the original version:

def sum_of_triangles_py(triangles):
    result = 0.0
    for triangle in triangles:
        p0 = triangle.p0
        p1 = triangle.p1
        p2 = triangle.p2
        result += (p0.x + p0.y + p1.x + p1.y + p2.x + p2.y)
    return result

Edit: Just ran it and it is almost on par with the array version. Haven’t verified this, but I assume __iter__ is implemented explicitly and thus more efficiently than frequent __getitem__. It is exactly as fast as the array version for me.

2 Likes

Hey @sschaer ,
That is a beautiful observation. It makes a huge difference.

# Compare element-wise access times for 100000 triangles vs ND-array:
# coordinates: 0.022762795005110092
# triangles_py:   6.338772386100027
# triangles_nb:   0.02295054620190058
# coordinates  vs triangles_nb faster by factor: 1.0
# triangles_nb vs triangles_py faster by factor: 276.2
1 Like

Hehe @sschaer, that’s awesome. Too bad it’s that nitpicky at the moment. But at least there is a way, and it looks like from @Oyibo’s benchmark calling directly into __iter__ alleviates the second order concerns as well (recounting + data contiguity), there’s no noticeable difference.

@DannyWeitekamp @Oyibo @sschaer Thank you all!

My code was all written in OO style. It is a computational geometry algorithm to generate a 2.5D triangulation of a terrain (the terrain is uploaded as a heighmap, a geotiff image, if that makes sense to you ).

The algorithm worked well but it was painfully slow so that is why I decided to try numba to speed it up. So far I have not achieved this goal and I have been slowly modifying the code in order to adjust it to numba. The reason I mention all of this is because I have yet to be able to run any benchmarks as I have yet to convert all the code.

After generating numba version of basic classes, i.e. point, edge, triangle, I am now trying to get the rest done as @njit functions (initially I had a single class called triangulator).

Thought I should provide some background.

Thank you for all your comments which I will keep in min. I will continue working on this and report back to you.

So @DannyWeitekamp if I understand you correctly, you are suggesting that instead of using numba classes to store objects such as triangle, I use a structured numpy arrays because they are more optimized. Is that it?

Thanks

Considering the benchmark above, you might be able get away with using OO style classes as you have been. However, in my experience numba’s current strengths lie in optimizing numerical code. The devs have done a great job of filling in OO style features, but that side of things still has some quirks like the issue above with typed lists… it certainly improving slowly but surely though. As it stands however, you’re going to have an easier time getting the fastest possible version of your algorithm if you stick to numpy based data-structures (including structured arrays), which it seems like you might be able to do since your problem is inherently numerical.

For instance, one consideration you might not have benchmarked yet is how you get all of your data into jit-friendly data-structures. There is often non-negligible overhead associated with this. The nice thing about numpy arrays is that you can allocate them all at once, and then fill them, which tends to be computationally faster than allocating individual objects one at a time, especially if that filling in happens within compiled sections of code.

Ideally if you could get to the point of having your outermost triangulator function take the heightmap as a numpy array, and return the final output as numpy arrays, then there would be very little overhead associated with python calling the compiled code. The data-structures you use internally matter less, but numpy arrays are almost always going to be the most computationally efficient choice if you can use them, and have generally fewer quirks than other options (like namedtuples, jitclasses, and structrefs). The tradeoff is that your code will probably end up looking more functional, like calc_dist(point) instead of point.calc_dist(), since you won’t be able to define methods on structured arrays.

numpy structured / record arrays are defined with from_dtype.