Why does the initialization of jitclass behave very slow?

I try to define my numba object to accelerate my codes with jitclass
Here is my definitions of the object

from numba import  float64
from numba.experimental import jitclass

import numpy as np
from shapely.geometry import Polygon as ShapelyPolygon
from shapely.geometry import Point
import time

import os

@jitclass([('vertices', float64[:, :])])
class NumbaPolygon:
    def __init__(self, vertices):
        self.vertices = vertices

    def contains_point(self, point):
        """To check if the point is inside the polygon"""
        x, y = point
        inside = False
        x_vertices = self.vertices[:, 0]
        y_vertices = self.vertices[:, 1]
        n = self.vertices.shape[0]
        p1x, p1y = x_vertices[0], y_vertices[0]
        for i in range(n + 1):
            p2x, p2y = x_vertices[i % n], y_vertices[i % n]
            if y > min(p1y, p2y):
                if y <= max(p1y, p2y):
                    if x <= max(p1x, p2x):
                        if p1y != p2y:
                            xinters = (y - p1y) * (p2x - p1x) / (p2y - p1y) + p1x
                        if p1x == p2x or x <= xinters:
                            inside = not inside
            p1x, p1y = p2x, p2y
        return inside

Then, I use the following codes to do the benchmark of the first compilation of the object

vertices = np.array([[0, 0], [10, 0], [10, 10], [0, 10]], dtype=np.float64)
point = np.array([0.5,0.5], dtype=np.float64)

t1 = time.perf_counter_ns()
numba_poly = NumbaPolygon(vertices)
numba_poly.contains_point(point)
consuming_time = time.perf_counter_ns() - t1
print(f'JITCLASS method takes {consuming_time/1e6:.6f} ms')        

t1 = time.perf_counter_ns()
poly = ShapelyPolygon(vertices)
poly.contains(Point(*point))
consuming_time = time.perf_counter_ns() - t1
print(f'Shapely method takes {consuming_time/1e6:.6f} ms')      

Here is the results:

JITCLASS method takes 436.783740 ms
Shapely method takes 0.288628 ms

Though the second-try really takes less time, it is hard to compensate the required initialization time.

Second Try:
JITCLASS method takes 0.064460 ms
Shapely method takes 0.511835 ms

Hence, as title, I want to find any method to accelerate the initialization of the object.
Thanks in advanced!

The long burn-in time that you measure on the first iteration is probably not the object initialisation but the code compilation. Numba is a Just In Time compiler, so compilation is triggered whenever you call a function that has not been previously compiled for the passed argument types for the first time. After that the compiled function is ready for immediate use when called again.

If you repeat the experiment many times over you should see that all subsequent runs are much faster, even if you reinitialise the NumbaPolygon object over and over. I don’t want to install shapely right now, so I cannot put up a proper benchmark from my side - so you should still validate what I am claiming of course :slight_smile:

Best of luck!

Unfortunately functions with jitclasses as arguments do not work with cache=True because the type specification is built on the fly in a way that is inconsistent between calls, so if you use them you’ll need to wait for your functions to recompile every time. Not fun. Structrefs don’t have this limitation, but leave a lot to be desired in terms of usability and documentation. Others have found this thread helpful for making the transition here. There are some other good conversations about structrefs here on discourse if you search for structref. Namedtuples may also cache, if I recall, but are less readily extensible.

@Hannes
You’re right. Forgive my words are not precise.

My question is how the code compilation takes some many time for jitclass.

To my best understanding, for njit, numba need to take some time to deduce the data type of argument so that giving signatures will help numba to save much time for the first compilation.

Analogy to njit, specification should help jitclass to quickly compile the codes, but it doesn’t help.
That’s the point confusing me.

@DannyWeitekamp

Hi Danny,

Thanks for replying.
I am a little bit confusing why cache is mentioned here.
I don’t use it in demo codes.

Could you give me more explanations for that?
Thanks in advanced

Apologies I got a little bit ahead of myself. The usual trick for dealing with long upfront compile times is to use @njit(cache=True). For instance your function could be written without jitclass like so:


from numba import  njit
import numpy as np
import time

@njit(cache=True)
def contains_point(vertices, point):
    """To check if the point is inside the polygon specified by verticies"""
    x, y = point[0], point[1]
    inside = False
    x_vertices = vertices[:, 0]
    y_vertices = vertices[:, 1]
    n = vertices.shape[0]
    p1x, p1y = x_vertices[0], y_vertices[0]
    for i in range(n + 1):
        p2x, p2y = x_vertices[i % n], y_vertices[i % n]
        if y > min(p1y, p2y):
            if y <= max(p1y, p2y):
                if x <= max(p1x, p2x):
                    if p1y != p2y:
                        xinters = (y - p1y) * (p2x - p1x) / (p2y - p1y) + p1x
                    if p1x == p2x or x <= xinters:
                        inside = not inside
        p1x, p1y = p2x, p2y
    return inside

vertices = np.array([[0, 0], [10, 0], [10, 10], [0, 10]], dtype=np.float64)
point = np.array([0.5,0.5], dtype=np.float64)

t1 = time.perf_counter_ns()
contains_point(vertices, point)
consuming_time = time.perf_counter_ns() - t1
print(f'njit function takes {consuming_time/1e6:.6f} ms')   

The first time you run code with njit(cache=True) it will compile which will take some time. For instance on my machine:

njit function takes 658.465777 ms

But on the second run the time will be greatly reduced. Numba still needs to initialize itself, and load its cached version of the function, but the function itself does not need to recompile the second time your run the script.

njit function takes 183.041554 ms

As I mention above this trick does not work with @jitclass. In general cache=True does not work with jitclass methods or with functions that take jitclasses. So if you wanted to use cache=True to reduce the upfront time of recompiling your functions then you might consider just using numpy arrays without bothering with the Polygon class wrapper.