Optimal way to speed up class method

Hi, I have a question regarding how I can use numba optimally in the following situation. Assume I have a class defined as follows :

class MyClass:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def mymethod(self, x):
        *do some complicated computations which involve self.a and self.b*        

To use numba, it seems to me I can either put a @jitclass in the above code (and give a specification for the types of each field), or do something like :

@jit
def aux(x, a, b):
    *do some complicated computations which involve a and b*
    
class MyClass:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def mymethod(self, x):
        return aux(x, self.a, self.b)

To have an idea of how these two approaches perform, I chose a value for a and b (say a=0.1 and b=1), created an instance of MyClass with these, and did the comparison. I was also curious to know how they would perform compared to doing

@jit
myfunction(x):
    *do the complicated computations involving a and b,
     replacing everywhere a by 0.1 and b by 1*

And in both cases I get a significant loss of performance compared to this last piece of code. So my question is : is there any way to, each time I create a new instance of MyClass, compile a function tailored to the values given for a and b, so that when I call myobject.mymethod(x), I get something as fast as the last piece of code ?

Thanks in advance for your help !

Actually, I just did the experiment again and it seems the second approach (with the auxiliary function) is really close to be as fast as the “hard coded” function. Sorry for not noticing this before …

One way to do this might be along these lines:

from numba import jit
import numpy as np

def generate(a, b):
    @jit(nopython=True)
    def aux(x):
        #*do some complicated computations which involve a and b*
        return (a + b) * x # trivial example expression
    return aux

class MyClass(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def _gen_method(self):
        # Don't actually need a method to call generate, it's just useful later
        # for showing what happened
        return generate(self.a, self.b)

    def mymethod(self, x):
        return self._gen_method()(x)

inst = MyClass(0.1, 1.0)

print(inst.mymethod(2))

Looking at the machine code for a compiled generated function, fn:

fn = inst._gen_method()
print(fn(2))
print(fn.inspect_asm(fn.signatures[0]))

Numba produced this for the aux function:

        #.text
        #.file   "<string>"
        #.section        .rodata.cst8,"aM",@progbits,8
        #.p2align        3
#.LCPI0_0:
        #.quad   4607632778762754458
        #.text
        #.globl  _ZN8__main__8generate12$3clocals$3e7aux$242Ex
        #.p2align        4, 0x90
        #.type   _ZN8__main__8generate12$3clocals$3e7aux$242Ex,@function
#_ZN8__main__8generate12$3clocals$3e7aux$242Ex:
        #cvtsi2sd        %rdx, %xmm0
        #movabsq $.LCPI0_0, %rax
        #mulsd   (%rax), %xmm0
        #movsd   %xmm0, (%rdi)
        #xorl    %eax, %eax
        #retq

$.LCPI0_0 is the multiplication factor, which is a quad word 4607632778762754458, turning this into a float64 representation we can see its the value 1.1.

In [7]: np.array([4607632778762754458], dtype=np.uint64).view(np.float64)
Out[7]: array([1.1])

The constants 0.1 and 1.0 were captured (globals are treated as
compile time constants by Numba
) and then these were propagated into the Numba IR and subsequently into LLVM which folded them as a constant (0.1 + 1.0 = 1.1) for use as the (a + b) in (a + b) * x.

Other similar tricks are possible!

It’s also worth noting that this is quite expensive in compile time as it’s
generating a value based specialisation for literally every value/call, might be worth considering caching? There’s also other options available to do compile time value based specialisation depending on what your use case is.

Hope this helps get you started?

Thanks a lot, this is definitely going to be helpful !

If what you’re saying is that each time I will create a new instance of MyClass, a new function will be compiled, this is exactly what I am looking for, since I am likely to create only 4-5 instances, but for each of them call mymethod say at least 100 000 times :slight_smile:

Thanks again for replying so quickly !

So in the end what I ended up doing is

from numba import jit
import numpy as np

def generate(a, b):
    @jit(nopython=True)
    def aux(x):
        #*do some complicated computations which involve a and b*
        return (a + b) * x # trivial example expression
    return aux

class MyClass(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b
        self.aux = generate(a, b)

    def mymethod(self, x):
        return self.aux(x)

and this is consistently 1% slower than the hard coded way, which is exactly what I wanted.

However trying to exactly adapt what you suggested was quite slow, I am not really sure whether I messed up adapting your idea to my code or something is really not working as it should. Maybe each time mymethod is called on a new x there really is something compiled …

In any case, thanks a lot for your answer :slight_smile:

Agree, I should indeed have put the specialisation into the __init__ given the specialisations are solely a function of variables at class instantiation.

Maybe each time mymethod is called on a new x there really is something compiled …

That’s exactly what’s happening, I wrote something too generalised! Given the per-class specialisation of the compiled code is solely down to values at initialisation time, redesigning this as you suggest prevents this from happening.

What you have in the above is probably about as good as it’ll get, but you can always add cache=True into the decorator to save having to compile new ones each time you run the code (presuming the values are the same). Think it’d also be ok to put the generator into the __init__ or as a @classmethod if desirable, e.g.

from numba import jit
import numpy as np

class MyClass(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b
        self._method = self.generate(self.a, self.b)

    @classmethod
    def generate(cls, a, b):
        @jit(nopython=True, cache=True)
        def aux(x):
            #*do some complicated computations which involve a and b*
            return (a + b) * x # trivial example expression
        return aux

    def mymethod(self, x):
        return self._method(x)

inst = MyClass(0.1, 1.0)
print(inst.mymethod(2))

inst2 = MyClass(0.1, 2.0)
print(inst2.mymethod(2))

Anyway, glad you got something working! :slight_smile:

Also, there’s some general performance tips here that might be worth a read?