Passing first-type class jitted method is very slow

Hi!

I get a 10x slowdown if I jit a jitted method instead of calling it manually directly. How is this possible ?

 >>>     @nb.jit(nopython=True, nogil=True)
...:     def _sum_tile_ground_impl(x: float,
...:                               y: float,
...:                               height: np.ndarray,
...:                               normal: np.ndarray,
...:                               heightmap_py_1) -> None:
...:         heightmap_py_1(x, y, height, normal)
...: 

>>> %timeit _sum_tile_ground_impl(0.0, 0.0, np.array(0.0), np.zeros(3), heightmaps[0])
15.4 µs ± 75.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit heightmaps[0](0.0, 0.0, np.array(0.0), np.zeros(3))
1.39 µs ± 7.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is a simplified example of what I want to do. The actual implementation of _sum_tile_ground_impl is as follow. So that it runs signficantly slower than native python:

    @nb.jit(nopython=True, nogil=True)
    def _sum_tile_ground_impl(x: float,
                              y: float,
                              height: np.ndarray,
                              normal: np.ndarray,
                              height_buffer: np.ndarray,
                              normal_buffer: np.ndarray) -> None:
        heightmaps_py[0](x, y, height, normal)
        for heightmap_py in heightmaps_py[1:]:
            heightmap_py(x, y, height_buffer, normal_buffer)
            height += height_buffer
            normal += normal_buffer
        normal /= math.sqrt(normal[0] ** 2 + normal[1] ** 2 + normal[0] ** 2)

where heightmaps_py is a list of jitter methodsall having the same signature and available in nonlocal scope.

Is there any way you could submit a complete, runnable script so that someone can reproduce your findings?