Nested numba functions slow down execution and cannot be inlined

I just ran that example on the branch with only global optimization mentioned here, but it seems that only has a minor impact:

%timeit fct_1_nested(s1, s2)
%timeit fct_1(s1, s2)

# Default optimizations in numba
2.52 ms ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
116 µs ± 376 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# merged-compile branch
2.32 ms ± 18.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
120 µs ± 925 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The merged compilation is a little bit faster, but it doesn’t look like it is the main reason for the performance difference.