[ANN] Profila, a line profiler for Numba (initial release)

Hi everyone,

Since currently there are no solutions for profiling Numba code, I have created one. It’s not as nice as the kind of profiling you could get if eventually Numba supports it natively, but it does seem to work in my initial tests.

Here’s an example:

$ pip install profila
$ python -m profila annotate -- scripts_for_tests/simple.py
# Total samples: 328 (54.9% non-Numba samples, 1.8% bad samples)

## File `/home/itamarst/devel/profila/scripts_for_tests/simple.py`
Lines 10 to 15:

  0.3% |     for i in range(len(timeseries)):
       |         # This should be the most expensive line:
 38.7% |         result[i] = (7 + timeseries[i] / 9 + (timeseries[i] ** 2) / 7) / 5
       |     for i in range(len(result)):
       |         # This should be cheaper:
  4.3% |         result[i] -= 1

So far I’ve only tested it on toy examples, so I’d love to get feedback from real-world usage. At the moment I’ve also only tried it on Linux, but there’s a decent chance it’ll work on macOS (you’ll need to brew install gdb first), and it will likely work on WSL2 on Windows.

For more details see GitHub - pythonspeed/profila: A profiler for Numba

6 Likes

That looks cool, thank you for sharing!

Since the profiler seems very interesting, I have tried testing it on a toy script I wrote for another question.

This is the script (where to be fair the details don’t matter that much)

import numpy as np
from numba import njit, prange, get_num_threads
import time

@njit
def f(i, rs):
  r = np.sum(rs)
  t = i * r
  for j in range(int(1e5)):
    t = (t * r) % (i+1)
  return 1.1, t**2


@njit(parallel=False)
def iu_loop(rss, a):

  for i in prange(len(rss)):

    temp = f(i, rss)

    a[0] *= temp[0]
    a[1] += temp[1]

  return a

a = np.array([1., 0.])
rss = np.ones((1,1))
_ = iu_loop(rss, a)

a = np.array([1., 0.])
rss = np.ones((1000,1000))

print("threads=", get_num_threads())

t_start = time.time()
result = iu_loop(rss, a)
t_end = time.time()
run_time = t_end - t_start
print(f"computed result in: {round(run_time, 3)}s")
print(result)

and this is the output I get from running Profila:

Total samples: 789 (22.2% non-Numba samples, 3.9% bad samples)

(…) /lib/python3.10/site-packages/numba/np/arraymath.py (lines 168 to 169):

  1.0% |         for v in np.nditer(arr):
 24.7% |             c += v.item()

(…) /auxiliary.py (lines 9 to 10):

  0.6% |   for j in range(int(1e5)):
 47.5% |     t = (t * r) % (i+1)

This works very nicely (it was quick to setup and run).

The one problem I see is that the profiler is telling me about 25% of the time was spent in a function of arraymath.py, which sounds like I might want to look into it more; however, I have no indication about what line of my original code is getting me there.

Is there already a way to make the output of the profiler more verbose with such information? If not, this sounds like a key aspect to implement in the future.