[ANN] Profila, a line profiler for Numba (initial release)

itamarst · January 30, 2024, 2:16pm

Hi everyone,

Since currently there are no solutions for profiling Numba code, I have created one. It’s not as nice as the kind of profiling you could get if eventually Numba supports it natively, but it does seem to work in my initial tests.

Here’s an example:

$ pip install profila
$ python -m profila annotate -- scripts_for_tests/simple.py
# Total samples: 328 (54.9% non-Numba samples, 1.8% bad samples)

## File `/home/itamarst/devel/profila/scripts_for_tests/simple.py`
Lines 10 to 15:

  0.3% |     for i in range(len(timeseries)):
       |         # This should be the most expensive line:
 38.7% |         result[i] = (7 + timeseries[i] / 9 + (timeseries[i] ** 2) / 7) / 5
       |     for i in range(len(result)):
       |         # This should be cheaper:
  4.3% |         result[i] -= 1

So far I’ve only tested it on toy examples, so I’d love to get feedback from real-world usage. At the moment I’ve also only tried it on Linux, but there’s a decent chance it’ll work on macOS (you’ll need to brew install gdb first), and it will likely work on WSL2 on Windows.

For more details see GitHub - pythonspeed/profila: A profiler for Numba

esc · January 31, 2024, 1:21pm

That looks cool, thank you for sharing!

Wlos · February 24, 2024, 2:09pm

Since the profiler seems very interesting, I have tried testing it on a toy script I wrote for another question.

This is the script (where to be fair the details don’t matter that much)

import numpy as np
from numba import njit, prange, get_num_threads
import time

@njit
def f(i, rs):
  r = np.sum(rs)
  t = i * r
  for j in range(int(1e5)):
    t = (t * r) % (i+1)
  return 1.1, t**2


@njit(parallel=False)
def iu_loop(rss, a):

  for i in prange(len(rss)):

    temp = f(i, rss)

    a[0] *= temp[0]
    a[1] += temp[1]

  return a

a = np.array([1., 0.])
rss = np.ones((1,1))
_ = iu_loop(rss, a)

a = np.array([1., 0.])
rss = np.ones((1000,1000))

print("threads=", get_num_threads())

t_start = time.time()
result = iu_loop(rss, a)
t_end = time.time()
run_time = t_end - t_start
print(f"computed result in: {round(run_time, 3)}s")
print(result)

and this is the output I get from running Profila:

Total samples: 789 (22.2% non-Numba samples, 3.9% bad samples)

(…) /lib/python3.10/site-packages/numba/np/arraymath.py (lines 168 to 169):

  1.0% |         for v in np.nditer(arr):
 24.7% |             c += v.item()

(…) /auxiliary.py (lines 9 to 10):

  0.6% |   for j in range(int(1e5)):
 47.5% |     t = (t * r) % (i+1)

This works very nicely (it was quick to setup and run).

The one problem I see is that the profiler is telling me about 25% of the time was spent in a function of arraymath.py, which sounds like I might want to look into it more; however, I have no indication about what line of my original code is getting me there.

Is there already a way to make the output of the profiler more verbose with such information? If not, this sounds like a key aspect to implement in the future.

Topic		Replies	Views
Comparison between Numba and Fortran code Numba	3	162	December 13, 2023
Tips for performance improvement of my code Support: How do I do ...?	5	577	January 5, 2023
How could i best optimize this with numba Support: How do I do ...?	9	525	May 4, 2021
Help with Optimisation Numba	1	139	December 1, 2023
New Numba Tutorials Community Support	1	465	February 28, 2022

[ANN] Profila, a line profiler for Numba (initial release)

Related Topics