Loop-vectorize debug

I see comments suggesting adding this to understand how loops are being handled by numba, and in the their own FAQ (https://numba.pydata.org/numba-doc/latest/user/faq.html)

from llvmlite import binding as llvm
llvm.set_option('','--debug-only=loop-vectorize')

I find that this code changes nothing - there is no additional debug output from llvm indicating whether a loop is being vectorized / enhanced with SSE instructions, or if not why not.

I’ve tried this on Windows, Linux and Mac, all using Anaconda distributions of Python 3.8 and with the conda, conda-forge and numba channels.

Any ideas? Does it work for you? Is there a hidden extra step?

1 Like

Hi @rhjmoore,

xref: loop-vectorize · Issue #7358 · numba/numba · GitHub

This works as intended locally (I’ve also been using this debug output for years and don’t recall it ever not working, this makes your case something that needs investigating!):

$ cat issue7358.py
from numba import njit
import numpy as np
from llvmlite import binding as llvm
llvm.set_option('','--debug-only=loop-vectorize')

@njit
def test(a):
    for i in range(a.shape[0]):
        a[i] = a[i] * 2

print(test(np.asarray([1.0, 2.0, 3.0])))

$ python issue7358.py

LV: Checking a loop in "_ZN8__main__8test$241E5ArrayIdLi1E1C7mutable7alignedE" from test
LV: Loop hints: force=? width=0 unroll=0
LV: Found a loop: B18
LV: Found an induction variable.
LV: Found FP op with unsafe algebra.
LV: We can vectorize this loop!
LV: Found trip count: 0
LV: The Smallest and Widest types: 64 / 64 bits.
LV: The Widest register safe to use is: 128 bits.
LV: Found uniform instruction:   %exitcond.not = icmp eq i64 %.127, %arg.a.5.0
LV: Found uniform instruction:   %.180 = getelementptr double, double* %arg.a.4, i64 %.56.07
LV: Found uniform instruction:   %.56.07 = phi i64 [ %.127, %B18 ], [ 0, %B18.preheader ]
LV: Found uniform instruction:   %.127 = add nuw nsw i64 %.56.07, 1
LV: Found scalar instruction:   %.56.07 = phi i64 [ %.127, %B18 ], [ 0, %B18.preheader ]
LV: Found scalar instruction:   %.127 = add nuw nsw i64 %.56.07, 1
LV: Scalarizing:  %.180 = getelementptr double, double* %arg.a.4, i64 %.56.07
LV: Scalarizing:  %.181 = load double, double* %.180, align 8
LV: Scalarizing:  %.183 = fmul double %.181, 2.000000e+00
LV: Scalarizing:  store double %.183, double* %.180, align 8
LV: Scalarizing:  %.180 = getelementptr double, double* %arg.a.4, i64 %.56.07
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{1\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
  N0 [label =
    "B18:\n" +
      "WIDEN-INDUCTION %.56.07 = phi %.127, 0\l" +
      "CLONE %.180 = getelementptr %arg.a.4, %.56.07\l" +
      "CLONE %.181 = load %.180\l" +
      "CLONE %.183 = fmul %.181, 2.000000e+00\l" +
      "CLONE store %.183, %.180\l"
  ]
}
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{2\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
  N0 [label =
    "B18:\n" +
      "WIDEN-INDUCTION %.56.07 = phi %.127, 0\l" +
      "CLONE %.180 = getelementptr %arg.a.4, %.56.07\l" +
      "WIDEN %.181 = load %.180, ir<%.180>\l" +
      "WIDEN\l""  %.183 = fmul %.181, 2.000000e+00\l" +
      "WIDEN store %.183, %.180, ir<%.180>\l"
  ]
}
LV: Found an estimated cost of 0 for VF 1 For instruction:   %.56.07 = phi i64 [ %.127, %B18 ], [ 0, %B18.preheader ]
LV: Found an estimated cost of 1 for VF 1 For instruction:   %.127 = add nuw nsw i64 %.56.07, 1
LV: Found an estimated cost of 0 for VF 1 For instruction:   %.180 = getelementptr double, double* %arg.a.4, i64 %.56.07
LV: Found an estimated cost of 1 for VF 1 For instruction:   %.181 = load double, double* %.180, align 8
LV: Found an estimated cost of 2 for VF 1 For instruction:   %.183 = fmul double %.181, 2.000000e+00
LV: Found an estimated cost of 1 for VF 1 For instruction:   store double %.183, double* %.180, align 8
LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond.not = icmp eq i64 %.127, %arg.a.5.0
LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond.not, label %B38.loopexit, label %B18
LV: Scalar loop costs: 6.
LV: Found an estimated cost of 0 for VF 2 For instruction:   %.56.07 = phi i64 [ %.127, %B18 ], [ 0, %B18.preheader ]
LV: Found an estimated cost of 1 for VF 2 For instruction:   %.127 = add nuw nsw i64 %.56.07, 1
LV: Found an estimated cost of 0 for VF 2 For instruction:   %.180 = getelementptr double, double* %arg.a.4, i64 %.56.07
LV: Found an estimated cost of 1 for VF 2 For instruction:   %.181 = load double, double* %.180, align 8
LV: Found an estimated cost of 2 for VF 2 For instruction:   %.183 = fmul double %.181, 2.000000e+00
LV: Found an estimated cost of 1 for VF 2 For instruction:   store double %.183, double* %.180, align 8
LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond.not = icmp eq i64 %.127, %arg.a.5.0
LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond.not, label %B38.loopexit, label %B18
LV: Vector loop of width 2 costs: 3.
LV: Selecting VF: 2.
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 2
LV(REG): At #4 Interval # 3
LV(REG): At #6 Interval # 1
LV(REG): VF = 2
LV(REG): Found max usage: 2 item
LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
LV(REG): RegisterClass: Generic::VectorRC, 1 registers
LV(REG): Found invariant usage: 0 item
LV: The target has 16 registers of Generic::ScalarRC register class
LV: The target has 16 registers of Generic::VectorRC register class
LV: Loop cost is 6
LV: Interleaving to reduce branch cost.
LV: Found a vectorizable loop (2) in test
LV: Interleave Count is 2
Setting best plan to VF=2, UF=2
LV: Interleaving disabled by the pass manager
None

How are you running the sample code (command line/notebook/other) and on what architecture?

Hi @stuartarchibald,

Thanks for picking this up. I’m running this both on notebook and then, after I suspected Jupyter might be making it fail, I tried saving this code to a python file and executing it at the command line as you show above.

This is true on Windows, Linux and Mac for me, all with Anaconda installations. Do you have a recommended alternative route for getting this to work? Should I have full LLVM / Clang installed separately or something?

One more note - I have found a reference on the internet somewhere (but can’t currently locate a link) to one other person having this problem. they had numba v0.28 installed, and they subsequently moved to a developer build of v0.29 and that fixed it.
I remember there was a brief discussion of whether this might be because of how the package / wheel was built in conda-forge, but it was never conclusive.

Update: I’ve found the link:

at Apr 17 2019 19:00

I can you see, @stuartarchibald, in the conversation!

Hi @rhjmoore

I think I’ve got to the bottom of this, it comes down to how LLVM was built. This being the important bit:

Numba from the numba channel is built with assertions on:

whereas the build from the conda-forge feedstock does not do this:

As a result, you will need to use a llvmlite that is linked against a LLVM which is built with assertions. One such llvmlite/numba combination is available from the Numba channel, e.g.:

conda create -n my_numba_env_with_assertions -c numba numba

I guess the reason I’ve never seen it fail is because I have LLVM built with assertions (for debugging!). Having just tried builds from conda-forge and the Anaconda distribution I can reproduce the lack of output against these packages.

I think the Numba docs could do with a fix to note this.

Hope this helps.

Thanks, that makes sense. I’ll try and create a new environment with the versions of numba, llvm etc. and confirm whether this fixes it for me.

For the documentation update do you want to re-open my original issue to link a patch against or shall I open a new issue?

Great, thanks for testing it.

I’ve reopened the original issue with a note about what the problem is, a patch to the documentation would be welcomed. Thanks!

Confirmed, that works.

Great, thanks for confirming, glad this is resolved.

@stuartarchibald
Sorry, I have one more question on the output of this now that I have it working:
Is there a way in which one can link each set of output about a loop to individual loops in your function if you have multiple loops?

Yes, but it’s a bit involved…

Setting debug=True in the @jit decorators means that DWARF info is emitted and this links the Python source to the generated LLVM through DILocation entries.

Example:

from llvmlite import binding as llvm
llvm.set_option('','--debug-only=loop-vectorize')

from numba import njit
import numpy as np

@njit(fastmath=True, debug=True)
def foo(n):
    acc = 0.0
    acc2 = 0.0

    for i in range(n):
        acc += np.sqrt(i) * np.sin(i)

    for i in range(n - 1):
        acc -= np.sqrt(i + 2) * np.cos(i)

    return acc + acc2

foo(1)

print(foo.inspect_llvm(foo.signatures[0]))

This produces two sets out output.

  1. the loop vectorize debug info (LV debug).
LV: Checking a loop in "_ZN8__main__7foo$241Ex" from <elided>.py:12:1
LV: Loop hints: force=? width=0 unroll=0
LV: Found a loop: for.end
LV: Found an induction variable.
LV: We can vectorize this loop!
LV: Found trip count: 0
LV: The Smallest and Widest types: 64 / 64 bits.
LV: The Widest register safe to use is: 128 bits.
LV: Found uniform instruction:   %exitcond33.not = icmp eq i64 %.136, %arg.n, !dbg !19
LV: Scalarizing:  %.201.le = sitofp i64 %.57.028 to double, !dbg !20
LV: Scalarizing:  %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
LV: Scalarizing:  %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
LV: Scalarizing:  %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{1\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
  N0 [label =
    "for.end:\n" +
      "WIDEN-PHI %acc.3.030 = phi %.254, 0.000000e+00\l" +
      "WIDEN-INDUCTION %.57.028 = phi %.136, 0\l" +
      "CLONE %.201.le = sitofp %.57.028\l" +
      "WIDEN-CALL %.202.le = call %.201.le, @sqrt\l" +
      "WIDEN-CALL %.230.le = call %.201.le, @llvm.sin.f64\l" +
      "CLONE %.244 = fmul %.202.le, %.230.le\l" +
      "CLONE %.254 = fadd %.244, %acc.3.030\l"
  ]
}
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{2\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
  N0 [label =
    "for.end:\n" +
      "WIDEN-PHI %acc.3.030 = phi %.254, 0.000000e+00\l" +
      "WIDEN-INDUCTION %.57.028 = phi %.136, 0\l" +
      "WIDEN\l""  %.201.le = sitofp %.57.028\l" +
      "REPLICATE %.202.le = call %.201.le, @sqrt\l" +
      "WIDEN-CALL %.230.le = call %.201.le, @llvm.sin.f64\l" +
      "WIDEN\l""  %.244 = fmul %.202.le, %.230.le\l" +
      "WIDEN\l""  %.254 = fadd %.244, %acc.3.030\l"
  ]
}
LV: Found an estimated cost of 0 for VF 1 For instruction:   %acc.3.030 = phi double [ %.254, %for.end ], [ 0.000000e+00, %for.end.preheader ]
LV: Found an estimated cost of 0 for VF 1 For instruction:   %.57.028 = phi i64 [ %.136, %for.end ], [ 0, %for.end.preheader ]
LV: Found an estimated cost of 1 for VF 1 For instruction:   %.136 = add nuw nsw i64 %.57.028, 1, !dbg !19
LV: Found an estimated cost of 1 for VF 1 For instruction:   %.201.le = sitofp i64 %.57.028 to double, !dbg !20
LV: Found an estimated cost of 10 for VF 1 For instruction:   %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
LV: Found an estimated cost of 10 for VF 1 For instruction:   %.230.le = tail call fast double @llvm.sin.f64(double %.201.le), !dbg !20
LV: Found an estimated cost of 2 for VF 1 For instruction:   %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
LV: Found an estimated cost of 2 for VF 1 For instruction:   %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond33.not = icmp eq i64 %.136, %arg.n, !dbg !19
LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond33.not, label %B48, label %for.end, !dbg !19
LV: Scalar loop costs: 27.
LV: Found an estimated cost of 0 for VF 2 For instruction:   %acc.3.030 = phi double [ %.254, %for.end ], [ 0.000000e+00, %for.end.preheader ]
LV: Found an estimated cost of 0 for VF 2 For instruction:   %.57.028 = phi i64 [ %.136, %for.end ], [ 0, %for.end.preheader ]
LV: Found an estimated cost of 1 for VF 2 For instruction:   %.136 = add nuw nsw i64 %.57.028, 1, !dbg !19
LV: Found an estimated cost of 20 for VF 2 For instruction:   %.201.le = sitofp i64 %.57.028 to double, !dbg !20
LV: Found an estimated cost of 22 for VF 2 For instruction:   %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
LV: Found an estimated cost of 22 for VF 2 For instruction:   %.230.le = tail call fast double @llvm.sin.f64(double %.201.le), !dbg !20
LV: Found an estimated cost of 2 for VF 2 For instruction:   %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
LV: Found an estimated cost of 2 for VF 2 For instruction:   %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond33.not = icmp eq i64 %.136, %arg.n, !dbg !19
LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond33.not, label %B48, label %for.end, !dbg !19
LV: Vector loop of width 2 costs: 35.
LV: Selecting VF: 1.
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 3
LV(REG): At #4 Interval # 3
LV(REG): At #5 Interval # 4
LV(REG): At #6 Interval # 4
LV(REG): At #7 Interval # 3
LV(REG): At #8 Interval # 2
LV(REG): VF = 1
LV(REG): Found max usage: 1 item
LV(REG): RegisterClass: Generic::ScalarRC, 4 registers
LV(REG): Found invariant usage: 0 item
LV: The target has 16 registers of Generic::ScalarRC register class
LV: Loop cost is 27
LV: Not Interleaving.
LV: Vectorization is possible but not beneficial.
LV: Interleaving is not beneficial.

LV: Checking a loop in "_ZN8__main__7foo$241Ex" from <elided>.py:15:1
LV: Loop hints: force=? width=0 unroll=0
LV: Found a loop: B62
LV: Found an induction variable.
LV: We can vectorize this loop!
LV: Found trip count: 0
LV: The Smallest and Widest types: 64 / 64 bits.
LV: The Widest register safe to use is: 128 bits.
LV: Found uniform instruction:   %exitcond.not = icmp eq i64 %.411, %spec.select23, !dbg !21
LV: Scalarizing:  %.473 = add nuw nsw i64 %.334.025, 2, !dbg !22
LV: Scalarizing:  %.487.le = sitofp i64 %.473 to double, !dbg !22
LV: Scalarizing:  %.517.le = sitofp i64 %.334.025 to double, !dbg !22
LV: Scalarizing:  %.532 = fmul fast double %.488.le, %.518.le, !dbg !22
LV: Scalarizing:  %.542 = fsub fast double %acc.4.027, %.532, !dbg !22
LV: Scalarizing:  %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !22
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{1\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
  N0 [label =
    "B62:\n" +
      "WIDEN-PHI %acc.4.027 = phi %.542, %.254.lcssa\l" +
      "WIDEN-INDUCTION %.334.025 = phi %.411, 0\l" +
      "CLONE %.473 = add %.334.025, 2\l" +
      "CLONE %.487.le = sitofp %.473\l" +
      "WIDEN-CALL %.488.le = call %.487.le, @sqrt\l" +
      "CLONE %.517.le = sitofp %.334.025\l" +
      "WIDEN-CALL %.518.le = call %.517.le, @llvm.cos.f64\l" +
      "CLONE %.532 = fmul %.488.le, %.518.le\l" +
      "CLONE %.542 = fsub %acc.4.027, %.532\l"
  ]
}
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{2\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
  N0 [label =
    "B62:\n" +
      "WIDEN-PHI %acc.4.027 = phi %.542, %.254.lcssa\l" +
      "WIDEN-INDUCTION %.334.025 = phi %.411, 0\l" +
      "WIDEN\l""  %.473 = add %.334.025, 2\l" +
      "WIDEN\l""  %.487.le = sitofp %.473\l" +
      "REPLICATE %.488.le = call %.487.le, @sqrt\l" +
      "WIDEN\l""  %.517.le = sitofp %.334.025\l" +
      "WIDEN-CALL %.518.le = call %.517.le, @llvm.cos.f64\l" +
      "WIDEN\l""  %.532 = fmul %.488.le, %.518.le\l" +
      "WIDEN\l""  %.542 = fsub %acc.4.027, %.532\l"
  ]
}
LV: Found an estimated cost of 0 for VF 1 For instruction:   %acc.4.027 = phi double [ %.542, %B62 ], [ %.254.lcssa, %B62.preheader ]
LV: Found an estimated cost of 0 for VF 1 For instruction:   %.334.025 = phi i64 [ %.411, %B62 ], [ 0, %B62.preheader ]
LV: Found an estimated cost of 1 for VF 1 For instruction:   %.411 = add nuw nsw i64 %.334.025, 1, !dbg !21
LV: Found an estimated cost of 1 for VF 1 For instruction:   %.473 = add nuw nsw i64 %.334.025, 2, !dbg !22
LV: Found an estimated cost of 1 for VF 1 For instruction:   %.487.le = sitofp i64 %.473 to double, !dbg !22
LV: Found an estimated cost of 10 for VF 1 For instruction:   %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !22
LV: Found an estimated cost of 1 for VF 1 For instruction:   %.517.le = sitofp i64 %.334.025 to double, !dbg !22
LV: Found an estimated cost of 10 for VF 1 For instruction:   %.518.le = tail call fast double @llvm.cos.f64(double %.517.le), !dbg !22
LV: Found an estimated cost of 2 for VF 1 For instruction:   %.532 = fmul fast double %.488.le, %.518.le, !dbg !22
LV: Found an estimated cost of 2 for VF 1 For instruction:   %.542 = fsub fast double %acc.4.027, %.532, !dbg !22
LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond.not = icmp eq i64 %.411, %spec.select23, !dbg !21
LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond.not, label %B94.loopexit, label %B62, !dbg !21
LV: Scalar loop costs: 29.
LV: Found an estimated cost of 0 for VF 2 For instruction:   %acc.4.027 = phi double [ %.542, %B62 ], [ %.254.lcssa, %B62.preheader ]
LV: Found an estimated cost of 0 for VF 2 For instruction:   %.334.025 = phi i64 [ %.411, %B62 ], [ 0, %B62.preheader ]
LV: Found an estimated cost of 1 for VF 2 For instruction:   %.411 = add nuw nsw i64 %.334.025, 1, !dbg !21
LV: Found an estimated cost of 1 for VF 2 For instruction:   %.473 = add nuw nsw i64 %.334.025, 2, !dbg !22
LV: Found an estimated cost of 20 for VF 2 For instruction:   %.487.le = sitofp i64 %.473 to double, !dbg !22
LV: Found an estimated cost of 22 for VF 2 For instruction:   %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !22
LV: Found an estimated cost of 20 for VF 2 For instruction:   %.517.le = sitofp i64 %.334.025 to double, !dbg !22
LV: Found an estimated cost of 22 for VF 2 For instruction:   %.518.le = tail call fast double @llvm.cos.f64(double %.517.le), !dbg !22
LV: Found an estimated cost of 2 for VF 2 For instruction:   %.532 = fmul fast double %.488.le, %.518.le, !dbg !22
LV: Found an estimated cost of 2 for VF 2 For instruction:   %.542 = fsub fast double %acc.4.027, %.532, !dbg !22
LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond.not = icmp eq i64 %.411, %spec.select23, !dbg !21
LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond.not, label %B94.loopexit, label %B62, !dbg !21
LV: Vector loop of width 2 costs: 45.
LV: Selecting VF: 1.
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 3
LV(REG): At #4 Interval # 4
LV(REG): At #5 Interval # 4
LV(REG): At #6 Interval # 4
LV(REG): At #7 Interval # 4
LV(REG): At #8 Interval # 4
LV(REG): At #9 Interval # 3
LV(REG): At #10 Interval # 2
LV(REG): VF = 1
LV(REG): Found max usage: 1 item
LV(REG): RegisterClass: Generic::ScalarRC, 4 registers
LV(REG): Found invariant usage: 1 item
LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
LV: The target has 16 registers of Generic::ScalarRC register class
LV: Loop cost is 29
LV: Not Interleaving.
LV: Vectorization is possible but not beneficial.
LV: Interleaving is not beneficial.

  1. the LLVM IR from the module that Numba generated.
; ModuleID = 'foo'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@"_ZN08NumbaEnv8__main__7foo$241Ex" = common local_unnamed_addr global i8* null
@.const.foo = internal constant [4 x i8] c"foo\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__7foo$241Ex" = internal constant [54 x i8] c"missing Environment: _ZN08NumbaEnv8__main__7foo$241Ex\00"

; Function Attrs: nofree noinline nounwind
define i32 @"_ZN8__main__7foo$241Ex"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture readnone %excinfo, i64 %arg.n) local_unnamed_addr #0 !dbg !4 {
entry:
  call void @llvm.dbg.value(metadata i64 0, metadata !7, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !10, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !12, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !13, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !16, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata i64 %arg.n, metadata !7, metadata !DIExpression()), !dbg !9
  %.71.inv = icmp sgt i64 %arg.n, 0
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !12, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
  br i1 %.71.inv, label %for.end.preheader, label %B94, !dbg !19

for.end.preheader:                                ; preds = %entry
  %0 = add i64 %arg.n, -1, !dbg !19
  %xtraiter39 = and i64 %arg.n, 3, !dbg !19
  %1 = icmp ult i64 %0, 3, !dbg !19
  br i1 %1, label %B48.unr-lcssa, label %for.end.preheader.new, !dbg !19

for.end.preheader.new:                            ; preds = %for.end.preheader
  %unroll_iter42 = and i64 %arg.n, -4, !dbg !19
  br label %for.end, !dbg !19

B48.unr-lcssa:                                    ; preds = %for.end, %for.end.preheader
  %.254.lcssa.ph = phi double [ undef, %for.end.preheader ], [ %.254.3, %for.end ]
  %acc.3.030.unr = phi double [ 0.000000e+00, %for.end.preheader ], [ %.254.3, %for.end ]
  %.57.028.unr = phi i64 [ 0, %for.end.preheader ], [ %.136.3, %for.end ]
  %lcmp.mod40.not = icmp eq i64 %xtraiter39, 0, !dbg !19
  br i1 %lcmp.mod40.not, label %B48, label %for.end.epil.preheader, !dbg !19

for.end.epil.preheader:                           ; preds = %B48.unr-lcssa
  br label %for.end.epil, !dbg !19

for.end.epil:                                     ; preds = %for.end.epil.preheader, %for.end.epil
  %acc.3.030.epil = phi double [ %.254.epil, %for.end.epil ], [ %acc.3.030.unr, %for.end.epil.preheader ]
  %.57.028.epil = phi i64 [ %.136.epil, %for.end.epil ], [ %.57.028.unr, %for.end.epil.preheader ]
  %epil.iter = phi i64 [ %epil.iter.sub, %for.end.epil ], [ %xtraiter39, %for.end.epil.preheader ]
  call void @llvm.dbg.value(metadata double %acc.3.030.epil, metadata !12, metadata !DIExpression()), !dbg !9
  %.136.epil = add nuw nsw i64 %.57.028.epil, 1, !dbg !19
  %.201.le.epil = sitofp i64 %.57.028.epil to double, !dbg !20
  %.202.le.epil = tail call fast double @sqrt(double %.201.le.epil), !dbg !20
  %.230.le.epil = tail call fast double @llvm.sin.f64(double %.201.le.epil), !dbg !20
  call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
  %.244.epil = fmul fast double %.202.le.epil, %.230.le.epil, !dbg !20
  %.254.epil = fadd fast double %.244.epil, %acc.3.030.epil, !dbg !20
  call void @llvm.dbg.value(metadata double %.254.epil, metadata !12, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
  %epil.iter.sub = add i64 %epil.iter, -1, !dbg !19
  %epil.iter.cmp.not = icmp eq i64 %epil.iter.sub, 0, !dbg !19
  br i1 %epil.iter.cmp.not, label %B48, label %for.end.epil, !dbg !19, !llvm.loop !21

B48:                                              ; preds = %for.end.epil, %B48.unr-lcssa
  %.254.lcssa = phi double [ %.254.lcssa.ph, %B48.unr-lcssa ], [ %.254.epil, %for.end.epil ], !dbg !20
  call void @llvm.dbg.value(metadata double %.254.lcssa, metadata !16, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !12, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata i64 0, metadata !7, metadata !DIExpression()), !dbg !9
  %.348 = icmp slt i64 %arg.n, 2, !dbg !23
  %.294 = add nsw i64 %arg.n, -1, !dbg !23
  %spec.select23 = select i1 %.348, i64 0, i64 %.294
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
  %.39824 = icmp sgt i64 %spec.select23, 0, !dbg !23
  br i1 %.39824, label %B62.preheader, label %B94, !dbg !23

B62.preheader:                                    ; preds = %B48
  %2 = icmp eq i64 %spec.select23, 1, !dbg !23
  br i1 %2, label %B94.loopexit.unr-lcssa, label %B62.preheader.new, !dbg !23

B62.preheader.new:                                ; preds = %B62.preheader
  %unroll_iter = and i64 %spec.select23, -2, !dbg !23
  br label %B62, !dbg !23

B62:                                              ; preds = %B62, %B62.preheader.new
  %acc.4.027 = phi double [ %.254.lcssa, %B62.preheader.new ], [ %.542.1, %B62 ]
  %.334.025 = phi i64 [ 0, %B62.preheader.new ], [ %.411.1, %B62 ]
  call void @llvm.dbg.value(metadata double %acc.4.027, metadata !16, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
  %3 = add i64 %.334.025, 1, !dbg !24
  %4 = add i64 %.334.025, 2, !dbg !24
  %.487.le = sitofp i64 %4 to double, !dbg !24
  %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !24
  %.517.le = sitofp i64 %.334.025 to double, !dbg !24
  %.518.le = tail call fast double @llvm.cos.f64(double %.517.le), !dbg !24
  %.532 = fmul fast double %.488.le, %.518.le, !dbg !24
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double undef, metadata !16, metadata !DIExpression()), !dbg !9
  %.411.1 = add nuw nsw i64 %.334.025, 2, !dbg !23
  %5 = add i64 %.334.025, 3, !dbg !24
  %.487.le.1 = sitofp i64 %5 to double, !dbg !24
  %.488.le.1 = tail call fast double @sqrt(double %.487.le.1), !dbg !24
  %.517.le.1 = sitofp i64 %3 to double, !dbg !24
  %.518.le.1 = tail call fast double @llvm.cos.f64(double %.517.le.1), !dbg !24
  %.532.1 = fmul fast double %.488.le.1, %.518.le.1, !dbg !24
  %6 = fadd fast double %.532, %.532.1, !dbg !24
  %.542.1 = fsub fast double %acc.4.027, %6, !dbg !24
  call void @llvm.dbg.value(metadata double %.542.1, metadata !16, metadata !DIExpression()), !dbg !9
  %niter.ncmp.1 = icmp eq i64 %unroll_iter, %.411.1, !dbg !23
  br i1 %niter.ncmp.1, label %B94.loopexit.unr-lcssa, label %B62, !dbg !23

B94.loopexit.unr-lcssa:                           ; preds = %B62, %B62.preheader
  %.542.lcssa.ph = phi double [ undef, %B62.preheader ], [ %.542.1, %B62 ]
  %acc.4.027.unr = phi double [ %.254.lcssa, %B62.preheader ], [ %.542.1, %B62 ]
  %.334.025.unr = phi i64 [ 0, %B62.preheader ], [ %.411.1, %B62 ]
  %7 = and i64 %spec.select23, 1, !dbg !23
  %lcmp.mod.not = icmp eq i64 %7, 0, !dbg !23
  br i1 %lcmp.mod.not, label %B94, label %B94.loopexit.epilog-lcssa, !dbg !23

B94.loopexit.epilog-lcssa:                        ; preds = %B94.loopexit.unr-lcssa
  call void @llvm.dbg.value(metadata double %acc.4.027.unr, metadata !16, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double undef, metadata !16, metadata !DIExpression()), !dbg !9
  %.473.epil = add nuw nsw i64 %.334.025.unr, 2, !dbg !24
  %.487.le.epil = sitofp i64 %.473.epil to double, !dbg !24
  %.488.le.epil = tail call fast double @sqrt(double %.487.le.epil), !dbg !24
  %.517.le.epil = sitofp i64 %.334.025.unr to double, !dbg !24
  %.518.le.epil = tail call fast double @llvm.cos.f64(double %.517.le.epil), !dbg !24
  %.532.epil = fmul fast double %.488.le.epil, %.518.le.epil, !dbg !24
  %.542.epil = fsub fast double %acc.4.027.unr, %.532.epil, !dbg !24
  call void @llvm.dbg.value(metadata double %.542.epil, metadata !16, metadata !DIExpression()), !dbg !9
  br label %B94, !dbg !25

B94:                                              ; preds = %B94.loopexit.epilog-lcssa, %B94.loopexit.unr-lcssa, %entry, %B48
  %acc.4.0.lcssa = phi double [ %.254.lcssa, %B48 ], [ 0.000000e+00, %entry ], [ %.542.lcssa.ph, %B94.loopexit.unr-lcssa ], [ %.542.epil, %B94.loopexit.epilog-lcssa ], !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !13, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !16, metadata !DIExpression()), !dbg !9
  store double %acc.4.0.lcssa, double* %retptr, align 8, !dbg !25
  ret i32 0, !dbg !25

for.end:                                          ; preds = %for.end, %for.end.preheader.new
  %acc.3.030 = phi double [ 0.000000e+00, %for.end.preheader.new ], [ %.254.3, %for.end ]
  %.57.028 = phi i64 [ 0, %for.end.preheader.new ], [ %11, %for.end ]
  call void @llvm.dbg.value(metadata double %acc.3.030, metadata !12, metadata !DIExpression()), !dbg !9
  %.201.le = sitofp i64 %.57.028 to double, !dbg !20
  %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
  %.230.le = tail call fast double @llvm.sin.f64(double %.201.le), !dbg !20
  call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
  %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
  %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
  call void @llvm.dbg.value(metadata double %.254, metadata !12, metadata !DIExpression()), !dbg !9
  call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
  %8 = add i64 %.57.028, 1, !dbg !20
  %.201.le.1 = sitofp i64 %8 to double, !dbg !20
  %.202.le.1 = tail call fast double @sqrt(double %.201.le.1), !dbg !20
  %.230.le.1 = tail call fast double @llvm.sin.f64(double %.201.le.1), !dbg !20
  %.244.1 = fmul fast double %.202.le.1, %.230.le.1, !dbg !20
  %.254.1 = fadd fast double %.244.1, %.254, !dbg !20
  call void @llvm.dbg.value(metadata double %.254.1, metadata !12, metadata !DIExpression()), !dbg !9
  %9 = add i64 %8, 1, !dbg !20
  %.201.le.2 = sitofp i64 %9 to double, !dbg !20
  %.202.le.2 = tail call fast double @sqrt(double %.201.le.2), !dbg !20
  %.230.le.2 = tail call fast double @llvm.sin.f64(double %.201.le.2), !dbg !20
  %.244.2 = fmul fast double %.202.le.2, %.230.le.2, !dbg !20
  %.254.2 = fadd fast double %.244.2, %.254.1, !dbg !20
  call void @llvm.dbg.value(metadata double %.254.2, metadata !12, metadata !DIExpression()), !dbg !9
  %.136.3 = add nuw nsw i64 %.57.028, 4, !dbg !19
  %10 = add i64 %9, 1, !dbg !20
  %.201.le.3 = sitofp i64 %10 to double, !dbg !20
  %.202.le.3 = tail call fast double @sqrt(double %.201.le.3), !dbg !20
  %.230.le.3 = tail call fast double @llvm.sin.f64(double %.201.le.3), !dbg !20
  %.244.3 = fmul fast double %.202.le.3, %.230.le.3, !dbg !20
  %.254.3 = fadd fast double %.244.3, %.254.2, !dbg !20
  call void @llvm.dbg.value(metadata double %.254.3, metadata !12, metadata !DIExpression()), !dbg !9
  %niter43.ncmp.3 = icmp eq i64 %unroll_iter42, %.136.3, !dbg !19
  %11 = add i64 %10, 1, !dbg !19
  br i1 %niter43.ncmp.3, label %B48.unr-lcssa, label %for.end, !dbg !19
}

; Function Attrs: nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata, metadata, metadata) #1

; Function Attrs: nofree nounwind readonly
declare double @sqrt(double) local_unnamed_addr #2

; Function Attrs: nounwind readnone speculatable willreturn
declare double @llvm.sin.f64(double) #1

; Function Attrs: nounwind readnone speculatable willreturn
declare double @llvm.cos.f64(double) #1

define i8* @"_ZN7cpython8__main__7foo$241Ex"(i8* nocapture readnone %py_closure, i8* %py_args, i8* nocapture readnone %py_kws) local_unnamed_addr {
entry:
  %.5 = alloca i8*, align 8
  %.6 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.const.foo, i64 0, i64 0), i64 1, i64 1, i8** nonnull %.5)
  %.7 = icmp eq i32 %.6, 0
  %.36 = alloca double, align 8
  store double 0.000000e+00, double* %.36, align 8
  br i1 %.7, label %entry.if, label %entry.endif, !prof !26

entry.if:                                         ; preds = %entry.endif.endif.endif, %entry
  ret i8* null

entry.endif:                                      ; preds = %entry
  %.11 = load i8*, i8** @"_ZN08NumbaEnv8__main__7foo$241Ex", align 8
  %.16 = icmp eq i8* %.11, null
  br i1 %.16, label %entry.endif.if, label %entry.endif.endif, !prof !26

entry.endif.if:                                   ; preds = %entry.endif
  call void @PyErr_SetString(i8* nonnull @PyExc_RuntimeError, i8* getelementptr inbounds ([54 x i8], [54 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__7foo$241Ex", i64 0, i64 0))
  ret i8* null

entry.endif.endif:                                ; preds = %entry.endif
  %.20 = load i8*, i8** %.5, align 8
  %.23 = call i8* @PyNumber_Long(i8* %.20)
  %.24.not = icmp eq i8* %.23, null
  br i1 %.24.not, label %entry.endif.endif.endif, label %entry.endif.endif.if, !prof !26

entry.endif.endif.if:                             ; preds = %entry.endif.endif
  %.26 = call i64 @PyLong_AsLongLong(i8* nonnull %.23)
  call void @Py_DecRef(i8* nonnull %.23)
  br label %entry.endif.endif.endif

entry.endif.endif.endif:                          ; preds = %entry.endif.endif, %entry.endif.endif.if
  %.21.0 = phi i64 [ %.26, %entry.endif.endif.if ], [ 0, %entry.endif.endif ]
  %.31 = call i8* @PyErr_Occurred()
  %.32.not = icmp eq i8* %.31, null
  br i1 %.32.not, label %entry.endif.endif.endif.endif, label %entry.if, !prof !27

entry.endif.endif.endif.endif:                    ; preds = %entry.endif.endif.endif
  store double 0.000000e+00, double* %.36, align 8
  %.40 = call i32 @"_ZN8__main__7foo$241Ex"(double* nonnull %.36, { i8*, i32, i8* }** undef, i64 %.21.0) #5
  %.50 = load double, double* %.36, align 8
  %.55 = call i8* @PyFloat_FromDouble(double %.50)
  ret i8* %.55
}

declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...) local_unnamed_addr

declare void @PyErr_SetString(i8*, i8*) local_unnamed_addr

declare i8* @PyNumber_Long(i8*) local_unnamed_addr

declare i64 @PyLong_AsLongLong(i8*) local_unnamed_addr

declare void @Py_DecRef(i8*) local_unnamed_addr

declare i8* @PyErr_Occurred() local_unnamed_addr

declare i8* @PyFloat_FromDouble(double) local_unnamed_addr

; Function Attrs: nofree nounwind
define double @"cfunc._ZN8__main__7foo$241Ex"(i64 %.1) local_unnamed_addr #3 {
entry:
  %.3 = alloca double, align 8
  store double 0.000000e+00, double* %.3, align 8
  %.7 = call i32 @"_ZN8__main__7foo$241Ex"(double* nonnull %.3, { i8*, i32, i8* }** undef, i64 %.1) #5
  %.17 = load double, double* %.3, align 8
  ret double %.17
}

; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #4

attributes #0 = { nofree noinline nounwind }
attributes #1 = { nounwind readnone speculatable willreturn }
attributes #2 = { nofree nounwind readonly }
attributes #3 = { nofree nounwind }
attributes #4 = { nounwind }
attributes #5 = { noinline }

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2, !3}

!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "Numba", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "<elided>.py", directory: "<elided>")
!2 = !{i32 2, !"Dwarf Version", i32 4}
!3 = !{i32 2, !"Debug Info Version", i32 3}
!4 = distinct !DISubprogram(name: "foo", linkageName: "_ZN8__main__7foo$241Ex", scope: !1, file: !1, line: 7, type: !5, scopeLine: 7, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
!5 = !DISubroutineType(types: !6)
!6 = !{}
!7 = !DILocalVariable(name: "n", scope: !4, file: !1, line: 7, type: !8)
!8 = !DIBasicType(name: "i64", size: 64, encoding: DW_ATE_unsigned)
!9 = !DILocation(line: 0, scope: !4)
!10 = !DILocalVariable(name: "acc", scope: !4, file: !1, line: 9, type: !11)
!11 = !DIBasicType(name: "double", size: 64, encoding: DW_ATE_float)
!12 = !DILocalVariable(name: "acc$3", scope: !4, file: !1, line: 9, type: !11)
!13 = !DILocalVariable(name: "acc2", scope: !4, file: !1, line: 10, type: !11)
!14 = !DILocalVariable(name: "i", scope: !4, file: !1, line: 12, type: !8)
!15 = !DILocalVariable(name: "acc$1", scope: !4, file: !1, line: 13, type: !11)
!16 = !DILocalVariable(name: "acc$4", scope: !4, file: !1, line: 12, type: !11)
!17 = !DILocalVariable(name: "i$1", scope: !4, file: !1, line: 15, type: !8)
!18 = !DILocalVariable(name: "acc$2", scope: !4, file: !1, line: 16, type: !11)
!19 = !DILocation(line: 12, column: 1, scope: !4)
!20 = !DILocation(line: 13, column: 1, scope: !4)
!21 = distinct !{!21, !22}
!22 = !{!"llvm.loop.unroll.disable"}
!23 = !DILocation(line: 15, column: 1, scope: !4)
!24 = !DILocation(line: 16, column: 1, scope: !4)
!25 = !DILocation(line: 18, column: 1, scope: !4)
!26 = !{!"branch_weights", i32 1, i32 99}
!27 = !{!"branch_weights", i32 99, i32 1}

In the above the LV debug has e.g. LV: Found a loop: for.end, if you look in the LLVM IR there’s labels like for.end.preheader: and for.end.epil: these are derived from the original loop label for.end. If you then look at the !dbg markers on the instructions in the associated blocks you’ll see !19, !20 and !21. Referring to the metadata section at the bottom on the LLVM IR, !19 and !20 are DILocations and show that this loop is from Python source lines 12 and 13.

!19 = !DILocation(line: 12, column: 1, scope: !4)
!20 = !DILocation(line: 13, column: 1, scope: !4)

which correspond to:

    for i in range(n):
        acc += np.sqrt(i) * np.sin(i)

Presenting this sort of thing in an easy to use manner is something that we hope to get working one day!

Hope this helps?!