Yes, but it’s a bit involved…
Setting debug=True in the @jit decorators means that DWARF info is emitted and this links the Python source to the generated LLVM through DILocation entries.
Example:
from llvmlite import binding as llvm
llvm.set_option('','--debug-only=loop-vectorize')
from numba import njit
import numpy as np
@njit(fastmath=True, debug=True)
def foo(n):
acc = 0.0
acc2 = 0.0
for i in range(n):
acc += np.sqrt(i) * np.sin(i)
for i in range(n - 1):
acc -= np.sqrt(i + 2) * np.cos(i)
return acc + acc2
foo(1)
print(foo.inspect_llvm(foo.signatures[0]))
This produces two sets out output.
- the loop vectorize debug info (LV debug).
LV: Checking a loop in "_ZN8__main__7foo$241Ex" from <elided>.py:12:1
LV: Loop hints: force=? width=0 unroll=0
LV: Found a loop: for.end
LV: Found an induction variable.
LV: We can vectorize this loop!
LV: Found trip count: 0
LV: The Smallest and Widest types: 64 / 64 bits.
LV: The Widest register safe to use is: 128 bits.
LV: Found uniform instruction: %exitcond33.not = icmp eq i64 %.136, %arg.n, !dbg !19
LV: Scalarizing: %.201.le = sitofp i64 %.57.028 to double, !dbg !20
LV: Scalarizing: %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
LV: Scalarizing: %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
LV: Scalarizing: %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{1\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
N0 [label =
"for.end:\n" +
"WIDEN-PHI %acc.3.030 = phi %.254, 0.000000e+00\l" +
"WIDEN-INDUCTION %.57.028 = phi %.136, 0\l" +
"CLONE %.201.le = sitofp %.57.028\l" +
"WIDEN-CALL %.202.le = call %.201.le, @sqrt\l" +
"WIDEN-CALL %.230.le = call %.201.le, @llvm.sin.f64\l" +
"CLONE %.244 = fmul %.202.le, %.230.le\l" +
"CLONE %.254 = fadd %.244, %acc.3.030\l"
]
}
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{2\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
N0 [label =
"for.end:\n" +
"WIDEN-PHI %acc.3.030 = phi %.254, 0.000000e+00\l" +
"WIDEN-INDUCTION %.57.028 = phi %.136, 0\l" +
"WIDEN\l"" %.201.le = sitofp %.57.028\l" +
"REPLICATE %.202.le = call %.201.le, @sqrt\l" +
"WIDEN-CALL %.230.le = call %.201.le, @llvm.sin.f64\l" +
"WIDEN\l"" %.244 = fmul %.202.le, %.230.le\l" +
"WIDEN\l"" %.254 = fadd %.244, %acc.3.030\l"
]
}
LV: Found an estimated cost of 0 for VF 1 For instruction: %acc.3.030 = phi double [ %.254, %for.end ], [ 0.000000e+00, %for.end.preheader ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %.57.028 = phi i64 [ %.136, %for.end ], [ 0, %for.end.preheader ]
LV: Found an estimated cost of 1 for VF 1 For instruction: %.136 = add nuw nsw i64 %.57.028, 1, !dbg !19
LV: Found an estimated cost of 1 for VF 1 For instruction: %.201.le = sitofp i64 %.57.028 to double, !dbg !20
LV: Found an estimated cost of 10 for VF 1 For instruction: %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
LV: Found an estimated cost of 10 for VF 1 For instruction: %.230.le = tail call fast double @llvm.sin.f64(double %.201.le), !dbg !20
LV: Found an estimated cost of 2 for VF 1 For instruction: %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
LV: Found an estimated cost of 2 for VF 1 For instruction: %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond33.not = icmp eq i64 %.136, %arg.n, !dbg !19
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond33.not, label %B48, label %for.end, !dbg !19
LV: Scalar loop costs: 27.
LV: Found an estimated cost of 0 for VF 2 For instruction: %acc.3.030 = phi double [ %.254, %for.end ], [ 0.000000e+00, %for.end.preheader ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %.57.028 = phi i64 [ %.136, %for.end ], [ 0, %for.end.preheader ]
LV: Found an estimated cost of 1 for VF 2 For instruction: %.136 = add nuw nsw i64 %.57.028, 1, !dbg !19
LV: Found an estimated cost of 20 for VF 2 For instruction: %.201.le = sitofp i64 %.57.028 to double, !dbg !20
LV: Found an estimated cost of 22 for VF 2 For instruction: %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
LV: Found an estimated cost of 22 for VF 2 For instruction: %.230.le = tail call fast double @llvm.sin.f64(double %.201.le), !dbg !20
LV: Found an estimated cost of 2 for VF 2 For instruction: %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
LV: Found an estimated cost of 2 for VF 2 For instruction: %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond33.not = icmp eq i64 %.136, %arg.n, !dbg !19
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond33.not, label %B48, label %for.end, !dbg !19
LV: Vector loop of width 2 costs: 35.
LV: Selecting VF: 1.
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 3
LV(REG): At #4 Interval # 3
LV(REG): At #5 Interval # 4
LV(REG): At #6 Interval # 4
LV(REG): At #7 Interval # 3
LV(REG): At #8 Interval # 2
LV(REG): VF = 1
LV(REG): Found max usage: 1 item
LV(REG): RegisterClass: Generic::ScalarRC, 4 registers
LV(REG): Found invariant usage: 0 item
LV: The target has 16 registers of Generic::ScalarRC register class
LV: Loop cost is 27
LV: Not Interleaving.
LV: Vectorization is possible but not beneficial.
LV: Interleaving is not beneficial.
LV: Checking a loop in "_ZN8__main__7foo$241Ex" from <elided>.py:15:1
LV: Loop hints: force=? width=0 unroll=0
LV: Found a loop: B62
LV: Found an induction variable.
LV: We can vectorize this loop!
LV: Found trip count: 0
LV: The Smallest and Widest types: 64 / 64 bits.
LV: The Widest register safe to use is: 128 bits.
LV: Found uniform instruction: %exitcond.not = icmp eq i64 %.411, %spec.select23, !dbg !21
LV: Scalarizing: %.473 = add nuw nsw i64 %.334.025, 2, !dbg !22
LV: Scalarizing: %.487.le = sitofp i64 %.473 to double, !dbg !22
LV: Scalarizing: %.517.le = sitofp i64 %.334.025 to double, !dbg !22
LV: Scalarizing: %.532 = fmul fast double %.488.le, %.518.le, !dbg !22
LV: Scalarizing: %.542 = fsub fast double %acc.4.027, %.532, !dbg !22
LV: Scalarizing: %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !22
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{1\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
N0 [label =
"B62:\n" +
"WIDEN-PHI %acc.4.027 = phi %.542, %.254.lcssa\l" +
"WIDEN-INDUCTION %.334.025 = phi %.411, 0\l" +
"CLONE %.473 = add %.334.025, 2\l" +
"CLONE %.487.le = sitofp %.473\l" +
"WIDEN-CALL %.488.le = call %.487.le, @sqrt\l" +
"CLONE %.517.le = sitofp %.334.025\l" +
"WIDEN-CALL %.518.le = call %.517.le, @llvm.cos.f64\l" +
"CLONE %.532 = fmul %.488.le, %.518.le\l" +
"CLONE %.542 = fsub %acc.4.027, %.532\l"
]
}
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{2\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
N0 [label =
"B62:\n" +
"WIDEN-PHI %acc.4.027 = phi %.542, %.254.lcssa\l" +
"WIDEN-INDUCTION %.334.025 = phi %.411, 0\l" +
"WIDEN\l"" %.473 = add %.334.025, 2\l" +
"WIDEN\l"" %.487.le = sitofp %.473\l" +
"REPLICATE %.488.le = call %.487.le, @sqrt\l" +
"WIDEN\l"" %.517.le = sitofp %.334.025\l" +
"WIDEN-CALL %.518.le = call %.517.le, @llvm.cos.f64\l" +
"WIDEN\l"" %.532 = fmul %.488.le, %.518.le\l" +
"WIDEN\l"" %.542 = fsub %acc.4.027, %.532\l"
]
}
LV: Found an estimated cost of 0 for VF 1 For instruction: %acc.4.027 = phi double [ %.542, %B62 ], [ %.254.lcssa, %B62.preheader ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %.334.025 = phi i64 [ %.411, %B62 ], [ 0, %B62.preheader ]
LV: Found an estimated cost of 1 for VF 1 For instruction: %.411 = add nuw nsw i64 %.334.025, 1, !dbg !21
LV: Found an estimated cost of 1 for VF 1 For instruction: %.473 = add nuw nsw i64 %.334.025, 2, !dbg !22
LV: Found an estimated cost of 1 for VF 1 For instruction: %.487.le = sitofp i64 %.473 to double, !dbg !22
LV: Found an estimated cost of 10 for VF 1 For instruction: %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !22
LV: Found an estimated cost of 1 for VF 1 For instruction: %.517.le = sitofp i64 %.334.025 to double, !dbg !22
LV: Found an estimated cost of 10 for VF 1 For instruction: %.518.le = tail call fast double @llvm.cos.f64(double %.517.le), !dbg !22
LV: Found an estimated cost of 2 for VF 1 For instruction: %.532 = fmul fast double %.488.le, %.518.le, !dbg !22
LV: Found an estimated cost of 2 for VF 1 For instruction: %.542 = fsub fast double %acc.4.027, %.532, !dbg !22
LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond.not = icmp eq i64 %.411, %spec.select23, !dbg !21
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond.not, label %B94.loopexit, label %B62, !dbg !21
LV: Scalar loop costs: 29.
LV: Found an estimated cost of 0 for VF 2 For instruction: %acc.4.027 = phi double [ %.542, %B62 ], [ %.254.lcssa, %B62.preheader ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %.334.025 = phi i64 [ %.411, %B62 ], [ 0, %B62.preheader ]
LV: Found an estimated cost of 1 for VF 2 For instruction: %.411 = add nuw nsw i64 %.334.025, 1, !dbg !21
LV: Found an estimated cost of 1 for VF 2 For instruction: %.473 = add nuw nsw i64 %.334.025, 2, !dbg !22
LV: Found an estimated cost of 20 for VF 2 For instruction: %.487.le = sitofp i64 %.473 to double, !dbg !22
LV: Found an estimated cost of 22 for VF 2 For instruction: %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !22
LV: Found an estimated cost of 20 for VF 2 For instruction: %.517.le = sitofp i64 %.334.025 to double, !dbg !22
LV: Found an estimated cost of 22 for VF 2 For instruction: %.518.le = tail call fast double @llvm.cos.f64(double %.517.le), !dbg !22
LV: Found an estimated cost of 2 for VF 2 For instruction: %.532 = fmul fast double %.488.le, %.518.le, !dbg !22
LV: Found an estimated cost of 2 for VF 2 For instruction: %.542 = fsub fast double %acc.4.027, %.532, !dbg !22
LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond.not = icmp eq i64 %.411, %spec.select23, !dbg !21
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond.not, label %B94.loopexit, label %B62, !dbg !21
LV: Vector loop of width 2 costs: 45.
LV: Selecting VF: 1.
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 3
LV(REG): At #4 Interval # 4
LV(REG): At #5 Interval # 4
LV(REG): At #6 Interval # 4
LV(REG): At #7 Interval # 4
LV(REG): At #8 Interval # 4
LV(REG): At #9 Interval # 3
LV(REG): At #10 Interval # 2
LV(REG): VF = 1
LV(REG): Found max usage: 1 item
LV(REG): RegisterClass: Generic::ScalarRC, 4 registers
LV(REG): Found invariant usage: 1 item
LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
LV: The target has 16 registers of Generic::ScalarRC register class
LV: Loop cost is 29
LV: Not Interleaving.
LV: Vectorization is possible but not beneficial.
LV: Interleaving is not beneficial.
- the LLVM IR from the module that Numba generated.
; ModuleID = 'foo'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@"_ZN08NumbaEnv8__main__7foo$241Ex" = common local_unnamed_addr global i8* null
@.const.foo = internal constant [4 x i8] c"foo\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__7foo$241Ex" = internal constant [54 x i8] c"missing Environment: _ZN08NumbaEnv8__main__7foo$241Ex\00"
; Function Attrs: nofree noinline nounwind
define i32 @"_ZN8__main__7foo$241Ex"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture readnone %excinfo, i64 %arg.n) local_unnamed_addr #0 !dbg !4 {
entry:
call void @llvm.dbg.value(metadata i64 0, metadata !7, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !10, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !13, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !16, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 %arg.n, metadata !7, metadata !DIExpression()), !dbg !9
%.71.inv = icmp sgt i64 %arg.n, 0
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
br i1 %.71.inv, label %for.end.preheader, label %B94, !dbg !19
for.end.preheader: ; preds = %entry
%0 = add i64 %arg.n, -1, !dbg !19
%xtraiter39 = and i64 %arg.n, 3, !dbg !19
%1 = icmp ult i64 %0, 3, !dbg !19
br i1 %1, label %B48.unr-lcssa, label %for.end.preheader.new, !dbg !19
for.end.preheader.new: ; preds = %for.end.preheader
%unroll_iter42 = and i64 %arg.n, -4, !dbg !19
br label %for.end, !dbg !19
B48.unr-lcssa: ; preds = %for.end, %for.end.preheader
%.254.lcssa.ph = phi double [ undef, %for.end.preheader ], [ %.254.3, %for.end ]
%acc.3.030.unr = phi double [ 0.000000e+00, %for.end.preheader ], [ %.254.3, %for.end ]
%.57.028.unr = phi i64 [ 0, %for.end.preheader ], [ %.136.3, %for.end ]
%lcmp.mod40.not = icmp eq i64 %xtraiter39, 0, !dbg !19
br i1 %lcmp.mod40.not, label %B48, label %for.end.epil.preheader, !dbg !19
for.end.epil.preheader: ; preds = %B48.unr-lcssa
br label %for.end.epil, !dbg !19
for.end.epil: ; preds = %for.end.epil.preheader, %for.end.epil
%acc.3.030.epil = phi double [ %.254.epil, %for.end.epil ], [ %acc.3.030.unr, %for.end.epil.preheader ]
%.57.028.epil = phi i64 [ %.136.epil, %for.end.epil ], [ %.57.028.unr, %for.end.epil.preheader ]
%epil.iter = phi i64 [ %epil.iter.sub, %for.end.epil ], [ %xtraiter39, %for.end.epil.preheader ]
call void @llvm.dbg.value(metadata double %acc.3.030.epil, metadata !12, metadata !DIExpression()), !dbg !9
%.136.epil = add nuw nsw i64 %.57.028.epil, 1, !dbg !19
%.201.le.epil = sitofp i64 %.57.028.epil to double, !dbg !20
%.202.le.epil = tail call fast double @sqrt(double %.201.le.epil), !dbg !20
%.230.le.epil = tail call fast double @llvm.sin.f64(double %.201.le.epil), !dbg !20
call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
%.244.epil = fmul fast double %.202.le.epil, %.230.le.epil, !dbg !20
%.254.epil = fadd fast double %.244.epil, %acc.3.030.epil, !dbg !20
call void @llvm.dbg.value(metadata double %.254.epil, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
%epil.iter.sub = add i64 %epil.iter, -1, !dbg !19
%epil.iter.cmp.not = icmp eq i64 %epil.iter.sub, 0, !dbg !19
br i1 %epil.iter.cmp.not, label %B48, label %for.end.epil, !dbg !19, !llvm.loop !21
B48: ; preds = %for.end.epil, %B48.unr-lcssa
%.254.lcssa = phi double [ %.254.lcssa.ph, %B48.unr-lcssa ], [ %.254.epil, %for.end.epil ], !dbg !20
call void @llvm.dbg.value(metadata double %.254.lcssa, metadata !16, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !7, metadata !DIExpression()), !dbg !9
%.348 = icmp slt i64 %arg.n, 2, !dbg !23
%.294 = add nsw i64 %arg.n, -1, !dbg !23
%spec.select23 = select i1 %.348, i64 0, i64 %.294
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
%.39824 = icmp sgt i64 %spec.select23, 0, !dbg !23
br i1 %.39824, label %B62.preheader, label %B94, !dbg !23
B62.preheader: ; preds = %B48
%2 = icmp eq i64 %spec.select23, 1, !dbg !23
br i1 %2, label %B94.loopexit.unr-lcssa, label %B62.preheader.new, !dbg !23
B62.preheader.new: ; preds = %B62.preheader
%unroll_iter = and i64 %spec.select23, -2, !dbg !23
br label %B62, !dbg !23
B62: ; preds = %B62, %B62.preheader.new
%acc.4.027 = phi double [ %.254.lcssa, %B62.preheader.new ], [ %.542.1, %B62 ]
%.334.025 = phi i64 [ 0, %B62.preheader.new ], [ %.411.1, %B62 ]
call void @llvm.dbg.value(metadata double %acc.4.027, metadata !16, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
%3 = add i64 %.334.025, 1, !dbg !24
%4 = add i64 %.334.025, 2, !dbg !24
%.487.le = sitofp i64 %4 to double, !dbg !24
%.488.le = tail call fast double @sqrt(double %.487.le), !dbg !24
%.517.le = sitofp i64 %.334.025 to double, !dbg !24
%.518.le = tail call fast double @llvm.cos.f64(double %.517.le), !dbg !24
%.532 = fmul fast double %.488.le, %.518.le, !dbg !24
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double undef, metadata !16, metadata !DIExpression()), !dbg !9
%.411.1 = add nuw nsw i64 %.334.025, 2, !dbg !23
%5 = add i64 %.334.025, 3, !dbg !24
%.487.le.1 = sitofp i64 %5 to double, !dbg !24
%.488.le.1 = tail call fast double @sqrt(double %.487.le.1), !dbg !24
%.517.le.1 = sitofp i64 %3 to double, !dbg !24
%.518.le.1 = tail call fast double @llvm.cos.f64(double %.517.le.1), !dbg !24
%.532.1 = fmul fast double %.488.le.1, %.518.le.1, !dbg !24
%6 = fadd fast double %.532, %.532.1, !dbg !24
%.542.1 = fsub fast double %acc.4.027, %6, !dbg !24
call void @llvm.dbg.value(metadata double %.542.1, metadata !16, metadata !DIExpression()), !dbg !9
%niter.ncmp.1 = icmp eq i64 %unroll_iter, %.411.1, !dbg !23
br i1 %niter.ncmp.1, label %B94.loopexit.unr-lcssa, label %B62, !dbg !23
B94.loopexit.unr-lcssa: ; preds = %B62, %B62.preheader
%.542.lcssa.ph = phi double [ undef, %B62.preheader ], [ %.542.1, %B62 ]
%acc.4.027.unr = phi double [ %.254.lcssa, %B62.preheader ], [ %.542.1, %B62 ]
%.334.025.unr = phi i64 [ 0, %B62.preheader ], [ %.411.1, %B62 ]
%7 = and i64 %spec.select23, 1, !dbg !23
%lcmp.mod.not = icmp eq i64 %7, 0, !dbg !23
br i1 %lcmp.mod.not, label %B94, label %B94.loopexit.epilog-lcssa, !dbg !23
B94.loopexit.epilog-lcssa: ; preds = %B94.loopexit.unr-lcssa
call void @llvm.dbg.value(metadata double %acc.4.027.unr, metadata !16, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double undef, metadata !16, metadata !DIExpression()), !dbg !9
%.473.epil = add nuw nsw i64 %.334.025.unr, 2, !dbg !24
%.487.le.epil = sitofp i64 %.473.epil to double, !dbg !24
%.488.le.epil = tail call fast double @sqrt(double %.487.le.epil), !dbg !24
%.517.le.epil = sitofp i64 %.334.025.unr to double, !dbg !24
%.518.le.epil = tail call fast double @llvm.cos.f64(double %.517.le.epil), !dbg !24
%.532.epil = fmul fast double %.488.le.epil, %.518.le.epil, !dbg !24
%.542.epil = fsub fast double %acc.4.027.unr, %.532.epil, !dbg !24
call void @llvm.dbg.value(metadata double %.542.epil, metadata !16, metadata !DIExpression()), !dbg !9
br label %B94, !dbg !25
B94: ; preds = %B94.loopexit.epilog-lcssa, %B94.loopexit.unr-lcssa, %entry, %B48
%acc.4.0.lcssa = phi double [ %.254.lcssa, %B48 ], [ 0.000000e+00, %entry ], [ %.542.lcssa.ph, %B94.loopexit.unr-lcssa ], [ %.542.epil, %B94.loopexit.epilog-lcssa ], !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !13, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !16, metadata !DIExpression()), !dbg !9
store double %acc.4.0.lcssa, double* %retptr, align 8, !dbg !25
ret i32 0, !dbg !25
for.end: ; preds = %for.end, %for.end.preheader.new
%acc.3.030 = phi double [ 0.000000e+00, %for.end.preheader.new ], [ %.254.3, %for.end ]
%.57.028 = phi i64 [ 0, %for.end.preheader.new ], [ %11, %for.end ]
call void @llvm.dbg.value(metadata double %acc.3.030, metadata !12, metadata !DIExpression()), !dbg !9
%.201.le = sitofp i64 %.57.028 to double, !dbg !20
%.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
%.230.le = tail call fast double @llvm.sin.f64(double %.201.le), !dbg !20
call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
%.244 = fmul fast double %.202.le, %.230.le, !dbg !20
%.254 = fadd fast double %.244, %acc.3.030, !dbg !20
call void @llvm.dbg.value(metadata double %.254, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
%8 = add i64 %.57.028, 1, !dbg !20
%.201.le.1 = sitofp i64 %8 to double, !dbg !20
%.202.le.1 = tail call fast double @sqrt(double %.201.le.1), !dbg !20
%.230.le.1 = tail call fast double @llvm.sin.f64(double %.201.le.1), !dbg !20
%.244.1 = fmul fast double %.202.le.1, %.230.le.1, !dbg !20
%.254.1 = fadd fast double %.244.1, %.254, !dbg !20
call void @llvm.dbg.value(metadata double %.254.1, metadata !12, metadata !DIExpression()), !dbg !9
%9 = add i64 %8, 1, !dbg !20
%.201.le.2 = sitofp i64 %9 to double, !dbg !20
%.202.le.2 = tail call fast double @sqrt(double %.201.le.2), !dbg !20
%.230.le.2 = tail call fast double @llvm.sin.f64(double %.201.le.2), !dbg !20
%.244.2 = fmul fast double %.202.le.2, %.230.le.2, !dbg !20
%.254.2 = fadd fast double %.244.2, %.254.1, !dbg !20
call void @llvm.dbg.value(metadata double %.254.2, metadata !12, metadata !DIExpression()), !dbg !9
%.136.3 = add nuw nsw i64 %.57.028, 4, !dbg !19
%10 = add i64 %9, 1, !dbg !20
%.201.le.3 = sitofp i64 %10 to double, !dbg !20
%.202.le.3 = tail call fast double @sqrt(double %.201.le.3), !dbg !20
%.230.le.3 = tail call fast double @llvm.sin.f64(double %.201.le.3), !dbg !20
%.244.3 = fmul fast double %.202.le.3, %.230.le.3, !dbg !20
%.254.3 = fadd fast double %.244.3, %.254.2, !dbg !20
call void @llvm.dbg.value(metadata double %.254.3, metadata !12, metadata !DIExpression()), !dbg !9
%niter43.ncmp.3 = icmp eq i64 %unroll_iter42, %.136.3, !dbg !19
%11 = add i64 %10, 1, !dbg !19
br i1 %niter43.ncmp.3, label %B48.unr-lcssa, label %for.end, !dbg !19
}
; Function Attrs: nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata, metadata, metadata) #1
; Function Attrs: nofree nounwind readonly
declare double @sqrt(double) local_unnamed_addr #2
; Function Attrs: nounwind readnone speculatable willreturn
declare double @llvm.sin.f64(double) #1
; Function Attrs: nounwind readnone speculatable willreturn
declare double @llvm.cos.f64(double) #1
define i8* @"_ZN7cpython8__main__7foo$241Ex"(i8* nocapture readnone %py_closure, i8* %py_args, i8* nocapture readnone %py_kws) local_unnamed_addr {
entry:
%.5 = alloca i8*, align 8
%.6 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.const.foo, i64 0, i64 0), i64 1, i64 1, i8** nonnull %.5)
%.7 = icmp eq i32 %.6, 0
%.36 = alloca double, align 8
store double 0.000000e+00, double* %.36, align 8
br i1 %.7, label %entry.if, label %entry.endif, !prof !26
entry.if: ; preds = %entry.endif.endif.endif, %entry
ret i8* null
entry.endif: ; preds = %entry
%.11 = load i8*, i8** @"_ZN08NumbaEnv8__main__7foo$241Ex", align 8
%.16 = icmp eq i8* %.11, null
br i1 %.16, label %entry.endif.if, label %entry.endif.endif, !prof !26
entry.endif.if: ; preds = %entry.endif
call void @PyErr_SetString(i8* nonnull @PyExc_RuntimeError, i8* getelementptr inbounds ([54 x i8], [54 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__7foo$241Ex", i64 0, i64 0))
ret i8* null
entry.endif.endif: ; preds = %entry.endif
%.20 = load i8*, i8** %.5, align 8
%.23 = call i8* @PyNumber_Long(i8* %.20)
%.24.not = icmp eq i8* %.23, null
br i1 %.24.not, label %entry.endif.endif.endif, label %entry.endif.endif.if, !prof !26
entry.endif.endif.if: ; preds = %entry.endif.endif
%.26 = call i64 @PyLong_AsLongLong(i8* nonnull %.23)
call void @Py_DecRef(i8* nonnull %.23)
br label %entry.endif.endif.endif
entry.endif.endif.endif: ; preds = %entry.endif.endif, %entry.endif.endif.if
%.21.0 = phi i64 [ %.26, %entry.endif.endif.if ], [ 0, %entry.endif.endif ]
%.31 = call i8* @PyErr_Occurred()
%.32.not = icmp eq i8* %.31, null
br i1 %.32.not, label %entry.endif.endif.endif.endif, label %entry.if, !prof !27
entry.endif.endif.endif.endif: ; preds = %entry.endif.endif.endif
store double 0.000000e+00, double* %.36, align 8
%.40 = call i32 @"_ZN8__main__7foo$241Ex"(double* nonnull %.36, { i8*, i32, i8* }** undef, i64 %.21.0) #5
%.50 = load double, double* %.36, align 8
%.55 = call i8* @PyFloat_FromDouble(double %.50)
ret i8* %.55
}
declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...) local_unnamed_addr
declare void @PyErr_SetString(i8*, i8*) local_unnamed_addr
declare i8* @PyNumber_Long(i8*) local_unnamed_addr
declare i64 @PyLong_AsLongLong(i8*) local_unnamed_addr
declare void @Py_DecRef(i8*) local_unnamed_addr
declare i8* @PyErr_Occurred() local_unnamed_addr
declare i8* @PyFloat_FromDouble(double) local_unnamed_addr
; Function Attrs: nofree nounwind
define double @"cfunc._ZN8__main__7foo$241Ex"(i64 %.1) local_unnamed_addr #3 {
entry:
%.3 = alloca double, align 8
store double 0.000000e+00, double* %.3, align 8
%.7 = call i32 @"_ZN8__main__7foo$241Ex"(double* nonnull %.3, { i8*, i32, i8* }** undef, i64 %.1) #5
%.17 = load double, double* %.3, align 8
ret double %.17
}
; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #4
attributes #0 = { nofree noinline nounwind }
attributes #1 = { nounwind readnone speculatable willreturn }
attributes #2 = { nofree nounwind readonly }
attributes #3 = { nofree nounwind }
attributes #4 = { nounwind }
attributes #5 = { noinline }
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2, !3}
!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "Numba", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "<elided>.py", directory: "<elided>")
!2 = !{i32 2, !"Dwarf Version", i32 4}
!3 = !{i32 2, !"Debug Info Version", i32 3}
!4 = distinct !DISubprogram(name: "foo", linkageName: "_ZN8__main__7foo$241Ex", scope: !1, file: !1, line: 7, type: !5, scopeLine: 7, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
!5 = !DISubroutineType(types: !6)
!6 = !{}
!7 = !DILocalVariable(name: "n", scope: !4, file: !1, line: 7, type: !8)
!8 = !DIBasicType(name: "i64", size: 64, encoding: DW_ATE_unsigned)
!9 = !DILocation(line: 0, scope: !4)
!10 = !DILocalVariable(name: "acc", scope: !4, file: !1, line: 9, type: !11)
!11 = !DIBasicType(name: "double", size: 64, encoding: DW_ATE_float)
!12 = !DILocalVariable(name: "acc$3", scope: !4, file: !1, line: 9, type: !11)
!13 = !DILocalVariable(name: "acc2", scope: !4, file: !1, line: 10, type: !11)
!14 = !DILocalVariable(name: "i", scope: !4, file: !1, line: 12, type: !8)
!15 = !DILocalVariable(name: "acc$1", scope: !4, file: !1, line: 13, type: !11)
!16 = !DILocalVariable(name: "acc$4", scope: !4, file: !1, line: 12, type: !11)
!17 = !DILocalVariable(name: "i$1", scope: !4, file: !1, line: 15, type: !8)
!18 = !DILocalVariable(name: "acc$2", scope: !4, file: !1, line: 16, type: !11)
!19 = !DILocation(line: 12, column: 1, scope: !4)
!20 = !DILocation(line: 13, column: 1, scope: !4)
!21 = distinct !{!21, !22}
!22 = !{!"llvm.loop.unroll.disable"}
!23 = !DILocation(line: 15, column: 1, scope: !4)
!24 = !DILocation(line: 16, column: 1, scope: !4)
!25 = !DILocation(line: 18, column: 1, scope: !4)
!26 = !{!"branch_weights", i32 1, i32 99}
!27 = !{!"branch_weights", i32 99, i32 1}
In the above the LV debug has e.g. LV: Found a loop: for.end, if you look in the LLVM IR there’s labels like for.end.preheader: and for.end.epil: these are derived from the original loop label for.end. If you then look at the !dbg markers on the instructions in the associated blocks you’ll see !19, !20 and !21. Referring to the metadata section at the bottom on the LLVM IR, !19 and !20 are DILocations and show that this loop is from Python source lines 12 and 13.
!19 = !DILocation(line: 12, column: 1, scope: !4)
!20 = !DILocation(line: 13, column: 1, scope: !4)
which correspond to:
for i in range(n):
acc += np.sqrt(i) * np.sin(i)
Presenting this sort of thing in an easy to use manner is something that we hope to get working one day!
Hope this helps?!