Yes, but it’s a bit involved…
Setting debug=True
in the @jit
decorators means that DWARF
info is emitted and this links the Python source to the generated LLVM through DILocation
entries.
Example:
from llvmlite import binding as llvm
llvm.set_option('','--debug-only=loop-vectorize')
from numba import njit
import numpy as np
@njit(fastmath=True, debug=True)
def foo(n):
acc = 0.0
acc2 = 0.0
for i in range(n):
acc += np.sqrt(i) * np.sin(i)
for i in range(n - 1):
acc -= np.sqrt(i + 2) * np.cos(i)
return acc + acc2
foo(1)
print(foo.inspect_llvm(foo.signatures[0]))
This produces two sets out output.
- the loop vectorize debug info (LV debug).
LV: Checking a loop in "_ZN8__main__7foo$241Ex" from <elided>.py:12:1
LV: Loop hints: force=? width=0 unroll=0
LV: Found a loop: for.end
LV: Found an induction variable.
LV: We can vectorize this loop!
LV: Found trip count: 0
LV: The Smallest and Widest types: 64 / 64 bits.
LV: The Widest register safe to use is: 128 bits.
LV: Found uniform instruction: %exitcond33.not = icmp eq i64 %.136, %arg.n, !dbg !19
LV: Scalarizing: %.201.le = sitofp i64 %.57.028 to double, !dbg !20
LV: Scalarizing: %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
LV: Scalarizing: %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
LV: Scalarizing: %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{1\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
N0 [label =
"for.end:\n" +
"WIDEN-PHI %acc.3.030 = phi %.254, 0.000000e+00\l" +
"WIDEN-INDUCTION %.57.028 = phi %.136, 0\l" +
"CLONE %.201.le = sitofp %.57.028\l" +
"WIDEN-CALL %.202.le = call %.201.le, @sqrt\l" +
"WIDEN-CALL %.230.le = call %.201.le, @llvm.sin.f64\l" +
"CLONE %.244 = fmul %.202.le, %.230.le\l" +
"CLONE %.254 = fadd %.244, %acc.3.030\l"
]
}
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{2\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
N0 [label =
"for.end:\n" +
"WIDEN-PHI %acc.3.030 = phi %.254, 0.000000e+00\l" +
"WIDEN-INDUCTION %.57.028 = phi %.136, 0\l" +
"WIDEN\l"" %.201.le = sitofp %.57.028\l" +
"REPLICATE %.202.le = call %.201.le, @sqrt\l" +
"WIDEN-CALL %.230.le = call %.201.le, @llvm.sin.f64\l" +
"WIDEN\l"" %.244 = fmul %.202.le, %.230.le\l" +
"WIDEN\l"" %.254 = fadd %.244, %acc.3.030\l"
]
}
LV: Found an estimated cost of 0 for VF 1 For instruction: %acc.3.030 = phi double [ %.254, %for.end ], [ 0.000000e+00, %for.end.preheader ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %.57.028 = phi i64 [ %.136, %for.end ], [ 0, %for.end.preheader ]
LV: Found an estimated cost of 1 for VF 1 For instruction: %.136 = add nuw nsw i64 %.57.028, 1, !dbg !19
LV: Found an estimated cost of 1 for VF 1 For instruction: %.201.le = sitofp i64 %.57.028 to double, !dbg !20
LV: Found an estimated cost of 10 for VF 1 For instruction: %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
LV: Found an estimated cost of 10 for VF 1 For instruction: %.230.le = tail call fast double @llvm.sin.f64(double %.201.le), !dbg !20
LV: Found an estimated cost of 2 for VF 1 For instruction: %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
LV: Found an estimated cost of 2 for VF 1 For instruction: %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond33.not = icmp eq i64 %.136, %arg.n, !dbg !19
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond33.not, label %B48, label %for.end, !dbg !19
LV: Scalar loop costs: 27.
LV: Found an estimated cost of 0 for VF 2 For instruction: %acc.3.030 = phi double [ %.254, %for.end ], [ 0.000000e+00, %for.end.preheader ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %.57.028 = phi i64 [ %.136, %for.end ], [ 0, %for.end.preheader ]
LV: Found an estimated cost of 1 for VF 2 For instruction: %.136 = add nuw nsw i64 %.57.028, 1, !dbg !19
LV: Found an estimated cost of 20 for VF 2 For instruction: %.201.le = sitofp i64 %.57.028 to double, !dbg !20
LV: Found an estimated cost of 22 for VF 2 For instruction: %.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
LV: Found an estimated cost of 22 for VF 2 For instruction: %.230.le = tail call fast double @llvm.sin.f64(double %.201.le), !dbg !20
LV: Found an estimated cost of 2 for VF 2 For instruction: %.244 = fmul fast double %.202.le, %.230.le, !dbg !20
LV: Found an estimated cost of 2 for VF 2 For instruction: %.254 = fadd fast double %.244, %acc.3.030, !dbg !20
LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond33.not = icmp eq i64 %.136, %arg.n, !dbg !19
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond33.not, label %B48, label %for.end, !dbg !19
LV: Vector loop of width 2 costs: 35.
LV: Selecting VF: 1.
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 3
LV(REG): At #4 Interval # 3
LV(REG): At #5 Interval # 4
LV(REG): At #6 Interval # 4
LV(REG): At #7 Interval # 3
LV(REG): At #8 Interval # 2
LV(REG): VF = 1
LV(REG): Found max usage: 1 item
LV(REG): RegisterClass: Generic::ScalarRC, 4 registers
LV(REG): Found invariant usage: 0 item
LV: The target has 16 registers of Generic::ScalarRC register class
LV: Loop cost is 27
LV: Not Interleaving.
LV: Vectorization is possible but not beneficial.
LV: Interleaving is not beneficial.
LV: Checking a loop in "_ZN8__main__7foo$241Ex" from <elided>.py:15:1
LV: Loop hints: force=? width=0 unroll=0
LV: Found a loop: B62
LV: Found an induction variable.
LV: We can vectorize this loop!
LV: Found trip count: 0
LV: The Smallest and Widest types: 64 / 64 bits.
LV: The Widest register safe to use is: 128 bits.
LV: Found uniform instruction: %exitcond.not = icmp eq i64 %.411, %spec.select23, !dbg !21
LV: Scalarizing: %.473 = add nuw nsw i64 %.334.025, 2, !dbg !22
LV: Scalarizing: %.487.le = sitofp i64 %.473 to double, !dbg !22
LV: Scalarizing: %.517.le = sitofp i64 %.334.025 to double, !dbg !22
LV: Scalarizing: %.532 = fmul fast double %.488.le, %.518.le, !dbg !22
LV: Scalarizing: %.542 = fsub fast double %acc.4.027, %.532, !dbg !22
LV: Scalarizing: %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !22
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{1\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
N0 [label =
"B62:\n" +
"WIDEN-PHI %acc.4.027 = phi %.542, %.254.lcssa\l" +
"WIDEN-INDUCTION %.334.025 = phi %.411, 0\l" +
"CLONE %.473 = add %.334.025, 2\l" +
"CLONE %.487.le = sitofp %.473\l" +
"WIDEN-CALL %.488.le = call %.487.le, @sqrt\l" +
"CLONE %.517.le = sitofp %.334.025\l" +
"WIDEN-CALL %.518.le = call %.517.le, @llvm.cos.f64\l" +
"CLONE %.532 = fmul %.488.le, %.518.le\l" +
"CLONE %.542 = fsub %acc.4.027, %.532\l"
]
}
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for VF=\{2\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
N0 [label =
"B62:\n" +
"WIDEN-PHI %acc.4.027 = phi %.542, %.254.lcssa\l" +
"WIDEN-INDUCTION %.334.025 = phi %.411, 0\l" +
"WIDEN\l"" %.473 = add %.334.025, 2\l" +
"WIDEN\l"" %.487.le = sitofp %.473\l" +
"REPLICATE %.488.le = call %.487.le, @sqrt\l" +
"WIDEN\l"" %.517.le = sitofp %.334.025\l" +
"WIDEN-CALL %.518.le = call %.517.le, @llvm.cos.f64\l" +
"WIDEN\l"" %.532 = fmul %.488.le, %.518.le\l" +
"WIDEN\l"" %.542 = fsub %acc.4.027, %.532\l"
]
}
LV: Found an estimated cost of 0 for VF 1 For instruction: %acc.4.027 = phi double [ %.542, %B62 ], [ %.254.lcssa, %B62.preheader ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %.334.025 = phi i64 [ %.411, %B62 ], [ 0, %B62.preheader ]
LV: Found an estimated cost of 1 for VF 1 For instruction: %.411 = add nuw nsw i64 %.334.025, 1, !dbg !21
LV: Found an estimated cost of 1 for VF 1 For instruction: %.473 = add nuw nsw i64 %.334.025, 2, !dbg !22
LV: Found an estimated cost of 1 for VF 1 For instruction: %.487.le = sitofp i64 %.473 to double, !dbg !22
LV: Found an estimated cost of 10 for VF 1 For instruction: %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !22
LV: Found an estimated cost of 1 for VF 1 For instruction: %.517.le = sitofp i64 %.334.025 to double, !dbg !22
LV: Found an estimated cost of 10 for VF 1 For instruction: %.518.le = tail call fast double @llvm.cos.f64(double %.517.le), !dbg !22
LV: Found an estimated cost of 2 for VF 1 For instruction: %.532 = fmul fast double %.488.le, %.518.le, !dbg !22
LV: Found an estimated cost of 2 for VF 1 For instruction: %.542 = fsub fast double %acc.4.027, %.532, !dbg !22
LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond.not = icmp eq i64 %.411, %spec.select23, !dbg !21
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond.not, label %B94.loopexit, label %B62, !dbg !21
LV: Scalar loop costs: 29.
LV: Found an estimated cost of 0 for VF 2 For instruction: %acc.4.027 = phi double [ %.542, %B62 ], [ %.254.lcssa, %B62.preheader ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %.334.025 = phi i64 [ %.411, %B62 ], [ 0, %B62.preheader ]
LV: Found an estimated cost of 1 for VF 2 For instruction: %.411 = add nuw nsw i64 %.334.025, 1, !dbg !21
LV: Found an estimated cost of 1 for VF 2 For instruction: %.473 = add nuw nsw i64 %.334.025, 2, !dbg !22
LV: Found an estimated cost of 20 for VF 2 For instruction: %.487.le = sitofp i64 %.473 to double, !dbg !22
LV: Found an estimated cost of 22 for VF 2 For instruction: %.488.le = tail call fast double @sqrt(double %.487.le), !dbg !22
LV: Found an estimated cost of 20 for VF 2 For instruction: %.517.le = sitofp i64 %.334.025 to double, !dbg !22
LV: Found an estimated cost of 22 for VF 2 For instruction: %.518.le = tail call fast double @llvm.cos.f64(double %.517.le), !dbg !22
LV: Found an estimated cost of 2 for VF 2 For instruction: %.532 = fmul fast double %.488.le, %.518.le, !dbg !22
LV: Found an estimated cost of 2 for VF 2 For instruction: %.542 = fsub fast double %acc.4.027, %.532, !dbg !22
LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond.not = icmp eq i64 %.411, %spec.select23, !dbg !21
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond.not, label %B94.loopexit, label %B62, !dbg !21
LV: Vector loop of width 2 costs: 45.
LV: Selecting VF: 1.
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 3
LV(REG): At #4 Interval # 4
LV(REG): At #5 Interval # 4
LV(REG): At #6 Interval # 4
LV(REG): At #7 Interval # 4
LV(REG): At #8 Interval # 4
LV(REG): At #9 Interval # 3
LV(REG): At #10 Interval # 2
LV(REG): VF = 1
LV(REG): Found max usage: 1 item
LV(REG): RegisterClass: Generic::ScalarRC, 4 registers
LV(REG): Found invariant usage: 1 item
LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
LV: The target has 16 registers of Generic::ScalarRC register class
LV: Loop cost is 29
LV: Not Interleaving.
LV: Vectorization is possible but not beneficial.
LV: Interleaving is not beneficial.
- the LLVM IR from the module that Numba generated.
; ModuleID = 'foo'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@"_ZN08NumbaEnv8__main__7foo$241Ex" = common local_unnamed_addr global i8* null
@.const.foo = internal constant [4 x i8] c"foo\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__7foo$241Ex" = internal constant [54 x i8] c"missing Environment: _ZN08NumbaEnv8__main__7foo$241Ex\00"
; Function Attrs: nofree noinline nounwind
define i32 @"_ZN8__main__7foo$241Ex"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture readnone %excinfo, i64 %arg.n) local_unnamed_addr #0 !dbg !4 {
entry:
call void @llvm.dbg.value(metadata i64 0, metadata !7, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !10, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !13, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !16, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 %arg.n, metadata !7, metadata !DIExpression()), !dbg !9
%.71.inv = icmp sgt i64 %arg.n, 0
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
br i1 %.71.inv, label %for.end.preheader, label %B94, !dbg !19
for.end.preheader: ; preds = %entry
%0 = add i64 %arg.n, -1, !dbg !19
%xtraiter39 = and i64 %arg.n, 3, !dbg !19
%1 = icmp ult i64 %0, 3, !dbg !19
br i1 %1, label %B48.unr-lcssa, label %for.end.preheader.new, !dbg !19
for.end.preheader.new: ; preds = %for.end.preheader
%unroll_iter42 = and i64 %arg.n, -4, !dbg !19
br label %for.end, !dbg !19
B48.unr-lcssa: ; preds = %for.end, %for.end.preheader
%.254.lcssa.ph = phi double [ undef, %for.end.preheader ], [ %.254.3, %for.end ]
%acc.3.030.unr = phi double [ 0.000000e+00, %for.end.preheader ], [ %.254.3, %for.end ]
%.57.028.unr = phi i64 [ 0, %for.end.preheader ], [ %.136.3, %for.end ]
%lcmp.mod40.not = icmp eq i64 %xtraiter39, 0, !dbg !19
br i1 %lcmp.mod40.not, label %B48, label %for.end.epil.preheader, !dbg !19
for.end.epil.preheader: ; preds = %B48.unr-lcssa
br label %for.end.epil, !dbg !19
for.end.epil: ; preds = %for.end.epil.preheader, %for.end.epil
%acc.3.030.epil = phi double [ %.254.epil, %for.end.epil ], [ %acc.3.030.unr, %for.end.epil.preheader ]
%.57.028.epil = phi i64 [ %.136.epil, %for.end.epil ], [ %.57.028.unr, %for.end.epil.preheader ]
%epil.iter = phi i64 [ %epil.iter.sub, %for.end.epil ], [ %xtraiter39, %for.end.epil.preheader ]
call void @llvm.dbg.value(metadata double %acc.3.030.epil, metadata !12, metadata !DIExpression()), !dbg !9
%.136.epil = add nuw nsw i64 %.57.028.epil, 1, !dbg !19
%.201.le.epil = sitofp i64 %.57.028.epil to double, !dbg !20
%.202.le.epil = tail call fast double @sqrt(double %.201.le.epil), !dbg !20
%.230.le.epil = tail call fast double @llvm.sin.f64(double %.201.le.epil), !dbg !20
call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
%.244.epil = fmul fast double %.202.le.epil, %.230.le.epil, !dbg !20
%.254.epil = fadd fast double %.244.epil, %acc.3.030.epil, !dbg !20
call void @llvm.dbg.value(metadata double %.254.epil, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
%epil.iter.sub = add i64 %epil.iter, -1, !dbg !19
%epil.iter.cmp.not = icmp eq i64 %epil.iter.sub, 0, !dbg !19
br i1 %epil.iter.cmp.not, label %B48, label %for.end.epil, !dbg !19, !llvm.loop !21
B48: ; preds = %for.end.epil, %B48.unr-lcssa
%.254.lcssa = phi double [ %.254.lcssa.ph, %B48.unr-lcssa ], [ %.254.epil, %for.end.epil ], !dbg !20
call void @llvm.dbg.value(metadata double %.254.lcssa, metadata !16, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !7, metadata !DIExpression()), !dbg !9
%.348 = icmp slt i64 %arg.n, 2, !dbg !23
%.294 = add nsw i64 %arg.n, -1, !dbg !23
%spec.select23 = select i1 %.348, i64 0, i64 %.294
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
%.39824 = icmp sgt i64 %spec.select23, 0, !dbg !23
br i1 %.39824, label %B62.preheader, label %B94, !dbg !23
B62.preheader: ; preds = %B48
%2 = icmp eq i64 %spec.select23, 1, !dbg !23
br i1 %2, label %B94.loopexit.unr-lcssa, label %B62.preheader.new, !dbg !23
B62.preheader.new: ; preds = %B62.preheader
%unroll_iter = and i64 %spec.select23, -2, !dbg !23
br label %B62, !dbg !23
B62: ; preds = %B62, %B62.preheader.new
%acc.4.027 = phi double [ %.254.lcssa, %B62.preheader.new ], [ %.542.1, %B62 ]
%.334.025 = phi i64 [ 0, %B62.preheader.new ], [ %.411.1, %B62 ]
call void @llvm.dbg.value(metadata double %acc.4.027, metadata !16, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
%3 = add i64 %.334.025, 1, !dbg !24
%4 = add i64 %.334.025, 2, !dbg !24
%.487.le = sitofp i64 %4 to double, !dbg !24
%.488.le = tail call fast double @sqrt(double %.487.le), !dbg !24
%.517.le = sitofp i64 %.334.025 to double, !dbg !24
%.518.le = tail call fast double @llvm.cos.f64(double %.517.le), !dbg !24
%.532 = fmul fast double %.488.le, %.518.le, !dbg !24
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double undef, metadata !16, metadata !DIExpression()), !dbg !9
%.411.1 = add nuw nsw i64 %.334.025, 2, !dbg !23
%5 = add i64 %.334.025, 3, !dbg !24
%.487.le.1 = sitofp i64 %5 to double, !dbg !24
%.488.le.1 = tail call fast double @sqrt(double %.487.le.1), !dbg !24
%.517.le.1 = sitofp i64 %3 to double, !dbg !24
%.518.le.1 = tail call fast double @llvm.cos.f64(double %.517.le.1), !dbg !24
%.532.1 = fmul fast double %.488.le.1, %.518.le.1, !dbg !24
%6 = fadd fast double %.532, %.532.1, !dbg !24
%.542.1 = fsub fast double %acc.4.027, %6, !dbg !24
call void @llvm.dbg.value(metadata double %.542.1, metadata !16, metadata !DIExpression()), !dbg !9
%niter.ncmp.1 = icmp eq i64 %unroll_iter, %.411.1, !dbg !23
br i1 %niter.ncmp.1, label %B94.loopexit.unr-lcssa, label %B62, !dbg !23
B94.loopexit.unr-lcssa: ; preds = %B62, %B62.preheader
%.542.lcssa.ph = phi double [ undef, %B62.preheader ], [ %.542.1, %B62 ]
%acc.4.027.unr = phi double [ %.254.lcssa, %B62.preheader ], [ %.542.1, %B62 ]
%.334.025.unr = phi i64 [ 0, %B62.preheader ], [ %.411.1, %B62 ]
%7 = and i64 %spec.select23, 1, !dbg !23
%lcmp.mod.not = icmp eq i64 %7, 0, !dbg !23
br i1 %lcmp.mod.not, label %B94, label %B94.loopexit.epilog-lcssa, !dbg !23
B94.loopexit.epilog-lcssa: ; preds = %B94.loopexit.unr-lcssa
call void @llvm.dbg.value(metadata double %acc.4.027.unr, metadata !16, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata i64 0, metadata !17, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !18, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double undef, metadata !16, metadata !DIExpression()), !dbg !9
%.473.epil = add nuw nsw i64 %.334.025.unr, 2, !dbg !24
%.487.le.epil = sitofp i64 %.473.epil to double, !dbg !24
%.488.le.epil = tail call fast double @sqrt(double %.487.le.epil), !dbg !24
%.517.le.epil = sitofp i64 %.334.025.unr to double, !dbg !24
%.518.le.epil = tail call fast double @llvm.cos.f64(double %.517.le.epil), !dbg !24
%.532.epil = fmul fast double %.488.le.epil, %.518.le.epil, !dbg !24
%.542.epil = fsub fast double %acc.4.027.unr, %.532.epil, !dbg !24
call void @llvm.dbg.value(metadata double %.542.epil, metadata !16, metadata !DIExpression()), !dbg !9
br label %B94, !dbg !25
B94: ; preds = %B94.loopexit.epilog-lcssa, %B94.loopexit.unr-lcssa, %entry, %B48
%acc.4.0.lcssa = phi double [ %.254.lcssa, %B48 ], [ 0.000000e+00, %entry ], [ %.542.lcssa.ph, %B94.loopexit.unr-lcssa ], [ %.542.epil, %B94.loopexit.epilog-lcssa ], !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !13, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !16, metadata !DIExpression()), !dbg !9
store double %acc.4.0.lcssa, double* %retptr, align 8, !dbg !25
ret i32 0, !dbg !25
for.end: ; preds = %for.end, %for.end.preheader.new
%acc.3.030 = phi double [ 0.000000e+00, %for.end.preheader.new ], [ %.254.3, %for.end ]
%.57.028 = phi i64 [ 0, %for.end.preheader.new ], [ %11, %for.end ]
call void @llvm.dbg.value(metadata double %acc.3.030, metadata !12, metadata !DIExpression()), !dbg !9
%.201.le = sitofp i64 %.57.028 to double, !dbg !20
%.202.le = tail call fast double @sqrt(double %.201.le), !dbg !20
%.230.le = tail call fast double @llvm.sin.f64(double %.201.le), !dbg !20
call void @llvm.dbg.value(metadata i64 0, metadata !14, metadata !DIExpression()), !dbg !9
%.244 = fmul fast double %.202.le, %.230.le, !dbg !20
%.254 = fadd fast double %.244, %acc.3.030, !dbg !20
call void @llvm.dbg.value(metadata double %.254, metadata !12, metadata !DIExpression()), !dbg !9
call void @llvm.dbg.value(metadata double 0.000000e+00, metadata !15, metadata !DIExpression()), !dbg !9
%8 = add i64 %.57.028, 1, !dbg !20
%.201.le.1 = sitofp i64 %8 to double, !dbg !20
%.202.le.1 = tail call fast double @sqrt(double %.201.le.1), !dbg !20
%.230.le.1 = tail call fast double @llvm.sin.f64(double %.201.le.1), !dbg !20
%.244.1 = fmul fast double %.202.le.1, %.230.le.1, !dbg !20
%.254.1 = fadd fast double %.244.1, %.254, !dbg !20
call void @llvm.dbg.value(metadata double %.254.1, metadata !12, metadata !DIExpression()), !dbg !9
%9 = add i64 %8, 1, !dbg !20
%.201.le.2 = sitofp i64 %9 to double, !dbg !20
%.202.le.2 = tail call fast double @sqrt(double %.201.le.2), !dbg !20
%.230.le.2 = tail call fast double @llvm.sin.f64(double %.201.le.2), !dbg !20
%.244.2 = fmul fast double %.202.le.2, %.230.le.2, !dbg !20
%.254.2 = fadd fast double %.244.2, %.254.1, !dbg !20
call void @llvm.dbg.value(metadata double %.254.2, metadata !12, metadata !DIExpression()), !dbg !9
%.136.3 = add nuw nsw i64 %.57.028, 4, !dbg !19
%10 = add i64 %9, 1, !dbg !20
%.201.le.3 = sitofp i64 %10 to double, !dbg !20
%.202.le.3 = tail call fast double @sqrt(double %.201.le.3), !dbg !20
%.230.le.3 = tail call fast double @llvm.sin.f64(double %.201.le.3), !dbg !20
%.244.3 = fmul fast double %.202.le.3, %.230.le.3, !dbg !20
%.254.3 = fadd fast double %.244.3, %.254.2, !dbg !20
call void @llvm.dbg.value(metadata double %.254.3, metadata !12, metadata !DIExpression()), !dbg !9
%niter43.ncmp.3 = icmp eq i64 %unroll_iter42, %.136.3, !dbg !19
%11 = add i64 %10, 1, !dbg !19
br i1 %niter43.ncmp.3, label %B48.unr-lcssa, label %for.end, !dbg !19
}
; Function Attrs: nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata, metadata, metadata) #1
; Function Attrs: nofree nounwind readonly
declare double @sqrt(double) local_unnamed_addr #2
; Function Attrs: nounwind readnone speculatable willreturn
declare double @llvm.sin.f64(double) #1
; Function Attrs: nounwind readnone speculatable willreturn
declare double @llvm.cos.f64(double) #1
define i8* @"_ZN7cpython8__main__7foo$241Ex"(i8* nocapture readnone %py_closure, i8* %py_args, i8* nocapture readnone %py_kws) local_unnamed_addr {
entry:
%.5 = alloca i8*, align 8
%.6 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.const.foo, i64 0, i64 0), i64 1, i64 1, i8** nonnull %.5)
%.7 = icmp eq i32 %.6, 0
%.36 = alloca double, align 8
store double 0.000000e+00, double* %.36, align 8
br i1 %.7, label %entry.if, label %entry.endif, !prof !26
entry.if: ; preds = %entry.endif.endif.endif, %entry
ret i8* null
entry.endif: ; preds = %entry
%.11 = load i8*, i8** @"_ZN08NumbaEnv8__main__7foo$241Ex", align 8
%.16 = icmp eq i8* %.11, null
br i1 %.16, label %entry.endif.if, label %entry.endif.endif, !prof !26
entry.endif.if: ; preds = %entry.endif
call void @PyErr_SetString(i8* nonnull @PyExc_RuntimeError, i8* getelementptr inbounds ([54 x i8], [54 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__7foo$241Ex", i64 0, i64 0))
ret i8* null
entry.endif.endif: ; preds = %entry.endif
%.20 = load i8*, i8** %.5, align 8
%.23 = call i8* @PyNumber_Long(i8* %.20)
%.24.not = icmp eq i8* %.23, null
br i1 %.24.not, label %entry.endif.endif.endif, label %entry.endif.endif.if, !prof !26
entry.endif.endif.if: ; preds = %entry.endif.endif
%.26 = call i64 @PyLong_AsLongLong(i8* nonnull %.23)
call void @Py_DecRef(i8* nonnull %.23)
br label %entry.endif.endif.endif
entry.endif.endif.endif: ; preds = %entry.endif.endif, %entry.endif.endif.if
%.21.0 = phi i64 [ %.26, %entry.endif.endif.if ], [ 0, %entry.endif.endif ]
%.31 = call i8* @PyErr_Occurred()
%.32.not = icmp eq i8* %.31, null
br i1 %.32.not, label %entry.endif.endif.endif.endif, label %entry.if, !prof !27
entry.endif.endif.endif.endif: ; preds = %entry.endif.endif.endif
store double 0.000000e+00, double* %.36, align 8
%.40 = call i32 @"_ZN8__main__7foo$241Ex"(double* nonnull %.36, { i8*, i32, i8* }** undef, i64 %.21.0) #5
%.50 = load double, double* %.36, align 8
%.55 = call i8* @PyFloat_FromDouble(double %.50)
ret i8* %.55
}
declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...) local_unnamed_addr
declare void @PyErr_SetString(i8*, i8*) local_unnamed_addr
declare i8* @PyNumber_Long(i8*) local_unnamed_addr
declare i64 @PyLong_AsLongLong(i8*) local_unnamed_addr
declare void @Py_DecRef(i8*) local_unnamed_addr
declare i8* @PyErr_Occurred() local_unnamed_addr
declare i8* @PyFloat_FromDouble(double) local_unnamed_addr
; Function Attrs: nofree nounwind
define double @"cfunc._ZN8__main__7foo$241Ex"(i64 %.1) local_unnamed_addr #3 {
entry:
%.3 = alloca double, align 8
store double 0.000000e+00, double* %.3, align 8
%.7 = call i32 @"_ZN8__main__7foo$241Ex"(double* nonnull %.3, { i8*, i32, i8* }** undef, i64 %.1) #5
%.17 = load double, double* %.3, align 8
ret double %.17
}
; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #4
attributes #0 = { nofree noinline nounwind }
attributes #1 = { nounwind readnone speculatable willreturn }
attributes #2 = { nofree nounwind readonly }
attributes #3 = { nofree nounwind }
attributes #4 = { nounwind }
attributes #5 = { noinline }
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2, !3}
!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "Numba", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "<elided>.py", directory: "<elided>")
!2 = !{i32 2, !"Dwarf Version", i32 4}
!3 = !{i32 2, !"Debug Info Version", i32 3}
!4 = distinct !DISubprogram(name: "foo", linkageName: "_ZN8__main__7foo$241Ex", scope: !1, file: !1, line: 7, type: !5, scopeLine: 7, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
!5 = !DISubroutineType(types: !6)
!6 = !{}
!7 = !DILocalVariable(name: "n", scope: !4, file: !1, line: 7, type: !8)
!8 = !DIBasicType(name: "i64", size: 64, encoding: DW_ATE_unsigned)
!9 = !DILocation(line: 0, scope: !4)
!10 = !DILocalVariable(name: "acc", scope: !4, file: !1, line: 9, type: !11)
!11 = !DIBasicType(name: "double", size: 64, encoding: DW_ATE_float)
!12 = !DILocalVariable(name: "acc$3", scope: !4, file: !1, line: 9, type: !11)
!13 = !DILocalVariable(name: "acc2", scope: !4, file: !1, line: 10, type: !11)
!14 = !DILocalVariable(name: "i", scope: !4, file: !1, line: 12, type: !8)
!15 = !DILocalVariable(name: "acc$1", scope: !4, file: !1, line: 13, type: !11)
!16 = !DILocalVariable(name: "acc$4", scope: !4, file: !1, line: 12, type: !11)
!17 = !DILocalVariable(name: "i$1", scope: !4, file: !1, line: 15, type: !8)
!18 = !DILocalVariable(name: "acc$2", scope: !4, file: !1, line: 16, type: !11)
!19 = !DILocation(line: 12, column: 1, scope: !4)
!20 = !DILocation(line: 13, column: 1, scope: !4)
!21 = distinct !{!21, !22}
!22 = !{!"llvm.loop.unroll.disable"}
!23 = !DILocation(line: 15, column: 1, scope: !4)
!24 = !DILocation(line: 16, column: 1, scope: !4)
!25 = !DILocation(line: 18, column: 1, scope: !4)
!26 = !{!"branch_weights", i32 1, i32 99}
!27 = !{!"branch_weights", i32 99, i32 1}
In the above the LV debug has e.g. LV: Found a loop: for.end
, if you look in the LLVM IR there’s labels like for.end.preheader:
and for.end.epil:
these are derived from the original loop label for.end
. If you then look at the !dbg
markers on the instructions in the associated blocks you’ll see !19
, !20
and !21
. Referring to the metadata section at the bottom on the LLVM IR, !19
and !20
are DILocation
s and show that this loop is from Python source lines 12 and 13.
!19 = !DILocation(line: 12, column: 1, scope: !4)
!20 = !DILocation(line: 13, column: 1, scope: !4)
which correspond to:
for i in range(n):
acc += np.sqrt(i) * np.sin(i)
Presenting this sort of thing in an easy to use manner is something that we hope to get working one day!
Hope this helps?!