Interestingly, the LLVM assembly code (after optimzation) is not all that different between fct_1
and fct_1_redundant
.
Only two blocks are significantly different.
For fct_1
:
B77: ; preds = %lookup.body.if, %lookup.body.1.if, %for.body.if, %for.body.if.1, %for.body.if.2, %for.body.1.if, %for.body.1.if.1, %for.body.1.if.2, %lookup.end.2.endif
%.95 = icmp ugt i64 %.40.0128, 1
br i1 %.95, label %B10.if, label %B76
B10.if: ; preds = %entry, %B77
%.40.0128 = phi i64 [ 1000, %entry ], [ %.104, %B77 ]
%.104 = add nsw i64 %.40.0128, -1
%sunkaddr181 = getelementptr i8, i8* %arg.set_1.0, i64 24
%1 = bitcast i8* %sunkaddr181 to { i64, i64, i64, i64, i8, { i64, i64 } }**
%.6.i94 = load { i64, i64, i64, i64, i8, { i64, i64 } }*, { i64, i64, i64, i64, i8, { i64, i64 } }** %1, align 8
%.149 = getelementptr inbounds { i64, i64, i64, i64, i8, { i64, i64 } }, { i64, i64, i64, i64, i8, { i64, i64 } }* %.6.i94, i64 0, i32 5
%.176 = getelementptr inbounds { i64, i64, i64, i64, i8, { i64, i64 } }, { i64, i64, i64, i64, i8, { i64, i64 } }* %.6.i94, i64 0, i32 2
%.177 = load i64, i64* %.176, align 8
%.181 = and i64 %.177, 2
%.190 = getelementptr { i64, i64 }, { i64, i64 }* %.149, i64 %.181, i32 0
%.191 = load i64, i64* %.190, align 8
switch i64 %.191, label %for.body.endif.endif [
i64 2, label %for.body.if
i64 -1, label %B30
]
vs. for fct_1_redundant
:
B101: ; preds = %lookup.body.if, %lookup.body.1.if, %for.body.if, %for.body.if.1, %for.body.if.2, %for.body.1.if, %for.body.1.if.1, %for.body.1.if.2, %lookup.end.2.endif
%fct_3_set_1.sroa.0.1 = phi i8* [ null, %lookup.end.2.endif ], [ %arg.set_1.0, %for.body.1.if.2 ], [ %arg.set_1.0, %for.body.1.if.1 ], [ %arg.set_1.0, %for.body.1.if ], [ %arg.set_1.0, %for.body.if.2 ], [ %arg.set_1.0, %for.body.if.1 ], [ %arg.set_1.0, %for.body.if ], [ %arg.set_1.0, %lookup.body.1.if ], [ %arg.set_1.0, %lookup.body.if ]
%fct_3_set_2.sroa.0.1 = phi i8* [ null, %lookup.end.2.endif ], [ %arg.set_2.0, %for.body.1.if.2 ], [ %arg.set_2.0, %for.body.1.if.1 ], [ %arg.set_2.0, %for.body.1.if ], [ %arg.set_2.0, %for.body.if.2 ], [ %arg.set_2.0, %for.body.if.1 ], [ %arg.set_2.0, %for.body.if ], [ %arg.set_2.0, %lookup.body.1.if ], [ %arg.set_2.0, %lookup.body.if ]
tail call void @NRT_decref(i8* %fct_3_set_2.sroa.0.1)
tail call void @NRT_decref(i8* %fct_3_set_1.sroa.0.1)
%.95 = icmp ugt i64 %.40.0128, 1
br i1 %.95, label %B10.if, label %B100
B10.if: ; preds = %entry, %B101
%.40.0128 = phi i64 [ 1000, %entry ], [ %.104, %B101 ]
%.104 = add nsw i64 %.40.0128, -1
tail call void @NRT_incref(i8* %arg.set_1.0)
tail call void @NRT_incref(i8* %arg.set_2.0)
%sunkaddr181 = getelementptr i8, i8* %arg.set_1.0, i64 24
%1 = bitcast i8* %sunkaddr181 to { i64, i64, i64, i64, i8, { i64, i64 } }**
%.6.i94 = load { i64, i64, i64, i64, i8, { i64, i64 } }*, { i64, i64, i64, i64, i8, { i64, i64 } }** %1, align 8
%.165 = getelementptr inbounds { i64, i64, i64, i64, i8, { i64, i64 } }, { i64, i64, i64, i64, i8, { i64, i64 } }* %.6.i94, i64 0, i32 5
%.192 = getelementptr inbounds { i64, i64, i64, i64, i8, { i64, i64 } }, { i64, i64, i64, i64, i8, { i64, i64 } }* %.6.i94, i64 0, i32 2
%.193 = load i64, i64* %.192, align 8
%.197 = and i64 %.193, 2
%.206 = getelementptr { i64, i64 }, { i64, i64 }* %.165, i64 %.197, i32 0
%.207 = load i64, i64* %.206, align 8
switch i64 %.207, label %for.body.endif.endif [
i64 2, label %for.body.if
i64 -1, label %B54
]
The LLVM assembly code contains two additional NRT_incref and NRT_decref.
Furthermore, it contains the phi Expressions for %fct_3_set_1.sroa.0.1 and %fct_3_set_2.sroa.0.1.
Interestingly, neither %fct_3_set_1.sroa.0.1
nor %fct_3_set_2.sroa.0.1
are used anywhere in the LLVM assembly code except for the NRT_decref directly afterwards. Furthermore, there is no matching NRT_incref.
There might be something weird going on?