I’m not completely sure either @pauljurczak ,
Your version temporarily creates a float64 literal (42) and casts it to float32 before multiplication. While this cast doesn’t affect the actual computation (which is done in float32, as seen in the LLVM IR via the fmul instruction), it does add a bit of overhead during compilation and clutters the IR.
@nb.njit(["f4(f4)"], fastmath=True)
def f(a):
mult = np.float32(42)
return mult * a
# f.inspect_types()
# f.inspect_llvm()
Here are the infered types for f():
...
# $const34.3.1 = const(int, 42) :: Literal[int](42)
# mult = call $14load_attr.1($const34.3.1, func=$14load_attr.1, args=[Var($const34.3.1, 3285166149.py:6)], kws=(), vararg=None, varkwarg=None, target=None) :: (int64,) -> float32
# del $const34.3.1
# del $14load_attr.1
mult = np.float32(42)
# --- LINE 7 ---
# $binop_mul50.7 = mult * a :: float32
# del mult
# del a
# $54return_value.8 = cast(value=$binop_mul50.7) :: float32
# del $binop_mul50.7
# return $54return_value.8
return mult * a
If you want to avoid that implicit cast and get a cleaner IR, you can initialize the constant as a float32 directly via the locals dictionary:
@nb.njit(["f4(f4)"], locals={"mult": nb.f4}, fastmath=True)
def g(a):
mult = 42
return mult*a
# g.inspect_types()
# g.inspect_llvm()
Here are the infered types for g():
def g(a):
# --- LINE 13 ---
# mult = const(int, 42) :: float32
mult = 42
# --- LINE 14 ---
# $binop_mul12.3 = mult * a :: float32
# del mult
# del a
# $16return_value.4 = cast(value=$binop_mul12.3) :: float32
# del $binop_mul12.3
# return $16return_value.4
return mult*a
The result is a cleaner LLVM IR and slightly faster compile times, but at runtime both versions could perform similarly since the computation itself stays in float32.