I have a function which will be shared between GPU and CPU (njit) runs.
It contains a math function rcbrt (which is essentially x ** -(1 / 3)).
When running under GPU, we currently implement “impl_rcbrt” and use @lower_builtin to point to __nv_rcbrt.
Now, my question is, how would I skip the whole implementation when I am running the CPU run? I am fine with directly calling a function that return x ** -(1 / 3). Or is there any way I can do a similar implementation for CPU run? (since rcbrt is not in C++ built in math lib, I can’t directly change “__nv_rcbrt” to “rcbrt”. for functions like cbrt, that would work.)
Thanks in advanced.