In an old issue, it was made possible to define guvectorized function with no output signature. This works on v0.55.1 for target='cpu' and target='parallel'.
However, with target='cuda', it seems to be hardcoded that there is one (only only 1) output variable.
Is this the intended behavior of guvectorized for CUDA kernels?