Support of CUDA gufunc with no output

In an old issue, it was made possible to define guvectorized function with no output signature. This works on v0.55.1 for target='cpu' and target='parallel'.

However, with target='cuda', it seems to be hardcoded that there is one (only only 1) output variable.

Is this the intended behavior of guvectorized for CUDA kernels?