How to use numpy.sum on uint32 -> uint32?

I have a 4d array of uint32 and I want to sum on axis 0 - i.e. return a 3d array with the fist axis gone.
I also want to keep the result as a uint32, but i get “No definition for lowering”

I can sum and return a (default) int64 array, but trying to accumulate to a uint32 fails.

The docs imply I should be able to use sum with dtype, but in reality It fails.
I’m running numba 0.54.0 (on Fedora 34)

code and further notes below…

import numpy as np
from numba import njit

testarr = np.zeros((4,60,50,3), dtype='uint32')


@njit
def test1(inarr):
    return np.sum(testarr, axis = 0)

@njit
def test2(inarr):
    return np.sum(testarr, axis=0, dtype='uint32')

res = test1(testarr)
print(res.dtype)

res2 = test2(testarr)
print(res2.dtype)

The larger problem is I am building fine grained histograms of large images of rgb data. I want to parallelize building the histogram and I am slicing the array into (say) 6 big slices and creating an array of 6 histograms so I can use prange to calculate slices in parallel, and finally sum the slice histograms into a final result. I want to keep the resultant histogram to uint32 to reduce memory size and bandwidth requirements