Running multiple instances of the same numba kernel in parallel

Hi,
I have a numba kernel

my_kernel[blockspergrid,threadsperblock](data_d)

Is it possible to execute mutiple instances of ‘my_kernel’ with different blockspergrid,threadsperblock and data_d in parallel. Or is there an alternative method to achieve such parallelism?
Can it be done using numba cuda streams.

Also, if i have function with mutiple numba kernels. How can it be done. Can i be able to run mutiple instances of ‘my_task’ in parallel with python threads. Do python threads work with streams?
For example,

def my_task(stream,data):
    data_d=cuda.to_device(data)
    my_kernel1[bgrids,tblocks,stream](data_d)
    stream.synchronize()
    my_kernel2[bgrids,tblocks,stream](data_d)
    stream.synchronize()
    my_kernel3[bgrids,tblocks,stream](data_d)
    stream.synchronize()
    return data_d.copy_to_host()
stream1=cuda.stream()
stream2=cuda.stream()
t1=Thread(target=my_task,args=(stream1,data1,))
t2=Thread(target=my_task,args=(stream2,data2,))
t1.start()
t2.start()
t1.join()
t2.join()

Will the above code work.

Thanks,