CUDA - Nvprof error?

I’m very new to nvprof and I’m trying to profile a kernel (more information can be found here GPU function apparently blocking due to data size/complexity).

I get a warning but that doesn’t seem to be the bigger issue.
The very first time I ask for something to be done towards the GPU:

spans_matrix = cuda.to_device(aux_spans_matrix)

apparently results in the error seen below. Can someone tell me if I’m doing something wrong?

==6056== Warning: Child processes are not profiled. Use option --profile-child-processes to profile them.

Traceback (most recent call last):
  File "", line 958, in <module>
    spans_matrix = cuda.to_device(aux_spans_matrix)
  File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\", line 223, in _require_cuda_context
    with _runtime.ensure_context():
  File "C:\...\AppData\Local\Programs\Python\Python38\lib\", line 113, in __enter__
    return next(self.gen)
  File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\", line 121, in ensure_context
    with driver.get_active_context():
  File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\", line 393, in __enter__
  File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\", line 280, in __getattr__
  File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\", line 237, in initialize
  File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\", line 299, in safe_cuda_api_call
    retcode = libfn(*args)
OSError: exception: access violation writing 0x0000000000000024
======== Warning: No CUDA application was profiled, exiting
======== Error: Application returned non-zero code 1

It looks like from the stack trace the code produced an error when trying to initialize CUDA. You might want to try running nvprof as administrator if you weren’t already.

Thanks for the reply.
I’ve tried running as admin in both the command line and through the IDE’s terminal. The result was exactly the same though I’ve noticed it might be necessary to allow access to the GPU performance counters to all users as mentioned here:

That being said, it did work when the argument –profile-child-processes was added and it looks like it was the only issue.