I’m very new to nvprof and I’m trying to profile a kernel (more information can be found here GPU function apparently blocking due to data size/complexity).
I get a warning but that doesn’t seem to be the bigger issue.
The very first time I ask for something to be done towards the GPU:
spans_matrix = cuda.to_device(aux_spans_matrix)
apparently results in the error seen below. Can someone tell me if I’m doing something wrong?
==6056== Warning: Child processes are not profiled. Use option --profile-child-processes to profile them.
Traceback (most recent call last):
File "Segmentation.py", line 958, in <module>
spans_matrix = cuda.to_device(aux_spans_matrix)
File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\devices.py", line 223, in _require_cuda_context
with _runtime.ensure_context():
File "C:\...\AppData\Local\Programs\Python\Python38\lib\contextlib.py", line 113, in __enter__
return next(self.gen)
File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\devices.py", line 121, in ensure_context
with driver.get_active_context():
File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\driver.py", line 393, in __enter__
driver.cuCtxGetCurrent(byref(hctx))
File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\driver.py", line 280, in __getattr__
self.initialize()
File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\driver.py", line 237, in initialize
self.cuInit(0)
File "D:\...\Segmentation\venv\lib\site-packages\numba\cuda\cudadrv\driver.py", line 299, in safe_cuda_api_call
retcode = libfn(*args)
OSError: exception: access violation writing 0x0000000000000024
======== Warning: No CUDA application was profiled, exiting
======== Error: Application returned non-zero code 1