Numba in docker

I have a data pipeline in docker, I want to use some script that utilizes Numba’s cuda jit.

Now the base image for the docker image is FROM ubuntu:20.04 and I don’t use a conda environment on docker. The python version is 3.8 and is installed in docker by:
RUN apt-get update && apt-get install -y --no-install-recommends build-essential python3.8 python3-pip python3-setuptools python3-dev bc
For installing numba I added RUN pip3 install numba==0.56.4. This builds the image perfectly well, but when I use cuda jit functionality within the docker, I get the following error (basically it is asking me to install cudatoolkit, it recommends conda install).

I tried a couple of things to no avail: a) tried to search if cudatoolkit has pip install (no luck), b) use nvidia image FROM nvidia/cuda:12.3.1-devel-ubuntu20.04 (ran into some issues building the image).
Anyone suggest how to add numba’s cuda capability in my docker image?

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/cudadrv/nvvm.py", line 126, in __new__
    inst.driver = open_cudalib('nvvm')
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/cudadrv/libs.py", line 60, in open_cudalib
    return ctypes.CDLL(path)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvvm.so: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/local-raid10/workspace/jtrivedi/read_analysis/test_run_13/read-analysis/HIFSEQ/script_folder/numbatagger/runner.py", line 69, in run
    core_run(lig_path, rt_path, r1_path, r2_path, r1_out_path, r2_out_path, processnum, copy_to_cuda, error_lig, error_rt)
  File "/mnt/local-raid10/workspace/jtrivedi/read_analysis/test_run_13/read-analysis/HIFSEQ/script_folder/numbatagger/runner.py", line 54, in core_run
    ligation_aligner[blockspergrid, threadsperblock](lig_aligner_result, seq_bin_embed, lig_bin_embed, lig_bin_name, lig_original_len_bin, error_lig, error_rt)
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/dispatcher.py", line 491, in __call__
    return self.dispatcher.call(args, self.griddim, self.blockdim,
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/dispatcher.py", line 625, in call
    kernel = _dispatcher.Dispatcher._cuda_call(self, *args)
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/dispatcher.py", line 633, in _compile_for_args
    return self.compile(tuple(argtypes))
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/dispatcher.py", line 794, in compile
    kernel = _Kernel(self.py_func, argtypes, **self.targetoptions)
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/dispatcher.py", line 75, in __init__
    cres = compile_cuda(self.py_func, types.void, self.argtypes,
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/compiler.py", line 212, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler.py", line 716, in compile_extra
    return pipeline.compile_extra(func)
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler.py", line 452, in compile_extra
    return self._compile_bytecode()
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler.py", line 520, in _compile_bytecode
    return self._compile_core()
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler.py", line 499, in _compile_core
    raise e
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler.py", line 486, in _compile_core
    pm.run(self.state)
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler_machinery.py", line 368, in run
    raise patched_exception
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler_machinery.py", line 356, in run
    self._runPass(idx, pass_inst, state)
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler_machinery.py", line 311, in _runPass
    mutated |= check(pss.run_pass, internal_state)
  File "/usr/local/lib/python3.8/dist-packages/numba/core/compiler_machinery.py", line 273, in check
   mangled = func(compiler_state)
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/compiler.py", line 123, in run_pass
    if NVVM().is_nvvm70:
  File "/usr/local/lib/python3.8/dist-packages/numba/cuda/cudadrv/nvvm.py", line 131, in __new__
    raise NvvmSupportError(errmsg % e)
numba.cuda.cudadrv.error.NvvmSupportError: libNVVM cannot be found. Do `conda install cudatoolkit`:
libnvvm.so: cannot open shared object file: No such file or directory

Using the nvidia image (e.g. as per your suggestion, nvidia/cuda:12.3.1-devel-ubuntu20.04) would be a preferable route. What issues did you run into?

Although, I haven’t tried it more rigorously, however immediate problems were the rest of my docker file gave out issues while building the image when I replaced the base image from vanilla ubuntu:20.04 to nvidia/cuda:12.3.1-devel-ubuntu20.04. I can try to debug and make the rest of the code compliant to the new base image, but was hoping to find a way where numba installation is compliant to the rest of my code (i.e. I am just able to install numba and cudatoolkit without necessarily changing any other piece of code). Is there a way? If not, I will just revert after I retry changing the base image workaround more rigorously soon.