import os
import cv2
import time
import numpy as np
import numba as nb
from numba import njit
from google.colab import drive
drive.mount('/content/gdrive')
@njit('f8:,:', parallel=True, cache=True)
def normalize_mat(depth_src):
depth_min = depth_src.min()
depth_max = depth_src.max()
depth = (depth_src - depth_min) / (depth_max - depth_min)
return depth
def generate_stereo(depth_dir, depth_prefix, filename):
print("=== Start processing:", filename, "===")
depth_src = cv2.imread(os.path.join(depth_dir, depth_prefix + filename + ".jpg"))
if len(depth_src.shape) == 3:
depth_src = cv2.cvtColor(depth_src, cv2.COLOR_BGR2GRAY)
else:
depth_src = depth_src
depth = normalize_mat(depth_src)
depth = np.round(depth*255).astype(int)
cv2.imwrite(os.path.join(depth_dir, "normaized_depth_" + filename + ".jpg"), depth)
def file_processing_im(depth_dir, depth_prefix):
for f in os.listdir(depth_dir):
filename = f.split(".")[0]
generate_stereo(depth_dir, depth_prefix, filename)
def main():
start_time = time.time()
depth_dir = 'gdrive/MyDrive/depth/'
depth_prefix = 'Depth_'
file_processing_im(depth_dir, depth_prefix)
print(time.time() - start_time, "seconds for base generation")
if name == "main":
main()
If you want to convert this code to use CUDA, it’s not clear to me that Numba is the right tool for the job - I’m not familiar with OpenCV, but I understand it has some CUDA functionality already - is that useable for your use case? I wonder if a combination of that and CuPy (to replace the functionality in normalize_mat()
) would be more appropriate.
If you really want to replace the functionality with CUDA-jitted kernels with Numba, then I think the main approach you’ll need will be:
- Replace the
njit
decorators withcuda.jit
. - In
normalize_mat
, you need to implement the max and min calculations and normalization using scalar operations indexed by thread ID. You’ll also need to change it so the output array is passed in, because you won’t be able to create thedepth
array in the function. - In
generate_stereo
, you’ll need to replace the call tocv2.cvtColor()
with a kernel you write yourself that provides the same functionality. You’ll also need to replace the call tonp.round()
with an implementation that operates on scalars. - You’ll need to move the data loading out of
generate_stereo()
, and allocate space on the device and transfer data to it before callinggenerate_stereo()
.
Thank you so much…will try and update