Please Help, I am new to Numba CUDA programming. My Binary search program not showing any speedup upon increasing number of threads & blocks

Here is my code:

#GPU based Binary Search Kernel Function : To get total count of a search item present in database.

@cuda.jit

``````    tid = cuda.grid(1) # thread id
last = first = -1

low  = (tid * len(dsrcDB) // threadCount)
high =  ((tid+1) * len(dsrcDB) // threadCount) - 1

while low <= high:
# Calculate mid to divide search doamin
mid = low + (high - low) // 2

# if key is found, update the result
if srcitem == dsrcDB[mid][1]:
first = mid
high = mid - 1
# if key is less than the mid element, discard right half
elif srcitem < dsrcDB[mid][1]:
high = mid - 1
# if key is more than the mid element, discard left half
else:
low = mid + 1
# End of first While Loop
if first != -1:
#Reinitialize low & high
low = first
high = ((tid+1) * len(dsrcDB) // threadCount) - 1

while low <= high:
# Calculate mid to divide search doamin
mid = low + (high - low) // 2

# if key is found, update the result
if srcitem == dsrcDB[mid][1]:
last = mid
low = mid + 1
# if key is less than the mid element, discard right half
elif srcitem < dsrcDB[mid][1]:
high = mid - 1
# if key is more than the mid element, discard left half
else:
low = mid + 1
# End of last While Loop

if first != -1 and last != -1:
dsrange[tid] = (last - first + 1)
``````

#Driverâ€™s Code

Note:
My database â€śdsrcDBâ€ť is huge almost 255 MB.
I am logging the time taken by GPU and CPU both.
E.g. To search 55 items,
Time taken by GPU (1 Grid, 1 Block, 8 threads) : 0.04008 sec
Time taken by CPU : 0.25186 sec

Now problem is, when I am increasingly number of threads and blocks to 16,32,64,128,256,512,1024,2048 I am not gaining any visible change in time taken by GPU.

GPU Machine details:
Device 1: â€śGeForce GTX 1080 Tiâ€ť
CUDA Driver Version / Runtime Version 11.0 / 11.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 11178 MBytes (11721506816 bytes)
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores

MemoryInfo(free=10626727936, total=11721506816)
numba version: 0.50.1
NumPy version: 1.18.5
llvmlite version: 0.33.0+1.g022ab0f

Any help appreciated. Thank you!

1 Like

Consider looking at this guide for Markdown to improve the readability of your post and perhaps attract more attention.

1 Like