Implementation idea for parallel Dict

TheTesla · September 21, 2023, 12:42pm

Hello,

I have an idea for a parallel Dict implementation without using those llvm-mutex-lock-stuff. I believe, reading from a Dict can be done in parallel without locking. Only write operations need locks.

The idea is to use additional FIFOs. The Dict is splitted into n subdicts. There are n dict-write-threads handling one subdict per thread. Each subdict handles its own key range. Reading the Dict means, at first
subdict_list[hash(key)%n]. And second reading the selected subdict. There is nothing new here. The write operations need locking the dict, because they can happen on the same dict with a known probability.

The new idea is to add one FIFO system per thread. Writing to Dict does only write to the individual FIFO system. The n worker threads look at the m FIFO systems. If the appropriate key is there, the worker thread take the key-value-pair and transfers its own subdict.

One FIFO system consists of n FIFOs. So there are n-by-m FIFOs. Each worker thread has one corresponding FIFO per program thread.

There is exactly one thread writing to one FIFO or one subdict. No locking is needed. Yes, reading the Dict may give an old value, if it is not already transferedfrom the FIFO to the subdict, but parallel writing should be fast.

Do you think, this may work?

DannyWeitekamp · September 21, 2023, 10:12pm

I don’t know if what you’re thinking would work, but will +1 this as something I personally would find useful. Any thread safe implementation that could afford some concurrence to dict reads/writes would help a lot with some of my work. I’ve looked for something like this in the past and found academic papers on this topic (although I don’t remember much). Seems probably like a well trodden topic if you search around, since there are lots of hash table heavy algorithms that would benefit from this kind of thing (noSQL databases for instance). If you make a numba-ready implementation do share.

TheTesla · October 3, 2023, 9:02pm

I have written a short mockup showing the idea:

github.com

TheTesla/py-par-dict/blob/master/pardict.py

#!/usr/bin/env python3


from threading import Thread
from queue import Queue
from time import sleep

dn = 2
sn = 2

q_mat = [[Queue() for s in range(sn)] for d in range(dn)]

o_vec = [{} for d in range(dn)]

def get(k):
    return o_vec[hash(k)%len(o_vec)][k]

def set_d(k, v):
    o_vec[hash(k)%dn][k] = v

This file has been truncated. show original

The shows, how it should work in principle. My problem is now, I lag some knowledge to make it numba ready.

TheTesla · November 26, 2023, 1:42pm

(post deleted by author)

TheTesla · November 26, 2023, 3:57pm

It works now!

Here it is:

github.com

TheTesla/py-par-dict/blob/776269d79a9ce4e44bfa065c211d25847b778057/testthreadid.py

from numba import njit, prange
from numba.typed import List, Dict
import numba as nb
import numpy as np

@njit(parallel=True)
def par(x):
    print(nb.get_num_threads())
    print(nb.get_thread_id())


    # initialize parDict
    n = nb.get_num_threads()
    fifo_cap = 1024
    fifos_k = np.zeros((n,n,fifo_cap),dtype=np.int64)
    fifos_v = np.zeros((n,n,fifo_cap),dtype=np.float64)
    fifo_idx = np.zeros((n,n,2),dtype=np.int64)

    # This variant crashes:
    # d = [{0:0.}] * n

This file has been truncated. show original

Now, we need to find out, if there are remaining problems, maybe FiFo overflow, that needs blocking write. We also need the read operation, which is trivial and native interfacing, that we can use it like a normal numba dict.

TheTesla · November 27, 2023, 6:19am

There is an important issue: Some items are missing in the target dicts. This maybe due to rescheduling the of the operations in parallel context, but I don’t know exactly.

TheTesla · March 17, 2024, 9:55pm

This works now. I added a postponed synchronization, processes all remaining data from the fifos:

github.com

TheTesla/py-par-dict/blob/712270ec4d79dbd0bb96b0c28d0f59637d3dcfe7/testthreadid.py

from numba import njit, prange
from numba.typed import List, Dict
import numba as nb
import numpy as np
import time

@njit(parallel=True)
def par(x):
    print(nb.get_num_threads())
    print(nb.get_thread_id())


    # initialize parDict
    n = nb.get_num_threads()
    fifo_cap = 2**20
    fifos_k = np.zeros((n,n,fifo_cap),dtype=np.int64)
    fifos_v = np.zeros((n,n,fifo_cap),dtype=np.float64)
    fifo_idx = np.zeros((n,n,2),dtype=np.int64)

This file has been truncated. show original

The next step is to create a class for the dict.

TheTesla · April 1, 2024, 5:22pm

Now, I made an object based version. The parallel dictionary object is represented by a state tuple. You can access it by the 4 par_dict_() member function defined here:

github.com

TheTesla/py-par-dict/blob/master/pardictimpl.py

#!/usr/bin/env python3
import numpy as np
import numba as nb
from numba import njit, prange

@njit(parallel=True)
def new_par_dict(key_type, val_type, nothrds=4, fifo_size=1024):
    keys = np.zeros((nothrds,nothrds,fifo_size),dtype=key_type)
    vals = np.zeros((nothrds,nothrds,fifo_size),dtype=val_type)
    cmds = np.zeros((nothrds,nothrds,fifo_size),dtype=np.int64)
    rd_idx = np.zeros((nothrds,nothrds),dtype=np.int64)
    wr_idx = np.zeros((nothrds,nothrds),dtype=np.int64)
    dicts = [nb.typed.Dict.empty(key_type=key_type, value_type=val_type) \
            for i in range(nothrds)]
    return (keys,vals,cmds,rd_idx,wr_idx,fifo_size,nothrds,dicts)

@njit
def par_dict_setitem(state, key, val, thrd_id=None):
    if thrd_id is None:
        thrd_id = nb.get_thread_id()

This file has been truncated. show original

Topic		Replies	Views
Weird parallel prange behaviour Community Support	14	2130	July 22, 2020
Parallel, prange and (fixed length) lists Support: How do I do ...?	2	61	April 15, 2024
How do I parallelize this code? Support: How do I do ...?	13	1325	December 20, 2021
Advice in parallelizing Support: How do I do ...?	2	1328	September 5, 2022
[RFC] Language design for a new back-end to automatically offload data-parallel kernels Development	4	661	August 11, 2020

Implementation idea for parallel Dict

Related Topics