Dynamic, heterogeneous dictionaries with tuple keys and non-constant tuple slicing

I originally posted this as a reply to this thread due to overlap but have been advised to create a new thread to avoid mixing issues.

I am currently trying to njit an algorithm for processing Delaunay triangulations, with the following features:

  1. The inputs are an array X and a homogeneous list simplices of tuples of integers. The shape of X and the length of each tuple are arbitrary, but what is always true is that X.shape[1] equals the length of each tuple in simplices plus 1. Call this the “dimension”, dim. The tuples may be assumed to be sorted.
  2. In pure Python, I would have the algorithm work with and return a dictionary filtration with keys all integers from 1 to dim (included). To key i there would correspond another dictionary whose keys are all the sub-tuples of length i + 1 of the tuples in simplices, and whose values are floats. Example: filtration = {1: {(0, 1): 0., (0, 2): 0., (1, 2): 0.}, 2: {(0, 1, 2): 0.}} for dim equal to 2. Notice that filtration is inhomogeneous in values (and homogeneous in keys), but each filtration[i] is homogeneous (in both keys and values).
  3. In the inner workings of the algorithm, and again assuming pure Python, each filtration[i] is constructed iteratively from filtration[i + 1]. In particular, one would want to write loops as follows:
# sigma is a tuple in filtration[i + 1].keys()
for k in range(i + 1):
    tau = sigma[:i] + sigma[i + 1:]
    # And do stuff with tau

Now I have at least two problems trying to njit this algorithm. One comes from point 2, because I would like to have an inhomogeneous dictionary like filtration. The second comes from 3, because numba is not happy with non-constant slicing. This seems to be just the same as @ziofil’s situation so I wonder if the recommendation is still to use tuple_setitem. As @ziofil already asked, I wonder if unify tuple creation intrinsics by ARF1 · Pull Request #5169 · numba/numba · GitHub is related.

Here is dummy reproducing code (I was not able to upload a .py file here) that should contain all salient issues. I have marked all “danger bits” for the purpose of njitting with a commented out exclamation mark.

import numpy as np
from scipy import spatial


def f(X, simplices):
    """
    Parameters
    ===========
    X : N x D array
        Array of N Euclidean vectors in D dimensions

    simplices : n x (D + 1) array
        np.int32 (or np.int64??) array of indices, such as returned by
        scipy.spatial.Delaunay
    """
    D = simplices.shape[1] - 1  # Top dimension
    filtration = {D: {}, D - 1: {}}  # !

    # Special iteration for highest dimensional simplices, but it's almost the
    # same as one of the iterations of the for loop in dim, below (for dim = D)
    for sigma in simplices:
        sigma_tup = tuple(sigma)  # !
        filtration[D][sigma_tup] = np.sum(np.abs(X[sigma_tup, :]))  # !
        for i in range(D + 1):
            tau = sigma_tup[:i] + sigma_tup[i + 1:]  # !
            if np.random.random() > 0.5:
                filtration[D - 1][tau] = filtration[D][sigma_tup]
            else:
                filtration[D - 1][tau] = np.nan

    for dim in range(D - 1, 0, -1):
        filtration[dim - 1] = {}  # !
        for sigma in filtration[dim]:
            if np.isnan(filtration[dim][sigma]):
                filtration[dim][sigma] = np.sum(np.abs(X[sigma, :]))  # !
            for i in range(dim + 1):
                tau = sigma[:i] + sigma[i + 1:]  # !
                if np.random.random() > 0.5:
                    filtration[dim - 1][tau] = filtration[dim][sigma]
                else:
                    filtration[dim - 1][tau] = np.nan

    return filtration


X = np.random.random((100, 2))

simplices = np.sort(spatial.Delaunay(X).simplices, axis=1)

f(X, simplices)

Thanks in advance for your help! Happy to provide more context if anything is unclear.

Forgot to mention: Some of the suggestions by @stuartarchibald in the related post are relevant here. In particular, drop_elements should solve my problem 3. I was also trying to use the TupleKeyDict class defined there to solve my other problems: one idea would be to have filtration be an int: TupleKeyDict dictionary so as to get around the heterogeneity problem. But I have not been able to make something as simple as this work:

@njit
def foo():
    d = {}
    for i in range(2):
        # Create a dictionary to wrap
        wrapped_dictionary = Dict.empty(types.intp, types.float64)
        # wrap it
        d[i] = TupleKeyDict(wrapped_dictionary)
        d[i][(1, 2)] = 4.
    return d
    
foo()

hi @ulupo , my first question would be: how fast do you need this to run? Using dictionaries will never get you the fastest speed.
I think that with some ingenuity this dictionary-based version could be jitted, but you have to be aware that the speed will be at least 10x slower than if you re-write it to use arrays.

Luk

Hi @luk-f-a, and thanks for the reply! Sure, speed is important. And I have been able to verify (by writing the code out explicitly in an njittable fashion for the special case D=2) that I can expect one or more orders of magnitude improvement in runtime (for the real code, not the dummy version). I cannot easily think of a way of getting rid of dictionaries as they are perfectly adapted to the description of the algorithm, but I would be very curious to hear alternatives!

I feel I have made some progress by using the TupleKeyDict StructRef and the drop_elements function defined by @stuartarchibald in the related post. My code now looks like

import numpy as np
from scipy import spatial

from numba.typed import Dict, List
from numba import types


def f(X, simplices):
    """
    Parameters
    ===========
    X : N x D array
        Array of N Euclidean vectors in D dimensions

    simplices : n x (D + 1) array
        np.int32 (or np.int64??) array of indices, such as returned by
        scipy.spatial.Delaunay
    """
    D = X.shape[1]  # Top dimension
    filtration = {D: TupleKeyDict(Dict.empty(types.intp, types.float64)),
                  D - 1: TupleKeyDict(Dict.empty(types.intp, types.float64))}

    # Special iteration for highest dimensional simplices, but it's almost the
    # same as one of the iterations of the for loop in dim, below (for dim = D)
    for sigma in simplices:
        filtration[D][sigma] = np.sum(np.abs(X[np.asarray(sigma)]))
        for x in drop_elements(sigma):
            tau = x[1]
            if np.random.random() > 0.5:
                filtration[D - 1][tau] = filtration[D][sigma]
            else:
                filtration[D - 1][tau] = np.nan

    for dim in range(D - 1, 0, -1):
        filtration[dim - 1] = TupleKeyDict(Dict.empty(types.intp, types.float64))
        for sigma in filtration[dim]:
            if np.isnan(filtration[dim][sigma]):
                filtration[dim][sigma] = np.sum(np.abs(X[np.asarray(sigma)]))
            for x in drop_elements(sigma):
                tau = x[1]
                if np.random.random() > 0.5:
                    filtration[dim - 1][tau] = filtration[dim][sigma]
                else:
                    filtration[dim - 1][tau] = np.nan

    return filtration


X = np.random.random((100, 2))

simplices = List([tuple(simplex) for simplex in np.sort(spatial.Delaunay(X).simplices, axis=1)])

f(X, simplices)

Currently, this does not compile because looping over the keys of a TupleKeyDict and the contains function are not supported. Not only do I need to loop over the keys, I need to loop over them as tuples and not hashed integers.

Another question to @stuartarchibald: is the idea used in the overloaded setitem method of TupleKeyDict safe to collisions? It is not obvious to me that one could not pass two different tuples which hash to the same integer and replace a key-value pair instead of adding a new one.

Depending how sparsely or not you are populating that dictionary, you could use an array or an sparse array. After all a dictionary that maps (i, j)->value behaves like an 2d array. An dictionary that maps (i, j, k)->value is a 3d array a so on. It would not be necessarily easy to re-write your function in that style. But if you still need more speed even after managing to jit the dictionary version, that’s one option.

Are sparse data structures supported by numba at present? I could not find anything in the docs.

Pydata sparse has a COO array that works in numba, although it’s a barebones implementation with only basic functionality (sparse/numba_extension.py at master · pydata/sparse · GitHub).
Are your factors sparse?

Thanks for the pointers @luk-f-a. I think I need to provide more context. The real algorithm of which I am showing a dummy version takes as input a point cloud X in some Euclidean space and a triangulation of that point cloud. Now if X lives in D dimensions then each “triangle” in the triangulation is a D-dimensional simplex, i.e. a set of D + 1 indices of points in X. Thus simplices in my code is a stacking of such D + 1 integer-valued vectors. If we imagined populating a sparse D + 1 dimensional array (with some nonzero values) only at the positions indicated by simplices, the resulting array would be extremely sparse generically.

For what it’s worth, I would not have expected a reimplementation of the algorithm using sparse structures but without numba to really be faster. Aren’t Python dictionaries quite efficient at querying whether a key is stored and accessing the corresponding value? This happens a lot in the algorithm described above and I fear that with COO arrays it would only be worse.

yes, in pure python dictionaries are the best choice for this type of sparse storage, the code is clear and they are the fastest choice in many cases. In jitted code, arrays would be faster if you could find a way to map your sparse set to a dense vector. That’s not easy, of course, and for some problems it’s not possible. It really depends on how you need access them. It seems that you need to retrieve them under very specific conditions (you create a tuple with one missing element), so it might not or not be possible to achieve.

Even retaining the use of dictionaries for the inner storage, I would consider using a list instead of a dictionary for the outer layer. If I’m reading the code right, filtration will always have D+1 elements, and these elements are indexed by the integer numbers 0 to D. That’ll work well with a list, and the access time to the i-th element of a list is faster than the i-th element of a dictionary.

Another thing to consider, which might work in your case is that you might be able to avoid shortening the tuples (which causes the type problem in numba) by keeping the same length but adding a fake value in the dimension you were planning to remove. For example, if your tuples only take positive integers, then instead of tau = sigma[:i] + sigma[i + 1:] you could do tau = sigma[:i] + (0,) +sigma[i + 1:]. Then all tuples would have the same length, which makes some problems easier.

Luk

Thanks for this suggestion. I have now implemented this! Unfortunately, it is comparable or slightly slower than the pure Python version. Maybe all the overhead due to the additions to achieve constant length. Memory-wise it is also more inconvenient. Note that I was able to code an ad-hoc version for D=2, with one dictionary per dimension (thus avoiding heterogeneous dictionaries or lists) and gained a factor of more than 5 in performance (for the real code).

interesting, I’ll keep in mind for the future that joining tuples is expensive. sorry about wasting time on that.

did the list idea work?

how large can D get with your real data?

Please forget what I said above! Making all tuples fixed size really worked! I recover the factor of 5 improvement indeed. In the rush of the thing I had forgotten to actually njit the code. Fantastic, thanks for your help. I really appreciate the time and care taken, and apologies for making you waste some along the way.

Ideally one would be less inconvenient to the memory and avoid this trick somehow. I am thinking that maybe recursion is the way forward. Since the algorithm is naturally recursive, I should be able to only ever deal with two dictionaries at a time, an input one and an output one.

D won’t get larger than 10 realistically, and overwhelmingly it would be either 2 or 3 (for geometric problems).

Thanks again! I will let you know how I get on with recursion.

@luk-f-a @stuartarchibald unbelievably to me, I seem to have managed with an iterative approach! But exactly why I managed is a bit mysterious to me. Below is dummy working code:

import numpy as np
from numba import from_dtype
from numba import njit
from numba import types
from numba.cpython.unsafe.tuple import tuple_setitem
from numba.typed import Dict
from scipy import spatial


@njit
def _drop_elements(tup: tuple):
    """Stuart's helper function"""
    for x in range(len(tup)):
        empty = tup[:-1]  # Not empty, but the right size and will be mutated
        idx = 0
        for i in range(len(tup)):
            if i != x:
                empty = tuple_setitem(empty, idx, tup[i])
                idx += 1
        yield tup[x], empty


@njit
def f_recursive(X, filtration_current):
    """
    Parameters
    ===========
    X : N x D array
        Array of N Euclidean vectors in D dimensions

    filtration_current : dictionary for the current iteration
    """
    filtration_lower = {}

    if len(next(iter(filtration_current))) == X.shape[1] + 1:
        # Special version for highest dimensional simplices, but it's almost the
        # same as one of the iterations of the for loop in dim, below (for dim = D)
        for sigma in filtration_current:
            filtration_current[sigma] = np.sum(np.abs(X[np.asarray(sigma)]))
            for x in _drop_elements(sigma):
                tau = x[1]
                if np.random.random() > 0.5:
                    filtration_lower[tau] = filtration_current[sigma]
                else:
                    filtration_lower[tau] = np.nan

    else:
        for sigma in filtration_current:
            if np.isnan(filtration_current[sigma]):
                filtration_current[sigma] = np.sum(np.abs(X[np.asarray(sigma)]))
            for x in _drop_elements(sigma):
                tau = x[1]
                if np.random.random() > 0.5:
                    filtration_lower[tau] = filtration_current[sigma]
                else:
                    filtration_lower[tau] = np.nan

    return filtration_lower


# CODE USAGE

X = np.random.random((100, 2))

simplices = np.sort(spatial.Delaunay(X).simplices, axis=1)
int_dtype = from_dtype(simplices.dtype)


X = X.astype(np.float64)
D = X.shape[1]  # Top dimension

# Need to create dummy NaN dictionaries for Numba's type inference to
# work.
filtration_current = Dict.empty(types.UniTuple(int_dtype, D + 1),
                                types.float64)
for simplex in np.sort(simplices):
    filtration_current[tuple(simplex)] = np.nan

simplices = []
for dim in range(D, 0, -1):
    filtration_lower = f_recursive(X, filtration_current)
    simplices.extend((
        (sigma, filtration_current[sigma])
        for sigma in filtration_current
    ))
    filtration_current = filtration_lower

In benchmarking, I notice that while I do indeed get awesome performance from being able to njit, there is a strange performance hit coming from the genexpr in the fourth-to-last line (it would be the same with a list comprehension), which has to loop over the keys of the TypedDict output by f_recursive. It seems to me that a vanilla python dict would be faster there. Any suggestions?

Furthermore, I get warnings from apparent downcastings of the dict key tuples from int64 to int32. This is a bit mysterious to me as I thought I was being quite diligent specifying the dtype of the tuples as being the same as the dtype of simplices (which happens to be int32).

Thanks again for all the help! I would have probably given up much sooner without your support.

it seems that the last part (the dim loop with the simplices .extend) is not being jitted. shouldn’t it?

Thanks for the quick reply! The last part would produce a heterogeneous Python list so I guessed it won’t njit. At the moment the API of my project requires such lists as outputs. Between 30% and 50% of the whole execution time (in the real program, depending on data size) is spent doing this silly unpacking!

Oh, I see.

could you use the version where all tuples are of full length D (and therefore all the dictionaries are of the same type and can be put in the same list)?

Right. Yes, I could. My main reason for moving to this iterative implementation was wanting to avoid the memory overheads. You see, if e.g. D=3 there will be a lot more simplices in dimension 2 than in dimension 3, so the fixed length approach becomes a memory burden. Of course maybe in practice that won’t matter (I don’t really know that yet), but I was also motivated by the desire to look for a more readable solution with fewer (though well-motivated) “hacks”.

But let me test that solution for speed versus this one anyway.

@luk-f-a indeed this improves speed as I can unpack inside the jitted function to return a list. I have not tried to push towards places where memory might become an issue.