JIT-ing set_partitions from more-itertools

Hi guys!
I’m trying to use my own modified version of set_partitions (from more-itertools) with numba.

@jit(nopython=True)
def numba_set_partitions_helper(n, k):
    if k == 1:
        yield np.zeros(n, dtype = np.int_)
    elif n == k:
        yield np.arange(n, dtype = np.int_)
    else:
        for p in numba_set_partitions_helper(n - 1, k - 1):
            yield np.append(0, p + 1)
        for p in numba_set_partitions_helper(n - 1, k):
            for i in range(max(p) + 1):
                yield np.append(i, p)

The problem is that when I compile I obtain the following error.

TypingError: Invalid use of getiter with parameters (none)

During: typing of intrinsic-call at /Users/user/Desktop/test.py (12)

What’s going on? Somebody can help me?
Thank you.

Ivan

I found out that (NBEP 6: Typing Recursion — Numba 0+untagged.4124.gd4460fe.dirty documentation)

Recursion support in numba is currently limited to self-recursion with explicit type annotation for the function.

So I added the signature to the decorations line.

@jit('int64[:](int64, int64)', nopython=True)

NOw I obtain onother error.

TypingError: No conversion from none to array(int64, 1d, A) for '$32return_value.9', defined at None

File "test.py", line 8:
def numba_set_partitions_helper(n, k):
    <source elided>
    if k == 1:
        yield np.zeros(n, dtype = np.int_)
        ^

During: typing of assignment at /Users/krono/Desktop/test.py (8)

Where is the conversion from none to array?

Ivan

I find out that return is needed in otder to observe the signature; so I adjusted the function as reported below.

@jit('int64[:](int64, int64)', nopython=True)
def numba_set_partitions_helper(n, k):
    if k == 1:
        yield np.zeros(n, dtype = np.int_)
    elif n == k:
        yield np.arange(n, dtype = np.int_)
    else:
        for p in numba_set_partitions_helper(n - 1, k - 1):
            yield np.append(0, p + 1)
        for p in numba_set_partitions_helper(n - 1, k):
            for i in range(k):
                yield np.append(i, p)
    return np.array([0], dtype=np.int_)

The problem now is that I obtain a segmentation fault.

Fatal Python error: Segmentation fault

I leaved an issue on the github page.
How can I avoid to report the return line?

Ivan

Seems like you’re returning two different types here. Your signature is correct for the return statement but incorrect for the yield statements which in fact are returning generators of int64[:]. Numba dispatchers must always return a single type for a particular set of input types (unlike normal python functions). In my experience generators support in numba isn’t great (not cacheable lots of weird segfaulty edge cases) and I’m not 100% what the signature convention is for them. Typed Lists on the other hand, for instance with type ListType(int64[:]), work like a charm. Perhaps there is a way you could rewrite this by appending to a list, or even better (in terms of efficiency) inserting into a preallocated 2D array (i.e. int64[:,:]).

Actually I don’t need the return statement!
I put it there in order to respect the signature.
How can I modify the signature in order to provide only the generator of int64[:]?
Thanks so much for response!

Ivan

Hey @krono86 ,

I see that you are trying to use Numba for a recursive generator function, but it seems that Numba encounters difficulties when deriving machine code from a dynamic generator function. Numba’s strength lies in cases where data types can be inferred statically.

@DannyWeitekamp’s suggestion to refactor the code or consider an alternative algorithm, such as an iterative solution for set partitions, is a practical approach that should work. One option is to replace the generator with a typed list. Below are some basic examples using typed lists.

from numba import njit, int64
from numba.typed import List


@njit
def func_recursive_nb(n):
    codes = List.empty_list(int64)
    if n == 0:
        codes = [0]
        return codes
    else:
        codes = func_recursive_nb(n - 1)
        new_codes = []
        for code in codes[::-1]:
            new_codes.append(2**(n-1) + code)
        return codes + new_codes

@njit
def func_iterative_nb(n):
    codes = List.empty_list(int64)
    codes = [0]
    for i in range(n):
        new_codes = []
        for code in codes[::-1]:
            new_codes.append(2**i + code)
        codes += new_codes
    return codes

@njit
def func_generator_iterative_nb(n):
    codes = List.empty_list(int64)
    codes = [0]
    yield 0
    for i in range(n):
        for code in codes[::-1]:
            yield 2**i + code
        codes += [2**i + code for code in codes[::-1]]

@njit
def func_generator_recursive_nb(n):
    """Remark: Result is different by design; Code will fail in nopython mode."""
    if n == 0:
        yield 0
    else:
        yield from func_recursive_nb(n - 1)
        for code in func_recursive_nb(n - 1):
            yield 2**(n-1) + code


func_recursive = func_recursive_nb.py_func
func_iterative = func_iterative_nb.py_func
func_generator_iterative = func_generator_iterative_nb.py_func
func_generator_recursive = func_generator_recursive_nb.py_func

print('recursive:')
print(func_recursive(3))
print(func_recursive_nb(3))
print('iterative:')
print(func_iterative(3))
print(func_iterative_nb(3))
print('generator (iterative):')
print(list(func_generator_iterative(3)))
print(list(func_generator_iterative_nb(3)))
print('generator (recursive):')
print(list(func_generator_recursive(3)))
print(list(func_generator_recursive_nb(3)))

# recursive:
# [0, 1, 3, 2, 6, 7, 5, 4]
# [0, 1, 3, 2, 6, 7, 5, 4]
# iterative:
# [0, 1, 3, 2, 6, 7, 5, 4]
# [0, 1, 3, 2, 6, 7, 5, 4]
# generator (iterative):
# [0, 1, 3, 2, 6, 7, 5, 4]
# [0, 1, 3, 2, 6, 7, 5, 4]
# generator (recursive):
# [0, 1, 3, 2, 4, 5, 7, 6]
# UnsupportedError: Use of unsupported opcode (GET_YIELD_FROM_ITER) found

Thanks for reponse!
Can you show me how to get an iterative version of my generator function?
Thanks a lot!

Ivan

Hey @krono86 ,

You could do something like this:

import numpy as np
from numba import njit


@njit(['(int64, int64)'])
def set_partitions_helper(n, k):
    if k == 1:
        return [np.zeros(n, dtype=np.int_)]
    elif n == k:
        return [np.arange(n, dtype=np.int_)]
    else:
        partitions = []
        for p in set_partitions_helper(n - 1, k - 1):
            partitions.append(np.append(0, p + 1).astype(np.int_))
        for p in set_partitions_helper(n - 1, k):
            for i in range(max(p) + 1):
                partitions.append(np.append(i, p).astype(np.int_))
        return partitions

n = 4
k = 3
res = np.vstack(set_partitions_helper(n, k))
print(res)
print(f'length: {len(res)}')

# [[0 1 2 2]
#  [0 1 1 2]
#  [0 2 1 2]
#  [0 0 1 2]
#  [1 0 1 2]
#  [2 0 1 2]]
# length: 6

Please check, if this is working for you. If it does, please give some feedback about the the performance against the original code.
I am not optimistic. I assume appending to a list is just pretty slow.
A faster approach could be to Compute the Stirling number of the second kind and define a 2D-array of int64 instead of a list. Then filling the array step by step.
A better approach could also be using an efficient algorithm like:
Knuth, Donald Ervin. The Art of Computer Programming, volume 4, fascicle 3; generating all combinations and partitions, sec. 7.2.1.5, generating all set partitions, algorithm E, p. 127, ans. 14.
Maybe there is a library written in C which can be utilized.

This is a function. I need a generator in order to avoid memory issues.
I think the only way to obtain a generator function that can be compiled with numba I need to switch to an iterative implementation…

Hey @krono86 ,
If there was a way to use a recursive generator in Numba that would make your task much easier.
I can not help you in that regards.
If it has to be a generator and you need a workaround then you might need another algorithm where data types can be inferred easier. Unfortunately, the algorithms are more complex than recursive ones because you have to pass the complete object state through the iterations.
Here is an implementation (not the original code) for an iterative algorithm described by Michael Orlov in “Efficient Generation of Set Partitions”.
Perhaps you are able to convert it into a jitclass.

class SetPartitions:
    """
    Class for generating and manipulating set partitions.

    Efficient Generation of Set Partitions by Michael Orlov.
    https://www.informatik.uni-ulm.de/ni/Lehre/WS03/DMM/Software/partitions.pdf

    Attributes:
        n (int): The total number of elements in the set.
        k (list): The current partitioning of the set, where each element
            represents the partition of an item.
        m (list): The largest partition number among the items before each
            element in the set.
        p (int, optional): The desired size of the partitions. If None, all
            possible partitions are generated.
        size (int): The size of the current partition.

    Methods:
        next_partition(): Moves to the next partition in the set partitioning.
        previous_partition(): Moves to the previous partition in the set partitioning.
        each_partition(): Iterates over all partitions in the set partitioning.
        count(): Calculates the total number of partitions.
        partition_size(): Calculates the size of the current partition.
        reinitialize(): Resets the partitioning to its initial state.
        pinitialize(): Initializes the partitioning based on the value of p.
        sterling_second(n, k): Calculates the Sterling number of the second kind.

    Example:
        from numpy import vstack
        n = 4
        k = 3
        set_parts = SetPartitions(n, k)
        res = vstack(list(set_parts.each_partition()))
        print(res)
        print(f'length: {len(res)}')
        # [[0 0 1 2]
        #  [0 1 0 2]
        #  [0 1 1 2]
        #  [0 1 2 0]
        #  [0 1 2 1]
        #  [0 1 2 2]]
        # length: 6
    """

    def __init__(self, n: int, p: int = None) -> None:
        """
        Initialize the SetPartitions object.

        Args:
            n (int): The total number of elements in the set.
            p (int, optional): The desired size of the partitions. If None, all
                possible partitions are generated.
        """
        self.n = n
        self.k = [0] * n
        self.m = [0] * n
        self.p = p
        self.pinitialize()
        self.size = self.partition_size()

    def next_partition(self):
        """
        Generate the next partition in the set partitioning.

        Returns:
            List[int] or None: The next partition as a list of integers
                representing the partition of each item.
                If there are no more partitions, returns None.
        """
        if self.p is None:
            for i in range(self.n - 1, 0, -1):
                if self.k[i] <= self.m[i - 1]:
                    self.k[i] += 1
                    self.m[i] = max(self.m[i], self.k[i])
                    j = i + 1
                    while j <= (self.n - 1):
                        self.k[j] = self.k[0]
                        self.m[j] = self.m[i]
                        j += 1
                    self.size = self.partition_size()
                    return self.k.copy()
        else:
            p = self.partition_size()
            for i in range(self.n - 1, 0, -1):
                if self.k[i] < (p - 1) and self.k[i] <= self.m[i - 1]:
                    self.k[i] += 1
                    self.m[i] = max(self.m[i], self.k[i])
                    j = i + 1
                    while j <= (self.n - (p - self.m[i])):
                        self.k[j] = 0
                        self.m[j] = self.m[i]
                        j += 1
                    j = self.n - (p - self.m[i]) + 1
                    while j <= (self.n - 1):
                        self.k[j] = self.m[j] = p - (self.n - j)
                        j += 1
                    self.size = self.partition_size()
                    return self.k.copy()

        self.size = None
        return

    def previous_partition(self):
        """
        Generate the previous partition in the set partitioning.

        Returns:
            List[int] or None: The previous partition as a list of integers
                representing the partition of each item.
                If there are no previous partitions, returns None.
        """
        if self.p is None:
            for i in range(self.n - 1, 0, -1):
                if self.k[i] > self.k[0]:
                    self.k[i] -= 1
                    self.m[i] = self.m[i - 1]
                    j = i + 1
                    while j <= (self.n - 1):
                        self.k[j] = self.m[j] = self.m[i] + j - i
                        j += 1
                    self.size = self.partition_size()
                    return self.k.copy()
        else:
            p = self.partition_size()
            for i in range(self.n - 1, 0, -1):
                if self.k[i] > 0 and (p - self.m[i - 1]) <= (self.n - i):
                    self.k[i] -= 1
                    self.m[i] = self.m[i - 1]
                    j = i + 1
                    while j <= (i + (p - self.m[i]) - 1):
                        self.k[j] = self.m[j] = self.m[i] + j - i
                        j += 1
                    j = i + (p - self.m[i])
                    while j <= (self.n - 1):
                        self.k[j] = self.m[j] = p - 1
                        j += 1
                    self.size = self.partition_size()
                    return self.k.copy()

        self.size = None
        return

    def each_partition(self):
        """
        Iterate over all partitions in the set partitioning.

        Yields:
            List[int]: A partition as a list of integers representing the
                partition of each item.

        Raises:
            ValueError: If the method is called without a block.

        Notes:
            The method yields each partition one at a time. It starts with the
                current partition and
            proceeds to generate the next partitions until all possible
                partitions have been yielded.

            Usage:
                for partition in set_partitions.each_partition():
                    # Do something with the partition
                    ...
        """
        # if not callable:
        #     raise ValueError("Missing block")
        self.reinitialize()
        yield self.k.copy()
        for _ in range(self.count() - 1):
            partition = self.next_partition()
            yield partition
        self.reinitialize()

    def count(self):
        """
        Calculate the total number of partitions.

        Returns:
            int: The total number of partitions.

        Notes:
            The `count` method calculates the total number of partitions that
            can be generated based on the current values of `n` and `p`. The
            result depends on whether `p` is specified or not. If `p` is not
            specified, it sums up the number of Sterling numbers of the second
            kind for each value of `k` from 1 to `n`. If `p` is specified, it
            directly returns the Sterling number of the second kind for `n`
            and `p`.
        """
        if self.p is None:
            sum_val = 0
            for k in range(1, self.n + 1):
                sum_val += self.sterling_second(self.n, k)
            return sum_val
        else:
            return self.sterling_second(self.n, self.p)

    def sterling_second(self, n: int, k: int) -> int:
        """
        Compute the Stirling number of the second kind.

        The Stirling numbers of the second kind count the ways to partition a
        set into non-empty subsets.

        Args:
            n: The total number of elements.
            k: The number of non-empty subsets.

        Returns:
            The Stirling number of the second kind (S(n, k)).
        """
        if n == 0 or k == 0 or k > n:
            return 0
        elif k == 1 or n == k:
            return 1
        else:
            return k * self.sterling_second(n-1, k) + self.sterling_second(n-1, k-1)

    def partition_size(self):
        """Calculate the size of the current partition."""
        return self.m[self.n - 1] - self.m[0] + 1

    def reinitialize(self):
        """
        Reinitialize the state of the set partitions.

        Resets the `k` and `m` arrays to their initial values based on the
        current values of `n` and `p`. Updates the `size` attribute to reflect
        the new partition size.
        """
        self.k = [0] * self.n
        self.m = [0] * self.n
        self.pinitialize()
        self.size = self.partition_size()

    def pinitialize(self):
        """
        Initialize the `k` and `m` arrays based on the value of `p`.

        Sets the elements in the `k` and `m` arrays based on the value of `p`
        and the size of the set partitions.
        The initial values are determined to ensure correct generation of
        partitions when `p` is specified.
        """
        if self.p is not None:
            for i in range(0, self.n - self.p):
                self.k[i] = self.m[i] = 0
            i = self.n - self.p + 1
            while i <= (self.n - 1):
                self.k[i] = self.m[i] = i - (self.n - self.p)
                i += 1

For more efficiency the stirling 2nd order kind table should be pre-computed and used as lookup table.
If SetPartition with fixed partition size is being used as subroutine for multiple partition sizes then the stirling table could be passed to initialize the SetPartition object to avoid repetitive calculations of the stirling numbers.

import numpy as np


class SetPartitions:
    """
    Class for generating and manipulating set partitions.

    Efficient Generation of Set Partitions by Michael Orlov.
    https://www.informatik.uni-ulm.de/ni/Lehre/WS03/DMM/Software/partitions.pdf

    Attributes:
        n (int): The total number of elements in the set.
        k (list): The current partitioning of the set, where each element
            represents the partition of an item.
        m (list): The largest partition number among the items before each
            element in the set.
        p (int, optional): The desired size of the partitions. If None, all
            possible partitions are generated.
        size (int): The size of the current partition.
        stirling_second (np.ndarray[int], optional): Stirling 2nd order kind
            table. If None cuompute table by n, p.

    Methods:
        next_partition(): Moves to the next partition in the set partitioning.
        previous_partition(): Moves to the previous partition in the set partitioning.
        each_partition(): Iterates over all partitions in the set partitioning.
        count(): Calculates the total number of partitions.
        partition_size(): Calculates the size of the current partition.
        reinitialize(): Resets the partitioning to its initial state.
        pinitialize(): Initializes the partitioning based on the value of p.
        compute_stirling2(n, k): Calculates the Sterling number of the second
            kind table.

    Example:
        from numpy import vstack
        n = 4
        k = 3
        set_parts = SetPartitions(n, k)
        res = vstack(list(set_parts.each_partition()))
        print(res)
        print(f'length: {len(res)}')
        # [[0 0 1 2]
        #  [0 1 0 2]
        #  [0 1 1 2]
        #  [0 1 2 0]
        #  [0 1 2 1]
        #  [0 1 2 2]]
        # length: 6
    """

    def __init__(self,
                 n: int,
                 p: int = None,
                 sterling_second: np.ndarray = None) -> None:
        """
        Initialize the SetPartitions object.

        Args:
            n (int): The total number of elements in the set.
            p (int, optional): The desired size of the partitions. If None, all
                possible partitions are generated.
            stirling_second (np.ndarray[int], optional): Stirling 2nd order kind
                table. If None cuompute table by n, p.
        """
        self.n = n
        self.k = [0] * n
        self.m = [0] * n
        self.p = p
        self.sterling_second = (self.compute_stirling2(n, p)
                                if sterling_second is None
                                else sterling_second)
        self.pinitialize()
        self.size = self.partition_size()

    def next_partition(self):
        """
        Generate the next partition in the set partitioning.

        Returns:
            List[int] or None: The next partition as a list of integers
                representing the partition of each item.
                If there are no more partitions, returns None.
        """
        if self.p is None:
            for i in range(self.n - 1, 0, -1):
                if self.k[i] <= self.m[i - 1]:
                    self.k[i] += 1
                    self.m[i] = max(self.m[i], self.k[i])
                    j = i + 1
                    while j <= (self.n - 1):
                        self.k[j] = self.k[0]
                        self.m[j] = self.m[i]
                        j += 1
                    self.size = self.partition_size()
                    return self.k.copy()
        else:
            p = self.partition_size()
            for i in range(self.n - 1, 0, -1):
                if self.k[i] < (p - 1) and self.k[i] <= self.m[i - 1]:
                    self.k[i] += 1
                    self.m[i] = max(self.m[i], self.k[i])
                    j = i + 1
                    while j <= (self.n - (p - self.m[i])):
                        self.k[j] = 0
                        self.m[j] = self.m[i]
                        j += 1
                    j = self.n - (p - self.m[i]) + 1
                    while j <= (self.n - 1):
                        self.k[j] = self.m[j] = p - (self.n - j)
                        j += 1
                    self.size = self.partition_size()
                    return self.k.copy()

        self.size = None
        return

    def previous_partition(self):
        """
        Generate the previous partition in the set partitioning.

        Returns:
            List[int] or None: The previous partition as a list of integers
                representing the partition of each item.
                If there are no previous partitions, returns None.
        """
        if self.p is None:
            for i in range(self.n - 1, 0, -1):
                if self.k[i] > self.k[0]:
                    self.k[i] -= 1
                    self.m[i] = self.m[i - 1]
                    j = i + 1
                    while j <= (self.n - 1):
                        self.k[j] = self.m[j] = self.m[i] + j - i
                        j += 1
                    self.size = self.partition_size()
                    return self.k.copy()
        else:
            p = self.partition_size()
            for i in range(self.n - 1, 0, -1):
                if self.k[i] > 0 and (p - self.m[i - 1]) <= (self.n - i):
                    self.k[i] -= 1
                    self.m[i] = self.m[i - 1]
                    j = i + 1
                    while j <= (i + (p - self.m[i]) - 1):
                        self.k[j] = self.m[j] = self.m[i] + j - i
                        j += 1
                    j = i + (p - self.m[i])
                    while j <= (self.n - 1):
                        self.k[j] = self.m[j] = p - 1
                        j += 1
                    self.size = self.partition_size()
                    return self.k.copy()

        self.size = None
        return

    def each_partition(self):
        """
        Iterate over all partitions in the set partitioning.

        Yields:
            List[int]: A partition as a list of integers representing the
                partition of each item.

        Raises:
            ValueError: If the method is called without a block.

        Notes:
            The method yields each partition one at a time. It starts with the
                current partition and
            proceeds to generate the next partitions until all possible
                partitions have been yielded.

            Usage:
                for partition in set_partitions.each_partition():
                    # Do something with the partition
                    ...
        """
        # if not callable:
        #     raise ValueError("Missing block")
        self.reinitialize()
        yield self.k.copy()
        for _ in range(self.count() - 1):
            partition = self.next_partition()
            yield partition
        self.reinitialize()

    def count(self):
        """
        Calculate the total number of partitions.

        Returns:
            int: The total number of partitions.

        Notes:
            The `count` method calculates the total number of partitions that
            can be generated based on the current values of `n` and `p`. The
            result depends on whether `p` is specified or not. If `p` is not
            specified, it sums up the number of Sterling numbers of the second
            kind for each value of `k` from 1 to `n`. If `p` is specified, it
            directly returns the Sterling number of the second kind for `n`
            and `p`.
        """
        if self.p is None:
            sum_val = 0
            for k in range(1, self.n + 1):
                sum_val += self.sterling_second[self.n, k]
            return sum_val
        else:
            return self.sterling_second[self.n, self.p]

    def compute_stirling2(self, n: int, k: int = None) -> np.ndarray:
        """
        Compute table with the Stirling numbers of the second kind.

        The Stirling numbers of the second kind count the ways to partition a set
        into non-empty subsets.

        Args:
            n: The maximum value of n for which to compute the Stirling numbers.

        Returns:
            The Stirling numbers of the second kind as a NumPy array.

        Example:
            import pandas as pd
            n = 8
            k = n
            df = pd.DataFrame(
                compute_stirling2(n, k),
                columns=[f'k={i}' for i in range(0, k+1)],
                index=pd.Index([f'n={i}' for i in range(0, n+1)]))
            print(df)
        """
        k = k or n
        s = np.zeros((n+1, k+1), dtype=np.int64)
        s[0, 0] = 1
        s[1:, 1] = 1
        for i in range(2, n+1):
            for c in range(2, min(k, i)+1):
                if i == c:
                    s[i, c] = 1
                else:
                    s[i, c] = c * s[i-1, c] + s[i-1, c-1]
        return s

    def partition_size(self):
        """Calculate the size of the current partition."""
        return self.m[self.n - 1] - self.m[0] + 1

    def reinitialize(self):
        """
        Reinitialize the state of the set partitions.

        Resets the `k` and `m` arrays to their initial values based on the
        current values of `n` and `p`. Updates the `size` attribute to reflect
        the new partition size.
        """
        self.k = [0] * self.n
        self.m = [0] * self.n
        self.pinitialize()
        self.size = self.partition_size()

    def pinitialize(self):
        """
        Initialize the `k` and `m` arrays based on the value of `p`.

        Sets the elements in the `k` and `m` arrays based on the value of `p`
        and the size of the set partitions.
        The initial values are determined to ensure correct generation of
        partitions when `p` is specified.
        """
        if self.p is not None:
            for i in range(0, self.n - self.p):
                self.k[i] = self.m[i] = 0
            i = self.n - self.p + 1
            while i <= (self.n - 1):
                self.k[i] = self.m[i] = i - (self.n - self.p)
                i += 1