Help with error message

Just discovered this discourse group.

Figured I would cross-post my SO question which contains the code and stack trace. https://stackoverflow.com/questions/64469950/numba-numpy-understanding-error-message

A little more context:
My dataset is a pandas dataframe with a single column. That column is an array of values, having some irregular length. I need to run union commands for all elements, as shown in the example code. I’m going to pass in the .values attribute of that column into my process_values function.

a) is this a good use case for numba? I’m trying to improve on pandas .apply method.
b) can you help be decrypt the error message?

Thanks so much,

Jeff

hi @jrjames83, I have replied on SO. Let me know if it’s clear how to solve it using lists.

Luk

Nice! Taking a look @luk-f-a. Appreciate the quick reply.

from numba import jit
from numba.typed import List

indices = np.arange(8806806, dtype=np.int64)
sizes = np.ones(8806806, dtype=np.int64)
connected_components = 8806806

@jit(npython=True)
def root(p: int) -> int:
    while p != indices[p]:
        indices[p] = indices[indices[p]]
        p = indices[p]
    return p

@jit(npython=True)
def connected( p: int, q: int) -> bool: 
    return root(p) == root(q)

@jit(npython=True)
def union( p: int, q: int) -> None:
    root1 = root(p)
    root2 = root(q)
    if root1 == root2:
        return

    if (sizes[root1] < sizes[root2]):
        indices[root1] = root2
        sizes[root2] += sizes[root1]
    else:
        indices[root2] = root1
        sizes[root1] += sizes[root2]

    connected_components -= 1
    
@jit(nopython=True)
def process_values(arr):
    for row in arr:
        for first, second in zip(arr, arr[1:]):
            union(first, second)
            
          

mylist = List()
for x in [[8018361, 4645960], [8018361, 4645960]]:
    new_list = List()
    new_list.extend(x)
    mylist.append(new_list)


process_values(mylist)

mylist evidently gets created, but now:

---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
<ipython-input-64-6e08ff92ded5> in <module>
     48 
     49 
---> 50 process_values(mylist)
     51 
     52 

/opt/conda/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
    399                 e.patch_message(msg)
    400 
--> 401             error_rewrite(e, 'typing')
    402         except errors.UnsupportedError as e:
    403             # Something unsupported is present in the user code, add help info

/opt/conda/lib/python3.7/site-packages/numba/core/dispatcher.py in error_rewrite(e, issue_type)
    342                 raise e
    343             else:
--> 344                 reraise(type(e), e, None)
    345 
    346         argtypes = []

/opt/conda/lib/python3.7/site-packages/numba/core/utils.py in reraise(tp, value, tb)
     78         value = tp()
     79     if value.__traceback__ is not tb:
---> 80         raise value.with_traceback(tb)
     81     raise value
     82 

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Internal error at <numba.core.typeinfer.CallConstraint object at 0x7f2127283ed0>.
"<class 'numba.core.cpu.CPUTargetOptions'> does not support option: 'npython'"
[1] During: resolving callee type: type(CPUDispatcher(<function union at 0x7f21272b2320>))
[2] During: typing of call at <ipython-input-64-6e08ff92ded5> (39)

Enable logging at debug level for details.

File "<ipython-input-64-6e08ff92ded5>", line 39:
def process_values(arr):
    <source elided>
        for first, second in zip(arr, arr[1:]):
            union(first, second)
            ^

What are the patterns then for getting my original array of arrays into this typed list format?

you just had a typo, @jit(npython=True) should be @jit(nopython=True)

Thanks @luk-f-a - good catch. I’ll report back with any updates and what the code ends up looking like.

@jrjames83 Bodo supports accelerating Pandas apply functions directly (usually great speedups). For example, you can just pass a Series with array of array input to Bodo:

import numpy as np
import pandas as pd
import bodo

    
@bodo.jit
def process_values(arr):
    print(arr)

process_values(
    pd.Series(
        [np.array([8018361, 4645960]),
            np.array([1137555, 7763897]),
            np.array([7532943, 2248813]),
            np.array([5352737,   71466, 3590473, 5352738, 2712260])], dtype='object')) 

Could you post your original Pandas code with apply so I can suggest how to use Bodo?