Just discovered this discourse group.
Figured I would cross-post my SO question which contains the code and stack trace. https://stackoverflow.com/questions/64469950/numba-numpy-understanding-error-message
A little more context:
My dataset is a pandas dataframe with a single column. That column is an array of values, having some irregular length. I need to run union commands for all elements, as shown in the example code. I’m going to pass in the .values attribute of that column into my process_values function.
a) is this a good use case for numba? I’m trying to improve on pandas .apply method.
b) can you help be decrypt the error message?
Thanks so much,
Jeff
hi @jrjames83, I have replied on SO. Let me know if it’s clear how to solve it using lists.
Luk
Nice! Taking a look @luk-f-a. Appreciate the quick reply.
from numba import jit
from numba.typed import List
indices = np.arange(8806806, dtype=np.int64)
sizes = np.ones(8806806, dtype=np.int64)
connected_components = 8806806
@jit(npython=True)
def root(p: int) -> int:
while p != indices[p]:
indices[p] = indices[indices[p]]
p = indices[p]
return p
@jit(npython=True)
def connected( p: int, q: int) -> bool:
return root(p) == root(q)
@jit(npython=True)
def union( p: int, q: int) -> None:
root1 = root(p)
root2 = root(q)
if root1 == root2:
return
if (sizes[root1] < sizes[root2]):
indices[root1] = root2
sizes[root2] += sizes[root1]
else:
indices[root2] = root1
sizes[root1] += sizes[root2]
connected_components -= 1
@jit(nopython=True)
def process_values(arr):
for row in arr:
for first, second in zip(arr, arr[1:]):
union(first, second)
mylist = List()
for x in [[8018361, 4645960], [8018361, 4645960]]:
new_list = List()
new_list.extend(x)
mylist.append(new_list)
process_values(mylist)
mylist
evidently gets created, but now:
---------------------------------------------------------------------------
TypingError Traceback (most recent call last)
<ipython-input-64-6e08ff92ded5> in <module>
48
49
---> 50 process_values(mylist)
51
52
/opt/conda/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
399 e.patch_message(msg)
400
--> 401 error_rewrite(e, 'typing')
402 except errors.UnsupportedError as e:
403 # Something unsupported is present in the user code, add help info
/opt/conda/lib/python3.7/site-packages/numba/core/dispatcher.py in error_rewrite(e, issue_type)
342 raise e
343 else:
--> 344 reraise(type(e), e, None)
345
346 argtypes = []
/opt/conda/lib/python3.7/site-packages/numba/core/utils.py in reraise(tp, value, tb)
78 value = tp()
79 if value.__traceback__ is not tb:
---> 80 raise value.with_traceback(tb)
81 raise value
82
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Internal error at <numba.core.typeinfer.CallConstraint object at 0x7f2127283ed0>.
"<class 'numba.core.cpu.CPUTargetOptions'> does not support option: 'npython'"
[1] During: resolving callee type: type(CPUDispatcher(<function union at 0x7f21272b2320>))
[2] During: typing of call at <ipython-input-64-6e08ff92ded5> (39)
Enable logging at debug level for details.
File "<ipython-input-64-6e08ff92ded5>", line 39:
def process_values(arr):
<source elided>
for first, second in zip(arr, arr[1:]):
union(first, second)
^
What are the patterns then for getting my original array of arrays into this typed list format?
you just had a typo, @jit(npython=True)
should be @jit(nopython=True)
Thanks @luk-f-a - good catch. I’ll report back with any updates and what the code ends up looking like.
@jrjames83 Bodo supports accelerating Pandas apply functions directly (usually great speedups). For example, you can just pass a Series with array of array input to Bodo:
import numpy as np
import pandas as pd
import bodo
@bodo.jit
def process_values(arr):
print(arr)
process_values(
pd.Series(
[np.array([8018361, 4645960]),
np.array([1137555, 7763897]),
np.array([7532943, 2248813]),
np.array([5352737, 71466, 3590473, 5352738, 2712260])], dtype='object'))
Could you post your original Pandas code with apply so I can suggest how to use Bodo?