How to Append list of type String Int and Float using Numba

I am using Numba to improve the speed of the below loop. without Numba it takes 135 sec to execute and with Numba it takes 0.30 sec :slight_smile: which is very fast.

In the below loop I comparing the array with a threshold of 0.85. If the condition turns out to be True I am inserting the data into the List which will be returned by the function in the end.

The data which is getting inserted into the List looks like this.

['Source ID', 'Source TEXT', 'Similar ID', Similar TEXT, 'Score']

idd = df['ID'].to_numpy()
txt = df['TEXT'].to_numpy()

Column = 'TEXT'
df = preprocessing(dataresult, Column) # removing special characters of 'TEXT' column
message_embeddings = model_url(np.array(df['DescriptionNew']))  #passing df to universal sentence encoder model to create sentence embedding.
cos_sim = cosine_similarity(message_embeddings) #len(cos_sim) > 8000

# Below function finds duplicates amoung rows.
@numba.jit(nopython=True)
def similarity(nid, txxt, cos_sim, threshold):

  numba_list = List()
  for i in range(cos_sim.shape[0]):
    for index in range(i, cos_sim.shape[1]):
      if (cos_sim[i][index] > threshold) & (i!=index):
        numba_list.append([nid[i], nid[index], cos_sim[i][index]]) # either this works
        # numba_list.append([txxt[i], txxt[index]]) # or either this works
        # numba_list.append([nid[i], txxt[i], nid[index], txxt[index], cos_sim[i][index]]) # I want this to work.
              
  return numba_list

print(similarity(idd, txt, cos_sim, 0.85))

In the above code during appending List either columns with numbers get appended or either Text. I want all the columns with both numbers and text to get inserted into the numba_list.

I am getting below Error


1 frames
/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py in error_rewrite(e, issue_type)
    359                 raise e
    360             else:
--> 361                 raise e.with_traceback(None)
    362 
    363         argtypes = []

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Poison type used in arguments; got Poison<LiteralList((int64, [unichr x 12], int64, [unichr x 12], float32))>
During: resolving callee type: BoundFunction((<class 'numba.core.types.containers.ListType'>, 'append') for ListType[undefined])
During: typing of call at <ipython-input-179-6ee851edb6b1> (14)


File "<ipython-input-179-6ee851edb6b1>", line 14:
def zero(nid, txxt, cos_sim, threshold):
    <source elided>
        # print(i+1)
        numba_list.append([nid[i], txxt[i], nid[index], txxt[index], cos_sim[i][index]])
        ^

Thank you for asking about this. All lists supported by Numba (Numba reflected list and Numba typed list) are type homogeneous, that means you can only add either text or numbers to a list, but not mix these. This is quite different from Python list which do allow arbitrary mixing of Python types. Hope this helps.