Performance issue with typed dicts and lists

@kartiksubbarao looks like you are trying to accelerate finding unique values for a Pandas Series? Bodo uses Numba to accelerate Pandas and may support your use case automatically. I added Bodo to you code to demonstrate. Also, having timers outside the call measures the compilation time so put the timers inside functions:

import random
import string
import pandas as pd
import numba
from time import time
import bodo

def seen_name(names):
    start = time()
    seen = {}
    for i in range(len(names)):
        if names[i] not in seen: seen[names[i]] = True
    print(f'\nseen_name => {time() - start:.4} seconds\n')

ntypes = numba.core.types
ndict = numba.typed.Dict
@numba.njit
def seen_name_numba(names):
    start = time()
    seen = ndict.empty(key_type=ntypes.unicode_type,
                    value_type=ntypes.boolean)
    for i in range(len(names)):
        if names[i] not in seen: seen[names[i]] = True
    print(f'\nseen_name_numba => {time() - start} seconds')

@bodo.jit
def seen_name_bodo(names):
    start = time()
    res = names.nunique()
    print(f'\nseen_name_bodo Series => {time() - start} seconds')
    return res

allnames = []
# Generate a long list of random 10-letter strings
for i in range(1000000):
    allnames.append(''.join(
    random.choices(string.ascii_letters, k=10)))
# The real-world scenario stores the data in a dataframe
df = pd.DataFrame(allnames, columns=['name'])

seen_name(df.name.tolist())
seen_name_numba(df.name.tolist())
seen_name_numba(numba.typed.List(df.name))
seen_name_bodo(df.name)

Here are results on my MacBook Pro (2019, 2.3 GHz Intel Core i9) laptop:

seen_name => 0.2311 seconds
seen_name_numba => 0.270217 seconds
seen_name_numba => 0.344879 seconds
seen_name_bodo Series => 0.561972 seconds

Using regular Series in Bodo seems to have overhead (which we need to investigate), but you can parallelize your code and scale on more cores and data linearly.