Njit function gives dimension mismatch if the return is condition dependent

I have an example code which represents the issue.

If i create an IF statement and the return of the function is dependent on that condition, numba will give an error called:

@njit(parallel=True)
def test(model):
d = {}
if model == ‘foo’:
d[2]=1
x=2

    return x,d

test(‘sar’)

AssertionError: Failed in nopython mode pipeline (step: convert to parfors)
Dimension mismatch for (Var($14.8, :9), Var($14.7, :9))

But when the return of the function is independent of the condition.

@njit(parallel=True)
def test(model):
d = {}
if model == ‘foo’:
d[2]=1
x=2

return x,d

There is no error so far.

Hey,

Are you using the latest version of Numba? I do get a warning that the parallel=True isn’t able to do anything with this function, but that’s fine. Other than that it seems to work correct for me, but perhaps you meant something else, the formatting of the code in your post is a little off.

Using Numba 0.54.1.

from numba import njit

@numba.njit(parallel=True)
def test(model):
    d = {}
    if model == "foo":
        d[2] = 1
    x = 2

    return x,d

Not entering the if-branch results in:

test("sar")
(2, DictType[int64,int64]<iv=None>({}))

And entering it:

test("foo")
(2, DictType[int64,int64]<iv=None>({2: 1}))

It’s always worth a try running with parallel=False to see if that works. Having it enabled definitely adds some restrictions

I was assuming that your x variable is declared outside the if-statement, since it’s always returned.

However, and this is a bit puzzling to me, if I do include it within the if-statement Numba runs fine whereas Python rightfully complains about it being undefined…

from numba import njit

def test_py(model):
    d = {}
    
    if model == "foo":
        d[2] = 1
        x = 2
        
    return x,d

test_nb = njit(parallel=True)(test_py)

test_py("sar")
test_nb("sar")

Why does Numba return x=2 in this case? :face_with_monocle: Printing x before the if makes it crash because it’s undefined, so it’s not using some rogue global variable. Adding an else: x=1 clause correctly makes it return x=1.

Adding a print in the if-statement also shows that it never, unexpectedly, does enter there, since there’s nothing being printed:

def test_py(model):

    print("before:", model)
    
    if model == "foo":
        x = 2
        print("within-if:", model, x)
    else:
        print("within-else:", model, x)
        
    print("after:", model, x)

    return x

test_nb = njit(parallel=False)(test_py)
test_nb("sar")

Returns 2 and prints:

before: sar
within-else: sar 2
after: sar 2

Am I missing something?

Regards,
Rutger

@Rutger Many thanks for taking a look at this. Unfortunately the “bit puzzling” thing you are seeing is a known bug: Incorrectly typed as literal when variable maybe undefined. · Issue #7338 · numba/numba · GitHub .

sorry for the misunderstandable codes, it wasnt formatted. But the problem with my statement was that the return in the upper codes was dependent of the If statement and it wasnt working

@njit(parallel=True)
def test(model):    
    d = {}
    if model == 'foo':
        d[2]=1
        x=2


        return x,d

@njit(parallel=True)
def test(model):    
    d = {}
    if model == 'foo':
        d[2]=1
        x=2


    return x,d

@krenusz Thanks for clarifying the issue, it’s now clear to me what you meant initially, and I can also reproduce the error you’re seeing (in 0.54.1). As mentioned earlier, in this case setting parallel=False resolves the issue, and since this function already doesn’t benefit from parallel=True it wouldn’t matter for this specific case (as shown by the NumbaPerformanceWarning). But I’m assuming it’s a toy example.

I’m not familiar with the inner workings of Numba, but my guess would be that Numba struggles with the different return types, and for a parallel-case those should probably be known upfront (by the compiler), in order to nicely collect the results after (potential) parallelization. Note that when omitting an explicit return, Python (and Numba) will still always return a None value. So when you omit it, you can still interpret the function as having return None at the end.

So with your example, depending on entering the if-statement or not, the result is sometimes a tuple of (int, dict) and other times just None. That’s probably a bit too difficult for Numba to reconcile in the parallel-case.

A workaround in this case could be to simply returning a tuple of (None, None) for the else-case. That already resolve the error you’re seeing, even though the types within the tuple are still different. In some cases you could perhaps even match the expected types, returning the empty dict with some integer that represents your “None”, that could theoretically be even easier for Numba/LLVM in terms of optimization, but I’m not sure if that really matters here.

So for example:

@njit(parallel=True)
def test(model):    
    d = {}
    if model == 'foo':
        d[2]=1
        x=2
        return x,d
    else:
        return None, None

Or:

@njit(parallel=True)
def test(model):    
    d = {}
    if model == 'foo':
        d[2]=1
        x=2
        return x,d
    else:
        return -999, d

@stuartarchibald Thanks for explaining the issue, that makes sense now.

Regards,
Rutger

Thank You really much for the explaination, its clear now.

In my usecase the condition independent return totally suitable so no need workaround. The reason i created the topic is because i was struggling with this issue for a while and when i recognised the cause i wanted to immediatelly create a thread. Sadly there was no info regarding this scenario.

Yours Truly,
Bence

I think the issue in the parallel case is an instance of/related to the bug reported here: Bug of mismatch shape with parallel mode · Issue #7681 · numba/numba · GitHub a fix for it is being worked on.