SystemError potentially from Typed Dict

Patol75 · October 21, 2021, 3:52am

Hi everyone,

I am facing a Python issue that I think is related to Numba and I would be very grateful to have the opinion of someone more familiar with Numba source code. Unfortunately, I do not have a minimal reproducer, but I will try my best to give as many details as I can in the following.

I am using a fluid dynamics framework, mostly written in FORTRAN, for which it is possible to provide Python snippets that define parts of the simulation into the simulation’s input file. I am making use of Numba AOT compilation to speed up some of the Python functions these snippets rely on. Unfortunately, I am running into the following (scary) error while attempting to run the simulation:

SystemError: ../Objects/tupleobject.c:159: bad argument to internal function

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 2, in val
  File "<frozen importlib._bootstrap>", line 1044, in _handle_fromlist
SystemError: <built-in function isinstance> returned a result with an error set

After some time, I have understood that this error originates from an import statement within one of the Python snippets. The import accesses a regular Python file (say, constants.py) in the same directory as the input file where some constant values for the simulation are written/calculated. Going through this file line by line, I have further discovered that the error above is linked to another import statement, within constants.py this time. This import again accesses another Python file in the same directory (say, data.py) that contains some data the simulation requires. Going line by line through data.py, I have found that the following lines lead to the error:

from numba import float64
from numba.types import unicode_type
from numba.typed import Dict
foo = Dict.empty(key_type=unicode_type, value_type=float64[:, :])
foo["bar"] = array([[0.456, -0.123, -0.678], [-0.256, 0.856, 0.541]])

In particular, the very last line is problematic. Commenting it out (in other words, having foo empty) avoids the error. Moreover, getting rid of foo in data.py and directly having it in the functions that require it also looks to work fine. However, foo is not the only Typed Dict I am using within this simulation, and another one later on yields the exact same error. The problem with the latter is that it is created and destroyed at every simulation step, meaning that its content is not constant and, hence, I cannot just hard-code it where it would not raise the error. Leaving it empty raises a legitimate KeyError as some code is trying to access the non-existing content of the Typed Dict. I have tried to define the dictionary using curly braces instead of the Dict.empty constructor. This time, I do not get the error, but instead a SegFault, which I believe is linked to the inferred type of the dictionary data not matching the signature provided for the functions compiled AOT. Finally, the last piece of evidence I have, running the simulation on different systems does not always yield the error, despite all systems using the same major versions of Python and same versions of Numba and NumPy.

On a side note, I have found this Python bug which might be relevant.

I am very much puzzled at what is going on here, and I would definitely appreciate some guidance. I am also happy to provide more information but, as I mentioned earlier, I have not succeeded in reproducing the error outside of the fluid dynamics framework.

Thank you for any help.

stuartarchibald · October 21, 2021, 10:27am

Hi @Patol75,

Thanks for providing plenty of information, these sorts of problems are challenging! Without a reproducer I’ll just have to guess…

The thing that is of most concern is this:

Finally, the last piece of evidence I have, running the simulation on different systems does not always yield the error, despite all systems using the same major versions of Python and same versions of Numba and NumPy.

does this mean that the problem isn’t always present? If it is intermittent this often points to memory corruption (something in your program has written to a valid memory region and corrupted it). This sort of problem can often stem from out of bounds array access. If you switch on -fbounds-check (this is for gfortran, other compilers have similar flags) this should help catch out of bounds accesses made in the fortran code. If you also set the environment variables NUMBA_BOUNDSCHECK=1 (docs: Environment variables — Numba 0+untagged.4124.gd4460fe.dirty documentation) this will force bounds checking in Numba code.

If your code has no identifiable out of bounds accesses then type mismatches may be involved, at which point a reproducer or at least code surrounding the caller/callee would help a lot.

Hope this helps?

Patol75 · October 22, 2021, 6:31am

@stuartarchibald Thanks a lot for your time and reply! I think you indirectly pointed me in the right direction. Through checking for -fbounds-check in the Makefile of the fluid dynamics framework, I have realised that multiple versions of Python were mentioned in FCFLAGS, CFLAGS and CXXFLAGS, except on the computer where the error was not showing. I have also noticed that, on this specific computer, the compilation was linking to VTK 6.3, whilst other computers were linking to VTK 7.1 and VTK 8.2. Therefore, I updated VTK 6.3 to VTK 7.1 and noticed that mentions of Python 3.8 (default on Ubuntu 20.04) were now present in the Makefile, even though I had provided PYTHON_VERSION=3.9 to the configure script. Building the framework nevertheless, I ran into the error when running the model.

In an attempt to fix what looks like a Python version conflict, I edited the Makefile after running configure and replaced all instances of Python 3.8 by the equivalent folders/files for Python 3.9. The framework compiled, and the model ran. After that, I compiled VTK 8.2 from source, specifying the Python version to be 3.9, and built the fluid dynamics framework against this updated version of VTK. Again, the model ran.

Applying the same idea in the Makefile to the system running Rocky Linux 8.4 worked, however both modifying the Makefile and building VTK from source failed on the system with Ubuntu 18.04. To be honest, the latter has been the weirdest from the start, as the model runs in serial but in parallel. I guess I will just update that system to a more recent Ubuntu and things will magically work.

What I am very curious about now is how this whole story affects specifically Numba Typed Dicts, even though I envisage it is far from a simple question.

stuartarchibald · November 3, 2021, 9:33am

@Patol75 no problem. I cannot completely tell from the above what happened, but it looks like there might have been a mix up of python versions between build and runtime in the environment. If you have a reproducer for something that runs in serial but not in parallel, and it also reproduces just in Numba and with bounds checking on, then please do report it on the issue tracker.

I suspect the issue with typed dicts is either something consistently corruption the dictionary internal structure due to a leak OR something due to mismatched python versions between build and run time. I’d encourage the use of bounds checking in both Fortran and Numba just to see if it’s an obvious leak.

Anyway, I’m glad you found something that’s working ok now!

Topic		Replies	Views
Numba dictionary is not working Support: What is this error message?	1	1132	May 10, 2023
Numba Dict function not compiling Support: How do I do ...?	1	669	May 3, 2021
How to convert a non numba dictionary to a nb.typed.Dict? Community Support	8	4572	October 10, 2021
Numba with Dicts & np.zeros Support: What is this error message?	1	4762	October 23, 2020
Iterate over array and copy Community Support	5	940	February 28, 2021

SystemError potentially from Typed Dict

Related topics