that’s a happy situation when the compiler can optimize away the overhead of passing by value the entire structure. most of the time, that is not the case
Since Numba usually inlines the callees in a @jit
function, and structs are immutable and passed by reference, I tend to find that the overhead is optimized away in the cases I’ve looked at. I’ll agree that it’s not true for compilers and languages in general, but I’d be keen to see cases where LLVM is failing to optimize these passes by value for Numba-generated IR. If there are specific examples of this not happening for Numba-generated IR, it would be good to investigate them if you can share them.
numba does not have a clear memory management strategy, or it does but is not documented anywhere.
My understanding is that there’s a strategy, but it is still going to be subject to some changes as necessary before 1.0, and it does require further documentation at some point - for example StructRef
has only just been added recently. I’m not a core developer so my opinion here should be taken with a grain of salt, but my view is that the strategy will become more concrete and clearly documented on the route towards 1.0.
I have to take a look at the generated code from llvm to see what is going on…
I too spend a lot of time looking at the LLVM IR. I mostly look at the optimized IR because its simpler to read, but some debugging and understanding needs a look at the unoptimized IR, or the generated assembly.
Other than that, the two types of data structures proposed here should have been one and the same with a jit flag to indicate if you want to pass a copy or not, and maybe one more flag for deep copies
I’m struggling to picture how this could be implemented without introducing inconsistencies - anything decorated with @jit
should execute with the same semantics as the undecorated function, so having a flag in there that changes semantics seems counter to this.
In implementing a Numba extension, the choice of StructModel
vs. StructRef
, etc., should be made based on what it takes to replicate the plain Python semantics for the type being implemented.
Besided this, the name StructRef is not very intuitive. From what I have seen, StructRef behaves more like a type template from which you can instantiate new data types. and the ref aspect of it indicates that it is being passed by reference
Numba types are in general templates from which you can instantiate new types - for example in numba/core/types/__init__.py
:
int8 = Integer('int8')
int16 = Integer('int16')
int32 = Integer('int32')
int64 = Integer('int64')
float32 = Float('float32')
float64 = Float('float64')
The Integer
type is used to instantiate various other data types (int8
, int16
), etc - you could instantiate other-sized integers if you want to - similarly for Float
and others in that file.
And by “memory management strategy” I am talking about what happens with your primitives/objects when you cross the bridge back and forth from python to native.
Some description of Boxing and Unboxing (conversion of Python objects to native and vice-versa) is in Low-level extension API — Numba 0.50.1 documentation and part of the Interval example: Example: an interval type — Numba 0.50.1 documentation
I don’t even know if numba uses any kind of automatic management, like garbage collector or automatic reference counting. I am not saying that it should, but it is not indicated anywhere
There is a reference counting implementation, which is referred to as the “Numba Runtime”: Notes on Numba Runtime — Numba 0.50.1 documentation
Another one: mutability vs imutability, how is being handled? like python does it, like c++ does it, or other way
I’m not quite clear about what “Like Python does” means vs. “Like C++ does it”, but Numba aims to match the semantics of the undecorated function it compiles, so it should probably be thought of as being handled how Python handles it.
Even more numpy. we know that standard numpy doesn’t handles slicing like python does. how numba does it?
For slicing NumPy arrays, the slicing in a Numba-compiled function should match how NumPy does it - for slicing other things, it should match how Python does it. Any deviations from this are likely to be considered bugs.
I think when approaching Numba internals and extensions, its important to keep in mind that Numba is a project with quite a lot of complexity and change that has only a small core team working on it, and borne out of this is a limited set of documentation for a lot of aspects of how it works, especially for more-recently added, experimental, and in-progress features. I’m doing what I personally can to improve documentation and provide help / explanations as things progress, but it will necessarily take some time before everything is settled, stable, and complete with comprehensive documentation and examples.
If you have more questions about how things work I’d be happy to try and continue answering them (and likely discovering the specifics of various answers myself at the same time). If you’re able to make any contributions to the documentation that clarify things you’ve found unclear as you discover the answers, I think that would be very much appreciated, and would help move Numba towards the state you’d like to see it in.