Fully JIT'ed DuckDB

Dear Numba users,

I have started a POC-motivated project NumbDuck (a follow-up to the NumbOx discussed previously).

The original motivation was to see if / how the DuckDB database can be used entirely in the JIT’ed context. (The inspiration is coming from the numbsql project that investigated similar questions for the python-native sqlite.)

NumbDuck so far has a couple of demos in its test suite. Please feel free to take a look, contribute, expand, and use.

cool, thanks for sharing!

That’s pretty cool… I didn’t realize that arm and AMD platforms had different struct passing methodologies!

1 Like

Right, here is the story.

When importing duckdb_fetch_chunk into NumbDuck, the duckdb_result parameter (of the duckdb_fetch_chunk function) which is a structure of six members of 8 bytes each, presents a subtlety of how to type it in the numba context.

A natural candidate for the argument of the duckdb_fetch_chunk function from the numba’s perspective is UniTuple(intp, 6) which is lowered to [int64 x 6] array type.

This assumes that either:

  1. The duckdb_result 6-field structure will be passed by value, as declared at the high level in the source.
  2. If the DuckDb backend got (implicitly) compiled to accept duckdb_result pointer instead, then consistently the code compiled by numba+llvmlite will supply the argument’s pointer to the callee as well.

It appears that neither of these two options are satisfied on the arm64 machine. Namely, larger than two-word structure parameters are implicitly ‘pointerized’.

At the same time, numba duly compiles arrays parameters to the by-value – this statement is in fact machine-independent.

However, on AMD machine the six-word duckdb_result parameter is still called by-value.

(These observations are easy to reproduce by examining LLVM IR emitted on arm and AMD machines, with no relation to DuckDb or numba.)

If we were using the DuckDb C API in the C code directly, as in this demo, a counterpart of option 2 above would have been fulfilled, as both the caller and the callee would have been consistent.

Reproducing the aforementioned demo in NumbDuck therefore required a more nuanced approach. The type of duckdb_result was declared to be platform-dependent, and the corresponding utility to prepare the duckdb_result object of the correct type – either tuple or StructRef (with pointer to its payload) – was provided.

Update:

After having upgraded to numba~=0.62.1, llvmlite~=0.45.1 the aforementioned platform-dependent workaround stopped working on AMD64 architecture. Specifically, passing the six-int64 aggregate duckdb_result (odd duck) didn’t work neither when passed by value nor by pointer. Both segfaulted. It still worked though (when passed by pointer, as before) on arm64.

As the first step in solving this, I explicitly repackaged the [int64 x 6] array as a LiteralStructType and gave that to the duckdb_fetch_chunk. This no longer segfauled (per se) but failed to collect the correct pointer.

However, copying that struct to the stack and passing pointer to the stack-allocated memory worked. (This is kind of ‘ptr byval(…)’, in LLVM parlance, I suppose.) A natural next step was to skip the LiteralStruct gymnastics altogether.

Moreover, such an approach alleviates the need to condition the logic on the platform.

1 Like