Performance of typed.List outside of jit functions

Also there are perhaps a few more optimizations which might account for some of the discrepancy between my entry_point trick (10x slower) and numpy (2x slower), which I was hoping the new AOT stuff would make easier. At the moment List()'s methods are implemented by calling into a pre-compiled C library that implements most of its functionality so it runs something like this:

Python->Multiple Dispatch->Call End-point->Call Opaque C-Func

The inefficiency is that there are multiple opaque function calls that aren’t currently inlined, but could be if the compilation pipeline could inline the C library by compiling it with clang and inlining the LLVM IR instead of calling out to it as an external library.

In addition to this I expect the unboxing phase does some refcount incrementing that could be skipped.