Website notebooks speed up variance and clarifying on numba v.s. numpy baked-in optimizations

Hi,

This is just a friendly question. I’ve downloaded and run the notebooks which are advertised from the project’s main landing page (sorry but can’t include a link on a first group post). In notebook text blocks saying that we get a x4 or x6 speedup I only get a x2 speedup on my i7-8700 chipset. In terms of SIMD this chipset does have AVX/AVX2 but not AVX-512. My OS is Ubuntu 20.04, not sure how to report various related C/C++ header levels or Openblas version levels which may be related. No GPU involved.

I’m wondering whether this is just a product of the numpy teams gradually catching up with the nubma team on optimization over time, or whether other people get the same speedups as the notebooks say (obviously this is all machine and low-level software stack specific).

As a point issue, where these notebooks claim that numba does type specialization, I wonder if that’s not something that numpy also does by now, it is sometimes hard to discern which optimizations are overlapping between the two libraries before doing a lot of experimentation, e.g. I assume that numpy uses SIMD instructions (it is advertised as such on an official link I can’t include here) and may guess that it attempts to accomplish cache locality either directly or by using underlying blas on its own as well.

Thanks for any helpful response,

Cheers,
Matan