Concurrent np.dot (matrix multiplications) crash

Hi guys,

I’m trying to parallelize some calculations which involve matrix multiplication. Thus, I’m using numpy.dot within a prange loop. However the code crashes with kind of segfault.

The code works perfectly in pure Python, as well when compiled with parallel=False. Only with parallel=True it segfaults. I have the feeling this is due to calling some BLAS function that is not thread safe, even if it sounds weird since nowadays all BLAS should be thread safe. The error happens usually when the multithreading involves quite some threads (30+, on a 46 cores machine).

This is some output from numba -s that could be useful:

__SVML Information__
SVML State, config.USING_SVML                 : True
SVML Library Loaded                           : True
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : True

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : True
+-->Vendor: MS
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

Moreover when it crashes some BLAS error message is printed, but since it’s a concurrent print I cannot understand it:

BBBLLBLLABABASLSAASS BBL: SLAL   AA:P: S:S  : SP: r rP   Po:rP:oPgrrg oo rrorPggoaagmPrmgrrr ra raoaimoiamgsm gmr  r siiai ssmaT si  e  mTsTiTT eser reTe rimmreriTsernm mmarTiitimimeeindnnnrnaaaittm.taaeinete datd.dtB.e.n ede  eBd.cBa.Bae  tuBeceseeBaddcec ce.a.uayacusuu  esasBseB ee ueyo sceycoaeyu oaouyuuu o ssy tuu eeotr  t  urittryy ierriootediru eeduidd  i te
  ttdtootter o at  iloaredaol lid l oale tlaclodtoolalc o cltoat aaoectoaltc ae lleatt alo toetloctel oocao ototcooooa at
  otmtemmm eae aaam n tnnnatytoyyyno oo   yomo mmm  e meeemmmmammmeaoanooomnrnyrrroyyy yyyr   mymme e  r mrreemrroeeergggogimiriooeiyoyonrgo n nsyinrsrs. ose.e.

Any idea on how to avoid this crash? Matrices are not small, so multiplication definitely benefit from BLAS libraries.

Hi @stavoltafunzia,

That error message looks like a jumbled up version of:

BLAS : Program is Terminated. Because you tried to allocate too many memory regions.

Think this comes from OpenBLAS, perhaps raise it on their forum/issue tracker?

Hope this helps?

Hi, thanks, the error message really helps.
Apparently is something already known: https://github.com/xianyi/OpenBLAS/issues/539 and https://github.com/xianyi/OpenBLAS/issues/1882.

Practically it means I don’t have to use more threads than the number that was defined at build time.

No problem, glad you’ve got it sorted out!