Hi guys,
I’m trying to parallelize some calculations which involve matrix multiplication. Thus, I’m using numpy.dot within a prange loop. However the code crashes with kind of segfault.
The code works perfectly in pure Python, as well when compiled with parallel=False. Only with parallel=True it segfaults. I have the feeling this is due to calling some BLAS function that is not thread safe, even if it sounds weird since nowadays all BLAS should be thread safe. The error happens usually when the multithreading involves quite some threads (30+, on a 46 cores machine).
This is some output from numba -s that could be useful:
__SVML Information__
SVML State, config.USING_SVML : True
SVML Library Loaded : True
llvmlite Using SVML Patched LLVM : True
SVML Operational : True
__Threading Layer Information__
TBB Threading Layer Available : True
+-->TBB imported successfully.
OpenMP Threading Layer Available : True
+-->Vendor: MS
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.
Moreover when it crashes some BLAS error message is printed, but since it’s a concurrent print I cannot understand it:
BBBLLBLLABABASLSAASS BBL: SLAL AA:P: S:S : SP: r rP Po:rP:oPgrrg oo rrorPggoaagmPrmgrrr ra raoaimoiamgsm gmr r siiai ssmaT si e mTsTiTT eser reTe rimmreriTsernm mmarTiitimimeeindnnnrnaaaittm.taaeinete datd.dtB.e.n ede eBd.cBa.Bae tuBeceseeBaddcec ce.a.uayacusuu esasBseB ee ueyo sceycoaeyu oaouyuuu o ssy tuu eeotr t urittryy ierriootediru eeduidd i te
ttdtootter o at iloaredaol lid l oale tlaclodtoolalc o cltoat aaoectoaltc ae lleatt alo toetloctel oocao ototcooooa at
otmtemmm eae aaam n tnnnatytoyyyno oo yomo mmm e meeemmmmammmeaoanooomnrnyrrroyyy yyyr mymme e r mrreemrroeeergggogimiriooeiyoyonrgo n nsyinrsrs. ose.e.
Any idea on how to avoid this crash? Matrices are not small, so multiplication definitely benefit from BLAS libraries.