Concurrent np.dot (matrix multiplications) crash

stavoltafunzia · October 9, 2020, 1:20pm

Hi guys,

I’m trying to parallelize some calculations which involve matrix multiplication. Thus, I’m using numpy.dot within a prange loop. However the code crashes with kind of segfault.

The code works perfectly in pure Python, as well when compiled with parallel=False. Only with parallel=True it segfaults. I have the feeling this is due to calling some BLAS function that is not thread safe, even if it sounds weird since nowadays all BLAS should be thread safe. The error happens usually when the multithreading involves quite some threads (30+, on a 46 cores machine).

This is some output from numba -s that could be useful:

__SVML Information__
SVML State, config.USING_SVML                 : True
SVML Library Loaded                           : True
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : True

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : True
+-->Vendor: MS
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

Moreover when it crashes some BLAS error message is printed, but since it’s a concurrent print I cannot understand it:

BBBLLBLLABABASLSAASS BBL: SLAL   AA:P: S:S  : SP: r rP   Po:rP:oPgrrg oo rrorPggoaagmPrmgrrr ra raoaimoiamgsm gmr  r siiai ssmaT si  e  mTsTiTT eser reTe rimmreriTsernm mmarTiitimimeeindnnnrnaaaittm.taaeinete datd.dtB.e.n ede  eBd.cBa.Bae  tuBeceseeBaddcec ce.a.uayacusuu  esasBseB ee ueyo sceycoaeyu oaouyuuu o ssy tuu eeotr  t  urittryy ierriootediru eeduidd  i te
  ttdtootter o at  iloaredaol lid l oale tlaclodtoolalc o cltoat aaoectoaltc ae lleatt alo toetloctel oocao ototcooooa at
  otmtemmm eae aaam n tnnnatytoyyyno oo   yomo mmm  e meeemmmmammmeaoanooomnrnyrrroyyy yyyr   mymme e  r mrreemrroeeergggogimiriooeiyoyonrgo n nsyinrsrs. ose.e.

Any idea on how to avoid this crash? Matrices are not small, so multiplication definitely benefit from BLAS libraries.

stuartarchibald · October 12, 2020, 9:54pm

Hi @stavoltafunzia,

That error message looks like a jumbled up version of:

BLAS : Program is Terminated. Because you tried to allocate too many memory regions.

Think this comes from OpenBLAS, perhaps raise it on their forum/issue tracker?

Hope this helps?

stavoltafunzia · October 13, 2020, 8:41am

Hi, thanks, the error message really helps.
Apparently is something already known: https://github.com/xianyi/OpenBLAS/issues/539 and https://github.com/xianyi/OpenBLAS/issues/1882.

Practically it means I don’t have to use more threads than the number that was defined at build time.

stuartarchibald · October 13, 2020, 9:22am

No problem, glad you’ve got it sorted out!

Topic		Replies	Views
Potential bug using np.inv with parallel accelerator Community Support	2	195	August 10, 2023
Python kernel crashes w/o error message Support: What is this error message?	4	165	August 11, 2023
Advice in parallelizing Support: How do I do ...?	2	1317	September 5, 2022
Omp use in manylinux wheels Community Support	5	176	November 7, 2023
Would this result in a race condition? Community Support	5	240	August 31, 2021

Concurrent np.dot (matrix multiplications) crash

Related Topics