[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Wed Mar 3 14:19:27 CST 2021


Just to add something I didn't knew before: I got an answer from intel 
and tried the mkl_link_tool script shipped with mkl (into OneAPI package):

$MKLROOT/bin/intel64/mkl_link_tool -libs --compiler=gnu_c --arch=intel64 
--linking=dynamic --parallel=yes --interface=lp64 
--threading-library=gomp --mpi=openmpi --cluster_library=scalapack

     Intel(R) oneAPI Math Kernel Library (oneMKL) Link Tool v6.1


Linking line:
  -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -Wl,--no-as-needed 
-lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lmkl_blacs_openmpi_lp64 
-lgomp -lpthread -lm -ldl

I don't know if this is intersting for PETSc configuration scripts to 
try to use this or not, but it exists!



On 2021-03-03 11:13 a.m., Barry Smith wrote:
>>> 2) I can reproduce the src/mat/tests/ex242.c error (which explicitly 
>>> uses ScaLAPACK, none of the above PC uses it explicitly, except 
>>> PCBDDC/PCHPDDM when using MUMPS on “big” problems where root nodes 
>>> are factorized using ScaLAPACK, see -mat_mumps_icntl_13)
>>> 3) I’m seeing that both on your machine and mine, PETSc BuildSystem 
>>> insist on linking libmkl_blacs_intelmpi_lp64.so even though we 
>>> supply explicitly libmkl_blacs_openmpi_lp64.so
>>> This for example yields a wrong Makefile.inc for MUMPS:
>>> $ cat 
>>> arch-linux2-c-opt-ompi/externalpackages/MUMPS_5.3.5/Makefile.inc|grep 
>>> blacs
>>> SCALAP  = […] -lmkl_blacs_openmpi_lp64
>>> LIBBLAS = […] -lmkl_blacs_intelmpi_lp64 -lgomp -ldl -lpthread -lm […]
>>> Despite what Barry says, I think PETSc is partially to blame as well 
>>> (why use libmkl_blacs_intelmpi_lp64.so even though BuildSystem is 
>>> capable of detecting we are using OpenMPI).
>>> I’ll try to fix this to see if it solves 2).
>> Okay, that's a very nice finding!!!  Hope it will be "fixable" easily!
> The knowledge is there but the information may not be trivially 
> available to make the right decisions. Parts of the BLAS/LAPACK checks 
> use the "check everything" approach. For example
> # Look for Multi-Threaded MKL for MKL_C/Pardiso
>       useCPardiso=0
>       usePardiso=0
>       if self.argDB['with-mkl_cpardiso'] or 'with-mkl_cpardiso-dir' in 
> self.argDB or 'with-mkl_cpardiso-lib' in self.argDB:
>         useCPardiso=1
> mkl_blacs_64=[['mkl_blacs_intelmpi'+ILP64+''],['mkl_blacs_mpich'+ILP64+''],['mkl_blacs_sgimpt'+ILP64+''],['mkl_blacs_openmpi'+ILP64+'']]
> mkl_blacs_32=[['mkl_blacs_intelmpi'],['mkl_blacs_mpich'],['mkl_blacs_sgimpt'],['mkl_blacs_openmpi']]
>       elif self.argDB['with-mkl_pardiso'] or 'with-mkl_pardiso-dir' in 
> self.argDB or 'with-mkl_pardiso-lib' in self.argDB:
>         usePardiso=1
>         mkl_blacs_64=[[]]
>         mkl_blacs_32=[[]]
>       if useCPardiso or usePardiso:
>         self.logPrintBox('BLASLAPACK: Looking for Multithreaded MKL 
> for C/Pardiso')
>         for libdir in 
> [os.path.join('lib','64'),os.path.join('lib','ia64'),os.path.join('lib','em64t'),os.path.join('lib','intel64'),'lib','64','ia64','em64t','intel64',
>  os.path.join('lib','32'),os.path.join('lib','ia32'),'32','ia32','']:
>           if not os.path.exists(os.path.join(dir,libdir)):
>             self.logPrint('MKL Path not found.. skipping: 
> '+os.path.join(dir,libdir))
>           else:
>             self.log.write('Files and directories in that 
> directory:\n'+str(os.listdir(os.path.join(dir,libdir)))+'\n')
>             #  iomp5 is provided by the Intel compilers on MacOS. Run 
> source /opt/intel/bin/compilervars.sh intel64 to have it added to 
>             #  then locate libimp5.dylib in the LIBRARY_PATH and copy 
> it to os.path.join(dir,libdir)
>             for i in mkl_blacs_64:
>               yield ('User specified MKL-C/Pardiso Intel-Linux64', 
> None, 
> [os.path.join(dir,libdir,'libmkl_intel'+ILP64+'.a'),'mkl_core','mkl_intel_thread']+i+['iomp5','dl','pthread'],known,'yes')
>               yield ('User specified MKL-C/Pardiso GNU-Linux64', None, 
> [os.path.join(dir,libdir,'libmkl_intel'+ILP64+'.a'),'mkl_core','mkl_gnu_thread']+i+['gomp','dl','pthread'],known,'yes')
>               yield ('User specified MKL-Pardiso Intel-Windows64', 
> None, 
> [os.path.join(dir,libdir,'mkl_core.lib'),'mkl_intel'+ILP64+'.lib','mkl_intel_thread.lib']+i+['libiomp5md.lib'],known,'yes')
>             for i in mkl_blacs_32:
>               yield ('User specified MKL-C/Pardiso Intel-Linux32', 
> None, 
> [os.path.join(dir,libdir,'libmkl_intel.a'),'mkl_core','mkl_intel_thread']+i+['iomp5','dl','pthread'],'32','yes')
>               yield ('User specified MKL-C/Pardiso GNU-Linux32', None, 
> [os.path.join(dir,libdir,'libmkl_intel.a'),'mkl_core','mkl_gnu_thread']+i+['gomp','dl','pthread'],'32','yes')
>               yield ('User specified MKL-Pardiso Intel-Windows32', 
> None, 
> [os.path.join(dir,libdir,'mkl_core.lib'),'mkl_intel_c.lib','mkl_intel_thread.lib']+i+['libiomp5md.lib'],'32','yes')
>         return
> The assumption is that the link will fail unless the correct libraries 
> are in the list. But apparently this is not the case; it returns the 
> first case that links but that case does not run which is why it 
> appears to be producing "silly" results.
> If you set the right MPI and threading library, at these locations 
> instead of trying all of them it might resolve the problems.
>     if self.openmp.found:
>       ITHREAD='intel_thread'
>       ITHREADGNU='gnu_thread'
>       ompthread = 'yes'
>     else:
>       ITHREAD='sequential'
>       ITHREADGNU='sequential'
>       ompthread = 'no'
> mkl_blacs_64=[['mkl_blacs_intelmpi'+ILP64+''],['mkl_blacs_mpich'+ILP64+''],['mkl_blacs_sgimpt'+ILP64+''],['mkl_blacs_openmpi'+ILP64+'']]
> mkl_blacs_32=[['mkl_blacs_intelmpi'],['mkl_blacs_mpich'],['mkl_blacs_sgimpt'],['mkl_blacs_openmpi']]
>> On Mar 3, 2021, at 8:22 AM, Eric Chamberland 
>> <Eric.Chamberland at giref.ulaval.ca 
>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>> Hi Pierre,
>> On 2021-03-03 2:42 a.m., Pierre Jolivet wrote:
>>>> If it ends that there is a problem combining MKL + openMP that 
>>>> relies on linking configuration for example, should it be a good 
>>>> thing to have this (--with-openmp=1) tested into the pipelines 
>>>> (with external packages of course)?
>>> As Barry said, there is not much (if any) OpenMP in PETSc.
>>> There is however some workers with the MKL (+ Intel compilers) 
>>> turned on, but I don’t think we test MKL + GNU compilers (which I 
>>> feel like is a very niche combination, hence not really worth 
>>> testing, IMHO).
>> Ouch, this is my almost my personal working configuration and for 
>> most of our users too... and it worked well until I activated the 
>> OpenMP thing...
>> We had good reasons to work with g++ or clang++ instead of intel 
>> compilers:
>> - It is mandatory to pay to work with an intel compiler (didn't 
>> looked at OneAPI licensing yet, but it may have changed?)
>> - No support of Intel compilers with iceccd (slow recompilation)
>> - MKL was freely distributed, so it can be used with any compiler
>> That doesn't mean we don't want to use intel compiler, but maybe we 
>> just want to to a specific delivery with it but continue to develop 
>> with g++ or clang++ (my personal choice).
>> But I understand it is less straightforward to combine gcc and MKL 
>> than using native Intel tool-chain....
>     I agree the MKL + GNU compilers is commonly used and should be 
> tested and maintained in PETSc.
>>>> Does the guys who maintain all these libs are reading petsc-dev? ;)
>>> I don’t think they are, but don’t worry, we do forward the 
>>> appropriate messages to them :)
>> :)
>>> About yesterday’s failures…
>>> 1) I cannot reproduce any of the PCHYPRE/PCBDDC/PCHPDDM errors 
>>> (sorry I didn’t bother putting the SuperLU_DIST tarball on my cluster)
>> Hmmm, maybe my environment variables may play a role into this?
>> for comparisons considerations, we explicitly set:
>> export MKL_NUM_THREADS=1
>> but it would be surprising it helps reproduce a problem: they usually 
>> stabilize results...
>> Merci,
>> Eric
>>> Thanks,
>>> Pierre
>>> http://joliv.et/irene-rome-configure.log 
>>> <http://joliv.et/irene-rome-configure.log>
>>> $ /usr/bin/gmake -f gmakefile test test-fail=1
>>> Using MAKEFLAGS: test-fail=1
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>>>  ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>>>  ok ksp_ksp_tutorials-ex50_tut_2 # SKIP PETSC_HAVE_SUPERLU_DIST 
>>> requirement not met
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>>>  ok snes_tutorials-ex56_hypre
>>>  ok diff-snes_tutorials-ex56_hypre
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>>>  ok snes_tutorials-ex17_3d_q3_trig_elas
>>>  ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>>>  ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>>>  ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>  ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>>>  ok snes_tutorials-ex19_tut_3
>>>  ok diff-snes_tutorials-ex19_tut_3
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/mat_tests-ex242_3.counts
>>> not ok mat_tests-ex242_3 # Error code: 137
>>> #[1]PETSC ERROR: 
>>> ------------------------------------------------------------------------
>>> #[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation 
>>> Violation, probably memory access out of range
>>> #[1]PETSC ERROR: Try option -start_in_debugger or 
>>> -on_error_attach_debugger
>>> #[1]PETSC ERROR: or see 
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind 
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>> #[1]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> 
>>> on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> #[1]PETSC ERROR: configure using --with-debugging=yes, recompile, 
>>> link, and run
>>> #[1]PETSC ERROR: to get more information on the crash.
>>> #[1]PETSC ERROR: --------------------- Error Message 
>>> --------------------------------------------------------------
>>> #[1]PETSC ERROR: Signal received
>>> #[1]PETSC ERROR: See 
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html 
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble 
>>> shooting.
>>> #[1]PETSC ERROR: Petsc Development GIT revision: 
>>> v3.14.4-733-g7ab9467ef9  GIT Date: 2021-03-02 16:15:11 +0000
>>> #[2]PETSC ERROR: 
>>> ------------------------------------------------------------------------
>>> #[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation 
>>> Violation, probably memory access out of range
>>> #[2]PETSC ERROR: Try option -start_in_debugger or 
>>> -on_error_attach_debugger
>>> #[2]PETSC ERROR: or see 
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind 
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>> #[2]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> 
>>> on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> #[2]PETSC ERROR: configure using --with-debugging=yes, recompile, 
>>> link, and run
>>> #[2]PETSC ERROR: to get more information on the crash.
>>> #[2]PETSC ERROR: --------------------- Error Message 
>>> --------------------------------------------------------------
>>> #[2]PETSC ERROR: Signal received
>>> #[2]PETSC ERROR: See 
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html 
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble 
>>> shooting.
>>> #[2]PETSC ERROR: Petsc Development GIT revision: 
>>> v3.14.4-733-g7ab9467ef9  GIT Date: 2021-03-02 16:15:11 +0000
>>> #[2]PETSC ERROR: 
>>> /ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-ompi/tests/mat/tests/runex242_3/../ex242 
>>> on a arch-linux2-c-opt-ompi named irene4047 by jolivetp Wed Mar  3 
>>> 08:21:20 2021
>>> #[2]PETSC ERROR: Configure options --download-hpddm 
>>> --download-hpddm-commit=origin/main --download-hypre 
>>> --download-metis --download-mumps --download-parmetis 
>>> --download-ptscotch --download-slepc 
>>> --download-slepc-commit=origin/main --download-tetgen 
>>> --known-mpi-c-double-complex --known-mpi-int64_t 
>>> --known-mpi-long-double --with-avx512-kernels=1 
>>> --with-blaslapack-dir=/ccc/products/mkl- 
>>> --with-cc=mpicc --with-cxx=mpicxx --with-debugging=0 
>>> --with-fc=mpifort --with-fortran-bindings=0 --with-make-np=40 
>>> --with-mkl_cpardiso-dir=/ccc/products/mkl- 
>>> --with-mkl_cpardiso=1 
>>> --with-mkl_pardiso-dir=/ccc/products/mkl- 
>>> --with-mkl_pardiso=1 --with-mpiexec=ccc_mprun --with-openmp=1 
>>> --with-packages-download-dir=/ccc/cont003/home/enseeiht/jolivetp/Dude/externalpackages/ 
>>> --with-scalapack-include=/ccc/products/mkl- 
>>> --with-scalapack-lib="[/ccc/products/mkl-,/ccc/products/mkl-]" 
>>> --with-scalar-type=real --with-x=0 COPTFLAGS="-O3 -fp-model fast 
>>> -mavx2" CXXOPTFLAGS="-O3 -fp-model fast -mavx2" FOPTFLAGS="-O3 
>>> -fp-model fast -mavx2" PETSC_ARCH=arch-linux2-c-opt-ompi
>>> #[2]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>>> #[2]PETSC ERROR: Run with -malloc_debug to check if memory 
>>> corruption is causing the crash.
>>> #[1]PETSC ERROR: 
>>> /ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-ompi/tests/mat/tests/runex242_3/../ex242 
>>> on a arch-linux2-c-opt-ompi named irene4047 by jolivetp Wed Mar  3 
>>> 08:21:20 2021
>>> #[1]PETSC ERROR: Configure options --download-hpddm 
>>> --download-hpddm-commit=origin/main --download-hypre 
>>> --download-metis --download-mumps --download-parmetis 
>>> --download-ptscotch --download-slepc 
>>> --download-slepc-commit=origin/main --download-tetgen 
>>> --known-mpi-c-double-complex --known-mpi-int64_t 
>>> --known-mpi-long-double --with-avx512-kernels=1 
>>> --with-blaslapack-dir=/ccc/products/mkl- 
>>> --with-cc=mpicc --with-cxx=mpicxx --with-debugging=0 
>>> --with-fc=mpifort --with-fortran-bindings=0 --with-make-np=40 
>>> --with-mkl_cpardiso-dir=/ccc/products/mkl- 
>>> --with-mkl_cpardiso=1 
>>> --with-mkl_pardiso-dir=/ccc/products/mkl- 
>>> --with-mkl_pardiso=1 --with-mpiexec=ccc_mprun --with-openmp=1 
>>> --with-packages-download-dir=/ccc/cont003/home/enseeiht/jolivetp/Dude/externalpackages/ 
>>> --with-scalapack-include=/ccc/products/mkl- 
>>> --with-scalapack-lib="[/ccc/products/mkl-,/ccc/products/mkl-]" 
>>> --with-scalar-type=real --with-x=0 COPTFLAGS="-O3 -fp-model fast 
>>> -mavx2" CXXOPTFLAGS="-O3 -fp-model fast -mavx2" FOPTFLAGS="-O3 
>>> -fp-model fast -mavx2" PETSC_ARCH=arch-linux2-c-opt-ompi
>>> #[1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>>> #[1]PETSC ERROR: Run with -malloc_debug to check if memory 
>>> corruption is causing the crash.
>>> #--------------------------------------------------------------------------
>>> #MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
>>> #with errorcode 50176059.
>>> #
>>> #NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>> #You may or may not see output from other processes, depending on
>>> #exactly when Open MPI kills them.
>>> #--------------------------------------------------------------------------
>>> #--------------------------------------------------------------------------
>>> #MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
>>> #with errorcode 50176059.
>>> #
>>> #NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>> #You may or may not see output from other processes, depending on
>>> #exactly when Open MPI kills them.
>>> #--------------------------------------------------------------------------
>>> #srun: Job step aborted: Waiting up to 302 seconds for job step to 
>>> finish.
>>> #slurmstepd-irene4047: error: *** STEP 1374176.36 ON irene4047 
>>> CANCELLED AT 2021-03-03T08:21:20 ***
>>> #srun: error: irene4047: task 0: Killed
>>> #srun: error: irene4047: tasks 1-2: Exited with exit code 16
>>>  ok mat_tests-ex242_3 # SKIP Command failed so no diff
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>>>  ok snes_tutorials-ex17_3d_q3_trig_vlap
>>>  ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>>>  ok snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>  ok 
>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>>>  ok ksp_ksp_tutorials-ex49_hypre_nullspace
>>>  ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/ts_tutorials-ex18_p1p1_xper_ref.counts
>>>  ok ts_tutorials-ex18_p1p1_xper_ref
>>>  ok diff-ts_tutorials-ex18_p1p1_xper_ref
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/ts_tutorials-ex18_p1p1_xyper_ref.counts
>>>  ok ts_tutorials-ex18_p1p1_xyper_ref
>>>  ok diff-ts_tutorials-ex18_p1p1_xyper_ref
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>>>  ok snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>  ok 
>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>         TEST 
>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>>>  ok ksp_ksp_tutorials-ex64_1 # SKIP PETSC_HAVE_SUPERLU_DIST 
>>> requirement not met
>>>> On 3 Mar 2021, at 6:21 AM, Eric Chamberland 
>>>> <Eric.Chamberland at giref.ulaval.ca 
>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>> Just started a discussion on the side:
>>>> https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-Link-Line-Advisor-as-external-tool/m-p/1260895#M30974
>>>> Eric
>>>> On 2021-03-02 3:50 p.m., Pierre Jolivet wrote:
>>>>> Hello Eric,
>>>>> src/mat/tests/ex237.c is a recent test with some code paths that 
>>>>> should be disabled for “old” MKL versions. It’s tricky to check 
>>>>> directly in the source (we do check in BuildSystem) because there 
>>>>> is no such thing as PETSC_PKG_MKL_VERSION_LT, but I guess we can 
>>>>> change if defined(PETSC_HAVE_MKL) to if defined(PETSC_HAVE_MKL) && 
>>>>> defined(PETSC_HAVE_MKL_SPARSE_OPTIMIZE), I’ll make a MR, thanks 
>>>>> for reporting this.
>>>>> For the other issues, I’m sensing this is a problem with gomp + 
>>>>> intel_gnu_thread, but this is pure speculation… sorry.
>>>>> I’ll try to reproduce some of these problems if you are not given 
>>>>> a more meaningful answer.
>>>>> Thanks,
>>>>> Pierre
>>>>>> On 2 Mar 2021, at 9:14 PM, Eric Chamberland 
>>>>>> <Eric.Chamberland at giref.ulaval.ca 
>>>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>> Hi,
>>>>>> It all started when I wanted to test PETSC/CUDA compatibility for 
>>>>>> our code.
>>>>>> I had to activate --with-openmp to configure with --with-cuda=1 
>>>>>> successfully.
>>>>>> I then saw that PETSC_HAVE_OPENMP  is used at least in MUMPS (and 
>>>>>> some other places).
>>>>>> So, I configured and tested petsc with openmp activated, without 
>>>>>> CUDA.
>>>>>> The first thing I see is that our code CI pipelines now fails for 
>>>>>> many tests.
>>>>>> After looking deeper, it seems that PETSc itself fails many tests 
>>>>>> when I activate openmp!
>>>>>> Here are all the configurations I have results for, after/before 
>>>>>> activating OpenMP for PETSc:
>>>>>> ==============================================================================
>>>>>> ==============================================================================
>>>>>> For petsc/master + OpenMPI 4.0.4 + MKL 2019.4.243:
>>>>>> With OpenMP=1
>>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log
>>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log
>>>>>> # -------------
>>>>>> #   Summary
>>>>>> # -------------
>>>>>> # FAILED snes_tutorials-ex12_quad_hpddm_reuse_baij diff-ksp_ksp_tests-ex33_superlu_dist_2 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 ksp_ksp_tutorials-ex50_tut_2 diff-ksp_ksp_tests-ex33_superlu_dist diff-snes_tutorials-ex56_hypre snes_tutorials-ex17_3d_q3_trig_elas snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij ksp_ksp_tutorials-ex5_superlu_dist_3 ksp_ksp_tutorials-ex5f_superlu_dist snes_tutorials-ex12_tri_parmetis_hpddm_baij diff-snes_tutorials-ex19_tut_3 mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex19_superlu_dist diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre diff-ksp_ksp_tutorials-ex49_hypre_nullspace ts_tutorials-ex18_p1p1_xper_ref ts_tutorials-ex18_p1p1_xyper_ref snes_tutorials-ex19_superlu_dist_2 ksp_ksp_tutorials-ex5_superlu_dist_2 diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre ksp_ksp_tutorials-ex64_1 ksp_ksp_tutorials-ex5_superlu_dist ksp_ksp_tutorials-ex5f_superlu_dist_2
>>>>>> # success 8275/10003 tests (82.7%)
>>>>>> #*failed 33/10003*  tests (0.3%)
>>>>>> With OpenMP=0
>>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log
>>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log
>>>>>> # -------------
>>>>>> #   Summary
>>>>>> # -------------
>>>>>> # FAILED tao_constrained_tutorials-tomographyADMM_6 snes_tutorials-ex17_3d_q3_trig_elas mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_5
>>>>>> # success 8262/9983 tests (82.8%)
>>>>>> #*failed 6/9983*  tests (0.1%)
>>>>>> ==============================================================================
>>>>>> ==============================================================================
>>>>>> For OpenMPI 3.1.x/master:
>>>>>> With OpenMP=1:
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log
>>>>>> # -------------
>>>>>> #   Summary
>>>>>> # -------------
>>>>>> # FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex5_superlu_dist_3 diff-ksp_ksp_tutorials-ex49_hypre_nullspace ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex17_3d_q3_trig_vlap diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre diff-snes_tutorials-ex19_tut_3 diff-snes_tutorials-ex56_hypre diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_4 tao_constrained_tutorials-tomographyADMM_6 diff-tao_constrained_tutorials-toyf_1
>>>>>> # success 8142/9765 tests (83.4%)
>>>>>> #*failed 16/9765*  tests (0.2%)
>>>>>> With OpenMP=0:
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_make_test.log
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_configure.log
>>>>>> # -------------
>>>>>> #   Summary
>>>>>> # -------------
>>>>>> # FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex56_2 snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_4 diff-tao_constrained_tutorials-toyf_1
>>>>>> # success 8151/9767 tests (83.5%)
>>>>>> #*failed 9/9767*  tests (0.1%)
>>>>>> ==============================================================================
>>>>>> ==============================================================================
>>>>>> For OpenMPI 4.0.x/master:
>>>>>> With OpenMP=1:
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_make_test.log
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_configure.log
>>>>>> # FAILED snes_tutorials-ex17_3d_q3_trig_elas snes_tutorials-ex19_hypre ksp_ksp_tutorials-ex56_2 tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_5 mat_tests-ex242_3 ksp_ksp_tutorials-ex55_hypre ksp_ksp_tutorials-ex5_superlu_dist_2 tao_constrained_tutorials-tomographyADMM_6 snes_tutorials-ex56_hypre snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre ksp_ksp_tutorials-ex5f_superlu_dist_3 ksp_ksp_tutorials-ex34_hyprestruct diff-ksp_ksp_tutorials-ex49_hypre_nullspace snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre ksp_ksp_tutorials-ex5f_superlu_dist ksp_ksp_tutorials-ex5f_superlu_dist_2 ksp_ksp_tutorials-ex5_superlu_dist snes_tutorials-ex19_tut_3 snes_tutorials-ex19_superlu_dist ksp_ksp_tutorials-ex50_tut_2 snes_tutorials-ex17_3d_q3_trig_vlap ksp_ksp_tutorials-ex5_superlu_dist_3 snes_tutorials-ex19_superlu_dist_2 tao_constrained_tutorials-tomographyADMM_4 ts_tutorials-ex26_2
>>>>>> # success 8125/9753 tests (83.3%)
>>>>>> #*failed 26/9753*  tests (0.3%)
>>>>>> With OpenMP=0
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_make_test.log
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_configure.log
>>>>>> # FAILED mat_tests-ex242_3
>>>>>> # success 8174/9777 tests (83.6%)
>>>>>> #*failed 1/9777*  tests (0.0%)
>>>>>> ==============================================================================
>>>>>> ==============================================================================
>>>>>> Is that known and normal?
>>>>>> In all cases, I am using MKL and I suspect it may come from 
>>>>>> there... :/
>>>>>> I also saw a second problem, "make test" fails to compile petsc 
>>>>>> examples on older versions of MKL (but that's less important for 
>>>>>> me, I just upgraded to OneAPI to avoid this, but you may want to 
>>>>>> know):
>>>>>> https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_make_test.log
>>>>>> https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_configure.log
>>>>>> Thanks,
>>>>>> Eric
>>>>>> -- 
>>>>>> Eric Chamberland, ing., M. Ing
>>>>>> Professionnel de recherche
>>>>>> GIREF/Université Laval
>>>>>> (418) 656-2131 poste 41 22 42
>>>> -- 
>>>> Eric Chamberland, ing., M. Ing
>>>> Professionnel de recherche
>>>> GIREF/Université Laval
>>>> (418) 656-2131 poste 41 22 42
>> -- 
>> Eric Chamberland, ing., M. Ing
>> Professionnel de recherche
>> GIREF/Université Laval
>> (418) 656-2131 poste 41 22 42
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210303/170cd582/attachment-0001.html>

More information about the petsc-dev mailing list