[petsc-dev] Petsc "make test" have more failures for --with-openmp=1
Eric Chamberland
Eric.Chamberland at giref.ulaval.ca
Wed Mar 3 14:19:27 CST 2021
Hi,
Just to add something I didn't knew before: I got an answer from intel
and tried the mkl_link_tool script shipped with mkl (into OneAPI package):
$MKLROOT/bin/intel64/mkl_link_tool -libs --compiler=gnu_c --arch=intel64
--linking=dynamic --parallel=yes --interface=lp64
--threading-library=gomp --mpi=openmpi --cluster_library=scalapack
Intel(R) oneAPI Math Kernel Library (oneMKL) Link Tool v6.1
==========================================================
Output
======
Linking line:
-L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -Wl,--no-as-needed
-lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lmkl_blacs_openmpi_lp64
-lgomp -lpthread -lm -ldl
I don't know if this is intersting for PETSc configuration scripts to
try to use this or not, but it exists!
(see
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-Link-Line-Advisor-as-external-tool/m-p/1260895#M30974
)
Eric
On 2021-03-03 11:13 a.m., Barry Smith wrote:
>
>>> 2) I can reproduce the src/mat/tests/ex242.c error (which explicitly
>>> uses ScaLAPACK, none of the above PC uses it explicitly, except
>>> PCBDDC/PCHPDDM when using MUMPS on “big” problems where root nodes
>>> are factorized using ScaLAPACK, see -mat_mumps_icntl_13)
>>> 3) I’m seeing that both on your machine and mine, PETSc BuildSystem
>>> insist on linking libmkl_blacs_intelmpi_lp64.so even though we
>>> supply explicitly libmkl_blacs_openmpi_lp64.so
>>> This for example yields a wrong Makefile.inc for MUMPS:
>>> $ cat
>>> arch-linux2-c-opt-ompi/externalpackages/MUMPS_5.3.5/Makefile.inc|grep
>>> blacs
>>> SCALAP = […] -lmkl_blacs_openmpi_lp64
>>> LIBBLAS = […] -lmkl_blacs_intelmpi_lp64 -lgomp -ldl -lpthread -lm […]
>>>
>>> Despite what Barry says, I think PETSc is partially to blame as well
>>> (why use libmkl_blacs_intelmpi_lp64.so even though BuildSystem is
>>> capable of detecting we are using OpenMPI).
>>> I’ll try to fix this to see if it solves 2).
>>
>> Okay, that's a very nice finding!!! Hope it will be "fixable" easily!
>>
>>
> The knowledge is there but the information may not be trivially
> available to make the right decisions. Parts of the BLAS/LAPACK checks
> use the "check everything" approach. For example
>
> # Look for Multi-Threaded MKL for MKL_C/Pardiso
> useCPardiso=0
> usePardiso=0
> if self.argDB['with-mkl_cpardiso'] or 'with-mkl_cpardiso-dir' in
> self.argDB or 'with-mkl_cpardiso-lib' in self.argDB:
> useCPardiso=1
> mkl_blacs_64=[['mkl_blacs_intelmpi'+ILP64+''],['mkl_blacs_mpich'+ILP64+''],['mkl_blacs_sgimpt'+ILP64+''],['mkl_blacs_openmpi'+ILP64+'']]
> mkl_blacs_32=[['mkl_blacs_intelmpi'],['mkl_blacs_mpich'],['mkl_blacs_sgimpt'],['mkl_blacs_openmpi']]
> elif self.argDB['with-mkl_pardiso'] or 'with-mkl_pardiso-dir' in
> self.argDB or 'with-mkl_pardiso-lib' in self.argDB:
> usePardiso=1
> mkl_blacs_64=[[]]
> mkl_blacs_32=[[]]
> if useCPardiso or usePardiso:
> self.logPrintBox('BLASLAPACK: Looking for Multithreaded MKL
> for C/Pardiso')
> for libdir in
> [os.path.join('lib','64'),os.path.join('lib','ia64'),os.path.join('lib','em64t'),os.path.join('lib','intel64'),'lib','64','ia64','em64t','intel64',
> os.path.join('lib','32'),os.path.join('lib','ia32'),'32','ia32','']:
> if not os.path.exists(os.path.join(dir,libdir)):
> self.logPrint('MKL Path not found.. skipping:
> '+os.path.join(dir,libdir))
> else:
> self.log.write('Files and directories in that
> directory:\n'+str(os.listdir(os.path.join(dir,libdir)))+'\n')
> # iomp5 is provided by the Intel compilers on MacOS. Run
> source /opt/intel/bin/compilervars.sh intel64 to have it added to
> LIBRARY_PATH
> # then locate libimp5.dylib in the LIBRARY_PATH and copy
> it to os.path.join(dir,libdir)
> for i in mkl_blacs_64:
> yield ('User specified MKL-C/Pardiso Intel-Linux64',
> None,
> [os.path.join(dir,libdir,'libmkl_intel'+ILP64+'.a'),'mkl_core','mkl_intel_thread']+i+['iomp5','dl','pthread'],known,'yes')
> yield ('User specified MKL-C/Pardiso GNU-Linux64', None,
> [os.path.join(dir,libdir,'libmkl_intel'+ILP64+'.a'),'mkl_core','mkl_gnu_thread']+i+['gomp','dl','pthread'],known,'yes')
> yield ('User specified MKL-Pardiso Intel-Windows64',
> None,
> [os.path.join(dir,libdir,'mkl_core.lib'),'mkl_intel'+ILP64+'.lib','mkl_intel_thread.lib']+i+['libiomp5md.lib'],known,'yes')
> for i in mkl_blacs_32:
> yield ('User specified MKL-C/Pardiso Intel-Linux32',
> None,
> [os.path.join(dir,libdir,'libmkl_intel.a'),'mkl_core','mkl_intel_thread']+i+['iomp5','dl','pthread'],'32','yes')
> yield ('User specified MKL-C/Pardiso GNU-Linux32', None,
> [os.path.join(dir,libdir,'libmkl_intel.a'),'mkl_core','mkl_gnu_thread']+i+['gomp','dl','pthread'],'32','yes')
> yield ('User specified MKL-Pardiso Intel-Windows32',
> None,
> [os.path.join(dir,libdir,'mkl_core.lib'),'mkl_intel_c.lib','mkl_intel_thread.lib']+i+['libiomp5md.lib'],'32','yes')
> return
>
> The assumption is that the link will fail unless the correct libraries
> are in the list. But apparently this is not the case; it returns the
> first case that links but that case does not run which is why it
> appears to be producing "silly" results.
>
> If you set the right MPI and threading library, at these locations
> instead of trying all of them it might resolve the problems.
>
> if self.openmp.found:
> ITHREAD='intel_thread'
> ITHREADGNU='gnu_thread'
> ompthread = 'yes'
> else:
> ITHREAD='sequential'
> ITHREADGNU='sequential'
> ompthread = 'no'
>
> mkl_blacs_64=[['mkl_blacs_intelmpi'+ILP64+''],['mkl_blacs_mpich'+ILP64+''],['mkl_blacs_sgimpt'+ILP64+''],['mkl_blacs_openmpi'+ILP64+'']]
> mkl_blacs_32=[['mkl_blacs_intelmpi'],['mkl_blacs_mpich'],['mkl_blacs_sgimpt'],['mkl_blacs_openmpi']]
>
>
>
>
>>
>
>
>> On Mar 3, 2021, at 8:22 AM, Eric Chamberland
>> <Eric.Chamberland at giref.ulaval.ca
>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>
>> Hi Pierre,
>>
>> On 2021-03-03 2:42 a.m., Pierre Jolivet wrote:
>>>>
>>>> If it ends that there is a problem combining MKL + openMP that
>>>> relies on linking configuration for example, should it be a good
>>>> thing to have this (--with-openmp=1) tested into the pipelines
>>>> (with external packages of course)?
>>>>
>>> As Barry said, there is not much (if any) OpenMP in PETSc.
>>> There is however some workers with the MKL (+ Intel compilers)
>>> turned on, but I don’t think we test MKL + GNU compilers (which I
>>> feel like is a very niche combination, hence not really worth
>>> testing, IMHO).
>>
>> Ouch, this is my almost my personal working configuration and for
>> most of our users too... and it worked well until I activated the
>> OpenMP thing...
>>
>> We had good reasons to work with g++ or clang++ instead of intel
>> compilers:
>>
>> - It is mandatory to pay to work with an intel compiler (didn't
>> looked at OneAPI licensing yet, but it may have changed?)
>>
>> - No support of Intel compilers with iceccd (slow recompilation)
>>
>> - MKL was freely distributed, so it can be used with any compiler
>>
>> That doesn't mean we don't want to use intel compiler, but maybe we
>> just want to to a specific delivery with it but continue to develop
>> with g++ or clang++ (my personal choice).
>>
>> But I understand it is less straightforward to combine gcc and MKL
>> than using native Intel tool-chain....
>>
>
> I agree the MKL + GNU compilers is commonly used and should be
> tested and maintained in PETSc.
>>
>>>> Does the guys who maintain all these libs are reading petsc-dev? ;)
>>>>
>>> I don’t think they are, but don’t worry, we do forward the
>>> appropriate messages to them :)
>> :)
>>>
>>> About yesterday’s failures…
>>> 1) I cannot reproduce any of the PCHYPRE/PCBDDC/PCHPDDM errors
>>> (sorry I didn’t bother putting the SuperLU_DIST tarball on my cluster)
>>
>> Hmmm, maybe my environment variables may play a role into this?
>>
>> for comparisons considerations, we explicitly set:
>>
>> export MKL_CBWR=COMPATIBLE
>> export MKL_NUM_THREADS=1
>>
>> but it would be surprising it helps reproduce a problem: they usually
>> stabilize results...
>>
>> Merci,
>>
>> Eric
>>
>>>
>>> Thanks,
>>> Pierre
>>>
>>> http://joliv.et/irene-rome-configure.log
>>> <http://joliv.et/irene-rome-configure.log>
>>> $ /usr/bin/gmake -f gmakefile test test-fail=1
>>> Using MAKEFLAGS: test-fail=1
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>>> ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>>> ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>>> ok ksp_ksp_tutorials-ex50_tut_2 # SKIP PETSC_HAVE_SUPERLU_DIST
>>> requirement not met
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>>> ok snes_tutorials-ex56_hypre
>>> ok diff-snes_tutorials-ex56_hypre
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>>> ok snes_tutorials-ex17_3d_q3_trig_elas
>>> ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>>> ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>> ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>>> ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>> ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>>> ok snes_tutorials-ex19_tut_3
>>> ok diff-snes_tutorials-ex19_tut_3
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/mat_tests-ex242_3.counts
>>> not ok mat_tests-ex242_3 # Error code: 137
>>> #[1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> #[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>> Violation, probably memory access out of range
>>> #[1]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> #[1]PETSC ERROR: or see
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>> #[1]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/>
>>> on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> #[1]PETSC ERROR: configure using --with-debugging=yes, recompile,
>>> link, and run
>>> #[1]PETSC ERROR: to get more information on the crash.
>>> #[1]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> #[1]PETSC ERROR: Signal received
>>> #[1]PETSC ERROR: See
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
>>> shooting.
>>> #[1]PETSC ERROR: Petsc Development GIT revision:
>>> v3.14.4-733-g7ab9467ef9 GIT Date: 2021-03-02 16:15:11 +0000
>>> #[2]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> #[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>> Violation, probably memory access out of range
>>> #[2]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> #[2]PETSC ERROR: or see
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>> #[2]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/>
>>> on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> #[2]PETSC ERROR: configure using --with-debugging=yes, recompile,
>>> link, and run
>>> #[2]PETSC ERROR: to get more information on the crash.
>>> #[2]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> #[2]PETSC ERROR: Signal received
>>> #[2]PETSC ERROR: See
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
>>> shooting.
>>> #[2]PETSC ERROR: Petsc Development GIT revision:
>>> v3.14.4-733-g7ab9467ef9 GIT Date: 2021-03-02 16:15:11 +0000
>>> #[2]PETSC ERROR:
>>> /ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-ompi/tests/mat/tests/runex242_3/../ex242
>>> on a arch-linux2-c-opt-ompi named irene4047 by jolivetp Wed Mar 3
>>> 08:21:20 2021
>>> #[2]PETSC ERROR: Configure options --download-hpddm
>>> --download-hpddm-commit=origin/main --download-hypre
>>> --download-metis --download-mumps --download-parmetis
>>> --download-ptscotch --download-slepc
>>> --download-slepc-commit=origin/main --download-tetgen
>>> --known-mpi-c-double-complex --known-mpi-int64_t
>>> --known-mpi-long-double --with-avx512-kernels=1
>>> --with-blaslapack-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64
>>> --with-cc=mpicc --with-cxx=mpicxx --with-debugging=0
>>> --with-fc=mpifort --with-fortran-bindings=0 --with-make-np=40
>>> --with-mkl_cpardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281
>>> --with-mkl_cpardiso=1
>>> --with-mkl_pardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl
>>> --with-mkl_pardiso=1 --with-mpiexec=ccc_mprun --with-openmp=1
>>> --with-packages-download-dir=/ccc/cont003/home/enseeiht/jolivetp/Dude/externalpackages/
>>> --with-scalapack-include=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/include
>>> --with-scalapack-lib="[/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_scalapack_lp64.so,/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.so]"
>>> --with-scalar-type=real --with-x=0 COPTFLAGS="-O3 -fp-model fast
>>> -mavx2" CXXOPTFLAGS="-O3 -fp-model fast -mavx2" FOPTFLAGS="-O3
>>> -fp-model fast -mavx2" PETSC_ARCH=arch-linux2-c-opt-ompi
>>> #[2]PETSC ERROR: #1 User provided function() line 0 in unknown file
>>> #[2]PETSC ERROR: Run with -malloc_debug to check if memory
>>> corruption is causing the crash.
>>> #[1]PETSC ERROR:
>>> /ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-ompi/tests/mat/tests/runex242_3/../ex242
>>> on a arch-linux2-c-opt-ompi named irene4047 by jolivetp Wed Mar 3
>>> 08:21:20 2021
>>> #[1]PETSC ERROR: Configure options --download-hpddm
>>> --download-hpddm-commit=origin/main --download-hypre
>>> --download-metis --download-mumps --download-parmetis
>>> --download-ptscotch --download-slepc
>>> --download-slepc-commit=origin/main --download-tetgen
>>> --known-mpi-c-double-complex --known-mpi-int64_t
>>> --known-mpi-long-double --with-avx512-kernels=1
>>> --with-blaslapack-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64
>>> --with-cc=mpicc --with-cxx=mpicxx --with-debugging=0
>>> --with-fc=mpifort --with-fortran-bindings=0 --with-make-np=40
>>> --with-mkl_cpardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281
>>> --with-mkl_cpardiso=1
>>> --with-mkl_pardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl
>>> --with-mkl_pardiso=1 --with-mpiexec=ccc_mprun --with-openmp=1
>>> --with-packages-download-dir=/ccc/cont003/home/enseeiht/jolivetp/Dude/externalpackages/
>>> --with-scalapack-include=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/include
>>> --with-scalapack-lib="[/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_scalapack_lp64.so,/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.so]"
>>> --with-scalar-type=real --with-x=0 COPTFLAGS="-O3 -fp-model fast
>>> -mavx2" CXXOPTFLAGS="-O3 -fp-model fast -mavx2" FOPTFLAGS="-O3
>>> -fp-model fast -mavx2" PETSC_ARCH=arch-linux2-c-opt-ompi
>>> #[1]PETSC ERROR: #1 User provided function() line 0 in unknown file
>>> #[1]PETSC ERROR: Run with -malloc_debug to check if memory
>>> corruption is causing the crash.
>>> #--------------------------------------------------------------------------
>>> #MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
>>> #with errorcode 50176059.
>>> #
>>> #NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>> #You may or may not see output from other processes, depending on
>>> #exactly when Open MPI kills them.
>>> #--------------------------------------------------------------------------
>>> #--------------------------------------------------------------------------
>>> #MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
>>> #with errorcode 50176059.
>>> #
>>> #NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>> #You may or may not see output from other processes, depending on
>>> #exactly when Open MPI kills them.
>>> #--------------------------------------------------------------------------
>>> #srun: Job step aborted: Waiting up to 302 seconds for job step to
>>> finish.
>>> #slurmstepd-irene4047: error: *** STEP 1374176.36 ON irene4047
>>> CANCELLED AT 2021-03-03T08:21:20 ***
>>> #srun: error: irene4047: task 0: Killed
>>> #srun: error: irene4047: tasks 1-2: Exited with exit code 16
>>> ok mat_tests-ex242_3 # SKIP Command failed so no diff
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>>> ok snes_tutorials-ex17_3d_q3_trig_vlap
>>> ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>>> ok snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>> ok
>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>>> ok ksp_ksp_tutorials-ex49_hypre_nullspace
>>> ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/ts_tutorials-ex18_p1p1_xper_ref.counts
>>> ok ts_tutorials-ex18_p1p1_xper_ref
>>> ok diff-ts_tutorials-ex18_p1p1_xper_ref
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/ts_tutorials-ex18_p1p1_xyper_ref.counts
>>> ok ts_tutorials-ex18_p1p1_xyper_ref
>>> ok diff-ts_tutorials-ex18_p1p1_xyper_ref
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>>> ok snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>> ok
>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>> TEST
>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>>> ok ksp_ksp_tutorials-ex64_1 # SKIP PETSC_HAVE_SUPERLU_DIST
>>> requirement not met
>>>
>>>> On 3 Mar 2021, at 6:21 AM, Eric Chamberland
>>>> <Eric.Chamberland at giref.ulaval.ca
>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>
>>>> Just started a discussion on the side:
>>>>
>>>> https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-Link-Line-Advisor-as-external-tool/m-p/1260895#M30974
>>>>
>>>> Eric
>>>>
>>>> On 2021-03-02 3:50 p.m., Pierre Jolivet wrote:
>>>>> Hello Eric,
>>>>> src/mat/tests/ex237.c is a recent test with some code paths that
>>>>> should be disabled for “old” MKL versions. It’s tricky to check
>>>>> directly in the source (we do check in BuildSystem) because there
>>>>> is no such thing as PETSC_PKG_MKL_VERSION_LT, but I guess we can
>>>>> change if defined(PETSC_HAVE_MKL) to if defined(PETSC_HAVE_MKL) &&
>>>>> defined(PETSC_HAVE_MKL_SPARSE_OPTIMIZE), I’ll make a MR, thanks
>>>>> for reporting this.
>>>>> For the other issues, I’m sensing this is a problem with gomp +
>>>>> intel_gnu_thread, but this is pure speculation… sorry.
>>>>> I’ll try to reproduce some of these problems if you are not given
>>>>> a more meaningful answer.
>>>>> Thanks,
>>>>> Pierre
>>>>>> On 2 Mar 2021, at 9:14 PM, Eric Chamberland
>>>>>> <Eric.Chamberland at giref.ulaval.ca
>>>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It all started when I wanted to test PETSC/CUDA compatibility for
>>>>>> our code.
>>>>>>
>>>>>> I had to activate --with-openmp to configure with --with-cuda=1
>>>>>> successfully.
>>>>>>
>>>>>> I then saw that PETSC_HAVE_OPENMP is used at least in MUMPS (and
>>>>>> some other places).
>>>>>>
>>>>>> So, I configured and tested petsc with openmp activated, without
>>>>>> CUDA.
>>>>>>
>>>>>> The first thing I see is that our code CI pipelines now fails for
>>>>>> many tests.
>>>>>>
>>>>>> After looking deeper, it seems that PETSc itself fails many tests
>>>>>> when I activate openmp!
>>>>>>
>>>>>> Here are all the configurations I have results for, after/before
>>>>>> activating OpenMP for PETSc:
>>>>>>
>>>>>> ==============================================================================
>>>>>>
>>>>>> ==============================================================================
>>>>>>
>>>>>> For petsc/master + OpenMPI 4.0.4 + MKL 2019.4.243:
>>>>>>
>>>>>> With OpenMP=1
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log
>>>>>>
>>>>>> # -------------
>>>>>> # Summary
>>>>>> # -------------
>>>>>> # FAILED snes_tutorials-ex12_quad_hpddm_reuse_baij diff-ksp_ksp_tests-ex33_superlu_dist_2 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 ksp_ksp_tutorials-ex50_tut_2 diff-ksp_ksp_tests-ex33_superlu_dist diff-snes_tutorials-ex56_hypre snes_tutorials-ex17_3d_q3_trig_elas snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij ksp_ksp_tutorials-ex5_superlu_dist_3 ksp_ksp_tutorials-ex5f_superlu_dist snes_tutorials-ex12_tri_parmetis_hpddm_baij diff-snes_tutorials-ex19_tut_3 mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex19_superlu_dist diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre diff-ksp_ksp_tutorials-ex49_hypre_nullspace ts_tutorials-ex18_p1p1_xper_ref ts_tutorials-ex18_p1p1_xyper_ref snes_tutorials-ex19_superlu_dist_2 ksp_ksp_tutorials-ex5_superlu_dist_2 diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre ksp_ksp_tutorials-ex64_1 ksp_ksp_tutorials-ex5_superlu_dist ksp_ksp_tutorials-ex5f_superlu_dist_2
>>>>>> # success 8275/10003 tests (82.7%)
>>>>>> #*failed 33/10003* tests (0.3%)
>>>>>>
>>>>>> With OpenMP=0
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log
>>>>>>
>>>>>> # -------------
>>>>>> # Summary
>>>>>> # -------------
>>>>>> # FAILED tao_constrained_tutorials-tomographyADMM_6 snes_tutorials-ex17_3d_q3_trig_elas mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_5
>>>>>> # success 8262/9983 tests (82.8%)
>>>>>> #*failed 6/9983* tests (0.1%)
>>>>>>
>>>>>> ==============================================================================
>>>>>>
>>>>>> ==============================================================================
>>>>>>
>>>>>> For OpenMPI 3.1.x/master:
>>>>>>
>>>>>> With OpenMP=1:
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log
>>>>>>
>>>>>> # -------------
>>>>>> # Summary
>>>>>> # -------------
>>>>>> # FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex5_superlu_dist_3 diff-ksp_ksp_tutorials-ex49_hypre_nullspace ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex17_3d_q3_trig_vlap diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre diff-snes_tutorials-ex19_tut_3 diff-snes_tutorials-ex56_hypre diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_4 tao_constrained_tutorials-tomographyADMM_6 diff-tao_constrained_tutorials-toyf_1
>>>>>> # success 8142/9765 tests (83.4%)
>>>>>> #*failed 16/9765* tests (0.2%)
>>>>>>
>>>>>> With OpenMP=0:
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_make_test.log
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_configure.log
>>>>>>
>>>>>> # -------------
>>>>>> # Summary
>>>>>> # -------------
>>>>>> # FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex56_2 snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_4 diff-tao_constrained_tutorials-toyf_1
>>>>>> # success 8151/9767 tests (83.5%)
>>>>>> #*failed 9/9767* tests (0.1%)
>>>>>>
>>>>>> ==============================================================================
>>>>>>
>>>>>> ==============================================================================
>>>>>>
>>>>>> For OpenMPI 4.0.x/master:
>>>>>>
>>>>>> With OpenMP=1:
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_make_test.log
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_configure.log
>>>>>>
>>>>>> # FAILED snes_tutorials-ex17_3d_q3_trig_elas snes_tutorials-ex19_hypre ksp_ksp_tutorials-ex56_2 tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_5 mat_tests-ex242_3 ksp_ksp_tutorials-ex55_hypre ksp_ksp_tutorials-ex5_superlu_dist_2 tao_constrained_tutorials-tomographyADMM_6 snes_tutorials-ex56_hypre snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre ksp_ksp_tutorials-ex5f_superlu_dist_3 ksp_ksp_tutorials-ex34_hyprestruct diff-ksp_ksp_tutorials-ex49_hypre_nullspace snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre ksp_ksp_tutorials-ex5f_superlu_dist ksp_ksp_tutorials-ex5f_superlu_dist_2 ksp_ksp_tutorials-ex5_superlu_dist snes_tutorials-ex19_tut_3 snes_tutorials-ex19_superlu_dist ksp_ksp_tutorials-ex50_tut_2 snes_tutorials-ex17_3d_q3_trig_vlap ksp_ksp_tutorials-ex5_superlu_dist_3 snes_tutorials-ex19_superlu_dist_2 tao_constrained_tutorials-tomographyADMM_4 ts_tutorials-ex26_2
>>>>>> # success 8125/9753 tests (83.3%)
>>>>>> #*failed 26/9753* tests (0.3%)
>>>>>>
>>>>>> With OpenMP=0
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_make_test.log
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_configure.log
>>>>>>
>>>>>> # FAILED mat_tests-ex242_3
>>>>>> # success 8174/9777 tests (83.6%)
>>>>>> #*failed 1/9777* tests (0.0%)
>>>>>>
>>>>>> ==============================================================================
>>>>>>
>>>>>> ==============================================================================
>>>>>>
>>>>>> Is that known and normal?
>>>>>>
>>>>>> In all cases, I am using MKL and I suspect it may come from
>>>>>> there... :/
>>>>>>
>>>>>> I also saw a second problem, "make test" fails to compile petsc
>>>>>> examples on older versions of MKL (but that's less important for
>>>>>> me, I just upgraded to OneAPI to avoid this, but you may want to
>>>>>> know):
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_make_test.log
>>>>>>
>>>>>> https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_configure.log
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>> --
>>>>>> Eric Chamberland, ing., M. Ing
>>>>>> Professionnel de recherche
>>>>>> GIREF/Université Laval
>>>>>> (418) 656-2131 poste 41 22 42
>>>>>
>>>> --
>>>> Eric Chamberland, ing., M. Ing
>>>> Professionnel de recherche
>>>> GIREF/Université Laval
>>>> (418) 656-2131 poste 41 22 42
>>>
>> --
>> Eric Chamberland, ing., M. Ing
>> Professionnel de recherche
>> GIREF/Université Laval
>> (418) 656-2131 poste 41 22 42
>
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210303/170cd582/attachment-0001.html>
More information about the petsc-dev
mailing list