[petsc-users] Using PETSC with an openMP program
Satish Balay
balay at mcs.anl.gov
Fri Mar 2 13:25:37 CST 2018
I just tried your test code with gfortran [without petsc] - and I
don't understand it. Does gfortran not support this openmp usage?
[tried gfortran 4.8.4 and 7.3.1]
balay at es^/sandbox/balay/omp $ gfortran -fopenmp -c hellocount
hellocount.F90 hellocount_main.F90
balay at es^/sandbox/balay/omp $ gfortran -fopenmp -c hellocount.F90
balay at es^/sandbox/balay/omp $ gfortran -fopenmp hellocount_main.F90 hellocount.o
balay at es^/sandbox/balay/omp $ ./a.out
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 11 out of 32
Hello from 14 out of 32
Hello from 14 out of 32
Hello from 14 out of 32
Hello from 14 out of 32
Hello from 14 out of 32
Hello from 14 out of 32
Hello from 14 out of 32
ifort compiled test appears to behave correctly
balay at es^/sandbox/balay/omp $ ifort -qopenmp -c hellocount.F90
balay at es^/sandbox/balay/omp $ ifort -qopenmp hellocount_main.F90 hellocount.o
balay at es^/sandbox/balay/omp $ ./a.out |sort -n
Hello from 0 out of 32
Hello from 10 out of 32
Hello from 11 out of 32
Hello from 12 out of 32
Hello from 13 out of 32
Hello from 14 out of 32
Hello from 15 out of 32
Hello from 16 out of 32
Hello from 17 out of 32
Hello from 18 out of 32
Hello from 19 out of 32
Hello from 1 out of 32
Hello from 20 out of 32
Hello from 21 out of 32
Hello from 22 out of 32
Hello from 23 out of 32
Hello from 24 out of 32
Hello from 25 out of 32
Hello from 26 out of 32
Hello from 27 out of 32
Hello from 28 out of 32
Hello from 29 out of 32
Hello from 2 out of 32
Hello from 30 out of 32
Hello from 31 out of 32
Hello from 3 out of 32
Hello from 4 out of 32
Hello from 5 out of 32
Hello from 6 out of 32
Hello from 7 out of 32
Hello from 8 out of 32
Hello from 9 out of 32
balay at es^/sandbox/balay/omp
Now I build petsc with:
./configure --with-cc=icc --with-mpi=0 --with-openmp --with-fc=0 --with-cxx=0 PETSC_ARCH=arch-omp
i.e
balay at es^/sandbox/balay/omp $ ldd /sandbox/balay/petsc/arch-omp/lib/libpetsc.so
linux-vdso.so.1 => (0x00007fff8bfb2000)
liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f513fbbf000)
libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f513e3b6000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f513e081000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f513de63000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f513dc5f000)
libimf.so => /soft/com/packages/intel/16/u3/lib/intel64/libimf.so (0x00007f513d761000)
libsvml.so => /soft/com/packages/intel/16/u3/lib/intel64/libsvml.so (0x00007f513c855000)
libirng.so => /soft/com/packages/intel/16/u3/lib/intel64/libirng.so (0x00007f513c4e3000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f513c1dd000)
libiomp5.so => /soft/com/packages/intel/16/u3/lib/intel64/libiomp5.so (0x00007f513be99000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f513bc83000)
libintlc.so.5 => /soft/com/packages/intel/16/u3/lib/intel64/libintlc.so.5 (0x00007f513ba17000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f513b64e000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f513b334000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f513b115000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5142b40000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f513aed9000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f513acd5000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f513aacf000)
And - then link in petsc with your test - and that works fine for me.
balay at es^/sandbox/balay/omp $ rm -f *.o *.mod
balay at es^/sandbox/balay/omp $ ifort -qopenmp -c hellocount.F90
balay at es^/sandbox/balay/omp $ ifort -qopenmp hellocount_main.F90 hellocount.o -Wl,-rpath,/sandbox/balay/petsc/arch-omp/lib -L/sandbox/balay/petsc/arch-omp/lib -lpetsc -liomp5
balay at es^/sandbox/balay/omp $ ./a.out |sort -n
Hello from 0 out of 32
Hello from 10 out of 32
Hello from 11 out of 32
Hello from 12 out of 32
Hello from 13 out of 32
Hello from 14 out of 32
Hello from 15 out of 32
Hello from 16 out of 32
Hello from 17 out of 32
Hello from 18 out of 32
Hello from 19 out of 32
Hello from 1 out of 32
Hello from 20 out of 32
Hello from 21 out of 32
Hello from 22 out of 32
Hello from 23 out of 32
Hello from 24 out of 32
Hello from 25 out of 32
Hello from 26 out of 32
Hello from 27 out of 32
Hello from 28 out of 32
Hello from 29 out of 32
Hello from 2 out of 32
Hello from 30 out of 32
Hello from 31 out of 32
Hello from 3 out of 32
Hello from 4 out of 32
Hello from 5 out of 32
Hello from 6 out of 32
Hello from 7 out of 32
Hello from 8 out of 32
Hello from 9 out of 32
Satish
On Fri, 2 Mar 2018, Adrián Amor wrote:
> Thanks Satish, I tried the procedure you suggested and I get the same
> performance, so I guess that MKL is not a problem in this case (I agree
> with you that it has to be improved though... my makefile is a little
> chaotic with all the libraries that I use).
>
> And thanks Barry and Matthew! I'll try to ask to the Intel compiler forum
> since I also think that this is a problem related to the compiler and if I
> make some advance I'll let you know! In the end, I guess I'll drop
> acceleration through OpenMP threads...
>
> Thanks all!
>
> Adrian.
>
> 2018-03-02 17:11 GMT+01:00 Satish Balay <balay at mcs.anl.gov>:
>
> > When using MKL - PETSc attempts to default to sequential MKL.
> >
> > Perhaps this pulls in a *conflicting* dependency against -liomp5 - and
> > one has to use threaded MKL for this case. i.e not use
> > -lmkl_sequential
> >
> > You appear to have multiple mkl libraires linked in - its not clear
> > what they are for - and if there are any conflicts there.
> >
> > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64
> > > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lpetsc -lmkl_intel_lp64
> > > -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread -lm
> >
> > > -lmkl_intel_lp64 lmkl_sequential -lmkl_core -lpthread
> >
> > To test this out - suggest rebuilding PETSc with
> > --download-fblaslapack [and no mkl or related pacakges] - and then run
> > this test case you have [with openmp]
> >
> > And then add back one mkl package at a time..
> >
> > Satish
> >
> >
> > On Fri, 2 Mar 2018, Adrián Amor wrote:
> >
> > > Hi all,
> > >
> > > I have been working in the last months with PETSC in a FEM program
> > written
> > > on FORTRAN, so far sequential. Now, I want to parallelize it with OpenMP
> > > and I have found some problems. Finally, I have built a mockup program
> > > trying to localize the error.
> > >
> > > 1. I have compiled PETSC with these options:
> > > ./configure --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> > > --with-blas-lapack-dir=/opt/intel/mkl/lib/intel64/ --with-debugging=1
> > > --with-scalar-type=complex --with-threadcomm --with-pthreadclasses
> > > --with-openmp
> > > --with-openmp-include=/opt/intel/compilers_and_libraries_
> > 2016.1.150/linux/compiler/lib/intel64_lin
> > > --with-openmp-lib=/opt/intel/compilers_and_libraries_2016.
> > 1.150/linux/compiler/lib/intel64_lin/libiomp5.a
> > > PETSC_ARCH=linux-intel-dbg PETSC-AVOID-MPIF-H=1
> > >
> > > (I have tried also removing --with-threadcomm --with-pthreadclasses and
> > > with libiomp5.so).
> > >
> > > 2. The program to be executed is composed of two files, one is
> > > hellocount.F90:
> > > MODULE hello_count
> > > use omp_lib
> > > IMPLICIT none
> > >
> > > CONTAINS
> > > subroutine hello_print ()
> > > integer :: nthreads,mythread
> > >
> > > !pragma hello-who-omp-f
> > > !$omp parallel
> > > nthreads = omp_get_num_threads()
> > > mythread = omp_get_thread_num()
> > > write(*,'("Hello from",i3," out of",i3)') mythread,nthreads
> > > !$omp end parallel
> > > !pragma end
> > > end subroutine hello_print
> > > END MODULE hello_count
> > >
> > > and the other one is hellocount_main.F90:
> > > Program Hello
> > >
> > > USE hello_count
> > >
> > > call hello_print
> > >
> > > STOP
> > >
> > > end Program Hello
> > >
> > > 3. To compile these two functions I use:
> > > rm -rf _obj
> > > mkdir _obj
> > >
> > > ifort -E -I/home/aamor/petsc/include
> > > -I/home/aamor/petsc/linux-intel-dbg/include -c hellocount.F90
> > > >_obj/hellocount.f90
> > > ifort -E -I/home/aamor/petsc/include
> > > -I/home/aamor/petsc/linux-intel-dbg/include -c hellocount_main.F90
> > > >_obj/hellocount_main.f90
> > >
> > > mpiifort -CB -g -warn all -O0 -shared-intel -check:none -qopenmp -module
> > > _obj -I./_obj -I/home/aamor/MUMPS_5.1.2/include
> > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/include
> > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/
> > mkl/include/intel64/lp64/
> > > -I/home/aamor/petsc/include -I/home/aamor/petsc/linux-intel-dbg/include
> > -o
> > > _obj/hellocount.o -c _obj/hellocount.f90
> > > mpiifort -CB -g -warn all -O0 -shared-intel -check:none -qopenmp -module
> > > _obj -I./_obj -I/home/aamor/MUMPS_5.1.2/include
> > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/include
> > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/
> > mkl/include/intel64/lp64/
> > > -I/home/aamor/petsc/include -I/home/aamor/petsc/linux-intel-dbg/include
> > -o
> > > _obj/hellocount_main.o -c _obj/hellocount_main.f90
> > >
> > > mpiifort -CB -g -warn all -O0 -shared-intel -check:none -qopenmp -module
> > > _obj -I./_obj -o exec/HELLO _obj/hellocount.o _obj/hellocount_main.o
> > > /home/aamor/lib_tmp/libarpack_LinuxIntel15.a
> > > /home/aamor/MUMPS_5.1.2/lib/libzmumps.a
> > > /home/aamor/MUMPS_5.1.2/lib/libmumps_common.a
> > > /home/aamor/MUMPS_5.1.2/lib/libpord.a
> > > /home/aamor/parmetis-4.0.3/lib/libparmetis.a
> > > /home/aamor/parmetis-4.0.3/lib/libmetis.a
> > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64
> > > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lpetsc -lmkl_intel_lp64
> > > -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread -lm
> > > -L/home/aamor/lib_tmp -lgidpost -lz /home/aamor/lua-5.3.3/src/liblua.a
> > > /home/aamor/ESEAS-master/libeseas.a
> > > -Wl,-rpath,/home/aamor/petsc/linux-intel-dbg/lib
> > > -L/home/aamor/petsc/linux-intel-dbg/lib
> > > -Wl,-rpath,/opt/intel/mkl/lib/intel64 -L/opt/intel/mkl/lib/intel64
> > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib/debug_mt
> > -L/opt/intel/impi/
> > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/
> > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib
> > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
> > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
> > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016.
> > 1.150/linux/compiler/lib/intel64_lin
> > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/
> > compiler/lib/intel64_lin
> > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7
> > > -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7
> > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/debug_mt
> > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lmkl_intel_lp64
> > > -lmkl_sequential -lmkl_core -lpthread -lX11 -lssl -lcrypto -lifport
> > > -lifcore_pic -lmpicxx -ldl -Wl,-rpath,/opt/intel/impi/
> > > 5.1.2.150/intel64/lib/debug_mt -L/opt/intel/impi/
> > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/
> > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib -lmpifort
> > > -lmpi -lmpigi -lrt -lpthread -Wl,-rpath,/opt/intel/impi/
> > > 5.1.2.150/intel64/lib/debug_mt -L/opt/intel/impi/
> > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/
> > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib
> > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
> > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
> > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016.
> > 1.150/linux/compiler/lib/intel64_lin
> > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/
> > compiler/lib/intel64_lin
> > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7
> > > -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7
> > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
> > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
> > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib/debug_mt
> > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib
> > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/debug_mt
> > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -limf -lsvml -lirng -lm
> > -lipgo
> > > -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s
> > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib/debug_mt
> > -L/opt/intel/impi/
> > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/
> > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib
> > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
> > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
> > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016.
> > 1.150/linux/compiler/lib/intel64_lin
> > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/
> > compiler/lib/intel64_lin
> > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7
> > > -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7
> > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
> > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 -ldl
> > >
> > > exec/HELLO
> > >
> > > 4. Then I have seen that:
> > > 4.1. If I set OMP_NUM_THREADS=2 and I remove -lpetsc and -lifcore_pic
> > from
> > > the last step, I got:
> > > Hello from 0 out of 2
> > > Hello from 1 out of 2
> > > 4.2 But if add -lpetsc and -lifcore_pic (because I want to use PETSC) I
> > get
> > > this error:
> > > Hello from 0 out of 2
> > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown
> > > Image PC Routine Line
> > Source
> > > HELLO 000000000041665C Unknown Unknown
> > Unknown
> > > HELLO 00000000004083C8 Unknown Unknown
> > Unknown
> > > libiomp5.so 00007F9C603566A3 Unknown Unknown
> > Unknown
> > > libiomp5.so 00007F9C60325007 Unknown Unknown
> > Unknown
> > > libiomp5.so 00007F9C603246F5 Unknown Unknown
> > Unknown
> > > libiomp5.so 00007F9C603569C3 Unknown Unknown
> > Unknown
> > > libpthread.so.0 0000003CE76079D1 Unknown Unknown
> > Unknown
> > > libc.so.6 0000003CE6AE88FD Unknown Unknown
> > Unknown
> > > If you set OMP_NUM_THREADS to 8, I get:
> > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown
> > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown
> > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown
> > >
> > > I am sorry if this is a trivial problem because I guess that lots of
> > people
> > > use PETSC with OpenMP in FORTRAN, but I have really done my best to
> > figure
> > > out where the error is. Can you help me?
> > >
> > > Thanks a lot!
> > >
> > > Adrian.
> > >
> >
>
More information about the petsc-users
mailing list