[petsc-dev] Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process
Satish Balay
balay at mcs.anl.gov
Fri Oct 7 17:42:16 CDT 2016
On Fri, 7 Oct 2016, Antonio Trande wrote:
> On 10/07/2016 08:29 PM, Satish Balay wrote:
> > When I build openblas from sources - I don't see this error.
> >
> >>>>>>>
> > balay at asterix /home/balay/petsc/src/snes/examples/tutorials (master=)
> > $ valgrind --tool=memcheck -q ./ex19 -da_refine 3 -pc_type mg -ksp_type fgmres
> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
> > Number of SNES iterations = 2
> > <<<<<
> >
> > I'm not sure why it behaves differently with a source build.. [some bugfixes since v0.2.18-5?]
> >
> > But there are other examples that appear to have issues.
> >
> > Satish
>
> I'm already using openblas-0.2.18-5
I meant to say 'newer than version 0.2.18-5' i.e whats in the git
repo. But I now think there are issues with openblas64 built by
fedora.
I tried the following:
1. get latest openblas
git clone https://github.com/xianyi/OpenBLAS
2. build with 64bit int mode:
checking - http://pkgs.fedoraproject.org/cgit/rpms/openblas.git/tree/openblas.spec - I used:
make DEBUG=1 INTERFACE64=1 -j 4
3. Now the PETSc tests run fine.
4. Now - I attempted to replicate fedora build using info from:
https://kojipkgs.fedoraproject.org//packages/openblas/0.2.18/5.fc25/data/logs/x86_64/build.log
attempt a build with:
make TARGET=CORE2 DYNAMIC_ARCH=1 USE_THREAD=0 USE_OPENMP=0 FC=gfortran CC=gcc 'COMMON_OPT=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -fPIC' 'FCOMMON_OPT=-frecursive -fPIC' NUM_THREADS=4 LIBPREFIX=libopenfoo-blas64 INTERFACE64=1
5. Now I see errors in 'openblas' test codes - during the build. [for some reason fedora koji build doesn't run this test?]
>>>>>>>>>>>>>>>>>>
gfortran -frecursive -o sblat1 sblat1.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o dblat1 dblat1.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o cblat1 cblat1.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o zblat1 zblat1.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o sblat2 sblat2.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o dblat2 dblat2.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o cblat2 cblat2.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o zblat2 zblat2.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o sblat3 sblat3.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o dblat3 dblat3.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o cblat3 cblat3.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
gfortran -frecursive -o zblat3 zblat3.o ../libopenfoo-blas64-r0.2.20.dev.a -lm -lgfortran -lm -lgfortran
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1
Real BLAS Test Program Results
Test of subprogram number 1 SDOT
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
rm -f ?BLAT2.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat2 < ./sblat2.dat
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f7f62862df7 in ???
#1 0x7f7f6286202d in ???
#2 0x7f7f61d5699f in ???
#3 0x16c23b0 in sdot_k_SANDYBRIDGE
at ../kernel/x86_64/sdot.c:110
#4 0x4094f0 in ???
#5 0x40a594 in ???
#6 0x40a67c in ???
#7 0x7f7f61d41400 in ???
#8 0x408089 in ???
#9 0xffffffffffffffff in ???
Makefile:8: recipe for target 'level1' failed
make[1]: *** [level1] Segmentation fault (core dumped)
make[1]: *** Waiting for unfinished jobs....
#0 0x7fc591700df7 in ???
#1 0x7fc59170002d in ???
#2 0x7fc590bf499f in ???
#3 0x16d38a3 in ???
at ../kernel/x86_64/scal_sse.S:372
/bin/sh: line 1: 14464 Segmentation fault (core dumped) OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat2 < ./sblat2.dat
Makefile:29: recipe for target 'level2' failed
make[1]: *** [level2] Error 139
make[1]: Leaving directory '/home/balay/git-repo/github/foo-OpenBLAS/test'
Makefile:111: recipe for target 'tests' failed
make: *** [tests] Error 2
balay at asterix /home/balay/git-repo/github/foo-OpenBLAS (develop=)
$
<<<<<<<<<<<<<<<<<
6. Ok - looking at both the build closely - I see:
* my working build with "make DEBUG=1 INTERFACE64=1 -j 4" has stuff like:
gfortran -Wall -m64 -fdefault-integer-8 -fPIC -g -c stgsen.f -o stgsen.o
gfortran -Wall -m64 -fdefault-integer-8 -fPIC -g -c stgsja.f -o stgsja.o
gfortran -Wall -m64 -fdefault-integer-8 -fPIC -g -c stgsna.f -o stgsna.o
gfortran -Wall -m64 -fdefault-integer-8 -fPIC -g -c stgsy2.f -o stgsy2.o
gfortran -Wall -m64 -fdefault-integer-8 -fPIC -g -c stgsyl.f -o stgsyl.o
gfortran -Wall -m64 -fdefault-integer-8 -fPIC -g -c stpcon.f -o stpcon.o
* broken builds - both my fedora equivalent - and
https://kojipkgs.fedoraproject.org//packages/openblas/0.2.18/5.fc25/data/logs/x86_64/build.log
has stuff like:
gfortran -frecursive -fPIC -c stgsen.f -o stgsen.o
gfortran -frecursive -fPIC -c stgsja.f -o stgsja.o
gfortran -frecursive -fPIC -c stgsna.f -o stgsna.o
gfortran -frecursive -fPIC -c stgsy2.f -o stgsy2.o
gfortran -frecursive -fPIC -c stgsyl.f -o stgsyl.o
gfortran -frecursive -fPIC -c stpcon.f -o stpcon.o
So - 'FCOMMON_OPT=-frecursive -fPIC' option is replacing
'-fdefault-integer-8' thats required for a correct build of 64bit int
openblas? [tuhs creating buggy /usr/lib64/libopenblas64.so]
BTW: Another issue with the fedora build - fortran sources are not
built with '-g -O2' [similar to the c sources] - so openblas-debuginfo
does not have any debug symbols for the fortran sources [that are part
of libopenblas64.so].
Satish
More information about the petsc-dev
mailing list