[petsc-dev] Apparent bug with PGI on Titan with each thread calling a serial LU solve

Mark Adams mfadams at lbl.gov
Thu May 8 09:30:13 CDT 2014


On Wed, May 7, 2014 at 1:02 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>    Mark,
>
>    Are you sure the PGI version is built with the correct branch of
> barry/make-petscoptionsobject-nonglobal ?
>

examples/tutorials> make ex1
cc -o ex1.o -c -mp   -I/autofs/na3_home1/adams/petsc/include
-I/autofs/na3_home1/adams/petsc/arch-titan-opt-pgi/include
-I/opt/nvidia/cudatoolkit/5.5.20-1.0402.7700.8.1/include
-I/opt/nvidia/cudatoolkit/5.5.20-1.0402.7700.8.1/extras/CUPTI/include
-I/opt/nvidia/cudatoolkit/5.5.20-1.0402.7700.8.1/extras/Debugger/include
-I/opt/cray/udreg/2.3.2-1.0402.7546.1.5.gem/include
-I/opt/cray/ugni/5.0-1.0402.7551.1.10.gem/include
-I/opt/cray/pmi/5.0.1-1.0000.9799.94.9.gem/include
-I/opt/cray/dmapp/4.0.1-1.0402.7784.4.1.gem/include
-I/opt/cray/gni-headers/2.1-1.0402.7541.1.5.gem/include
-I/opt/cray/xpmem/0.1-2.0402.45248.1.5.gem/include
-I/opt/cray/rca/1.0.0-2.0402.47290.7.1.gem/include
-I/opt/cray-hss-devel/7.1.0/include
-I/opt/cray/krca/1.0.0-2.0402.46083.4.47.gem/include
-I/opt/cray/mpt/6.2.0/gni/mpich2-pgi/121/include
-I/opt/acml/5.3.1/pgi64_fma4/include
-I/opt/cray/libsci/12.1.3/pgi/121/interlagos/include -I/usr/include/alps
-I/opt/cray/hdf5-parallel/1.8.11/pgi/121/include
-I/opt/cray/netcdf-hdf5parallel/4.3.0/pgi/121/include
-I/opt/pgi/13.10.0/linux86-64/13.10/include
-I/opt/cray/xe-sysroot/4.2.34/usr/include
 /ccs/home/adams/petsc/src/ksp/ksp/examples/tutorials/ex1.c
cc -mp  -o ex1 ex1.o
 -L/autofs/na3_home1/adams/petsc/arch-titan-opt-pgi/lib  -lpetsc
-Wl,-rpath,/autofs/na3_home1/adams/petsc/arch-titan-opt-pgi/lib
-lsuperlu_4.3 -Wl,-rpath,/opt/pgi/13.10.0/linux86/5.1/lib
-L/opt/pgi/13.10.0/linux86/5.1/lib -llapack -lblas -lpthread -ldl
/autofs/na3_home1/adams/petsc/arch-titan-opt-pgi/lib/libpetsc.a(inode2.o):
In function `MatCreate_SeqAIJ_Inode':
/autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:90: undefined
reference to `PetscOptionsPublishCount'
/autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:90: undefined
reference to `PetscOptionsPublishCount'
/autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:90: undefined
reference to `PetscOptionsBool'
/autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:95: undefined
reference to `PetscOptionsBool'
/autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:99: undefined
reference to `PetscOptionsInt'
/autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:100: undefined
reference to `PetscOptionsPublishCount'
/usr/bin/ld: link errors found, deleting executable `ex1'
examples/tutorials> git status
# On branch barry/make-petscoptionsobject-nonglobal
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working
directory)
#
#       modified:   makefile
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       ../../../../../conftest-arch-titan-opt-pgi
#       ../../../../../conftest.d
#       assert_mod.mod
#       ddt.output
#       omp_module.mod
#       tpetsc.F90
no changes added to commit (use "git add" and/or "git commit -a")



>
>     In that branch line 516 of aoptions.c has         case OPTION_INT:
> not a PetscFree
>
>   Meanwhile in master line 516 has   ierr =
> PetscFree(PetscOptionsObject.title);CHKERRQ(ierr);
>
>
>   Barry
>
>
> On May 7, 2014, at 2:47 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> > Barry, made the LU solver "thread safe" in that each thread calls a
> serial solve, for Ed (cc'ed).
> >
> > Ed has a test code (attached) that can be run with 16 threads (aprun -n
> 1 -d 16) and NUM_OPENMP_THREADS=16.  This code seems to work with Intel
> compilers but fails with PGI.
> >
> > I've appended a stack trace.  Any ideas?
> >
> > Mark
> > Note, the line numbers are not quite right in tpetsc.F90 but the rest
> look OK.
> >
> > #11 tpetsc () at
> /autofs/na3_home1/adams/petsc/src/ksp/ksp/examples/tutorials/tpetsc.F90:144
> (at 0x0000000000422afa)
> > #10 matcreateseqaij_ () at
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/ftn-custom/zaijf.c:14
> (at 0x0000000000474e95)
> > #9 MatCreateSeqAIJ () at
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/aij.c:3574 (at
> 0x000000000053c8b1)
> > #8 MatSetType () at
> /autofs/na3_home1/adams/petsc/src/mat/interface/matreg.c:71 (at
> 0x000000000050b292)
> > #7 MatCreate_SeqAIJ () at
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/aij.c:4129 (at
> 0x000000000053f24a)
> > #6 MatCreate_SeqAIJ_Inode () at
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:99 (at
> 0x000000000058b205)
> > #5 PetscOptionsEnd_Private () at
> /autofs/na3_home1/adams/petsc/src/sys/objects/aoptions.c:516 (at
> 0x000000000043fc25)
> > #4 PetscFreeAlign () at
> /autofs/na3_home1/adams/petsc/src/sys/memory/mal.c:72 (at
> 0x0000000000483f89)
> > #3 free () from /dsl/lib64/libc-2.11.3.so (at 0x00002aaab5bcc4fc)
> > #2 malloc_printerr () from /dsl/lib64/libc-2.11.3.so (at
> 0x00002aaab5bc7558)
> > #1 __libc_message () from /dsl/lib64/libc-2.11.3.so (at
> 0x00002aaab5bc1e2f)
> > #0 abort () from /dsl/lib64/libc-2.11.3.so (at 0x00002aaab5b85fb0)
> >
> >
> >
> > <tpetsc.F90>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140508/06325cbf/attachment.html>


More information about the petsc-dev mailing list