[petsc-dev] Apparent bug with PGI on Titan with each thread calling a serial LU solve
Mark Adams
mfadams at lbl.gov
Thu May 8 13:06:59 CDT 2014
OK nuked it and started over and it fixed this error below.
On Thu, May 8, 2014 at 7:30 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
>
> On Wed, May 7, 2014 at 1:02 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> Mark,
>>
>> Are you sure the PGI version is built with the correct branch of
>> barry/make-petscoptionsobject-nonglobal ?
>>
>
> examples/tutorials> make ex1
> cc -o ex1.o -c -mp -I/autofs/na3_home1/adams/petsc/include
> -I/autofs/na3_home1/adams/petsc/arch-titan-opt-pgi/include
> -I/opt/nvidia/cudatoolkit/5.5.20-1.0402.7700.8.1/include
> -I/opt/nvidia/cudatoolkit/5.5.20-1.0402.7700.8.1/extras/CUPTI/include
> -I/opt/nvidia/cudatoolkit/5.5.20-1.0402.7700.8.1/extras/Debugger/include
> -I/opt/cray/udreg/2.3.2-1.0402.7546.1.5.gem/include
> -I/opt/cray/ugni/5.0-1.0402.7551.1.10.gem/include
> -I/opt/cray/pmi/5.0.1-1.0000.9799.94.9.gem/include
> -I/opt/cray/dmapp/4.0.1-1.0402.7784.4.1.gem/include
> -I/opt/cray/gni-headers/2.1-1.0402.7541.1.5.gem/include
> -I/opt/cray/xpmem/0.1-2.0402.45248.1.5.gem/include
> -I/opt/cray/rca/1.0.0-2.0402.47290.7.1.gem/include
> -I/opt/cray-hss-devel/7.1.0/include
> -I/opt/cray/krca/1.0.0-2.0402.46083.4.47.gem/include
> -I/opt/cray/mpt/6.2.0/gni/mpich2-pgi/121/include
> -I/opt/acml/5.3.1/pgi64_fma4/include
> -I/opt/cray/libsci/12.1.3/pgi/121/interlagos/include -I/usr/include/alps
> -I/opt/cray/hdf5-parallel/1.8.11/pgi/121/include
> -I/opt/cray/netcdf-hdf5parallel/4.3.0/pgi/121/include
> -I/opt/pgi/13.10.0/linux86-64/13.10/include
> -I/opt/cray/xe-sysroot/4.2.34/usr/include
> /ccs/home/adams/petsc/src/ksp/ksp/examples/tutorials/ex1.c
> cc -mp -o ex1 ex1.o
> -L/autofs/na3_home1/adams/petsc/arch-titan-opt-pgi/lib -lpetsc
> -Wl,-rpath,/autofs/na3_home1/adams/petsc/arch-titan-opt-pgi/lib
> -lsuperlu_4.3 -Wl,-rpath,/opt/pgi/13.10.0/linux86/5.1/lib
> -L/opt/pgi/13.10.0/linux86/5.1/lib -llapack -lblas -lpthread -ldl
> /autofs/na3_home1/adams/petsc/arch-titan-opt-pgi/lib/libpetsc.a(inode2.o):
> In function `MatCreate_SeqAIJ_Inode':
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:90: undefined
> reference to `PetscOptionsPublishCount'
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:90: undefined
> reference to `PetscOptionsPublishCount'
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:90: undefined
> reference to `PetscOptionsBool'
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:95: undefined
> reference to `PetscOptionsBool'
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:99: undefined
> reference to `PetscOptionsInt'
> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:100:
> undefined reference to `PetscOptionsPublishCount'
> /usr/bin/ld: link errors found, deleting executable `ex1'
> examples/tutorials> git status
> # On branch barry/make-petscoptionsobject-nonglobal
> # Changes not staged for commit:
> # (use "git add <file>..." to update what will be committed)
> # (use "git checkout -- <file>..." to discard changes in working
> directory)
> #
> # modified: makefile
> #
> # Untracked files:
> # (use "git add <file>..." to include in what will be committed)
> #
> # ../../../../../conftest-arch-titan-opt-pgi
> # ../../../../../conftest.d
> # assert_mod.mod
> # ddt.output
> # omp_module.mod
> # tpetsc.F90
> no changes added to commit (use "git add" and/or "git commit -a")
>
>
>
>>
>> In that branch line 516 of aoptions.c has case OPTION_INT:
>> not a PetscFree
>>
>> Meanwhile in master line 516 has ierr =
>> PetscFree(PetscOptionsObject.title);CHKERRQ(ierr);
>>
>>
>> Barry
>>
>>
>> On May 7, 2014, at 2:47 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>> > Barry, made the LU solver "thread safe" in that each thread calls a
>> serial solve, for Ed (cc'ed).
>> >
>> > Ed has a test code (attached) that can be run with 16 threads (aprun -n
>> 1 -d 16) and NUM_OPENMP_THREADS=16. This code seems to work with Intel
>> compilers but fails with PGI.
>> >
>> > I've appended a stack trace. Any ideas?
>> >
>> > Mark
>> > Note, the line numbers are not quite right in tpetsc.F90 but the rest
>> look OK.
>> >
>> > #11 tpetsc () at
>> /autofs/na3_home1/adams/petsc/src/ksp/ksp/examples/tutorials/tpetsc.F90:144
>> (at 0x0000000000422afa)
>> > #10 matcreateseqaij_ () at
>> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/ftn-custom/zaijf.c:14
>> (at 0x0000000000474e95)
>> > #9 MatCreateSeqAIJ () at
>> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/aij.c:3574 (at
>> 0x000000000053c8b1)
>> > #8 MatSetType () at
>> /autofs/na3_home1/adams/petsc/src/mat/interface/matreg.c:71 (at
>> 0x000000000050b292)
>> > #7 MatCreate_SeqAIJ () at
>> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/aij.c:4129 (at
>> 0x000000000053f24a)
>> > #6 MatCreate_SeqAIJ_Inode () at
>> /autofs/na3_home1/adams/petsc/src/mat/impls/aij/seq/inode2.c:99 (at
>> 0x000000000058b205)
>> > #5 PetscOptionsEnd_Private () at
>> /autofs/na3_home1/adams/petsc/src/sys/objects/aoptions.c:516 (at
>> 0x000000000043fc25)
>> > #4 PetscFreeAlign () at
>> /autofs/na3_home1/adams/petsc/src/sys/memory/mal.c:72 (at
>> 0x0000000000483f89)
>> > #3 free () from /dsl/lib64/libc-2.11.3.so (at 0x00002aaab5bcc4fc)
>> > #2 malloc_printerr () from /dsl/lib64/libc-2.11.3.so (at
>> 0x00002aaab5bc7558)
>> > #1 __libc_message () from /dsl/lib64/libc-2.11.3.so (at
>> 0x00002aaab5bc1e2f)
>> > #0 abort () from /dsl/lib64/libc-2.11.3.so (at 0x00002aaab5b85fb0)
>> >
>> >
>> >
>> > <tpetsc.F90>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140508/679bc281/attachment.html>
More information about the petsc-dev
mailing list