[petsc-dev] spock

Mark Adams mfadams at lbl.gov
Fri Dec 10 07:16:56 CST 2021


FWIW,  here is my current status.

08:08 main= spock:/gpfs/alpine/csc314/scratch/adams/petsc$ make
PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
PETSC_ARCH=arch-olcf-spock check
Running check examples to verify correct installation
Using PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc and
PETSC_ARCH=arch-olcf-spock
Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
    0 KSP Residual norm 0.0406612
    1 KSP Residual norm 0.036923
    2 KSP Residual norm 0.0191849
    3 KSP Residual norm 0.00201589
    4 KSP Residual norm 0.000376045
    5 KSP Residual norm 4.2974e-05
    6 KSP Residual norm 5.96585e-06
    7 KSP Residual norm 4.5398e-07
    8 KSP Residual norm 6.30474e-08
    9 KSP Residual norm 5.55518e-09
   10 KSP Residual norm 6.180e-10
   11 KSP Residual norm 6.211e-11
  Linear solve converged due to CONVERGED_RTOL iterations 11
    0 KSP Residual norm 3.32845e-06
    1 KSP Residual norm 9.0003e-07
    2 KSP Residual norm 1.32594e-07
    3 KSP Residual norm 1.49857e-08
    4 KSP Residual norm 1.31887e-09
    5 KSP Residual norm 2.105e-10
    6 KSP Residual norm 2.827e-11
    7 KSP Residual norm < 1.e-11
    8 KSP Residual norm < 1.e-11
    9 KSP Residual norm < 1.e-11
   10 KSP Residual norm < 1.e-11
  Linear solve converged due to CONVERGED_RTOL iterations 10
Number of SNES iterations = 2
Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
See http://www.mcs.anl.gov/petsc/documentation/faq.html
lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
    0 KSP Residual norm 0.0406612
    1 KSP Residual norm 0.0281101
    2 KSP Residual norm 0.00773873
    3 KSP Residual norm 0.00165731
    4 KSP Residual norm 0.000395614
    5 KSP Residual norm 8.67655e-05
    6 KSP Residual norm 1.69495e-05
    7 KSP Residual norm 3.70051e-06
    8 KSP Residual norm 5.97067e-07
    9 KSP Residual norm 1.02242e-07
   10 KSP Residual norm 1.75727e-08
   11 KSP Residual norm 3.84826e-09
   12 KSP Residual norm 6.414e-10
   13 KSP Residual norm 1.380e-10
  Linear solve converged due to CONVERGED_RTOL iterations 13
    0 KSP Residual norm 3.32846e-06
    1 KSP Residual norm 8.99139e-07
    2 KSP Residual norm 1.72893e-07
    3 KSP Residual norm 3.733e-08
    4 KSP Residual norm 6.67427e-09
    5 KSP Residual norm 1.22785e-09
    6 KSP Residual norm 2.551e-10
    7 KSP Residual norm 5.458e-11
    8 KSP Residual norm 1.050e-11
    9 KSP Residual norm < 1.e-11
   10 KSP Residual norm < 1.e-11
   11 KSP Residual norm < 1.e-11
   12 KSP Residual norm < 1.e-11
  Linear solve converged due to CONVERGED_RTOL iterations 12
Number of SNES iterations = 2
3,5c3,14
<   1 SNES Function norm 4.12227e-06
<   2 SNES Function norm 6.098e-11
< Number of SNES iterations = 2
---
>     0 KSP Residual norm 0.0406612
>     1 KSP Residual norm 0.21263
>     2 KSP Residual norm 1.09192
>     3 KSP Residual norm 6.9087
>     4 KSP Residual norm 23.4292
>     5 KSP Residual norm 57.7558
>     6 KSP Residual norm 118.076
>     7 KSP Residual norm 213.527
>     8 KSP Residual norm 354.101
>     9 KSP Residual norm 550.58
>   Linear solve did not converge due to DIVERGED_DTOL iterations 9
> Number of SNES iterations = 0
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials
Possible problem with ex19 running with hypre, diffs above
=========================================
gmake[3]: [makefile:115: runex3k_kokkos] Error 134 (ignored)
21,25c21,26
<   1 SNES Function norm 2.952582418265e-01
<   2 SNES Function norm 4.502293658739e-04
<   3 SNES Function norm 1.389665806646e-09
< Number of SNES iterations = 3
< Norm of error 1.49752e-10 Iterations 3
---
> Memory access fault by GPU node-4 (Agent handle: 0xb08c90) on address
0xe17000. Reason: Page not present or supervisor privilege.
> Memory access fault by GPU node-5 (Agent handle: 0xb0d3c0) on address
0xe11000. Reason: Page not present or supervisor privilege.
> srun: error: spock25: task 0: Aborted
> srun: launch/slurm: _step_signal: Terminating StepId=304034.3
> slurmstepd: error: *** STEP 304034.3 ON spock25 CANCELLED AT
2021-12-10T08:08:40 ***
> srun: error: spock25: task 1: Aborted (core dumped)
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials
Possible problem with ex3k running with kokkos-kernels, diffs above
=========================================
*******************Error detected during compile or link!*******************
See http://www.mcs.anl.gov/petsc/documentation/faq.html
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex5f
*********************************************************
ftn -fPIC   -fPIC    -I/gpfs/alpine/csc314/scratch/adams/petsc/include
-I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-spock/include
-I/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray/include
-I/opt/rocm-4.3.0/include     ex5f.F90
 -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-spock/lib
-L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-spock/lib
-Wl,-rpath,/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray/lib
-L/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray/lib
-Wl,-rpath,/opt/rocm-4.3.0/lib -L/opt/rocm-4.3.0/lib
-Wl,-rpath,/opt/cray/pe/mpich/8.1.10/gtl/lib
-L/opt/cray/pe/mpich/8.1.10/gtl/lib
-Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64
-L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/
21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/
21.08.1.2/CRAY/9.0/x86_64/lib
-Wl,-rpath,/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib
-L/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib
-Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib
-L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.14/lib
-L/opt/cray/pe/pmi/6.0.14/lib
-Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce/x86_64/lib
-L/opt/cray/pe/cce/12.0.3/cce/x86_64/lib
-Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64
-L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64
-Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux
-L/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux
-Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0
-L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0
-Wl,-rpath,/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib
-L/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib
-lpetsc -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse
-lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -lstdc++
-ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem
-lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran
-lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64
-lquadmath -lstdc++ -ldl -lmpi_gtl_hsa -o ex5f
/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld:
warning: alignment 128 of symbol
`$host_init$$runtime_init_for_iso_c_binding$iso_c_binding_' in
/opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller than 256 in
/tmp/pe_46424/ex5f_1.o
/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld:
warning: alignment 64 of symbol `$data_init$iso_c_binding_' in
/opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller than 256 in
/tmp/pe_46424/ex5f_1.o
Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI
process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
    0 KSP Residual norm < 1.e-11
  Linear solve converged due to CONVERGED_ATOL iterations 0

On Fri, Dec 10, 2021 at 8:07 AM Mark Adams <mfadams at lbl.gov> wrote:

> I am trying to get Spock working (again) and am having problems.
>
> * make check seems to fail but it is hard to see what is going on. Maybe
> we should start here, but let me continue.
>
> * GAMG seems to work on the CPU
>
> * I have this for configuring with Kokkos. I am guessing these versions
> are out of data. What is current practice:
>     '--with-kokkos-hip-arch=VEGA908',
>     '--download-kokkos-commit=3.4.01',
>     '--download-kokkos-kernels-commit=3.4.01',
>
> * Should I hold off (and tell my eager user to do same)?
>
> Thanks,
> Mark
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20211210/76c6d60d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log
Type: application/octet-stream
Size: 118051 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20211210/76c6d60d/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 3186796 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20211210/76c6d60d/attachment-0003.obj>


More information about the petsc-dev mailing list