<div dir="ltr">It seems to be hanging on the 2 processor test.<div>I'll try running jobs manually.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Dec 10, 2021 at 9:34 AM Satish Balay <<a href="mailto:balay@mcs.anl.gov">balay@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Merged now. And the following now works [for me].<br>
<br>
 1025  git fetch -p<br>
 1026  git checkout origin/main<br>
 1027  ./config/examples/arch-olcf-spock.py && make<br>
 1028  MPIR_CVAR_GPU_EAGER_DEVICE_MEM=0 MPICH_GPU_SUPPORT_ENABLED=1 MPICH_SMP_SINGLE_COPY_MODE=CMA make check<br>
<br>
Satish<br>
<br>
On Fri, 10 Dec 2021, Satish Balay via petsc-dev wrote:<br>
<br>
> Works for me [per instructions in balay/update-spock, config/examples/arch-olcf-spock.py] with main - without these additional options<br>
> <br>
> I'll go ahead and merge in balay/update-spock<br>
> <br>
> Satish<br>
> <br>
> -----<br>
> <br>
>  1009  git fetch -p<br>
>  1015  module load emacs<br>
>  1016  module load rocm/4.3.0<br>
>  1018  git reset --hard<br>
>  1019  git checkout origin/main<br>
>  1020  git merge origin/balay/update-spock<br>
>  1021  ./config/examples/arch-olcf-spock.py && make<br>
> <br>
> <br>
> <br>
> [balay@login2.spock petsc]$ MPIR_CVAR_GPU_EAGER_DEVICE_MEM=0 MPICH_GPU_SUPPORT_ENABLED=1 MPICH_SMP_SINGLE_COPY_MODE=CMA make check<br>
> Running check examples to verify correct installation<br>
> Using PETSC_DIR=/autofs/nccs-svm1_home1/balay/petsc and PETSC_ARCH=arch-olcf-spock<br>
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process<br>
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes<br>
> C/C++ example src/snes/tutorials/ex3k run successfully with kokkos-kernels<br>
> *******************Error detected during compile or link!*******************<br>
> See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a><br>
> /ccs/home/balay/petsc/src/snes/tutorials ex5f<br>
> *********************************************************<br>
> ftn -fPIC   -fPIC    -I/autofs/nccs-svm1_home1/balay/petsc/include -I/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/include -I/opt/rocm-4.3.0/include     ex5f.F90  -Wl,-rpath,/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/lib -L/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/lib -Wl,-rpath,/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/lib -L/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/lib -Wl,-rpath,/opt/rocm-4.3.0/lib -L/opt/rocm-4.3.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/gtl/lib -L/opt/cray/pe/mpich/8.1.10/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/<a href="http://21.08.1.2/CRAY/9.0/x86_64/lib" rel="noreferrer" target="_blank">21.08.1.2/CRAY/9.0/x86_64/lib</a> -L/opt/cray/pe/libsci/<a href="http://21.08.1.2/CRAY/9.0/x86_64/lib" rel="noreferrer" target="_blank">21.08.1.2/CRAY/9.0/x86_64/lib</a> -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.14/lib -L/opt/cray/pe/pmi/6<br>
>  .0.14/li<br>
>  b -Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce/x86_64/lib -L/opt/cray/pe/cce/12.0.3/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux -L/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lpetsc -lmagma -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -lstdc++ -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypg<br>
>  o-x86_64<br>
>   -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -lmpi_gtl_hsa -o ex5f/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: warning: alignment 128 of symbol `$host_init$$runtime_init_for_iso_c_binding$iso_c_binding_' in /opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller than 256 in /tmp/pe_202599/ex5f_1.o<br>
> /opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: warning: alignment 64 of symbol `$data_init$iso_c_binding_' in /opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller than 256 in /tmp/pe_202599/ex5f_1.o<br>
> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process<br>
> Completed test examples<br>
> [balay@login2.spock petsc]$ <br>
> <br>
> <br>
> On Fri, 10 Dec 2021, Mark Adams wrote:<br>
> <br>
> > FWIW,  here is my current status.<br>
> > <br>
> > 08:08 main= spock:/gpfs/alpine/csc314/scratch/adams/petsc$ make<br>
> > PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc<br>
> > PETSC_ARCH=arch-olcf-spock check<br>
> > Running check examples to verify correct installation<br>
> > Using PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc and<br>
> > PETSC_ARCH=arch-olcf-spock<br>
> > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process<br>
> > See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a><br>
> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1.<br>
> >     0 KSP Residual norm 0.0406612<br>
> >     1 KSP Residual norm 0.036923<br>
> >     2 KSP Residual norm 0.0191849<br>
> >     3 KSP Residual norm 0.00201589<br>
> >     4 KSP Residual norm 0.000376045<br>
> >     5 KSP Residual norm 4.2974e-05<br>
> >     6 KSP Residual norm 5.96585e-06<br>
> >     7 KSP Residual norm 4.5398e-07<br>
> >     8 KSP Residual norm 6.30474e-08<br>
> >     9 KSP Residual norm 5.55518e-09<br>
> >    10 KSP Residual norm 6.180e-10<br>
> >    11 KSP Residual norm 6.211e-11<br>
> >   Linear solve converged due to CONVERGED_RTOL iterations 11<br>
> >     0 KSP Residual norm 3.32845e-06<br>
> >     1 KSP Residual norm 9.0003e-07<br>
> >     2 KSP Residual norm 1.32594e-07<br>
> >     3 KSP Residual norm 1.49857e-08<br>
> >     4 KSP Residual norm 1.31887e-09<br>
> >     5 KSP Residual norm 2.105e-10<br>
> >     6 KSP Residual norm 2.827e-11<br>
> >     7 KSP Residual norm < 1.e-11<br>
> >     8 KSP Residual norm < 1.e-11<br>
> >     9 KSP Residual norm < 1.e-11<br>
> >    10 KSP Residual norm < 1.e-11<br>
> >   Linear solve converged due to CONVERGED_RTOL iterations 10<br>
> > Number of SNES iterations = 2<br>
> > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes<br>
> > See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a><br>
> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1.<br>
> >     0 KSP Residual norm 0.0406612<br>
> >     1 KSP Residual norm 0.0281101<br>
> >     2 KSP Residual norm 0.00773873<br>
> >     3 KSP Residual norm 0.00165731<br>
> >     4 KSP Residual norm 0.000395614<br>
> >     5 KSP Residual norm 8.67655e-05<br>
> >     6 KSP Residual norm 1.69495e-05<br>
> >     7 KSP Residual norm 3.70051e-06<br>
> >     8 KSP Residual norm 5.97067e-07<br>
> >     9 KSP Residual norm 1.02242e-07<br>
> >    10 KSP Residual norm 1.75727e-08<br>
> >    11 KSP Residual norm 3.84826e-09<br>
> >    12 KSP Residual norm 6.414e-10<br>
> >    13 KSP Residual norm 1.380e-10<br>
> >   Linear solve converged due to CONVERGED_RTOL iterations 13<br>
> >     0 KSP Residual norm 3.32846e-06<br>
> >     1 KSP Residual norm 8.99139e-07<br>
> >     2 KSP Residual norm 1.72893e-07<br>
> >     3 KSP Residual norm 3.733e-08<br>
> >     4 KSP Residual norm 6.67427e-09<br>
> >     5 KSP Residual norm 1.22785e-09<br>
> >     6 KSP Residual norm 2.551e-10<br>
> >     7 KSP Residual norm 5.458e-11<br>
> >     8 KSP Residual norm 1.050e-11<br>
> >     9 KSP Residual norm < 1.e-11<br>
> >    10 KSP Residual norm < 1.e-11<br>
> >    11 KSP Residual norm < 1.e-11<br>
> >    12 KSP Residual norm < 1.e-11<br>
> >   Linear solve converged due to CONVERGED_RTOL iterations 12<br>
> > Number of SNES iterations = 2<br>
> > 3,5c3,14<br>
> > <   1 SNES Function norm 4.12227e-06<br>
> > <   2 SNES Function norm 6.098e-11<br>
> > < Number of SNES iterations = 2<br>
> > ---<br>
> > >     0 KSP Residual norm 0.0406612<br>
> > >     1 KSP Residual norm 0.21263<br>
> > >     2 KSP Residual norm 1.09192<br>
> > >     3 KSP Residual norm 6.9087<br>
> > >     4 KSP Residual norm 23.4292<br>
> > >     5 KSP Residual norm 57.7558<br>
> > >     6 KSP Residual norm 118.076<br>
> > >     7 KSP Residual norm 213.527<br>
> > >     8 KSP Residual norm 354.101<br>
> > >     9 KSP Residual norm 550.58<br>
> > >   Linear solve did not converge due to DIVERGED_DTOL iterations 9<br>
> > > Number of SNES iterations = 0<br>
> > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials<br>
> > Possible problem with ex19 running with hypre, diffs above<br>
> > =========================================<br>
> > gmake[3]: [makefile:115: runex3k_kokkos] Error 134 (ignored)<br>
> > 21,25c21,26<br>
> > <   1 SNES Function norm 2.952582418265e-01<br>
> > <   2 SNES Function norm 4.502293658739e-04<br>
> > <   3 SNES Function norm 1.389665806646e-09<br>
> > < Number of SNES iterations = 3<br>
> > < Norm of error 1.49752e-10 Iterations 3<br>
> > ---<br>
> > > Memory access fault by GPU node-4 (Agent handle: 0xb08c90) on address<br>
> > 0xe17000. Reason: Page not present or supervisor privilege.<br>
> > > Memory access fault by GPU node-5 (Agent handle: 0xb0d3c0) on address<br>
> > 0xe11000. Reason: Page not present or supervisor privilege.<br>
> > > srun: error: spock25: task 0: Aborted<br>
> > > srun: launch/slurm: _step_signal: Terminating StepId=304034.3<br>
> > > slurmstepd: error: *** STEP 304034.3 ON spock25 CANCELLED AT<br>
> > 2021-12-10T08:08:40 ***<br>
> > > srun: error: spock25: task 1: Aborted (core dumped)<br>
> > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials<br>
> > Possible problem with ex3k running with kokkos-kernels, diffs above<br>
> > =========================================<br>
> > *******************Error detected during compile or link!*******************<br>
> > See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a><br>
> > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex5f<br>
> > *********************************************************<br>
> > ftn -fPIC   -fPIC    -I/gpfs/alpine/csc314/scratch/adams/petsc/include<br>
> > -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-spock/include<br>
> > -I/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray/include<br>
> > -I/opt/rocm-4.3.0/include     ex5f.F90<br>
> >  -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-spock/lib<br>
> > -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-spock/lib<br>
> > -Wl,-rpath,/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray/lib<br>
> > -L/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray/lib<br>
> > -Wl,-rpath,/opt/rocm-4.3.0/lib -L/opt/rocm-4.3.0/lib<br>
> > -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/gtl/lib<br>
> > -L/opt/cray/pe/mpich/8.1.10/gtl/lib<br>
> > -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64<br>
> > -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/<br>
> > <a href="http://21.08.1.2/CRAY/9.0/x86_64/lib" rel="noreferrer" target="_blank">21.08.1.2/CRAY/9.0/x86_64/lib</a> -L/opt/cray/pe/libsci/<br>
> > <a href="http://21.08.1.2/CRAY/9.0/x86_64/lib" rel="noreferrer" target="_blank">21.08.1.2/CRAY/9.0/x86_64/lib</a><br>
> > -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib<br>
> > -L/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib<br>
> > -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib<br>
> > -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.14/lib<br>
> > -L/opt/cray/pe/pmi/6.0.14/lib<br>
> > -Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce/x86_64/lib<br>
> > -L/opt/cray/pe/cce/12.0.3/cce/x86_64/lib<br>
> > -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64<br>
> > -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64<br>
> > -Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux<br>
> > -L/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux<br>
> > -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0<br>
> > -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0<br>
> > -Wl,-rpath,/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib<br>
> > -L/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib<br>
> > -lpetsc -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse<br>
> > -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -lstdc++<br>
> > -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem<br>
> > -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran<br>
> > -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64<br>
> > -lquadmath -lstdc++ -ldl -lmpi_gtl_hsa -o ex5f<br>
> > /opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld:<br>
> > warning: alignment 128 of symbol<br>
> > `$host_init$$runtime_init_for_iso_c_binding$iso_c_binding_' in<br>
> > /opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller than 256 in<br>
> > /tmp/pe_46424/ex5f_1.o<br>
> > /opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld:<br>
> > warning: alignment 64 of symbol `$data_init$iso_c_binding_' in<br>
> > /opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller than 256 in<br>
> > /tmp/pe_46424/ex5f_1.o<br>
> > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI<br>
> > process<br>
> > See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a><br>
> >     0 KSP Residual norm < 1.e-11<br>
> >   Linear solve converged due to CONVERGED_ATOL iterations 0<br>
> > <br>
> > On Fri, Dec 10, 2021 at 8:07 AM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>
> > <br>
> > > I am trying to get Spock working (again) and am having problems.<br>
> > ><br>
> > > * make check seems to fail but it is hard to see what is going on. Maybe<br>
> > > we should start here, but let me continue.<br>
> > ><br>
> > > * GAMG seems to work on the CPU<br>
> > ><br>
> > > * I have this for configuring with Kokkos. I am guessing these versions<br>
> > > are out of data. What is current practice:<br>
> > >     '--with-kokkos-hip-arch=VEGA908',<br>
> > >     '--download-kokkos-commit=3.4.01',<br>
> > >     '--download-kokkos-kernels-commit=3.4.01',<br>
> > ><br>
> > > * Should I hold off (and tell my eager user to do same)?<br>
> > ><br>
> > > Thanks,<br>
> > > Mark<br>
> > ><br>
> > <br>
> <br>
<br>
</blockquote></div>