[petsc-users] PAMI error on Summit
Junchao Zhang
junchao.zhang at gmail.com
Mon Mar 4 11:10:45 CST 2024
Hi, Sophie,
I tried various modules and compilers on Summit and failed to find one
that works with gpu aware mpi.
The one that could build petsc and kokkos was "module load cuda/11.7.1
gcc/9.3.0-compiler_only spectrum-mpi essl netlib-lapack". But it only
worked with "-use_gpu_aware_mpi 0". Without it, I saw code crashes.
From what I can see, the gpu-aware mpi on Summit is an unusable and
unmaintained state.
--Junchao Zhang
On Fri, Mar 1, 2024 at 3:58 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:
> It is weird, with
> jsrun --smpiargs "-gpu" -n 6 -a 1 -c 1 -g 1 /gpfs/alpine2/mat267/proj-
> shared/dependencies/petsc-kokkos/src/ksp/ksp/tutorials/bench_kspsolve
> -mat_type aijkokkos -use_gpu_aware_mpi 1
>
> petsc tried to test if the MPI is gpu aware (by doing an MPI_Allreduce on
> device buffers). It tried and found it was not, so it threw out the
> complaint in the error message.
>
> From
> https://urldefense.us/v3/__https://docs.olcf.ornl.gov/systems/summit_user_guide.html*cuda-aware-mpi__;Iw!!G_uCfscf7eWS!cq445CXteimKBMZKF1HQqgEFTwREIrbMMm5Cn-sCV3wDm2A3tixBsge_FLfW-3YKRxtbYWK9D29cMq338kMstbOGRUZc$ ,
> I think your flags were right.
>
> I just got my Summit account reactivated today. I will give it a try.
>
> --Junchao Zhang
>
>
> On Fri, Mar 1, 2024 at 3:32 PM Blondel, Sophie <sblondel at utk.edu> wrote:
>
>> I have been using --smpiargs "-gpu".
>>
>> I tried the benchmark with "jsrun --smpiargs "-gpu" -n 6 -a 1 -c 1 -g 1
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos/src/ksp/ksp/tutorials/bench_kspsolve
>> -mat_type aijkokkos -use_gpu_aware_mpi 0" and it seems to work:
>> Fri Mar 1 16:27:14 EST 2024
>> ===========================================
>> Test: KSP performance - Poisson
>> Input matrix: 27-pt finite difference stencil
>> -n 100
>> DoFs = 1000000
>> Number of nonzeros = 26463592
>>
>> Step1 - creating Vecs and Mat...
>> Step2 - running KSPSolve()...
>> Step3 - calculating error norm...
>>
>> Error norm: 5.591e-02
>> KSP iters: 63
>> KSPSolve: 3.16646 seconds
>> FOM: 3.158e+05 DoFs/sec
>> ===========================================
>>
>> ------------------------------------------------------------
>> Sender: LSF System <lsfadmin at batch3>
>> Subject: Job 3322694: <xolotlTest> in cluster <summit> Done
>>
>> Job <xolotlTest> was submitted from host <login2> by user <bqo> in
>> cluster <summit> at Fri Mar 1 16:26:58 2024
>> Job was executed on host(s) <1*batch3>, in queue <debug>, as user <bqo>
>> in cluster <summit> at Fri Mar 1 16:27:00 2024
>> <42*a35n05>
>> </ccs/home/bqo> was used as the home directory.
>> </gpfs/alpine2/mat267/scratch/bqo/test> was used as the working directory.
>> Started at Fri Mar 1 16:27:00 2024
>> Terminated at Fri Mar 1 16:27:26 2024
>> Results reported at Fri Mar 1 16:27:26 2024
>>
>> The output (if any) is above this job summary.
>>
>>
>> If I switch to "jsrun --smpiargs "-gpu" -n 6 -a 1 -c 1 -g 1
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos/src/ksp/ksp/tutorials/bench_kspsolve
>> -mat_type aijkokkos -use_gpu_aware_mpi 1" it complains:
>> Fri Mar 1 16:25:02 EST 2024
>> ===========================================
>> Test: KSP performance - Poisson
>> Input matrix: 27-pt finite difference stencil
>> -n 100
>> DoFs = 1000000
>> Number of nonzeros = 26463592
>>
>> Step1 - creating Vecs and Mat...
>> [5]PETSC ERROR: PETSc is configured with GPU support, but your MPI is not
>> GPU-aware. For better performance, please use a GPU-aware MPI.
>> [5]PETSC ERROR: If you do not care, add option -use_gpu_aware_mpi 0. To
>> not see the message again, add the option to your .petscrc, OR add it to
>> the env var PETSC_OPTIONS.
>> [5]PETSC ERROR: If you do care, for IBM Spectrum MPI on OLCF Summit, you
>> may need jsrun --smpiargs=-gpu.
>> [5]PETSC ERROR: For Open MPI, you need to configure it --with-cuda (
>> https://urldefense.us/v3/__https://www.open-mpi.org/faq/?category=buildcuda__;!!G_uCfscf7eWS!cq445CXteimKBMZKF1HQqgEFTwREIrbMMm5Cn-sCV3wDm2A3tixBsge_FLfW-3YKRxtbYWK9D29cMq338kMstegRM8hj$ )
>> [5]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1 (
>> https://urldefense.us/v3/__http://mvapich.cse.ohio-state.edu/userguide/gdr/__;!!G_uCfscf7eWS!cq445CXteimKBMZKF1HQqgEFTwREIrbMMm5Cn-sCV3wDm2A3tixBsge_FLfW-3YKRxtbYWK9D29cMq338kMstf6tc3Lv$ )
>> [5]PETSC ERROR: For Cray-MPICH, you need to set
>> MPICH_GPU_SUPPORT_ENABLED=1 (man mpi to see manual of cray-mpich)
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
>> with errorcode 76.
>>
>> Best,
>>
>> Sophie
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Thursday, February 29, 2024 17:09
>> *To:* Blondel, Sophie <sblondel at utk.edu>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>
>> *Subject:* Re: [petsc-users] PAMI error on Summit
>>
>> You don't often get email from junchao.zhang at gmail.com. Learn why this
>> is important <https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!cq445CXteimKBMZKF1HQqgEFTwREIrbMMm5Cn-sCV3wDm2A3tixBsge_FLfW-3YKRxtbYWK9D29cMq338kMstYnzecTK$ >
>> Could you try a petsc example to see if the environment is good?
>> For example,
>>
>> cd src/ksp/ksp/tutorials
>> make bench_kspsolve
>> mpirun -n 6 ./bench_kspsolve -mat_type aijkokkos -use_gpu_aware_mpi {0
>> or 1}
>>
>> BTW, I remember to use gpu-aware mpi on Summit, one needs to pass
>> --smpiargs "-gpu" to jsrun
>>
>> --Junchao Zhang
>>
>>
>> On Thu, Feb 29, 2024 at 3:22 PM Blondel, Sophie via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>> I still get the same error when deactivating GPU-aware MPI. I also tried
>> unloading spectrum MPI and using openMPI instead (recompiling everything)
>> and I get a segfault in PETSc in that case (still using GPU-aware MPI I
>> think, at least not explicitly
>> ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>> ZjQcmQRYFpfptBannerEnd
>> I still get the same error when deactivating GPU-aware MPI.
>>
>> I also tried unloading spectrum MPI and using openMPI instead
>> (recompiling everything) and I get a segfault in PETSc in that case (still
>> using GPU-aware MPI I think, at least not explicitly turning it off):
>>
>> 0 TS dt 1e-12 time 0.
>>
>> [ERROR] [0]PETSC ERROR:
>>
>> [ERROR]
>> ------------------------------------------------------------------------
>>
>> [ERROR] [0]PETSC ERROR:
>>
>> [ERROR] Caught signal number 11 SEGV: Segmentation Violation, probably
>> memory access out of range
>>
>> [ERROR] [0]PETSC ERROR:
>>
>> [ERROR] Try option -start_in_debugger or -on_error_attach_debugger
>>
>> [ERROR] [0]PETSC ERROR:
>>
>> [ERROR] or see https://urldefense.us/v3/__https://petsc.org/release/faq/*valgrind__;Iw!!G_uCfscf7eWS!cq445CXteimKBMZKF1HQqgEFTwREIrbMMm5Cn-sCV3wDm2A3tixBsge_FLfW-3YKRxtbYWK9D29cMq338kMstTXwhErY$
>> <https://urldefense.us/v3/__https://urldefense.us/v2/url?u=https-3A__petsc.org_release_faq_-23valgrind&d=DwQGaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=SNsmM8pc4pmx4j-bqFq40w&m=1GLMwF9jewRd8MBil83VSwu-tVEn7Tkm_YfSAcgEMsZ9hDb2HvlnscmeqXsnzv5S&s=Loebf9sk4dgXGOOKPK3IHxp-C5SjGtr7Svr49LwaM4E&e=__;!!G_uCfscf7eWS!bhpq7UF4Rq9PhMMRRb_zeSflUb9Cs5My48ggt02OxSWxoM4eIU_MDt3H6e2YnrxJizIsA21q76YdORVhI0jsXekj$> and
>> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cq445CXteimKBMZKF1HQqgEFTwREIrbMMm5Cn-sCV3wDm2A3tixBsge_FLfW-3YKRxtbYWK9D29cMq338kMstcIOR87Y$
>> <https://urldefense.us/v3/__https://urldefense.us/v2/url?u=https-3A__petsc.org_release_faq_&d=DwQGaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=SNsmM8pc4pmx4j-bqFq40w&m=1GLMwF9jewRd8MBil83VSwu-tVEn7Tkm_YfSAcgEMsZ9hDb2HvlnscmeqXsnzv5S&s=7e9oLVYLacda_1-8rSkzDEHL4Zy1BFnO4pnrfMNlgO4&e=__;!!G_uCfscf7eWS!bhpq7UF4Rq9PhMMRRb_zeSflUb9Cs5My48ggt02OxSWxoM4eIU_MDt3H6e2YnrxJizIsA21q76YdORVhI74qqyaL$>
>>
>> [ERROR] [0]PETSC ERROR:
>>
>> [ERROR] or try https://urldefense.us/v3/__https://docs.nvidia.com/cuda/cuda-memcheck/index.html__;!!G_uCfscf7eWS!cq445CXteimKBMZKF1HQqgEFTwREIrbMMm5Cn-sCV3wDm2A3tixBsge_FLfW-3YKRxtbYWK9D29cMq338kMstUmD8idW$
>> <https://urldefense.us/v3/__https://urldefense.us/v2/url?u=https-3A__docs.nvidia.com_cuda_cuda-2Dmemcheck_index.html&d=DwQGaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=SNsmM8pc4pmx4j-bqFq40w&m=1GLMwF9jewRd8MBil83VSwu-tVEn7Tkm_YfSAcgEMsZ9hDb2HvlnscmeqXsnzv5S&s=2gHentsiEM2njpPim4k40mYA96k7v_ivjI3erSECebM&e=__;!!G_uCfscf7eWS!bhpq7UF4Rq9PhMMRRb_zeSflUb9Cs5My48ggt02OxSWxoM4eIU_MDt3H6e2YnrxJizIsA21q76YdORVhI3YGCBJ5$> on
>> NVIDIA CUDA systems to find memory corruption errors
>>
>> [ERROR] [0]PETSC ERROR:
>>
>> [ERROR] configure using --with-debugging=yes, recompile, link, and run
>>
>> [ERROR] [0]PETSC ERROR:
>>
>> [ERROR] to get more information on the crash.
>>
>> [ERROR] [0]PETSC ERROR:
>>
>> [ERROR] Run with -malloc_debug to check if memory corruption is causing
>> the crash.
>>
>> --------------------------------------------------------------------------
>>
>> Best,
>>
>> Sophie
>> ------------------------------
>> *From:* Blondel, Sophie via Xolotl-psi-development <
>> xolotl-psi-development at lists.sourceforge.net>
>> *Sent:* Thursday, February 29, 2024 10:17
>> *To:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>
>> *Subject:* [Xolotl-psi-development] PAMI error on Summit
>>
>> Hi,
>>
>> I am using PETSc build with the Kokkos CUDA backend on Summit but when I
>> run my code with multiple MPI tasks I get the following error:
>> 0 TS dt 1e-12 time 0.
>> errno 14 pid 864558
>> xolotl:
>> /__SMPI_build_dir__________________________/ibmsrc/pami/ibm-pami/buildtools/pami_build_port/../pami/components/devices/shmem/shaddr/CMAShaddr.h:164:
>> size_t PAMI::Dev
>> ice::Shmem::CMAShaddr::read_impl(PAMI::Memregion*, size_t,
>> PAMI::Memregion*, size_t, size_t, bool*): Assertion `cbytes > 0' failed.
>> errno 14 pid 864557
>> xolotl:
>> /__SMPI_build_dir__________________________/ibmsrc/pami/ibm-pami/buildtools/pami_build_port/../pami/components/devices/shmem/shaddr/CMAShaddr.h:164:
>> size_t PAMI::Dev
>> ice::Shmem::CMAShaddr::read_impl(PAMI::Memregion*, size_t,
>> PAMI::Memregion*, size_t, size_t, bool*): Assertion `cbytes > 0' failed.
>> [e28n07:864557] *** Process received signal ***
>> [e28n07:864557] Signal: Aborted (6)
>> [e28n07:864557] Signal code: (-6)
>> [e28n07:864557] [ 0]
>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000604d8]
>> [e28n07:864557] [ 1] /lib64/glibc-hwcaps/power9/libc-2.28.so
>> (gsignal+0xd8)[0x200005d796f8]
>> [e28n07:864557] [ 2] /lib64/glibc-hwcaps/power9/libc-2.28.so
>> (abort+0x164)[0x200005d53ff4]
>> [e28n07:864557] [ 3] /lib64/glibc-hwcaps/power9/libc-2.28.so
>> (+0x3d280)[0x200005d6d280]
>> [e28n07:864557] [ 4] [e28n07:864558] *** Process received signal ***
>> [e28n07:864558] Signal: Aborted (6)
>> [e28n07:864558] Signal code: (-6)
>> [e28n07:864558] [ 0]
>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000604d8]
>> [e28n07:864558] [ 1] /lib64/glibc-hwcaps/power9/libc-2.28.so
>> (gsignal+0xd8)[0x200005d796f8]
>> [e28n07:864558] [ 2] /lib64/glibc-hwcaps/power9/libc-2.28.so
>> (abort+0x164)[0x200005d53ff4]
>> [e28n07:864558] [ 3] /lib64/glibc-hwcaps/power9/libc-2.28.so
>> (+0x3d280)[0x200005d6d280]
>> [e28n07:864558] [ 4] /lib64/glibc-hwcaps/power9/libc-2.28.so
>> (__assert_fail+0x64)[0x200005d6d324]
>> [e28n07:864557] [ 5]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>>
>> (_ZN4PAMI8Protocol3Get7GetRdmaINS_6Device5Shmem8DmaModelINS3_11ShmemDeviceINS_4Fifo8WrapFifoINS7_10FifoPacketILj64ELj4096EEENS_7Counter15IndirectBoundedINS_6Atomic12NativeAt
>>
>> omicEEELj256EEENSB_8IndirectINSB_6NativeEEENS4_9CMAShaddrELj256ELj512EEELb0EEESL_E6simpleEP18pami_rget_simple_t+0x1d8)[0x20007f3971d8]
>> [e28n07:864557] [ 6]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>>
>> (_ZN4PAMI8Protocol3Get13CompositeRGetINS1_4RGetES3_E6simpleEP18pami_rget_simple_t+0x40)[0x20007f2ecc10]
>> [e28n07:864557] [ 7]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>> (_ZN4PAMI7Context9rget_implEP18pami_rget_simple_t+0x28c)[0x20007f31a78c]
>> [e28n07:864557] [ 8]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>> (PAMI_Rget+0x18)[0x20007f2d94a8]
>> [e28n07:864557] [ 9]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/spectrum_mpi/mca_pml_p
>> ami.so(process_rndv_msg+0x46c)[0x2000a80159ac]
>> [e28n07:864557] [10]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/spectrum_mpi/mca_pml_p
>> ami.so(pml_pami_recv_rndv_cb+0x2bc)[0x2000a801670c]
>> [e28n07:864557] [11] /lib64/glibc-hwcaps/power9/libc-2.28.so
>> (__assert_fail+0x64)[0x200005d6d324]
>> [e28n07:864558] [ 5]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>>
>> (_ZN4PAMI8Protocol3Get7GetRdmaINS_6Device5Shmem8DmaModelINS3_11ShmemDeviceINS_4Fifo8WrapFifoINS7_10FifoPacketILj64ELj4096EEENS_7Counter15IndirectBoundedINS_6Atomic12NativeAt
>>
>> omicEEELj256EEENSB_8IndirectINSB_6NativeEEENS4_9CMAShaddrELj256ELj512EEELb0EEESL_E6simpleEP18pami_rget_simple_t+0x1d8)[0x20007f3971d8]
>> [e28n07:864558] [ 6]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>>
>> (_ZN4PAMI8Protocol3Get13CompositeRGetINS1_4RGetES3_E6simpleEP18pami_rget_simple_t+0x40)[0x20007f2ecc10]
>> [e28n07:864558] [ 7]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>> (_ZN4PAMI7Context9rget_implEP18pami_rget_simple_t+0x28c)[0x20007f31a78c]
>> [e28n07:864558] [ 8]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>> (PAMI_Rget+0x18)[0x20007f2d94a8]
>> [e28n07:864558] [ 9]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/spectrum_mpi/mca_pml_p
>> ami.so(process_rndv_msg+0x46c)[0x2000a80159ac]
>> [e28n07:864558] [10]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/spectrum_mpi/mca_pml_p
>> ami.so(pml_pami_recv_rndv_cb+0x2bc)[0x2000a801670c]
>> [e28n07:864558] [11]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>>
>> (_ZN4PAMI8Protocol4Send11EagerSimpleINS_6Device5Shmem11PacketModelINS3_11ShmemDeviceINS_4Fifo8WrapFifoINS7_10FifoPacketILj64ELj4096EEENS_7Counter15IndirectBoundedINS_6Atomic
>>
>> 12NativeAtomicEEELj256EEENSB_8IndirectINSB_6NativeEEENS4_9CMAShaddrELj256ELj512EEEEELNS1_15configuration_tE5EE15dispatch_packedEPvSP_mSP_SP_+0x4c)[0x20007f2e30ac]
>> [e28n07:864557] [12]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>> (PAMI_Context_advancev+0x6b0)[0x20007f2da540]
>> [e28n07:864557] [13]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/spectrum_mpi/mca_pml_p
>> ami.so(mca_pml_pami_progress+0x34)[0x2000a80073e4]
>> [e28n07:864557] [14]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/libopen-pal.so.3(opal_
>> progress+0x6c)[0x20003d60640c]
>> [e28n07:864557] [15]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/libmpi_ibm.so.3(ompi_r
>> equest_default_wait_all+0x144)[0x2000034c4b04]
>> [e28n07:864557] [16]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/libmpi_ibm.so.3(PMPI_W
>> aitall+0x10c)[0x20000352790c]
>> [e28n07:864557] [17]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>>
>> (_ZN4PAMI8Protocol4Send11EagerSimpleINS_6Device5Shmem11PacketModelINS3_11ShmemDeviceINS_4Fifo8WrapFifoINS7_10FifoPacketILj64ELj4096EEENS_7Counter15IndirectBoundedINS_6Atomic
>>
>> 12NativeAtomicEEELj256EEENSB_8IndirectINSB_6NativeEEENS4_9CMAShaddrELj256ELj512EEEEELNS1_15configuration_tE5EE15dispatch_packedEPvSP_mSP_SP_+0x4c)[0x20007f2e30ac]
>> [e28n07:864558] [12]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/pami_port/libpami.so.3
>> (PAMI_Context_advancev+0x6b0)[0x20007f2da540]
>> [e28n07:864558] [13]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/spectrum_mpi/mca_pml_p
>> ami.so(mca_pml_pami_progress+0x34)[0x2000a80073e4]
>> [e28n07:864558] [14]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/libopen-pal.so.3(opal_
>> progress+0x6c)[0x20003d60640c]
>> [e28n07:864558] [15]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/libmpi_ibm.so.3(ompi_r
>> equest_default_wait_all+0x144)[0x2000034c4b04]
>> [e28n07:864558] [16]
>> /sw/summit/spack-envs/summit-plus/opt/gcc-12.1.0/spectrum-mpi-10.4.0.6-20230210-db5xakaaqowbhp3nqwebpxrdbwtm4knu/container/../lib/libmpi_ibm.so.3(PMPI_W
>> aitall+0x10c)[0x20000352790c]
>> [e28n07:864558] [17]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x3ca7b0)[0x2000004ea7b0]
>> [e28n07:864557] [18]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x3ca7b0)[0x2000004ea7b0]
>> [e28n07:864558] [18]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x3c5e68)[0x2000004e5e68]
>> [e28n07:864557] [19]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x3c5e68)[0x2000004e5e68]
>> [e28n07:864558] [19]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(PetscSFBcastEnd+0x74)[0x2000004c9214]
>> [e28n07:864557] [20]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(PetscSFBcastEnd+0x74)[0x2000004c9214]
>> [e28n07:864558] [20]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x3b4cb0)[0x2000004d4cb0]
>> [e28n07:864557] [21]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x3b4cb0)[0x2000004d4cb0]
>> [e28n07:864558] [21]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(VecScatterEnd+0x178)[0x2000004dd038]
>> [e28n07:864558] [22]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(VecScatterEnd+0x178)[0x2000004dd038]
>> [e28n07:864557] [22]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x1112be0)[0x200001232be0]
>> [e28n07:864558] [23]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x1112be0)[0x200001232be0]
>> [e28n07:864557] [23]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(DMGlobalToLocalEnd+0x470)[0x200000e9b0f0]
>> [e28n07:864557] [24]
>> /gpfs/alpine2/mat267/proj-shared/code/xolotl-stable-cuda/xolotl/solver/libxolotlSolver.so(_ZN6xolotl6solver11PetscSolver11rhsFunctionEP5_p_TSdP6_p_VecS5
>> _+0xc4)[0x200005f710d4]
>> [e28n07:864557] [25]
>> /gpfs/alpine2/mat267/proj-shared/code/xolotl-stable-cuda/xolotl/solver/libxolotlSolver.so(_ZN6xolotl6solver11RHSFunctionEP5_p_TSdP6_p_VecS4_Pv+0x2c)[0x2
>> 00005f7130c]
>> [e28n07:864557] [26]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(DMGlobalToLocalEnd+0x470)[0x200000e9b0f0]
>> [e28n07:864558] [24]
>> /gpfs/alpine2/mat267/proj-shared/code/xolotl-stable-cuda/xolotl/solver/libxolotlSolver.so(_ZN6xolotl6solver11PetscSolver11rhsFunctionEP5_p_TSdP6_p_VecS5
>> _+0xc4)[0x200005f710d4]
>> [e28n07:864558] [25]
>> /gpfs/alpine2/mat267/proj-shared/code/xolotl-stable-cuda/xolotl/solver/libxolotlSolver.so(_ZN6xolotl6solver11RHSFunctionEP5_p_TSdP6_p_VecS4_Pv+0x2c)[0x2
>> 00005f7130c]
>> [e28n07:864558] [26]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(TSComputeRHSFunction+0x1bc)[0x2000017621dc]
>> [e28n07:864557] [27]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(TSComputeRHSFunction+0x1bc)[0x2000017621dc]
>> [e28n07:864558] [27]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(TSComputeIFunction+0x418)[0x200001763ad8]
>> [e28n07:864557] [28]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(TSComputeIFunction+0x418)[0x200001763ad8]
>> [e28n07:864558] [28]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x16f2ef0)[0x200001812ef0]
>> [e28n07:864557] [29]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(+0x16f2ef0)[0x200001812ef0]
>> [e28n07:864558] [29]
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(TSStep+0x228)[0x200001768088]
>> [e28n07:864557] *** End of error message ***
>>
>> /gpfs/alpine2/mat267/proj-shared/dependencies/petsc-kokkos-cuda/lib/libpetsc.so.3.020(TSStep+0x228)[0x200001768088]
>> [e28n07:864558] *** End of error message ***
>>
>> It seems to be pointing to
>> https://urldefense.us/v3/__https://petsc.org/release/manualpages/PetscSF/PetscSFBcastEnd/__;!!G_uCfscf7eWS!cq445CXteimKBMZKF1HQqgEFTwREIrbMMm5Cn-sCV3wDm2A3tixBsge_FLfW-3YKRxtbYWK9D29cMq338kMste-N8hvu$
>> <https://urldefense.us/v3/__https://petsc.org/release/manualpages/PetscSF/PetscSFBcastEnd/__;!!G_uCfscf7eWS!bhpq7UF4Rq9PhMMRRb_zeSflUb9Cs5My48ggt02OxSWxoM4eIU_MDt3H6e2YnrxJizIsA21q76YdORVhI30Ylvr6$>
>> so I wanted to check if you had seen this type of error before and if it
>> could be related to how the code is compiled or run. Let me know if I can
>> provide any additional information.
>>
>> Best,
>>
>> Sophie
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240304/0c2f2622/attachment-0001.html>
More information about the petsc-users
mailing list