[petsc-users] Fwd: petsc on Cori Haswell

Mark Adams mfadams at lbl.gov
Wed Apr 15 15:33:47 CDT 2020


We have a problem when going from 32K to 64K cores on Cori-haswell.
Does Anyone have any thoughts?
Thanks,
Mark

---------- Forwarded message ---------
From: David Trebotich <dptrebotich at lbl.gov>
Date: Wed, Apr 15, 2020 at 4:20 PM
Subject: Re: petsc on Cori Haswell
To: Mark Adams <mfadams at lbl.gov>


Hey Mark-
I am running into some issues that I am convinced are from the PETSc build.
I am able to build and run on up to 32K cores. At 64K I start getting stuff
like below (looks like two issues: pmi stuff and MPI_Init). I have been
working with Brian Freisen to see if it's a NERSC problem. At this point I
build without PETSc and then run native gmg in Chombo and have no problems.
The problems only come with building with PETSc, and at larger
concurrencies. The only thing that has changed is that this is a new PETSc
installation. Perhaps something changed in the PETSc version you built from
previously?  Thanks for the help.
Treb

Mon Apr 13 17:49:45 2020: [PE_101955]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_101958]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_101958]:_pmi_init:_pmi_mmap_init returned -1
Mon Apr 13 17:49:45 2020: [PE_101979]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_101979]:_pmi_init:_pmi_mmap_init returned -1
Mon Apr 13 17:49:45 2020: [PE_82712]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=28, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_17868]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_97918]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_17869]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_17869]:_pmi_init:_pmi_mmap_init returned -1
Mon Apr 13 17:49:45 2020: [PE_110562]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=27, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_110562]:_pmi_init:_pmi_mmap_init returned -1
Mon Apr 13 17:49:45 2020: [PE_110563]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=27, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_27899]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=38, pes_this_node=64, timeout=180 secs
[Mon Apr 13 17:49:45 2020] [c7-4c1s6n0] Fatal error in MPI_Init: Other MPI
error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......:  PMI2 init failed: 1
Attempting to use an MPI routine before initializing MPICH
[Mon Apr 13 17:49:45 2020] [c7-4c1s6n0] Fatal error in MPI_Init: Other MPI
error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......:  PMI2 init failed: 1
Attempting to use an MPI routine before initializing MPICH
Mon Apr 13 17:49:45 2020: [PE_71961]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_71961]:_pmi_init:_pmi_mmap_init returned -1
Mon Apr 13 17:49:45 2020: [PE_71962]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_64329]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_64335]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_64335]:_pmi_init:_pmi_mmap_init returned -1
[Mon Apr 13 17:49:45 2020] [c6-1c2s5n2] Fatal error in MPI_Init: Other MPI
error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......:  PMI2 init failed: 1
Attempting to use an MPI routine before initializing MPICH
[Mon Apr 13 17:49:45 2020] [c9-4c2s13n2] Fatal error in MPI_Init: Other MPI
error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......:  PMI2 init failed: 1
Attempting to use an MPI routine before initializing MPICH
Mon Apr 13 17:49:45 2020: [PE_71960]:_pmi_mmap_tmp: Warning bootstrap
barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs
Mon Apr 13 17:49:45 2020: [PE_71960]:_pmi_init:_pmi_mmap_init returned -1
[Mon Apr 13 17:49:45 2020] [c6-3c2s9n1] Fatal error in MPI_Init: Other MPI
error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200415/4c8f0539/attachment.html>


More information about the petsc-users mailing list