<div dir="ltr">We have a problem when going from 32K to 64K cores on Cori-haswell.<div>Does Anyone have any thoughts?</div><div>Thanks,</div><div>Mark<br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Forwarded message ---------<br>From: <strong class="gmail_sendername" dir="auto">David Trebotich</strong> <span dir="auto"><<a href="mailto:dptrebotich@lbl.gov">dptrebotich@lbl.gov</a>></span><br>Date: Wed, Apr 15, 2020 at 4:20 PM<br>Subject: Re: petsc on Cori Haswell<br>To: Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>><br></div><br><br><div dir="ltr">Hey Mark-<div>I am running into some issues that I am convinced are from the PETSc build. I am able to build and run on up to 32K cores. At 64K I start getting stuff like below (looks like two issues: pmi stuff and MPI_Init). I have been working with Brian Freisen to see if it's a NERSC problem. At this point I build without PETSc and then run native gmg in Chombo and have no problems. The problems only come with building with PETSc, and at larger concurrencies. The only thing that has changed is that this is a new PETSc installation. Perhaps something changed in the PETSc version you built from previously? Thanks for the help.</div><div>Treb</div><div><br></div><div>Mon Apr 13 17:49:45 2020: [PE_101955]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_101958]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_101958]:_pmi_init:_pmi_mmap_init returned -1<br>Mon Apr 13 17:49:45 2020: [PE_101979]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_101979]:_pmi_init:_pmi_mmap_init returned -1<br>Mon Apr 13 17:49:45 2020: [PE_82712]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=28, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_17868]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_97918]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_17869]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_17869]:_pmi_init:_pmi_mmap_init returned -1<br>Mon Apr 13 17:49:45 2020: [PE_110562]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=27, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_110562]:_pmi_init:_pmi_mmap_init returned -1<br>Mon Apr 13 17:49:45 2020: [PE_110563]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=27, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_27899]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=38, pes_this_node=64, timeout=180 secs<br>[Mon Apr 13 17:49:45 2020] [c7-4c1s6n0] Fatal error in MPI_Init: Other MPI error, error stack:<br>MPIR_Init_thread(537): <br>MPID_Init(246).......: channel initialization failed<br>MPID_Init(647).......: PMI2 init failed: 1 <br>Attempting to use an MPI routine before initializing MPICH<br>[Mon Apr 13 17:49:45 2020] [c7-4c1s6n0] Fatal error in MPI_Init: Other MPI error, error stack:<br>MPIR_Init_thread(537): <br>MPID_Init(246).......: channel initialization failed<br>MPID_Init(647).......: PMI2 init failed: 1 <br>Attempting to use an MPI routine before initializing MPICH<br>Mon Apr 13 17:49:45 2020: [PE_71961]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_71961]:_pmi_init:_pmi_mmap_init returned -1<br>Mon Apr 13 17:49:45 2020: [PE_71962]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_64329]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_64335]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_64335]:_pmi_init:_pmi_mmap_init returned -1<br>[Mon Apr 13 17:49:45 2020] [c6-1c2s5n2] Fatal error in MPI_Init: Other MPI error, error stack:<br>MPIR_Init_thread(537): <br>MPID_Init(246).......: channel initialization failed<br>MPID_Init(647).......: PMI2 init failed: 1 <br>Attempting to use an MPI routine before initializing MPICH<br>[Mon Apr 13 17:49:45 2020] [c9-4c2s13n2] Fatal error in MPI_Init: Other MPI error, error stack:<br>MPIR_Init_thread(537): <br>MPID_Init(246).......: channel initialization failed<br>MPID_Init(647).......: PMI2 init failed: 1 <br>Attempting to use an MPI routine before initializing MPICH<br>Mon Apr 13 17:49:45 2020: [PE_71960]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs<br>Mon Apr 13 17:49:45 2020: [PE_71960]:_pmi_init:_pmi_mmap_init returned -1<br>[Mon Apr 13 17:49:45 2020] [c6-3c2s9n1] Fatal error in MPI_Init: Other MPI error, error stack:<br>MPIR_Init_thread(537): <br>MPID_Init(246).......: channel initialization failed<br></div></div><div dir="ltr"><br></div>
</div></div></div>