[petsc-users] petsc on Cori Haswell
Junchao Zhang
junchao.zhang at gmail.com
Wed Apr 15 16:21:37 CDT 2020
I want to know who called MPI_Init(). Petsc or Chombo?
--Junchao Zhang
On Wed, Apr 15, 2020 at 4:13 PM Matthew Knepley <knepley at gmail.com> wrote:
> On Wed, Apr 15, 2020 at 5:10 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> Was there a petsc error stack?
>>
>
> 1) SNES ex5 is a highly scalable problem. Just give it large enough m and
> n.
>
> 2) Junchao, it looks like MPI_Init() is failing, which I believe comes
> before we install our signal handler to get us the stack.
>
> Thanks,
>
> Matt
>
>
>> --Junchao Zhang
>>
>>
>> On Wed, Apr 15, 2020 at 3:41 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Whoops, this is actually Cori-KNL.
>>>
>>> On Wed, Apr 15, 2020 at 4:33 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> We have a problem when going from 32K to 64K cores on Cori-haswell.
>>>> Does Anyone have any thoughts?
>>>> Thanks,
>>>> Mark
>>>>
>>>> ---------- Forwarded message ---------
>>>> From: David Trebotich <dptrebotich at lbl.gov>
>>>> Date: Wed, Apr 15, 2020 at 4:20 PM
>>>> Subject: Re: petsc on Cori Haswell
>>>> To: Mark Adams <mfadams at lbl.gov>
>>>>
>>>>
>>>> Hey Mark-
>>>> I am running into some issues that I am convinced are from the PETSc
>>>> build. I am able to build and run on up to 32K cores. At 64K I start
>>>> getting stuff like below (looks like two issues: pmi stuff and MPI_Init). I
>>>> have been working with Brian Freisen to see if it's a NERSC problem. At
>>>> this point I build without PETSc and then run native gmg in Chombo and have
>>>> no problems. The problems only come with building with PETSc, and at larger
>>>> concurrencies. The only thing that has changed is that this is a new PETSc
>>>> installation. Perhaps something changed in the PETSc version you built from
>>>> previously? Thanks for the help.
>>>> Treb
>>>>
>>>> Mon Apr 13 17:49:45 2020: [PE_101955]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_101958]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_101958]:_pmi_init:_pmi_mmap_init returned
>>>> -1
>>>> Mon Apr 13 17:49:45 2020: [PE_101979]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_101979]:_pmi_init:_pmi_mmap_init returned
>>>> -1
>>>> Mon Apr 13 17:49:45 2020: [PE_82712]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=28, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_17868]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_97918]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=33, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_17869]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_17869]:_pmi_init:_pmi_mmap_init returned
>>>> -1
>>>> Mon Apr 13 17:49:45 2020: [PE_110562]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=27, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_110562]:_pmi_init:_pmi_mmap_init returned
>>>> -1
>>>> Mon Apr 13 17:49:45 2020: [PE_110563]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=27, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_27899]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=38, pes_this_node=64, timeout=180 secs
>>>> [Mon Apr 13 17:49:45 2020] [c7-4c1s6n0] Fatal error in MPI_Init: Other
>>>> MPI error, error stack:
>>>> MPIR_Init_thread(537):
>>>> MPID_Init(246).......: channel initialization failed
>>>> MPID_Init(647).......: PMI2 init failed: 1
>>>> Attempting to use an MPI routine before initializing MPICH
>>>> [Mon Apr 13 17:49:45 2020] [c7-4c1s6n0] Fatal error in MPI_Init: Other
>>>> MPI error, error stack:
>>>> MPIR_Init_thread(537):
>>>> MPID_Init(246).......: channel initialization failed
>>>> MPID_Init(647).......: PMI2 init failed: 1
>>>> Attempting to use an MPI routine before initializing MPICH
>>>> Mon Apr 13 17:49:45 2020: [PE_71961]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_71961]:_pmi_init:_pmi_mmap_init returned
>>>> -1
>>>> Mon Apr 13 17:49:45 2020: [PE_71962]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_64329]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_64335]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=32, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_64335]:_pmi_init:_pmi_mmap_init returned
>>>> -1
>>>> [Mon Apr 13 17:49:45 2020] [c6-1c2s5n2] Fatal error in MPI_Init: Other
>>>> MPI error, error stack:
>>>> MPIR_Init_thread(537):
>>>> MPID_Init(246).......: channel initialization failed
>>>> MPID_Init(647).......: PMI2 init failed: 1
>>>> Attempting to use an MPI routine before initializing MPICH
>>>> [Mon Apr 13 17:49:45 2020] [c9-4c2s13n2] Fatal error in MPI_Init: Other
>>>> MPI error, error stack:
>>>> MPIR_Init_thread(537):
>>>> MPID_Init(246).......: channel initialization failed
>>>> MPID_Init(647).......: PMI2 init failed: 1
>>>> Attempting to use an MPI routine before initializing MPICH
>>>> Mon Apr 13 17:49:45 2020: [PE_71960]:_pmi_mmap_tmp: Warning bootstrap
>>>> barrier failed: num_syncd=35, pes_this_node=64, timeout=180 secs
>>>> Mon Apr 13 17:49:45 2020: [PE_71960]:_pmi_init:_pmi_mmap_init returned
>>>> -1
>>>> [Mon Apr 13 17:49:45 2020] [c6-3c2s9n1] Fatal error in MPI_Init: Other
>>>> MPI error, error stack:
>>>> MPIR_Init_thread(537):
>>>> MPID_Init(246).......: channel initialization failed
>>>>
>>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200415/4ef76722/attachment-0001.html>
More information about the petsc-users
mailing list