[petsc-users] PetSc and CrayPat: MPI assertion errors
Matthew Knepley
knepley at gmail.com
Tue Jul 6 09:26:20 CDT 2021
On Tue, Jul 6, 2021 at 9:31 AM Vijay S Kumar <vijayskumar at gmail.com> wrote:
> Hello all,
>
> By way of background, we have a PetSc-based solver that we run on our
> in-house Cray system. We are carrying out performance analysis using
> profilers in the CrayPat suite that provide more fine-grained
> performance-related information than the PetSc log_view summary.
>
> When instrumented using CrayPat perftools, it turns out that the MPI
> initialization (MPI_Init) internally invoked by PetscInitialize is not
> picked up by the profiler. That is, simply specifying the following:
> ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr)
> return ierr;
> results in the following runtime error:
>
> CrayPat/X: Version 7.1.1 Revision 7c0ddd79b 08/19/19
> 16:58:46
>
> Attempting to use an MPI routine before initializing MPICH
>
> To circumvent this, we had to explicitly call MPI_Init prior to
> PetscInitialize:
> MPI_Init(&argc,&argv);
> ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr)
> return ierr;
>
> However, the side-effect of this above workaround seems to be several
> downstream runtime (assertion) errors with VecAssemblyBegin/End and
> MatAssemblyBeing/End statements:
>
> CrayPat/X: Version 7.1.1 Revision 7c0ddd79b 08/19/19 16:58:46
> main.x: ../rtsum.c:5662: __pat_trsup_trace_waitsome_rtsum: Assertion
> `recv_count != MPI_UNDEFINED' failed.
>
> main at main.c:769
> VecAssemblyEnd at 0x2aaab951b3ba
> VecAssemblyEnd_MPI_BTS at 0x2aaab950b179
> MPI_Waitsome at 0x43a238
> __pat_trsup_trace_waitsome_rtsum at 0x5f1a17
> __GI___assert_fail at 0x2aaabc61e7d1
> __assert_fail_base at 0x2aaabc61e759
> __GI_abort at 0x2aaabc627740
> __GI_raise at 0x2aaabc626160
>
>
> Interestingly, we do not see such errors when there is no explicit
> MPI_Init, and no instrumentation for performance.
> Looking for someone to help throw more light on why PetSc Mat/Vec
> AssemblyEnd statements lead to such MPI-level assertion errors in cases
> where MPI_Init is explicitly called.
> (Or alternatively, is there a way to call PetscInitialize in a manner that
> ensures that the MPI initialization is picked up by the profilers in
> question?)
>
> We would highly appreciate any help/pointers,
>
There is no problem calling MPI_Init() before PetscInitialize(), although
then you also have to call MPI_Finalize() explicitly at the end.
Both errors appears to arise from the Cray instrumentation, which is
evidently buggy. Did you try calling MPI_Init() yourself without
instrumentation? Also, what info are you getting from CrayPat that we do
not log?
Thanks,
Matt
> Thanks!
> Vijay
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210706/5d56b7c2/attachment-0001.html>
More information about the petsc-users
mailing list