[petsc-users] PetSc and CrayPat: MPI assertion errors

Matthew Knepley knepley at gmail.com
Tue Jul 6 09:26:20 CDT 2021


On Tue, Jul 6, 2021 at 9:31 AM Vijay S Kumar <vijayskumar at gmail.com> wrote:

> Hello all,
>
> By way of background, we have a PetSc-based solver that we run on our
> in-house Cray system. We are carrying out performance analysis using
> profilers in the CrayPat suite that provide more fine-grained
> performance-related information than the PetSc log_view summary.
>
> When instrumented using CrayPat perftools, it turns out that the MPI
> initialization (MPI_Init) internally invoked by PetscInitialize is not
> picked up by the profiler. That is, simply specifying the following:
>               ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr)
> return ierr;
> results in the following runtime error:
>
>                CrayPat/X:  Version 7.1.1 Revision 7c0ddd79b  08/19/19
> 16:58:46
>
> Attempting to use an MPI routine before initializing MPICH
>
> To circumvent this, we had to explicitly call MPI_Init prior to
> PetscInitialize:
>             MPI_Init(&argc,&argv);
>             ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr)
> return ierr;
>
> However, the side-effect of this above workaround seems to be several
> downstream runtime (assertion) errors with VecAssemblyBegin/End and
> MatAssemblyBeing/End statements:
>
> CrayPat/X:  Version 7.1.1 Revision 7c0ddd79b  08/19/19 16:58:46
> main.x: ../rtsum.c:5662: __pat_trsup_trace_waitsome_rtsum: Assertion
> `recv_count != MPI_UNDEFINED' failed.
>
>  main at main.c:769
>   VecAssemblyEnd at 0x2aaab951b3ba
>   VecAssemblyEnd_MPI_BTS at 0x2aaab950b179
>   MPI_Waitsome at 0x43a238
>   __pat_trsup_trace_waitsome_rtsum at 0x5f1a17
>   __GI___assert_fail at 0x2aaabc61e7d1
>   __assert_fail_base at 0x2aaabc61e759
>   __GI_abort at 0x2aaabc627740
>   __GI_raise at 0x2aaabc626160
>
>
> Interestingly,  we do not see such errors when there is no explicit
> MPI_Init, and no instrumentation for performance.
> Looking for someone to help throw more light on why PetSc Mat/Vec
> AssemblyEnd statements lead to such MPI-level assertion errors in cases
> where MPI_Init is explicitly called.
> (Or alternatively, is there a way to call PetscInitialize in a manner that
> ensures that the MPI initialization is picked up by the profilers in
> question?)
>
> We would highly appreciate any help/pointers,
>

There is no problem calling MPI_Init() before PetscInitialize(), although
then you also have to call MPI_Finalize() explicitly at the end.
Both errors appears to arise from the Cray instrumentation, which is
evidently buggy. Did you try calling MPI_Init() yourself without
instrumentation? Also, what info are you getting from CrayPat that we do
not log?

  Thanks,

     Matt


> Thanks!
>  Vijay
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210706/5d56b7c2/attachment-0001.html>


More information about the petsc-users mailing list