[petsc-users] PetSc and CrayPat: MPI assertion errors

Junchao Zhang junchao.zhang at gmail.com
Tue Jul 6 09:13:47 CDT 2021

On Tue, Jul 6, 2021 at 8:31 AM Vijay S Kumar <vijayskumar at gmail.com> wrote:

> Hello all,
> By way of background, we have a PetSc-based solver that we run on our
> in-house Cray system. We are carrying out performance analysis using
> profilers in the CrayPat suite that provide more fine-grained
> performance-related information than the PetSc log_view summary.
> When instrumented using CrayPat perftools, it turns out that the MPI
> initialization (MPI_Init) internally invoked by PetscInitialize is not
> picked up by the profiler. That is, simply specifying the following:
>               ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr)
> return ierr;
> results in the following runtime error:
>                CrayPat/X:  Version 7.1.1 Revision 7c0ddd79b  08/19/19
> 16:58:46
> Attempting to use an MPI routine before initializing MPICH
Do you happen to know what the MPI routine is?

> To circumvent this, we had to explicitly call MPI_Init prior to
> PetscInitialize:
>             MPI_Init(&argc,&argv);
>             ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr)
> return ierr;
> However, the side-effect of this above workaround seems to be several
> downstream runtime (assertion) errors with VecAssemblyBegin/End and
> MatAssemblyBeing/End statements:
> CrayPat/X:  Version 7.1.1 Revision 7c0ddd79b  08/19/19 16:58:46
> main.x: ../rtsum.c:5662: __pat_trsup_trace_waitsome_rtsum: Assertion
> `recv_count != MPI_UNDEFINED' failed.
>  main at main.c:769
>   VecAssemblyEnd at 0x2aaab951b3ba
>   VecAssemblyEnd_MPI_BTS at 0x2aaab950b179
>   MPI_Waitsome at 0x43a238
>   __pat_trsup_trace_waitsome_rtsum at 0x5f1a17
>   __GI___assert_fail at 0x2aaabc61e7d1
>   __assert_fail_base at 0x2aaabc61e759
>   __GI_abort at 0x2aaabc627740
>   __GI_raise at 0x2aaabc626160
> Interestingly,  we do not see such errors when there is no explicit
> MPI_Init, and no instrumentation for performance.
> Looking for someone to help throw more light on why PetSc Mat/Vec
> AssemblyEnd statements lead to such MPI-level assertion errors in cases
> where MPI_Init is explicitly called.
> (Or alternatively, is there a way to call PetscInitialize in a manner that
> ensures that the MPI initialization is picked up by the profilers in
> question?)
> We would highly appreciate any help/pointers,
> Thanks!
>  Vijay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210706/97fe951a/attachment.html>

More information about the petsc-users mailing list