[petsc-users] PetSc and CrayPat: MPI assertion errors

Barry Smith bsmith at petsc.dev
Tue Jul 6 15:21:40 CDT 2021

> On Jul 6, 2021, at 8:30 AM, Vijay S Kumar <vijayskumar at gmail.com> wrote:
> Hello all,
> By way of background, we have a PetSc-based solver that we run on our in-house Cray system. We are carrying out performance analysis using profilers in the CrayPat suite that provide more fine-grained performance-related information than the PetSc log_view summary.
> When instrumented using CrayPat perftools, it turns out that the MPI initialization (MPI_Init) internally invoked by PetscInitialize is not picked up by the profiler. That is, simply specifying the following:
>               ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr) return ierr;
> results in the following runtime error: 
>                CrayPat/X:  Version 7.1.1 Revision 7c0ddd79b  08/19/19 16:58:46
> Attempting to use an MPI routine before initializing MPICH

   This is certainly unexpected behavior, PETSc is "just" an MPI application it does not do anything special for CrayPat. We do not expect that one would need to call MPI_Init() outside of PETSc to use a performance tool. Perhaps PETSc is not being configured/compiled with the correct flags for the CrayPat performance tools or its shared library is not being built appropriately. If CrayPat uses the PMPI_xxx wrapper model for MPI profiling it may cause these kinds of difficulties if the correct profile wrapper functions are not inserted during the build process.

  I would try running a standard PETSc program in a debugger with breakpoints for MPI_Init() (and possible others) to investigate what is happening exactly and maybe why. 

   You can send to petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov> the configure.log and make.log that was generated.


> To circumvent this, we had to explicitly call MPI_Init prior to PetscInitialize:
>             MPI_Init(&argc,&argv);
>             ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr) return ierr;
> However, the side-effect of this above workaround seems to be several downstream runtime (assertion) errors with VecAssemblyBegin/End and MatAssemblyBeing/End statements:
> CrayPat/X:  Version 7.1.1 Revision 7c0ddd79b  08/19/19 16:58:46
> main.x: ../rtsum.c:5662: __pat_trsup_trace_waitsome_rtsum: Assertion `recv_count != MPI_UNDEFINED' failed.
>  main at main.c:769
>   VecAssemblyEnd at 0x2aaab951b3ba
>   VecAssemblyEnd_MPI_BTS at 0x2aaab950b179
>   MPI_Waitsome at 0x43a238
>   __pat_trsup_trace_waitsome_rtsum at 0x5f1a17
>   __GI___assert_fail at 0x2aaabc61e7d1
>   __assert_fail_base at 0x2aaabc61e759
>   __GI_abort at 0x2aaabc627740
>   __GI_raise at 0x2aaabc626160
> Interestingly,  we do not see such errors when there is no explicit MPI_Init, and no instrumentation for performance.
> Looking for someone to help throw more light on why PetSc Mat/Vec AssemblyEnd statements lead to such MPI-level assertion errors in cases where MPI_Init is explicitly called. 
> (Or alternatively, is there a way to call PetscInitialize in a manner that ensures that the MPI initialization is picked up by the profilers in question?)
> We would highly appreciate any help/pointers,
> Thanks!
>  Vijay

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210706/b706ef88/attachment.html>

More information about the petsc-users mailing list