[petsc-users] PetSc and CrayPat: MPI assertion errors
Vijay S Kumar
vijayskumar at gmail.com
Tue Jul 6 08:30:14 CDT 2021
Hello all,
By way of background, we have a PetSc-based solver that we run on our
in-house Cray system. We are carrying out performance analysis using
profilers in the CrayPat suite that provide more fine-grained
performance-related information than the PetSc log_view summary.
When instrumented using CrayPat perftools, it turns out that the MPI
initialization (MPI_Init) internally invoked by PetscInitialize is not
picked up by the profiler. That is, simply specifying the following:
ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr)
return ierr;
results in the following runtime error:
CrayPat/X: Version 7.1.1 Revision 7c0ddd79b 08/19/19
16:58:46
Attempting to use an MPI routine before initializing MPICH
To circumvent this, we had to explicitly call MPI_Init prior to
PetscInitialize:
MPI_Init(&argc,&argv);
ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr)
return ierr;
However, the side-effect of this above workaround seems to be several
downstream runtime (assertion) errors with VecAssemblyBegin/End and
MatAssemblyBeing/End statements:
CrayPat/X: Version 7.1.1 Revision 7c0ddd79b 08/19/19 16:58:46
main.x: ../rtsum.c:5662: __pat_trsup_trace_waitsome_rtsum: Assertion
`recv_count != MPI_UNDEFINED' failed.
main at main.c:769
VecAssemblyEnd at 0x2aaab951b3ba
VecAssemblyEnd_MPI_BTS at 0x2aaab950b179
MPI_Waitsome at 0x43a238
__pat_trsup_trace_waitsome_rtsum at 0x5f1a17
__GI___assert_fail at 0x2aaabc61e7d1
__assert_fail_base at 0x2aaabc61e759
__GI_abort at 0x2aaabc627740
__GI_raise at 0x2aaabc626160
Interestingly, we do not see such errors when there is no explicit
MPI_Init, and no instrumentation for performance.
Looking for someone to help throw more light on why PetSc Mat/Vec
AssemblyEnd statements lead to such MPI-level assertion errors in cases
where MPI_Init is explicitly called.
(Or alternatively, is there a way to call PetscInitialize in a manner that
ensures that the MPI initialization is picked up by the profilers in
question?)
We would highly appreciate any help/pointers,
Thanks!
Vijay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210706/44fcda0b/attachment.html>
More information about the petsc-users
mailing list