[Darshan-users] darshan execl issue

Snyder, Shane ssnyder at mcs.anl.gov
Tue Jul 12 10:53:40 CDT 2022


Hi Adrian,

So you are replacing one of the MPI processes in MPI_COMM_WOLRD with a new process? In that case, it is probably that this new replacing process is not calling MPI_Finalize which ultimately causes Darshan to hang -- Darshan is intercepting the shutdown call and performing some collective operations for MPI applications, and if one of the ranks disappears these calls will likely just hang. If that's the issue, you could probably reproduce without using Darshan by having your MPI processes run a collective on MPI_COMM_WORLD (like a barrier) _after_ the execl call.

A couple of different ideas:

  *   If possible, it might be worth trying to fork ahead of the execl call so that you still have all MPI processes hanging around at shutdown time?
  *   You may be able to run Darshan in non-MPI mode at runtime (using 'export DARSHAN_ENABLE_NONMPI=1') to workaround this problem. This would prevent Darshan from running collectives at shutdown time, but will result in a different log file for each process in your application.

Thanks,
--Shane
________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Adrian Jackson <a.jackson at epcc.ed.ac.uk>
Sent: Tuesday, July 12, 2022 8:13 AM
To: darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: [Darshan-users] darshan execl issue

Hi,

I've encountered an issue using Darshan (3.3.1) with a code which calls
execl from one MPI process. Using with Darshan the MPI run just hangs.
Is spawning processes from a subset of MPI processes an issue for
Darshan? I would say that I can still spawn processes (i.e. using fork)
and it seems to work, but using execl doesn't.

cheers

adrianj
--
Tel: +44 131 6506470 skype: remoteadrianj
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
_______________________________________________
Darshan-users mailing list
Darshan-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20220712/8d58fa2c/attachment-0001.html>


More information about the Darshan-users mailing list