<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi Adrian,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
So you are replacing one of the MPI processes in MPI_COMM_WOLRD with a new process? In that case, it is probably that this new replacing process is not calling MPI_Finalize which ultimately causes Darshan to hang -- Darshan is intercepting the shutdown call
and performing some collective operations for MPI applications, and if one of the ranks disappears these calls will likely just hang. If that's the issue, you could probably reproduce without using Darshan by having your MPI processes run a collective on MPI_COMM_WORLD
(like a barrier) _after_ the execl call. <br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
A couple of different ideas:</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<ul>
<li>If possible, it might be worth trying to fork ahead of the execl call so that you still have all MPI processes hanging around at shutdown time?</li><li>You may be able to run Darshan in non-MPI mode at runtime (using 'export DARSHAN_ENABLE_NONMPI=1') to workaround this problem. This would prevent Darshan from running collectives at shutdown time, but will result in a different log file for each process
in your application. <br>
</li></ul>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
--Shane<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Darshan-users <darshan-users-bounces@lists.mcs.anl.gov> on behalf of Adrian Jackson <a.jackson@epcc.ed.ac.uk><br>
<b>Sent:</b> Tuesday, July 12, 2022 8:13 AM<br>
<b>To:</b> darshan-users@lists.mcs.anl.gov <darshan-users@lists.mcs.anl.gov><br>
<b>Subject:</b> [Darshan-users] darshan execl issue</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Hi,<br>
<br>
I've encountered an issue using Darshan (3.3.1) with a code which calls<br>
execl from one MPI process. Using with Darshan the MPI run just hangs.<br>
Is spawning processes from a subset of MPI processes an issue for<br>
Darshan? I would say that I can still spawn processes (i.e. using fork)<br>
and it seems to work, but using execl doesn't.<br>
<br>
cheers<br>
<br>
adrianj<br>
--<br>
Tel: +44 131 6506470 skype: remoteadrianj<br>
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.<br>
_______________________________________________<br>
Darshan-users mailing list<br>
Darshan-users@lists.mcs.anl.gov<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/darshan-users">https://lists.mcs.anl.gov/mailman/listinfo/darshan-users</a><br>
</div>
</span></font></div>
</body>
</html>