[Darshan-users] darshan execl issue

Adrian Jackson a.jackson at epcc.ed.ac.uk
Tue Jul 12 10:58:29 CDT 2022


Hi Shane,

Thanks for the reply. This is spawning an additional process that runs 
for a bit then ends, the original MPI processes are still all there, 
it's just there is an additional one for a bit. The spawned process 
doesn't do any MPI, it's just doing some system interaction stuff. If 
Darshan intercepts or is triggered by a process ending I think that 
would explain it.

It's something we can work around anyway, we can just not use Darshan 
for this exectuable, I was just checking if it's expected behaviour with 
Darshan or whether we'd stumbled across "an accidental feature" :)

cheers

adrianj

On 12/07/2022 16:53, Snyder, Shane wrote:
> This email was sent to you by someone outside the University.
> You should only click on links or attachments if you are certain that 
> the email is genuine and the content is safe.
> Hi Adrian,
> 
> So you are replacing one of the MPI processes in MPI_COMM_WOLRD with a 
> new process? In that case, it is probably that this new replacing 
> process is not calling MPI_Finalize which ultimately causes Darshan to 
> hang -- Darshan is intercepting the shutdown call and performing some 
> collective operations for MPI applications, and if one of the ranks 
> disappears these calls will likely just hang. If that's the issue, you 
> could probably reproduce without using Darshan by having your MPI 
> processes run a collective on MPI_COMM_WORLD (like a barrier) _after_ 
> the execl call.
> 
> A couple of different ideas:
> 
>   * If possible, it might be worth trying to fork ahead of the execl
>     call so that you still have all MPI processes hanging around at
>     shutdown time?
>   * You may be able to run Darshan in non-MPI mode at runtime (using
>     'export DARSHAN_ENABLE_NONMPI=1') to workaround this problem. This
>     would prevent Darshan from running collectives at shutdown time, but
>     will result in a different log file for each process in your
>     application.
> 
> Thanks,
> --Shane
> ------------------------------------------------------------------------
> *From:* Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on 
> behalf of Adrian Jackson <a.jackson at epcc.ed.ac.uk>
> *Sent:* Tuesday, July 12, 2022 8:13 AM
> *To:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
> *Subject:* [Darshan-users] darshan execl issue
> Hi,
> 
> I've encountered an issue using Darshan (3.3.1) with a code which calls
> execl from one MPI process. Using with Darshan the MPI run just hangs.
> Is spawning processes from a subset of MPI processes an issue for
> Darshan? I would say that I can still spawn processes (i.e. using fork)
> and it seems to work, but using execl doesn't.
> 
> cheers
> 
> adrianj
> --
> Tel: +44 131 6506470 skype: remoteadrianj
> The University of Edinburgh is a charitable body, registered in 
> Scotland, with registration number SC005336. Is e buidheann carthannais 
> a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh 
> SC005336.
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users 
> <https://lists.mcs.anl.gov/mailman/listinfo/darshan-users>

-- 
Tel: +44 131 6506470 skype: remoteadrianj


More information about the Darshan-users mailing list