[Darshan-users] darshan execl issue

Snyder, Shane ssnyder at mcs.anl.gov
Tue Jul 12 12:57:51 CDT 2022


Just to clarify, are you doing anything explicit to spawn a new process (i.e., fork) ahead of the call to execl? My understanding is that execl replaces the calling process, so generally speaking it shouldn't result in 2 processes (MPI one + one for system tasks)?

--Shane
________________________________
From: Adrian Jackson <a.jackson at epcc.ed.ac.uk>
Sent: Tuesday, July 12, 2022 10:58 AM
To: Snyder, Shane <ssnyder at mcs.anl.gov>; darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: Re: [Darshan-users] darshan execl issue

Hi Shane,

Thanks for the reply. This is spawning an additional process that runs
for a bit then ends, the original MPI processes are still all there,
it's just there is an additional one for a bit. The spawned process
doesn't do any MPI, it's just doing some system interaction stuff. If
Darshan intercepts or is triggered by a process ending I think that
would explain it.

It's something we can work around anyway, we can just not use Darshan
for this exectuable, I was just checking if it's expected behaviour with
Darshan or whether we'd stumbled across "an accidental feature" :)

cheers

adrianj

On 12/07/2022 16:53, Snyder, Shane wrote:
> This email was sent to you by someone outside the University.
> You should only click on links or attachments if you are certain that
> the email is genuine and the content is safe.
> Hi Adrian,
>
> So you are replacing one of the MPI processes in MPI_COMM_WOLRD with a
> new process? In that case, it is probably that this new replacing
> process is not calling MPI_Finalize which ultimately causes Darshan to
> hang -- Darshan is intercepting the shutdown call and performing some
> collective operations for MPI applications, and if one of the ranks
> disappears these calls will likely just hang. If that's the issue, you
> could probably reproduce without using Darshan by having your MPI
> processes run a collective on MPI_COMM_WORLD (like a barrier) _after_
> the execl call.
>
> A couple of different ideas:
>
>   * If possible, it might be worth trying to fork ahead of the execl
>     call so that you still have all MPI processes hanging around at
>     shutdown time?
>   * You may be able to run Darshan in non-MPI mode at runtime (using
>     'export DARSHAN_ENABLE_NONMPI=1') to workaround this problem. This
>     would prevent Darshan from running collectives at shutdown time, but
>     will result in a different log file for each process in your
>     application.
>
> Thanks,
> --Shane
> ------------------------------------------------------------------------
> *From:* Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on
> behalf of Adrian Jackson <a.jackson at epcc.ed.ac.uk>
> *Sent:* Tuesday, July 12, 2022 8:13 AM
> *To:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
> *Subject:* [Darshan-users] darshan execl issue
> Hi,
>
> I've encountered an issue using Darshan (3.3.1) with a code which calls
> execl from one MPI process. Using with Darshan the MPI run just hangs.
> Is spawning processes from a subset of MPI processes an issue for
> Darshan? I would say that I can still spawn processes (i.e. using fork)
> and it seems to work, but using execl doesn't.
>
> cheers
>
> adrianj
> --
> Tel: +44 131 6506470 skype: remoteadrianj
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336. Is e buidheann carthannais
> a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh
> SC005336.
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
> <https://lists.mcs.anl.gov/mailman/listinfo/darshan-users>

--
Tel: +44 131 6506470 skype: remoteadrianj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20220712/652e7161/attachment.html>


More information about the Darshan-users mailing list