[Darshan-users] darshan execl issue
Adrian Jackson
a.jackson at epcc.ed.ac.uk
Tue Jul 12 10:58:29 CDT 2022
Hi Shane,
Thanks for the reply. This is spawning an additional process that runs
for a bit then ends, the original MPI processes are still all there,
it's just there is an additional one for a bit. The spawned process
doesn't do any MPI, it's just doing some system interaction stuff. If
Darshan intercepts or is triggered by a process ending I think that
would explain it.
It's something we can work around anyway, we can just not use Darshan
for this exectuable, I was just checking if it's expected behaviour with
Darshan or whether we'd stumbled across "an accidental feature" :)
cheers
adrianj
On 12/07/2022 16:53, Snyder, Shane wrote:
> This email was sent to you by someone outside the University.
> You should only click on links or attachments if you are certain that
> the email is genuine and the content is safe.
> Hi Adrian,
>
> So you are replacing one of the MPI processes in MPI_COMM_WOLRD with a
> new process? In that case, it is probably that this new replacing
> process is not calling MPI_Finalize which ultimately causes Darshan to
> hang -- Darshan is intercepting the shutdown call and performing some
> collective operations for MPI applications, and if one of the ranks
> disappears these calls will likely just hang. If that's the issue, you
> could probably reproduce without using Darshan by having your MPI
> processes run a collective on MPI_COMM_WORLD (like a barrier) _after_
> the execl call.
>
> A couple of different ideas:
>
> * If possible, it might be worth trying to fork ahead of the execl
> call so that you still have all MPI processes hanging around at
> shutdown time?
> * You may be able to run Darshan in non-MPI mode at runtime (using
> 'export DARSHAN_ENABLE_NONMPI=1') to workaround this problem. This
> would prevent Darshan from running collectives at shutdown time, but
> will result in a different log file for each process in your
> application.
>
> Thanks,
> --Shane
> ------------------------------------------------------------------------
> *From:* Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on
> behalf of Adrian Jackson <a.jackson at epcc.ed.ac.uk>
> *Sent:* Tuesday, July 12, 2022 8:13 AM
> *To:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
> *Subject:* [Darshan-users] darshan execl issue
> Hi,
>
> I've encountered an issue using Darshan (3.3.1) with a code which calls
> execl from one MPI process. Using with Darshan the MPI run just hangs.
> Is spawning processes from a subset of MPI processes an issue for
> Darshan? I would say that I can still spawn processes (i.e. using fork)
> and it seems to work, but using execl doesn't.
>
> cheers
>
> adrianj
> --
> Tel: +44 131 6506470 skype: remoteadrianj
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336. Is e buidheann carthannais
> a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh
> SC005336.
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
> <https://lists.mcs.anl.gov/mailman/listinfo/darshan-users>
--
Tel: +44 131 6506470 skype: remoteadrianj
More information about the Darshan-users
mailing list