[Darshan-users] darshan execl issue

Adrian Jackson a.jackson at epcc.ed.ac.uk
Tue Jul 12 13:08:48 CDT 2022


Ah, good question. No, I don't think I am, I'd not picked up that was 
what execl did.

For full disclosure, I'm not actually calling execl, this is a Fortran 
program that is calling system(), which according to the man pages on 
our Cray does "The system() library function uses fork(2) to create a 
child process that executes the shell command specified in command using 
execl(3) as follows:

            execl("/bin/sh", "sh", "-c", command, (char *) 0);
"

Hence the execl call.

I think I'll just tell the code owners not to use system() for now.

cheers

adrianj



On 12/07/2022 18:57, Snyder, Shane wrote:
> This email was sent to you by someone outside the University.
> You should only click on links or attachments if you are certain that 
> the email is genuine and the content is safe.
> Just to clarify, are you doing anything explicit to spawn a new process 
> (i.e., fork) ahead of the call to execl? My understanding is that execl 
> replaces the calling process, so generally speaking it shouldn't result 
> in 2 processes (MPI one + one for system tasks)?
> 
> --Shane
> ------------------------------------------------------------------------
> *From:* Adrian Jackson <a.jackson at epcc.ed.ac.uk>
> *Sent:* Tuesday, July 12, 2022 10:58 AM
> *To:* Snyder, Shane <ssnyder at mcs.anl.gov>; 
> darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
> *Subject:* Re: [Darshan-users] darshan execl issue
> Hi Shane,
> 
> Thanks for the reply. This is spawning an additional process that runs
> for a bit then ends, the original MPI processes are still all there,
> it's just there is an additional one for a bit. The spawned process
> doesn't do any MPI, it's just doing some system interaction stuff. If
> Darshan intercepts or is triggered by a process ending I think that
> would explain it.
> 
> It's something we can work around anyway, we can just not use Darshan
> for this exectuable, I was just checking if it's expected behaviour with
> Darshan or whether we'd stumbled across "an accidental feature" :)
> 
> cheers
> 
> adrianj
> 
> On 12/07/2022 16:53, Snyder, Shane wrote:
>> This email was sent to you by someone outside the University.
>> You should only click on links or attachments if you are certain that 
>> the email is genuine and the content is safe.
>> Hi Adrian,
>> 
>> So you are replacing one of the MPI processes in MPI_COMM_WOLRD with a 
>> new process? In that case, it is probably that this new replacing 
>> process is not calling MPI_Finalize which ultimately causes Darshan to 
>> hang -- Darshan is intercepting the shutdown call and performing some 
>> collective operations for MPI applications, and if one of the ranks 
>> disappears these calls will likely just hang. If that's the issue, you 
>> could probably reproduce without using Darshan by having your MPI 
>> processes run a collective on MPI_COMM_WORLD (like a barrier) _after_ 
>> the execl call.
>> 
>> A couple of different ideas:
>> 
>>   * If possible, it might be worth trying to fork ahead of the execl
>>     call so that you still have all MPI processes hanging around at
>>     shutdown time?
>>   * You may be able to run Darshan in non-MPI mode at runtime (using
>>     'export DARSHAN_ENABLE_NONMPI=1') to workaround this problem. This
>>     would prevent Darshan from running collectives at shutdown time, but
>>     will result in a different log file for each process in your
>>     application.
>> 
>> Thanks,
>> --Shane
>> ------------------------------------------------------------------------
>> *From:* Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on 
>> behalf of Adrian Jackson <a.jackson at epcc.ed.ac.uk>
>> *Sent:* Tuesday, July 12, 2022 8:13 AM
>> *To:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
>> *Subject:* [Darshan-users] darshan execl issue
>> Hi,
>> 
>> I've encountered an issue using Darshan (3.3.1) with a code which calls
>> execl from one MPI process. Using with Darshan the MPI run just hangs.
>> Is spawning processes from a subset of MPI processes an issue for
>> Darshan? I would say that I can still spawn processes (i.e. using fork)
>> and it seems to work, but using execl doesn't.
>> 
>> cheers
>> 
>> adrianj
>> --
>> Tel: +44 131 6506470 skype: remoteadrianj
>> The University of Edinburgh is a charitable body, registered in 
>> Scotland, with registration number SC005336. Is e buidheann carthannais 
>> a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh 
>> SC005336.
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users 
> <https://lists.mcs.anl.gov/mailman/listinfo/darshan-users>
>> <https://lists.mcs.anl.gov/mailman/listinfo/darshan-users 
> <https://lists.mcs.anl.gov/mailman/listinfo/darshan-users>>
> 
> -- 
> Tel: +44 131 6506470 skype: remoteadrianj

-- 
Tel: +44 131 6506470 skype: remoteadrianj


More information about the Darshan-users mailing list