[Darshan-users] LD_PRELOAD system() call

Phil Carns carns at mcs.anl.gov
Fri May 1 11:06:57 CDT 2015


Hi Cristian,

I was testing on Edison, an XC30 system at NERSC.  I compiled with 
cray-mpich 7.1.1, and I think it is using Torque as the batch system.  
FYI, to run this example program I have to launch the executable using 
aprun (otherwise MPI won't initialize properly). I think this will be 
reproducible with non-MPI programs as well, though.

thanks,
-Phil

On 05/01/2015 04:53 AM, Cristian Simarro wrote:
> Hi Phil,
>
> Could you please tell me the batch system that you are using in your Cray machine? Is the MPI implementation cray-mpich?
>
> Thanks,
> Cristian
>
> ----- Original Message -----
> From: "Phil Carns" <carns at mcs.anl.gov>
> To: "Cristian Simarro" <cristian.simarro at ecmwf.int>
> Cc: darshan-users at lists.mcs.anl.gov
> Sent: Thursday, 30 April, 2015 8:45:11 PM
> Subject: Re: [Darshan-users] LD_PRELOAD system() call
>
> Thanks for the test program, Cristian. I can confirm that it hangs with
> LD_PRELOAD on a Cray, but not on a Linux workstation.  I'm not exactly
> sure what the underlying difference is in this case, but it is
> definitely 100% reproducible in the Cray environment.
>
> Kalyana Chadalavada has actually observed something very similar when
> using fork() directly; I imagine that it is the underlying fork() within
> the system() call that is causing the problem.
>
> thanks,
> -Phil
>
> On 04/30/2015 03:07 AM, Cristian Simarro wrote:
>> Hi Phill,
>>
>> Actually any command under system() call is triggering the problem. The spawned process do not finish and then the task that has issued the call is hung on the waitpid.
>>
>> This example hangs if we are using LD_PRELOAD mechanism:
>>
>> #include <stdio.h>
>> #include <mpi.h>
>> #include <stdlib.h>
>>
>> int main (int argc, char *argv[])
>> {
>>     int rank, size;
>>     int ret;
>>
>>     MPI_Init (&argc, &argv);
>>     MPI_Comm_rank (MPI_COMM_WORLD, &rank);
>>     MPI_Comm_size (MPI_COMM_WORLD, &size);
>>     if(rank == 0) {
>>      ret = system("echo calling system");
>>     }
>>     printf( "Hello world from process %d of %d\n", rank, size );
>>     MPI_Finalize();
>>     return 0;
>> }
>>
>> Thanks,
>> Cristian
>>
>> ------------------------------------------------------------------
>> Cristian Simarro
>> Analyst, User Support Section
>> European Centre for Medium-Range Weather Forecasts (ECMWF)
>> Shinfield Park, Reading, RG2 9AX, United Kingdom
>> Tel:    (+44 118) 9499315                Fax:    (+44 118) 9869450
>> E-mail: Cristian.Simarro at ecmwf.int            http://www.ecmwf.int
>> ------------------------------------------------------------------
>>
>> ----- Original Message -----
>> From: "Phil Carns" <carns at mcs.anl.gov>
>> To: darshan-users at lists.mcs.anl.gov
>> Sent: Wednesday, 29 April, 2015 10:13:54 PM
>> Subject: Re: [Darshan-users] LD_PRELOAD system() call
>>
>> On 04/29/2015 02:54 PM, Phil Carns wrote:
>>> On 04/29/2015 12:17 PM, Cristian Simarro wrote:
>>>> Hello,
>>>>
>>>> We have been facing some problems with system() call inside some
>>>> C/Fortran codes in our Cray machine.
>>>>
>>>> The method used here is compile dynamically and then use LD_PRELOAD.
>>>> When the code calls system(command), it hangs the execution if
>>>> preloaded with Darshan because it is trying to instrument an internal
>>>> system read() with no initialization.
>>>>
>>>> The solution we have designed is to unset LD_PRELOAD (if set before)
>>>> in the darshan_mpi_initialize function.
>>>>
>>>> Has anybody found a similar problem with LD_PRELOAD + system() calls?
>>> Hi Cristian,
>>>
>>> I don't think I've seen this exact combination before, but it seems
>>> like something we should be able to reproduce and isolate.
>>>
>>> If I understand correctly, it sounds like the underlying process
>>> spawned by system() is inheriting the LD_PRELOAD environment variable
>>> from the parent program, and it is the underlying process that is
>>> getting hung?  If so, does it matter what you run in the system() call
>>> or does it seem like pretty anything triggers it?
>>>
>>> thanks,
>>> -Phil
>> The solution you have suggested (unsetting LD_PRELOAD grammatically
>> during Darshan initialization) might not be a bad long term solution,
>> maybe with some extra safety logic to make sure we don't accidentally
>> unset unrelated LD_PRELOAD entries.  I imagine that once the application
>> has gotten to darshan initialization, then the loader has already
>> processed the LD_PRELOAD environment variable and we don't need to keep
>> it set any longer.  That would help keep it from interfering with child
>> processes.
>>
>> We would definitely need to do some testing to confirm, though.
>>
>> thanks,
>> -Phil
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users



More information about the Darshan-users mailing list