[Darshan-users] LD_PRELOAD system() call

Cristian Simarro cristian.simarro at ecmwf.int
Fri May 1 03:53:55 CDT 2015


Hi Phil,

Could you please tell me the batch system that you are using in your Cray machine? Is the MPI implementation cray-mpich?

Thanks,
Cristian

----- Original Message -----
From: "Phil Carns" <carns at mcs.anl.gov>
To: "Cristian Simarro" <cristian.simarro at ecmwf.int>
Cc: darshan-users at lists.mcs.anl.gov
Sent: Thursday, 30 April, 2015 8:45:11 PM
Subject: Re: [Darshan-users] LD_PRELOAD system() call

Thanks for the test program, Cristian. I can confirm that it hangs with 
LD_PRELOAD on a Cray, but not on a Linux workstation.  I'm not exactly 
sure what the underlying difference is in this case, but it is 
definitely 100% reproducible in the Cray environment.

Kalyana Chadalavada has actually observed something very similar when 
using fork() directly; I imagine that it is the underlying fork() within 
the system() call that is causing the problem.

thanks,
-Phil

On 04/30/2015 03:07 AM, Cristian Simarro wrote:
> Hi Phill,
>
> Actually any command under system() call is triggering the problem. The spawned process do not finish and then the task that has issued the call is hung on the waitpid.
>
> This example hangs if we are using LD_PRELOAD mechanism:
>
> #include <stdio.h>
> #include <mpi.h>
> #include <stdlib.h>
>
> int main (int argc, char *argv[])
> {
>    int rank, size;
>    int ret;
>
>    MPI_Init (&argc, &argv);
>    MPI_Comm_rank (MPI_COMM_WORLD, &rank);
>    MPI_Comm_size (MPI_COMM_WORLD, &size);
>    if(rank == 0) {
>     ret = system("echo calling system");
>    }
>    printf( "Hello world from process %d of %d\n", rank, size );
>    MPI_Finalize();
>    return 0;
> }
>
> Thanks,
> Cristian
>
> ------------------------------------------------------------------
> Cristian Simarro
> Analyst, User Support Section
> European Centre for Medium-Range Weather Forecasts (ECMWF)
> Shinfield Park, Reading, RG2 9AX, United Kingdom
> Tel:    (+44 118) 9499315                Fax:    (+44 118) 9869450
> E-mail: Cristian.Simarro at ecmwf.int            http://www.ecmwf.int
> ------------------------------------------------------------------
>
> ----- Original Message -----
> From: "Phil Carns" <carns at mcs.anl.gov>
> To: darshan-users at lists.mcs.anl.gov
> Sent: Wednesday, 29 April, 2015 10:13:54 PM
> Subject: Re: [Darshan-users] LD_PRELOAD system() call
>
> On 04/29/2015 02:54 PM, Phil Carns wrote:
>> On 04/29/2015 12:17 PM, Cristian Simarro wrote:
>>> Hello,
>>>
>>> We have been facing some problems with system() call inside some
>>> C/Fortran codes in our Cray machine.
>>>
>>> The method used here is compile dynamically and then use LD_PRELOAD.
>>> When the code calls system(command), it hangs the execution if
>>> preloaded with Darshan because it is trying to instrument an internal
>>> system read() with no initialization.
>>>
>>> The solution we have designed is to unset LD_PRELOAD (if set before)
>>> in the darshan_mpi_initialize function.
>>>
>>> Has anybody found a similar problem with LD_PRELOAD + system() calls?
>> Hi Cristian,
>>
>> I don't think I've seen this exact combination before, but it seems
>> like something we should be able to reproduce and isolate.
>>
>> If I understand correctly, it sounds like the underlying process
>> spawned by system() is inheriting the LD_PRELOAD environment variable
>> from the parent program, and it is the underlying process that is
>> getting hung?  If so, does it matter what you run in the system() call
>> or does it seem like pretty anything triggers it?
>>
>> thanks,
>> -Phil
> The solution you have suggested (unsetting LD_PRELOAD grammatically
> during Darshan initialization) might not be a bad long term solution,
> maybe with some extra safety logic to make sure we don't accidentally
> unset unrelated LD_PRELOAD entries.  I imagine that once the application
> has gotten to darshan initialization, then the loader has already
> processed the LD_PRELOAD environment variable and we don't need to keep
> it set any longer.  That would help keep it from interfering with child
> processes.
>
> We would definitely need to do some testing to confirm, though.
>
> thanks,
> -Phil
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users



More information about the Darshan-users mailing list