[Darshan-users] LD_PRELOAD system() call

Phil Carns carns at mcs.anl.gov
Wed May 6 13:33:53 CDT 2015


On 05/05/2015 07:09 AM, Cristian Simarro wrote:
> Hi Phill,
>
> I have tried your test in our Cray but I can not reproduce the behaviour. Actually the test program is finishing properly.

Interesting!  I assume it is printing output showing that the wrappers 
were triggered, though, right?

I'm attaching my job script just in case there is any difference there 
in how we are executing the test case.  I think I probably compiled the 
example program with Intel compilers, while the read-wrapper library was 
compiled with GNU, though I wouldn't think that part would matter here.  
The system is running craype 2.2.1.



> Can you add traceback information about the call?
>      
>      Dl_info dli;
>
>      original_read = dlsym(RTLD_NEXT, "read");
>      dladdr(original_read,&dli);
>      fprintf(stderr, "debug trace [%d]: %s "
>                      "called by %p [ %s(%p) %s(%p) ].\n",
>                      getpid(), __func__,
>                       __builtin_return_address(0),
>                      strrchr(dli.dli_fname, '/') ?
>                              strrchr(dli.dli_fname, '/')+1 : dli.dli_fname,
>                      dli.dli_fbase, dli.dli_sname, dli.dli_saddr);

Sure, I'll give that a try and report back.  Thanks for the example; I 
was thinking that a backtrace might be very helpful but I wasn't sure 
how to best go about collecting it :)

thanks,
-Phil

> Best regards,
> Cristian
>
> ------------------------------------------------------------------
> Cristian Simarro
> Analyst, User Support Section
> European Centre for Medium-Range Weather Forecasts (ECMWF)
> Shinfield Park, Reading, RG2 9AX, United Kingdom
> Tel:    (+44 118) 9499315                Fax:    (+44 118) 9869450
> E-mail: Cristian.Simarro at ecmwf.int            http://www.ecmwf.int
> ------------------------------------------------------------------
>
> ----- Original Message -----
> From: "Phil Carns" <carns at mcs.anl.gov>
> To: darshan-users at lists.mcs.anl.gov
> Sent: Sunday, 3 May, 2015 3:35:46 PM
> Subject: Re: [Darshan-users] LD_PRELOAD system() call
>
> Hi Cristian,
>
> This is definitely the same problem that Kalyana and I were looking at
> earlier with fork().  I built a very small example reproducer library to
> try to simplify the problem (see Makefile and read-wrapper.c).  The
> read-wrapper.c isn't doing anything except intercepting the read()
> function, printing some information, then calling the real read()
> function.  I've been building this library with PrgEnv-gnu.
>
> The test.c is your example program, and the
> test-preload-read-wrapper.pbs.e* is an example stderr file from trying
> to run it with the example read wrapper library preloaded.
>
> There isn't any Darshan code involved here, but the example still
> hangs.  It looks like you could trigger it with *any* wrapper on the
> read() function in the Cray environment in conjunction with a fork() or
> system() call.  Maybe there is some sort of recursion here?
>
> I'll keep thinking about this some, but I thought I would share what I'm
> seeing with the list in case anyone else has an idea.
>
> thanks,
> -Phil
>
> On 05/01/2015 12:06 PM, Carns, Philip H. wrote:
>> Hi Cristian,
>>
>> I was testing on Edison, an XC30 system at NERSC.  I compiled with
>> cray-mpich 7.1.1, and I think it is using Torque as the batch system.
>> FYI, to run this example program I have to launch the executable using
>> aprun (otherwise MPI won't initialize properly). I think this will be
>> reproducible with non-MPI programs as well, though.
>>
>> thanks,
>> -Phil
>>
>> On 05/01/2015 04:53 AM, Cristian Simarro wrote:
>>> Hi Phil,
>>>
>>> Could you please tell me the batch system that you are using in your Cray machine? Is the MPI implementation cray-mpich?
>>>
>>> Thanks,
>>> Cristian
>>>
>>> ----- Original Message -----
>>> From: "Phil Carns" <carns at mcs.anl.gov>
>>> To: "Cristian Simarro" <cristian.simarro at ecmwf.int>
>>> Cc: darshan-users at lists.mcs.anl.gov
>>> Sent: Thursday, 30 April, 2015 8:45:11 PM
>>> Subject: Re: [Darshan-users] LD_PRELOAD system() call
>>>
>>> Thanks for the test program, Cristian. I can confirm that it hangs with
>>> LD_PRELOAD on a Cray, but not on a Linux workstation.  I'm not exactly
>>> sure what the underlying difference is in this case, but it is
>>> definitely 100% reproducible in the Cray environment.
>>>
>>> Kalyana Chadalavada has actually observed something very similar when
>>> using fork() directly; I imagine that it is the underlying fork() within
>>> the system() call that is causing the problem.
>>>
>>> thanks,
>>> -Phil
>>>
>>> On 04/30/2015 03:07 AM, Cristian Simarro wrote:
>>>> Hi Phill,
>>>>
>>>> Actually any command under system() call is triggering the problem. The spawned process do not finish and then the task that has issued the call is hung on the waitpid.
>>>>
>>>> This example hangs if we are using LD_PRELOAD mechanism:
>>>>
>>>> #include <stdio.h>
>>>> #include <mpi.h>
>>>> #include <stdlib.h>
>>>>
>>>> int main (int argc, char *argv[])
>>>> {
>>>>       int rank, size;
>>>>       int ret;
>>>>
>>>>       MPI_Init (&argc, &argv);
>>>>       MPI_Comm_rank (MPI_COMM_WORLD, &rank);
>>>>       MPI_Comm_size (MPI_COMM_WORLD, &size);
>>>>       if(rank == 0) {
>>>>        ret = system("echo calling system");
>>>>       }
>>>>       printf( "Hello world from process %d of %d\n", rank, size );
>>>>       MPI_Finalize();
>>>>       return 0;
>>>> }
>>>>
>>>> Thanks,
>>>> Cristian
>>>>
>>>> ------------------------------------------------------------------
>>>> Cristian Simarro
>>>> Analyst, User Support Section
>>>> European Centre for Medium-Range Weather Forecasts (ECMWF)
>>>> Shinfield Park, Reading, RG2 9AX, United Kingdom
>>>> Tel:    (+44 118) 9499315                Fax:    (+44 118) 9869450
>>>> E-mail: Cristian.Simarro at ecmwf.int            http://www.ecmwf.int
>>>> ------------------------------------------------------------------
>>>>
>>>> ----- Original Message -----
>>>> From: "Phil Carns" <carns at mcs.anl.gov>
>>>> To: darshan-users at lists.mcs.anl.gov
>>>> Sent: Wednesday, 29 April, 2015 10:13:54 PM
>>>> Subject: Re: [Darshan-users] LD_PRELOAD system() call
>>>>
>>>> On 04/29/2015 02:54 PM, Phil Carns wrote:
>>>>> On 04/29/2015 12:17 PM, Cristian Simarro wrote:
>>>>>> Hello,
>>>>>>
>>>>>> We have been facing some problems with system() call inside some
>>>>>> C/Fortran codes in our Cray machine.
>>>>>>
>>>>>> The method used here is compile dynamically and then use LD_PRELOAD.
>>>>>> When the code calls system(command), it hangs the execution if
>>>>>> preloaded with Darshan because it is trying to instrument an internal
>>>>>> system read() with no initialization.
>>>>>>
>>>>>> The solution we have designed is to unset LD_PRELOAD (if set before)
>>>>>> in the darshan_mpi_initialize function.
>>>>>>
>>>>>> Has anybody found a similar problem with LD_PRELOAD + system() calls?
>>>>> Hi Cristian,
>>>>>
>>>>> I don't think I've seen this exact combination before, but it seems
>>>>> like something we should be able to reproduce and isolate.
>>>>>
>>>>> If I understand correctly, it sounds like the underlying process
>>>>> spawned by system() is inheriting the LD_PRELOAD environment variable
>>>>> from the parent program, and it is the underlying process that is
>>>>> getting hung?  If so, does it matter what you run in the system() call
>>>>> or does it seem like pretty anything triggers it?
>>>>>
>>>>> thanks,
>>>>> -Phil
>>>> The solution you have suggested (unsetting LD_PRELOAD grammatically
>>>> during Darshan initialization) might not be a bad long term solution,
>>>> maybe with some extra safety logic to make sure we don't accidentally
>>>> unset unrelated LD_PRELOAD entries.  I imagine that once the application
>>>> has gotten to darshan initialization, then the loader has already
>>>> processed the LD_PRELOAD environment variable and we don't need to keep
>>>> it set any longer.  That would help keep it from interfering with child
>>>> processes.
>>>>
>>>> We would definitely need to do some testing to confirm, though.
>>>>
>>>> thanks,
>>>> -Phil
>>>> _______________________________________________
>>>> Darshan-users mailing list
>>>> Darshan-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users

-------------- next part --------------
#!/bin/bash
#PBS -l walltime=00:02:00
#PBS -l nodes=1:ppn=1
#PBS -q debug
#PBS -m abe
#PBS -M carns at mcs.anl.gov

set -e

export LD_PRELOAD=/global/u1/p/pcarns/working/darshan-cristian-bug/read-wrapper/libread-wrapper.so

cd $PBS_O_WORKDIR

aprun -n 1 ./test


More information about the Darshan-users mailing list