[Darshan-users] fputs and internal glibc i/o calls
Phil Carns
carns at mcs.anl.gov
Thu Apr 3 13:12:20 CDT 2014
Hi Chuck,
Thanks for the detailed information. Darshan definitely does not
currently provide wrappers for the stream functions, and you are right
that we can't catch the underlying I/O calls either.
As far as whether it is intentional or not, it is at least a
long-standing known problem :) We haven't added the wrappers and tested
them. There are quite a few variations on the stream functions; we
would need a separate wrapper for each one.
Chris Daley of NERSC has run into this recently as well. I think this
is actually going to become a more frequent problem, especially with
applications that access genomics data in text format.
We won't be able to address this in the next point release (2.2.9), but
I can keep you in the loop if you would be interested in helping to test
an experimental version of this feature in the medium term.
We have an old ticket open on this issue. I just clarified the
description and added some comments to help keep track of what's going
on: http://trac.mcs.anl.gov/projects/darshan/ticket/38
thanks,
-Phil
On 04/03/2014 01:17 PM, Chuck Cranor wrote:
> hi-
>
> I downloaded and tried 2.2.8 to better understand how it works
> and what it can do. The test app I used uses MPI to simulate
> checkpoints and then proc 0 writes the performance results to a log
> file. I used the LD_PRELOAD mechanism to load darshan.
>
> Looking at the darshan log file from this app, I noticed that
> darshan captured proc 0 creating the log file, but didn't show
> any writes to it (even though new data was added). This is because
> the app does all its writes with fputs and darshan doesn't intercept
> calls to fputs (or calls to fprintf either, for that matter).
>
> I'm not sure if omitting fputs/fprintf is intentional or not, but
> I didn't see anything in the documentation about it.
>
>
> Also, it is worth noting in the documentation somewhere that
> internal glibc I/O system calls may not be captured in the darshan
> log. For example, fputs() is a libc function that uses write()
> internally, but the internal write() will not appear in the darshan
> log. I believe this is because glibc does not allow internally
> generated system calls to be overridden. See the discussion of
> hidden prototypes here:
>
> https://sourceware.org/glibc/wiki/Testing/Check-localplt
>
> An easy way to demo this is with the ctime(3) function. Internally,
> ctime(3) uses the POSIX open(2) I/O call to load local timezone
> information from /etc/localtime. So you might think a simple program like:
>
> #include <stdio.h>
> #include <time.h>
>
> main(int argc, char **argv ) {
> time_t now;
> MPI_Init(&argc, &argv);
> now = time(0);
> printf("time=%s", ctime(&now));
> MPI_Finalize();
> }
>
> would show the I/O to /etc/localtime in the darshan log, but it
> doesn't:
>
> h0:/tmp/ctime# ./a.out
> time=Thu Apr 3 11:05:56 2014
> h0:/tmp/ctime#
>
> h0:/tmp/ctime# strace ./a.out |& fgrep localtime
> open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 4
> h0:/tmp/ctime#
>
> h0:/tmp/ctime# env LD_PRELOAD=/usr/local/lib/libdarshan.so mpirun.mpich2 ./a.out
> time=Thu Apr 3 11:06:31 2014
> h0:/tmp/ctime# darshan-parser /m/pvfs/tmp/2014/4/3/root_a.out_id22281_4-3-39991-1756671920181884080_1.darshan.gz
> # darshan log version: 2.02
> # size of file statistics: 1328 bytes
> # size of job statistics: 120 bytes
> # exe: ./a.out
> # uid: 0
> # jobid: 22281
> # start_time: 1396544791
> # start_time_asci: Thu Apr 3 11:06:31 2014
> # end_time: 1396544791
> # end_time_asci: Thu Apr 3 11:06:31 2014
> # nprocs: 1
> # run time: 1
> # metadata: lib_ver = 2.2.8
> # metadata: h = romio_no_indep_rw=true;cb_nodes=4
>
> # mounted file systems (device, mount point, and fs type)
> # -------------------------------------------------------
> # mount entry: 8879946760550363901 /m/pvfs fuse
> # mount entry: -6525176383006974612 /l0 ext2
> # mount entry: 1378004016996341907 /users/garth nfs
> # mount entry: -6395289464566466203 /users/qingzhen nfs
> # mount entry: 3967925232003708285 /proj/TableFS nfs
> # mount entry: -5933552100018119666 /users/chuck nfs
> # mount entry: -9071479661283960111 /share nfs
> # mount entry: 361011049498856045 /users/kair nfs
> # mount entry: 3179183617035706161 /dev devtmpfs
> # mount entry: -648807988769344735 / ext3
> # no files opened.
> h0:/tmp/ctime#
>
>
> The only internal functions that glibc seems to let you override
> are malloc-related:
>
> h0:/l0/glibc-2.14# cat scripts/data/localplt-generic.data
> libc.so: calloc
> libc.so: free
> libc.so: malloc
> libc.so: memalign
> libc.so: realloc
> libm.so: matherr
> h0:/l0/glibc-2.14#
>
> that's prob for malloc debuggers, otherwise they would have
> trouble with functions like strdup(3) that call malloc internally
> for an application [and expect the app to free(3) the memory later].
>
>
> I suspect the behavior might be different using a static-linked
> program, but haven't tried it...
>
>
> chuck
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
More information about the Darshan-users
mailing list