[Darshan-users] Darshan not tracing with tensorflow import.

Sandra A. Mendez smendez.fi.unju at gmail.com
Wed Jul 8 14:48:05 CDT 2020


Only a comment, Could be when you import tensorflow the number of files to
trace exceeds the maximum files to trace defined by Darshan? This could be
the reason that you don't see the file logs that you comment.

Sandra.-


On Wed, 8 Jul 2020 at 21:38, Devarajan, Hariharan <hdevarajan at anl.gov>
wrote:

> Here are the reproducers. The working one has the tensorflow import
> commented (1st line). If u uncomment that, the hdf5 file and npz file
> stops getting traced. Rest of the py files are traced so Darshan is working.
>
>
>
> Hari
>
>
>
>
>
> *From: *Snyder, Shane <ssnyder at mcs.anl.gov>
> *Sent: *Wednesday, July 8, 2020 10:27 AM
> *To: *Devarajan, Hariharan <hdevarajan at anl.gov>
> *Cc: *Jeffrey Layton <laytonjb at gmail.com>; darshan-users at lists.mcs.anl.gov
> *Subject: *Re: [Darshan-users] Darshan not tracing with tensorflow import.
>
>
>
> I see. Thanks for the clarification.
>
>
>
> The only issue I'm aware of that currently causes us to lose some log data
> is in the case of applications calling fork(). Maybe that or something
> similar is happening in the import of tensorflow, with the h5/numpy I/O
> then happening in the child process?
>
>
>
> I can try to reproduce the issue to see if I can get a better idea of
> what's happening. We'd like to make sure tensorflow use cases work, but
> admittedly haven't really tested it.
>
>
>
> --Shane
>
> *From:* Devarajan, Hariharan <hdevarajan at anl.gov>
> *Sent:* Wednesday, July 8, 2020 9:45 AM
> *To:* Snyder, Shane <ssnyder at mcs.anl.gov>
> *Cc:* Jeffrey Layton <laytonjb at gmail.com>; darshan-users at lists.mcs.anl.gov
> <darshan-users at lists.mcs.anl.gov>
> *Subject:* Re: [Darshan-users] Darshan not tracing with tensorflow import.
>
>
>
> It produces logs but it stops tracing h5py and np.load calls. If u run the
> working version, u will notice we get traces from both files but when u add
> import of tensorflow, this stops. I have verified that darshan is
> initializing as we get logs just not giving tracing for those two files in
> the code.
>
>
>
> Hari
>
>
>
> On Jul 8, 2020, at 9:41 AM, Snyder, Shane <ssnyder at mcs.anl.gov> wrote:
>
> 
>
> Hi Hariharan,
>
>
>
> Thanks for letting us know about this issue.
>
>
>
> I can't really think of any reason why the import of tensorflow module
> would result in Darshan no longer producing log files. Just to make sure
> I'm fully understanding, you aren't getting any logfiles at all in the case
> where tensorflow is imported? I ask because when using this non-MPI
> instrumentation, I've noticed it tends to create a lot of log files,
> particularly for Python modules that like to call subprocesses for using
> things like ls, sed, etc. I just want to make sure it's not an issue of you
> missing one particular log file of interest or whether you don't get any
> log files at all.
>
>
>
> At any rate, the environment setup looks correct in both cases to preload
> the Darshan library and to enable the non-MPI instrumentation support. So,
> that doesn't appear to be the issue.
>
>
>
> Just to verify whether Darshan is even being properly
> initialized/shutdown, could you try setting the DARSHAN_INTERNAL_TIMING env
> variable (i.e., export
>
> DARSHAN_INTERNAL_TIMING=1) before running? That should spit out some more
> verbose output about how long it takes Darshan to init/finalize. If you do
> see some additional output indicating Darshan is at least initializing, you
> might want to double check that there are no errors in your output logs
> that indicate some issue Darshan encountered.
>
>
>
> --Shane
>
> *From:* Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf
> of Devarajan, Hariharan <hdevarajan at anl.gov>
> *Sent:* Monday, July 6, 2020 9:56 AM
> *To:* Jeffrey Layton <laytonjb at gmail.com>
> *Cc:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
> *Subject:* Re: [Darshan-users] Darshan not tracing with tensorflow import.
>
>
>
> The I/O is not through tensorflow. Its through h5py and numpy.load and I
> verified both without tensorflow import are getting traced. I already
> verified that import of tensorflow doesn’t change any environment
> variables.
>
>
>
> Hari
>
>
>
> *From: *Jeffrey Layton <laytonjb at gmail.com>
> *Sent: *Monday, July 6, 2020 9:41 AM
> *To: *Devarajan, Hariharan <hdevarajan at anl.gov>
> *Cc: *darshan-users at lists.mcs.anl.gov
> *Subject: *Re: [Darshan-users] Darshan not tracing with tensorflow import.
>
>
>
> About 2 months ago, I tried using Darshan to trace a TensorFlow2 DL
> training. I could not trace the input. What I _think_ happens is the TF2
> uses mmap() for reading the input files and I don't think Darshan can
> capture that file IO. But I'm not a Darshan expert so perhaps someone has
> tried this before and can help
>
>
>
> (BTW - it's possible to build TensorFlow so it doesn't use mmap() for
> reading files.)
>
>
>
> Jeff
>
>
>
>
>
>
>
> On Mon, Jul 6, 2020 at 1:08 PM Devarajan, Hariharan <hdevarajan at anl.gov>
> wrote:
>
>
>
> Hello,
>
>
>
> I was able to run my test program with 3.2.1 darshan. However, when i
> trace an app which load tensorflow it seems darshan doesn't produce trace.
> I am attaching two tars one with working example and one without. The only
> difference between the two is in test.py where i import tensorflow on first
> line.
>
>
>
> Can you please assist on how I can further debug the problem?
>
>
>
> Regards
>
> Hariharan
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200708/91823f26/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: A888723DE33C4D99A96D63B8AA82ED3B.png
Type: image/png
Size: 159 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200708/91823f26/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 5C0A677D05F443BE81528051468C6A28.png
Type: image/png
Size: 161 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200708/91823f26/attachment-0001.png>


More information about the Darshan-users mailing list