[Darshan-users] Darshan not tracing with tensorflow import.

Sandra A. Mendez smendez.fi.unju at gmail.com
Wed Jul 8 17:22:43 CDT 2020


I think you can change the value in the source code in darshan-runtime (in
darshan.h there is a variable, but I suppose darshan developers can provide
a more appropriate solution). I had a similar case, but I decided to filter
folders by using the DARSHAN_EXCLUDE_DIRS environment variable and only
trace files open by the user's application and not trace files related to
the python libraries.
Regards,
Sandra.-



On Thu, 9 Jul 2020 at 00:09, Devarajan, Hariharan <hdevarajan at anl.gov>
wrote:

> I see. There are 1024 files exactly in the darshan trace. What is the
> limit of the total number of files? Can we increase it?
>
>
>
> Hari
>
>
>
>
>
>
>
> *From: *Sandra A. Mendez <smendez.fi.unju at gmail.com>
> *Sent: *Wednesday, July 8, 2020 2:49 PM
> *To: *Devarajan, Hariharan <hdevarajan at anl.gov>
> *Cc: *Snyder, Shane <ssnyder at mcs.anl.gov>; darshan-users at lists.mcs.anl.gov
> *Subject: *Re: [Darshan-users] Darshan not tracing with tensorflow import.
>
>
>
> Only a comment, Could be when you import tensorflow the number of files to
> trace exceeds the maximum files to trace defined by Darshan? This could be
> the reason that you don't see the file logs that you comment.
>
>
>
> Sandra.-
>
>
>
>
>
> On Wed, 8 Jul 2020 at 21:38, Devarajan, Hariharan <hdevarajan at anl.gov>
> wrote:
>
> Here are the reproducers. The working one has the tensorflow import
> commented (1st line). If u uncomment that, the hdf5 file and npz file
> stops getting traced. Rest of the py files are traced so Darshan is working.
>
>
>
> Hari
>
>
>
>
>
> *From: *Snyder, Shane <ssnyder at mcs.anl.gov>
> *Sent: *Wednesday, July 8, 2020 10:27 AM
> *To: *Devarajan, Hariharan <hdevarajan at anl.gov>
> *Cc: *Jeffrey Layton <laytonjb at gmail.com>; darshan-users at lists.mcs.anl.gov
> *Subject: *Re: [Darshan-users] Darshan not tracing with tensorflow import.
>
>
>
> I see. Thanks for the clarification.
>
>
>
> The only issue I'm aware of that currently causes us to lose some log data
> is in the case of applications calling fork(). Maybe that or something
> similar is happening in the import of tensorflow, with the h5/numpy I/O
> then happening in the child process?
>
>
>
> I can try to reproduce the issue to see if I can get a better idea of
> what's happening. We'd like to make sure tensorflow use cases work, but
> admittedly haven't really tested it.
>
>
>
> --Shane
>
> *From:* Devarajan, Hariharan <hdevarajan at anl.gov>
> *Sent:* Wednesday, July 8, 2020 9:45 AM
> *To:* Snyder, Shane <ssnyder at mcs.anl.gov>
> *Cc:* Jeffrey Layton <laytonjb at gmail.com>; darshan-users at lists.mcs.anl.gov
> <darshan-users at lists.mcs.anl.gov>
> *Subject:* Re: [Darshan-users] Darshan not tracing with tensorflow import.
>
>
>
> It produces logs but it stops tracing h5py and np.load calls. If u run the
> working version, u will notice we get traces from both files but when u add
> import of tensorflow, this stops. I have verified that darshan is
> initializing as we get logs just not giving tracing for those two files in
> the code.
>
>
>
> Hari
>
>
>
> On Jul 8, 2020, at 9:41 AM, Snyder, Shane <ssnyder at mcs.anl.gov> wrote:
>
> 
>
> Hi Hariharan,
>
>
>
> Thanks for letting us know about this issue.
>
>
>
> I can't really think of any reason why the import of tensorflow module
> would result in Darshan no longer producing log files. Just to make sure
> I'm fully understanding, you aren't getting any logfiles at all in the case
> where tensorflow is imported? I ask because when using this non-MPI
> instrumentation, I've noticed it tends to create a lot of log files,
> particularly for Python modules that like to call subprocesses for using
> things like ls, sed, etc. I just want to make sure it's not an issue of you
> missing one particular log file of interest or whether you don't get any
> log files at all.
>
>
>
> At any rate, the environment setup looks correct in both cases to preload
> the Darshan library and to enable the non-MPI instrumentation support. So,
> that doesn't appear to be the issue.
>
>
>
> Just to verify whether Darshan is even being properly
> initialized/shutdown, could you try setting the DARSHAN_INTERNAL_TIMING env
> variable (i.e., export
>
> DARSHAN_INTERNAL_TIMING=1) before running? That should spit out some more
> verbose output about how long it takes Darshan to init/finalize. If you do
> see some additional output indicating Darshan is at least initializing, you
> might want to double check that there are no errors in your output logs
> that indicate some issue Darshan encountered.
>
>
>
> --Shane
>
> *From:* Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf
> of Devarajan, Hariharan <hdevarajan at anl.gov>
> *Sent:* Monday, July 6, 2020 9:56 AM
> *To:* Jeffrey Layton <laytonjb at gmail.com>
> *Cc:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
> *Subject:* Re: [Darshan-users] Darshan not tracing with tensorflow import.
>
>
>
> The I/O is not through tensorflow. Its through h5py and numpy.load and I
> verified both without tensorflow import are getting traced. I already
> verified that import of tensorflow doesn’t change any environment
> variables.
>
>
>
> Hari
>
>
>
> *From: *Jeffrey Layton <laytonjb at gmail.com>
> *Sent: *Monday, July 6, 2020 9:41 AM
> *To: *Devarajan, Hariharan <hdevarajan at anl.gov>
> *Cc: *darshan-users at lists.mcs.anl.gov
> *Subject: *Re: [Darshan-users] Darshan not tracing with tensorflow import.
>
>
>
> About 2 months ago, I tried using Darshan to trace a TensorFlow2 DL
> training. I could not trace the input. What I _think_ happens is the TF2
> uses mmap() for reading the input files and I don't think Darshan can
> capture that file IO. But I'm not a Darshan expert so perhaps someone has
> tried this before and can help
>
>
>
> (BTW - it's possible to build TensorFlow so it doesn't use mmap() for
> reading files.)
>
>
>
> Jeff
>
>
>
>
>
>
>
> On Mon, Jul 6, 2020 at 1:08 PM Devarajan, Hariharan <hdevarajan at anl.gov>
> wrote:
>
>
>
> Hello,
>
>
>
> I was able to run my test program with 3.2.1 darshan. However, when i
> trace an app which load tensorflow it seems darshan doesn't produce trace.
> I am attaching two tars one with working example and one without. The only
> difference between the two is in test.py where i import tensorflow on first
> line.
>
>
>
> Can you please assist on how I can further debug the problem?
>
>
>
> Regards
>
> Hariharan
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
>
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200709/6856bd2d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D55210654B9847E088DCA5EA60C1915F.png
Type: image/png
Size: 159 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200709/6856bd2d/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: EEC90B4839814114B801FF8CCD331BC0.png
Type: image/png
Size: 161 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200709/6856bd2d/attachment-0003.png>


More information about the Darshan-users mailing list