<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi Jeff,</p>
<p>Yes, it includes subdirs. You can kind of think of it as a
prefix match on the file paths.</p>
<p>thanks,</p>
<p>-Phil<br>
</p>
<div class="moz-cite-prefix">On 7/27/21 8:07 AM, Jeffrey Layton
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAJfzO5RO37g_Ps3cK-_ow5wmq6ZV1UMAu2AAKJ-to1WtbY3KQw@mail.gmail.com">
<div dir="ltr">
<div>Oops. DId a reply instead of a reply-all.</div>
<div><br>
</div>
<div>One more question. When I specify a directory to exclude,
does it include all subdirectories as well?</div>
<div><br>
</div>
<div>Thanks!</div>
<div><br>
</div>
<div>Jeff</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Jul 27, 2021 at 12:01
PM Jeffrey Layton <<a href="mailto:laytonjb@gmail.com" moz-do-not-send="true">laytonjb@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Shane,</div>
<div><br>
</div>
<div>Thanks for the reply. I'm glad you're doing the changes
to Darshan. This will have a big impact on profiling DL
workloads (I realize that wasn't a focus of Darshan
originally, but Darshan has become a victim of it's own
success. When people mention 'IO profiling' then
immediately say 'Darshan').</div>
<div><br>
</div>
<div>DL is a tough workload. Let me give you an example. I
just ran a pretty simple model with 1.2M parameters. I
used the CIFAR-10 data set (collection of images) and ran
PyTorch or 100 epochs (not even close to being fully
trained). During those 100 epochs, PyTorch opened 167,076
files. It closed 1,274,000 files (I measured this using
the strace output). It also used 1,206 threads. The
training code was all Python.</div>
<div><br>
</div>
<div>Tensorflow is better in regard to IO than PyTorch for
CIFAR-10. The model only used 555K parameters. For 100
epochs it opened 4,099 files. It closed 3.617 files. It
used 350 threads during this time.</div>
<div><br>
</div>
<div>You can see that DL frameworks do a lot of stuff in the
name of IO! Being able to track over 100K files is
probably not a bad idea (I might go as far as 1M files).</div>
<div><br>
</div>
<div>In the meantime, is there a limit to the number of
items you can include using excludes?</div>
<div><br>
</div>
<div>Thanks!</div>
<div><br>
</div>
<div>Jeff</div>
<div><br>
</div>
<div><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Jul 26, 2021 at
9:43 PM Snyder, Shane <<a href="mailto:ssnyder@mcs.anl.gov" target="_blank" moz-do-not-send="true">ssnyder@mcs.anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">Hi
Jeff,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">Existing
Darshan releases do have some hard coded limits that
have been increasingly problematic for our users, it
seems. The limit you are likely hitting is just that
Darshan instrumentation modules do not track more
than 1,024 file records currently. This isn't really
tunable in any way, unfortunately.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">You
can get a list of files that Darshan did instrument
by running darshan-parser with the '--file-list'
option. That might give you some more ideas on
directories you could potentially exclude to force
Darshan to reserve instrumentation resources for
other files, but that may not even be sufficient
depending on your workload.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">We
do have some functionality we are hoping to have
merged in for our next release to help address this
issue. In fact, it's available to try out in a
branch in our repo if you're really motivated to get
this working soon. There are more details here in a
PR on our GitHub: <a href="https://github.com/darshan-hpc/darshan/pull/405" id="gmail-m_1040552738000251815gmail-m_5528091281693620561LPlnk476838" target="_blank" moz-do-not-send="true">
https://github.com/darshan-hpc/darshan/pull/405</a><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">Essentially,
you can use a config file to control a number of
different Darshan settings, including the ability to
change the hard coded file maximum from above and to
provide regular expressions (rather than just
directory names) for files Darshan should exclude
from instrumentation. If you have more specific
questions or feedback about this functionality,
please let us know.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">Thanks!</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">--Shane<br>
</div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_1040552738000251815gmail-m_5528091281693620561divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b>
Darshan-users <<a href="mailto:darshan-users-bounces@lists.mcs.anl.gov" target="_blank" moz-do-not-send="true">darshan-users-bounces@lists.mcs.anl.gov</a>>
on behalf of Jeffrey Layton <<a href="mailto:laytonjb@gmail.com" target="_blank" moz-do-not-send="true">laytonjb@gmail.com</a>><br>
<b>Sent:</b> Monday, July 26, 2021 9:15 AM<br>
<b>To:</b> <a href="mailto:darshan-users@lists.mcs.anl.gov" target="_blank" moz-do-not-send="true">darshan-users@lists.mcs.anl.gov</a>
<<a href="mailto:darshan-users@lists.mcs.anl.gov" target="_blank" moz-do-not-send="true">darshan-users@lists.mcs.anl.gov</a>><br>
<b>Subject:</b> [Darshan-users] Error in
job_summary</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>Good morning,</div>
<div><br>
</div>
<div>I'm post-processing a darshan file for a
Tensorflow training of a simple model
(CIFAR-10). The post-processing completes just
fine, but I see an error on the first page:</div>
<div><br>
</div>
<div><br>
</div>
<div><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">WARNING</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">:</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">This</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">Darshan</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">log</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">contains</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">incomplete</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">data.</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">This</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">happens</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">when</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">a</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">module</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">runs</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">out</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">of</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">memory</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">to</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">store</span><br role="presentation">
<span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">new</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">record</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">data.</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">Please</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">run</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">darshan-parser</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">on</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">the</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">log</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">file</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">for</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">more</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">
</span><span role="presentation" dir="ltr" style="font-size:18.1818px;font-family:sans-serif">information.</span></div>
<div><br>
</div>
<div>So I ran darshan-parser on the file and I see
the following at the end.</div>
<div><br>
</div>
<div><br>
</div>
<div>#
*******************************************************<br>
# POSIX module data<br>
#
*******************************************************<br>
<br>
# *ERROR*: The POSIX module contains incomplete
data!<br>
# This happens when a module runs out
of<br>
# memory to store new record data.<br>
<br>
# To avoid this error, consult the
darshan-runtime<br>
# documentation and consider setting the<br>
# DARSHAN_EXCLUDE_DIRS environment variable to
prevent<br>
# Darshan from instrumenting unecessary files.<br>
<br>
# You can display the (incomplete) data that is<br>
# present in this log using the
--show-incomplete<br>
# option to darshan-parser.</div>
<div><br>
</div>
<div><br>
</div>
<div>I have a bunch of file systems excluded:
/proc,/etc,/dev,/sys,/snap,/run . <br>
</div>
<div><br>
</div>
<div>How can I get a list of files that Darshan
tracked? Is there a way to increase the amount
of memory?</div>
<div><br>
</div>
<div>Thanks!</div>
<div><br>
</div>
<div>Jeff</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
Darshan-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Darshan-users@lists.mcs.anl.gov">Darshan-users@lists.mcs.anl.gov</a>
<a class="moz-txt-link-freetext" href="https://lists.mcs.anl.gov/mailman/listinfo/darshan-users">https://lists.mcs.anl.gov/mailman/listinfo/darshan-users</a>
</pre>
</blockquote>
</body>
</html>