<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-2022-jp">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thank you for the update, Lu.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Let me see if I can  write a test that generates a few thousand file records, like your test case 3 results -- seems like you start hitting problems there, so will be good for me to understand what is still limiting you. I can also see if there's anything obvious
 that breaks our usage of zlib compression algorithms when users really start dialing up Darshan's memory usage. Maybe with those 2 things resolved we can get closer to complete coverage on workloads like this.
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I'll see if I can get an updated branch for you to try soon, if you're interested.<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
--Shane<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Lu Weizheng <luweizheng36@hotmail.com><br>
<b>Sent:</b> Tuesday, July 6, 2021 4:24 AM<br>
<b>To:</b> Snyder, Shane <ssnyder@mcs.anl.gov><br>
<b>Cc:</b> darshan-users@lists.mcs.anl.gov <darshan-users@lists.mcs.anl.gov><br>
<b>Subject:</b> Re: Using darshan to instrument PyTorch</font>
<div> </div>
</div>
<div class="" style="word-wrap:break-word; line-break:after-white-space"><font size="3" class="">Hi Shane,</font>
<div class=""><font size="3" class=""><br class="">
</font></div>
<div class=""><font size="3" class="">Thank you so much for your reply!</font></div>
<div class=""><font size="3" class="">I have tested your branch. Maybe there are still some problems. </font></div>
<div class=""><font size="3" class=""><br class="">
</font></div>
<div class=""><font size="3" class="">The file structure of my dataset is like the following, which is a typical ImageNet file structure. </font></div>
<div class=""><br class="">
</div>
<div class=""><font size="3" class="">train/<br class="">
|-- n01440764<br class="">
|   |-- n01440764_10026.JPEG<br class="">
|   |-- n01440764_10027.JPEG<br class="">
|   |-- n01440764_10029.JPEG<br class="">
|   |-- n01440764_10040.JPEG<br class="">
|   |-- n01440764_10042.JPEG</font></div>
<div class=""><font size="3" class="">$B!D(B</font></div>
<div class=""><font size="3" class="">val/</font></div>
<div class=""><font size="3" class="">$B!D(B</font></div>
<div class=""><font size="3" class=""><br class="">
</font></div>
<div class=""><span class="" style="font-size:medium">n01440764 means this is one of the 1000 classes of this dataset. The whole train folder has 1000 folders which means the dataset has 1000 classes of different classes representing different kinds of items
 in images.</span><br class="">
</div>
<div class=""><font size="3" class=""><br class="">
</font></div>
<div class=""><span class="" style="font-size:medium">I have two filesystems: one is local SSD with xfs on compute node, another is a Lustre filesystem. I do the tests on both of the two filesystems and the results show same results.</span></div>
<div class=""><font size="3" class="">Here are what I do to test Darshan and Python.</font></div>
<div class=""><font size="3" class=""><br class="">
</font></div>
<div class=""><font size="3" class="">I compile Darshan using 'snyder/dev-log-filters$B!G(B branch.</font></div>
<div class=""><font size="3" class="">The Test Case 1-4 are based on a simple image reading Python program. The program just walks through some folders and uses Python PIL library (which is most common used image reading library in PyTorch computer vision community)
 to read the JPEG images into memory and converts JPEG images to RGB. </font></div>
<div class=""><font size="3" class=""><br class="">
</font></div>
<div class=""><font size="3" class="">Test Case 1:</font></div>
<div class=""><font size="3" class=""><b class="">WITHOUT</b> DARSHAN_CONF_PATH and DARSHAN_MODMEM=2048. </font></div>
<div class=""><font size="3" class="">I only read one folder like n01440764. There are 1300 JPEG images in this folder. The size of images ranges from 10K - 100K Bytes.</font></div>
<div class=""><font size="3" class="">The collected log shows that The POSIX module contains incomplete data.</font></div>
<div class=""><font size="3" class=""><br class="">
</font></div>
<div class=""><span class="" style="font-size:medium">Test Case 2:</span></div>
<div class=""><span class="" style="font-size:medium"><b class="">WITH</b> DARSHAN_CONF_PATH which set </span><font size="3" class="">MAX_RECORDS to a very big value like 1200000
</font><span class="" style="font-size:medium">and DARSHAN_MODMEM=2048. </span></div>
<div class=""><span class="" style="font-size:medium">The Python program and the folder I read are same with Test Case 1. No more </span><font size="3" class="">incomplete data. Use grep to check the log from darshan-parser shows the number of recorded files
 are exactly what the Python program would read. Total Bytes Read is correct.</font></div>
<div class=""><font size="3" class="">I guess the </font><span class="" style="font-size:medium">DARSHAN_CONF_PATH can take effect. Darshan log before parser is </span><font size="3" class="">141K. Darshan-parser generated log is 23M.</font></div>
<div class=""><span class="" style="font-size:medium"><br class="">
</span></div>
<div class=""><span class="" style="font-size:medium">Test Case 3:</span></div>
<div class=""><span class="" style="font-size:medium">WITH DARSHAN_CONF_PATH which set </span><font size="3" class="">MAX_RECORDS to a very big value like 1200000 </font><span class="" style="font-size:medium">and DARSHAN_MODMEM=2048. </span></div>
<div class=""><span class="" style="font-size:medium">I read more folders in Python program. In total the program would read
<b class="">2 folders and 2600 images</b>. 2 and more folders shows i</span><span class="" style="font-size:medium">ncomplete data again. </span><span class="" style="font-size:medium">Use grep to check the log from darshan-parser shows the number of recorded
 files are 100 less than what the Python program would read.</span></div>
<div class=""><span class="" style="font-size:medium"><br class="">
</span></div>
<div class=""><span class="" style="font-size:medium">Test Case 4:</span></div>
<div class=""><span class="" style="font-size:medium">With DARSHAN_CONF_PATH which set </span><font size="3" class="">MAX_RECORDS to a very big value like 1200000 </font><span class="" style="font-size:medium">and<b class=""> DARSHAN_MODMEM=4096</b>. </span></div>
<div class=""><font size="3" class="">Python program is same with Test Case 1. Now I get: darshan_library_warning: error compressing job record.</font></div>
<div class=""><font size="3" class="">darshan_library_warning: unable to write job record to file. The warning log probably relates with a previous problem I mention in this thread and maybe a zlib related problem(</font><a href="https://github.com/darshan-hpc/darshan/blob/e85b8bc929da91e54ff68fb1210dfe7bee3261a3/darshan-runtime/lib/darshan-core.c#L2039" class="">https://github.com/darshan-hpc/darshan/blob/e85b8bc929da91e54ff68fb1210dfe7bee3261a3/darshan-runtime/lib/darshan-core.c#L2039</a><span class="" style="font-size:medium">).</span></div>
<div class=""><span class="" style="font-size:medium"><br class="">
</span></div>
<div class=""><span class="" style="font-size:medium">Test Case 5:</span></div>
<div class=""><span class="" style="font-size:medium"><b class="">WITH</b> DARSHAN_CONF_PATH which set </span><font size="3" class="">MAX_RECORDS to a very big value like
<b class="">3000000</b> </font><span class="" style="font-size:medium">and DARSHAN_MODMEM=2048. </span></div>
<div class=""><span class="" style="font-size:medium">I use a<b class=""> typical PyTorch ImageNet training program</b> which includes image reading, data processing and neural network training. The </span><span class="" style="font-size:medium">darshan-parser
 shows that Darshan could not get all the counters recorded correctly. Logs are not complete and total bytes read is 0.</span></div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><font size="3" class="">So I guess the</font> <font size="3" class="">DARSHAN_CONF_PATH can take effect. But for larger number of files, Darshan$B!G(Bs POSIX module may encounter out of memory issue.</font></div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">2021$BG/(B7$B7n(B3$BF|(B $B>e8a(B4:28$B!$(BSnyder, Shane <<a href="mailto:ssnyder@mcs.anl.gov" class="">ssnyder@mcs.anl.gov</a>> $B<LF;!'(B</div>
<br class="x_Apple-interchange-newline">
<div class="">
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Hi Lu,</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Sure, I can give you some details on how to use it. Most of the details are actually contained in this PR:<a href="https://github.com/darshan-hpc/darshan/pull/405" id="LPlnk470435" class="">https://github.com/darshan-hpc/darshan/pull/405</a><br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
So, to use, you would need to check out the branch that PR is based on (<span title="darshan-hpc/darshan:snyder/dev-log-filters" class="x_css-truncate x_user-select-contain x_expandable x_head-ref x_commit-ref"><a title="darshan-hpc/darshan:snyder/dev-log-filters" class="x_no-underline" href="https://github.com/darshan-hpc/darshan/tree/snyder/dev-log-filters"><span class="x_css-truncate-target"></span></a></span>'snyder/dev-log-filters')
 and build it just like you would normally build Darshan.<span class="x_Apple-converted-space"> </span><br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
The only additional trick is that you can specify a config file for Darshan to use at runtime via the DARSHAN_CONF_PATH environment variable (i.e., export DARSHAN_CONF_PATH=/path/to/my/darshan/config/file). You can add whatever lines you need to your config
 file to control various aspects of Darshan's runtime behavior as outlined in the PR. Most relevant for you is probably just the ability to request that the POSIX module use more than the default 1,024 records, like this:</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
# request POSIX store 1.2 million file records rather than 1024 default<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
MAX_RECORDS       1200000                    POSIX<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
You may also consider using NAME_EXCLUDE options to provide regular expressions of files to ignore that are not related to your ImageNet test case.</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
# e.g., ignore files in /home and files that end in .txt</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
NAME_EXCLUDE    ^/home    *</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
NAME_EXCLUDE    .txt$          *<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
I have not tried to get Darshan to instrument such a massive single process workload, but will be interested to see if you have success. As I mentioned in my previous email, you'll probably want to bump DARSHAN_MODMEM up to around 2 GB to handle this, at the
 very least.<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Another disclaimer I'll mention is that, since this stuff is experimental, some of these steps or naming conventions could change by the time we merge this into our main branch for eventual release. Not a big deal for now, but just a heads up.<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Thanks,</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
--Shane<br class="">
</div>
<div id="x_appendonsend" class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br class="">
</div>
<hr tabindex="-1" class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; display:inline-block; width:748.71875px">
<span class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; float:none; display:inline!important"></span>
<div id="x_divRplyFwdMsg" dir="ltr" class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
<font face="Calibri, sans-serif" class="" style="font-size:11pt"><b class="">From:</b><span class="x_Apple-converted-space"> </span>Lu Weizheng <<a href="mailto:luweizheng36@hotmail.com" class="">luweizheng36@hotmail.com</a>><br class="">
<b class="">Sent:</b><span class="x_Apple-converted-space"> </span>Friday, July 2, 2021 4:02 AM<br class="">
<b class="">To:</b><span class="x_Apple-converted-space"> </span>Snyder, Shane <<a href="mailto:ssnyder@mcs.anl.gov" class="">ssnyder@mcs.anl.gov</a>><br class="">
<b class="">Subject:</b><span class="x_Apple-converted-space"> </span>Re: Using darshan to instrument PyTorch</font>
<div class=""> </div>
</div>
<div class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
Hi Shane,
<div class=""><br class="">
</div>
<div class="">Could you tell me more info about the experimental branch. Is it on github? I want to try it.</div>
<div class=""><br class="">
</div>
<div class="">Thanks!<br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">2021$BG/(B6$B7n(B18$BF|(B $B2<8a(B11:11$B!$(BSnyder, Shane <<a href="mailto:ssnyder@mcs.anl.gov" class="">ssnyder@mcs.anl.gov</a>> $B<LF;!'(B</div>
<br class="x_x_Apple-interchange-newline">
<div class=""><span class="" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; float:none; display:inline!important">Those
 changes are in an experimental branch right now while I fine tune the implementation, but if you're interested in trying it out I could give you some details.</span></div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</body>
</html>