<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"DengXian Light";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"\@DengXian Light";}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.xmsonormal, li.xmsonormal, div.xmsonormal
{mso-style-name:x_msonormal;
margin:0in;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Bogdan,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Ok. On the part that work on in Autoperf, we have received a request to have a runtime variable to ON and OFF the modules to be loaded and we will soon work on that, but that is more in darshan-runtime. But I think you are referring to
the darshan-utils code, if possible, would be good to have some sample list of these constants in the code. Probably, Shane can comment better if it is on darshan-utils.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Nicolae, Bogdan <bnicolae@anl.gov><br>
<b>Date: </b>Monday, June 7, 2021 at 11:12 AM<br>
<b>To: </b>Chunduri, Sudheer <sudheer@anl.gov>, Jie Liu <jliu279@ucmerced.edu>, darshan-users@lists.mcs.anl.gov <darshan-users@lists.mcs.anl.gov><br>
<b>Cc: </b>Si, Min <msi@anl.gov><br>
<b>Subject: </b>Re: Problems about Darshan Logs<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Chunduri,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">We finally fixed the problem, Darshan simply has a lot of hardcoded constants in the code. We compiled our own version. Based on our experience, it would be better to expose these constants as
environment variables.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Cheers,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Bogdan<o:p></o:p></span></p>
</div>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="0" width="100%" align="center">
</div>
<div id="divRplyFwdMsg">
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> Chunduri, Sudheer <sudheer@anl.gov><br>
<b>Sent:</b> Monday, June 7, 2021 9:54 AM<br>
<b>To:</b> Jie Liu <jliu279@ucmerced.edu>; darshan-users@lists.mcs.anl.gov <darshan-users@lists.mcs.anl.gov><br>
<b>Cc:</b> Nicolae, Bogdan <bnicolae@anl.gov>; Si, Min <msi@anl.gov><br>
<b>Subject:</b> Re: Problems about Darshan Logs</span> <o:p></o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="xmsonormal"><span style="font-size:11.0pt">Hi Jie,</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt">I see you copying darshan-users mailing list, so, Shane should hopefully see this.</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt">Meanwhile, have you tried using �darshan-parser --show-incomplete�?</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="xmsonormal" style="margin-bottom:12.0pt"><b><span style="color:black">From:
</span></b><span style="color:black">Darshan-users <darshan-users-bounces@lists.mcs.anl.gov> on behalf of Jie Liu <jliu279@ucmerced.edu><br>
<b>Date: </b>Monday, June 7, 2021 at 9:42 AM<br>
<b>To: </b>darshan-users@lists.mcs.anl.gov <darshan-users@lists.mcs.anl.gov><br>
<b>Cc: </b>Nicolae, Bogdan <bnicolae@anl.gov>, Si, Min <msi@anl.gov><br>
<b>Subject: </b>[Darshan-users] Problems about Darshan Logs</span><o:p></o:p></p>
</div>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">Hi,</span><o:p></o:p></p>
<p class="xmsonormal"><span style="color:black"> </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">I used Darshan to do some profiling work when training Deep Learning models on ThetaGPU (Resnet50 on ImageNet, mini-batch size is 32).</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">When I used the following command to get the summary of darshan logs:</span><o:p></o:p></p>
<p class="xmsonormal"><b><span style="font-size:11.0pt;font-family:"DengXian Light";color:red">darshan-job-summary.pl</span></b><span style="font-size:11.0pt;font-family:"DengXian Light";color:red"> </span><span style="font-size:11.0pt;font-family:"DengXian Light";color:black">/path/to/.darshan
--output /path/to/summary.pdf</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">The received summary.pdf file contains the following Error message at the firs page:</span><o:p></o:p></p>
<p class="xmsonormal"><b><span style="font-size:11.0pt;font-family:"DengXian Light";color:red">WARNING</span></b><span style="font-size:11.0pt;font-family:"DengXian Light";color:black">: This Darshan log contains incomplete data. This happens when a module
runs out of memory to store new record data. Please run darshan-parser on the log file for more information.</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">I also tried to use darshan-parser by the following command:</span><o:p></o:p></p>
<p class="xmsonormal"><b><span style="font-size:11.0pt;font-family:"DengXian Light";color:red">darshan-parser</span></b><span style="font-size:11.0pt;font-family:"DengXian Light";color:red"> </span><span style="font-size:11.0pt;font-family:"DengXian Light";color:black">/path/to/.darshan
--output /path/to/summary.txt</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">It also shows incomplete data error:</span><o:p></o:p></p>
<p class="xmsonormal"><b><span style="font-size:11.0pt;font-family:"DengXian Light";color:red">*ERROR*:</span></b><span style="font-size:11.0pt;font-family:"DengXian Light";color:red"> </span><span style="font-size:11.0pt;font-family:"DengXian Light";color:black">The
POSIX module contains incomplete data! This happens when a module runs out of memory to store new record data.</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">The ImageNet dataset contains about 1.3 million image files, but the darshan log only shows the number of opened files is: 14792 when I trained Resnet50 on ThetaGPU with 2 nodes, 16 GPUs. (<b>Please
check the attached file for more information about the logs obtained by darshan</b>).</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">Is there an efficient way to make the darshan logs contain all the I/O information of 1.3 million image files during the model training?</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black"> </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">Previously, I contacted with the Support Team, their response is �the POSIX module only tracks 1024 files, once we open 1025 files Darshan no longer tracks those files�. How to make the POSIX
module track all the images files. </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black"> </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">For the model training on ThetaGPU using 2 nodes and 16 GPUs. My experimental results show that every process handles 14792/16 = 924 images files on average, actually, this number is less than
1024. How to explain it? </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black"> </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">Thanks for your help.</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black"> </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">Best regards,</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">--</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">Jie Liu</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
</div>
</div>
</div>
</body>
</html>