<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=gb2312">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Hi,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

I am using darshan to instrument PyTorch on a local machine. My workload is an image classification problem on ImageNet dataset. When the training process ended, there are a lot of logs generated. Like:</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

u2020000_python_id4719_6-15-41351-17690910011763757569_1.darshan     </div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

u2020000_python_id5012_6-15-42860-17690910011763757569_1.darshan

<div>u2020000_python_id4721_6-15-41352-17690910011763757569_1.darshan     </div>

<div>u2020000_uname_id4720_6-15-41351-17690910011763757569_1.darshan</div>

<div>u2020000_python_id4722_6-15-41352-17690910011763757569_1.darshan     </div>

<div>u2020000_uname_id4723_6-15-41354-17690910011763757569_1.darshan</div>

u2020000_python_id4758_6-15-41830-17690910011763757569_1.darshan     </div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

u2020000_uname_id4724_6-15-41354-17690910011763757569_1.darshan</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

...</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

After using the darshan-util analysis tool for one of the above log file, it shows: I/O performance estimate (at the POSIX layer): transferred 7.5 MiB at 36.02 MiB/s</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

The transferred data showed in the PDF report is far less than the whole dataset size.<span style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;">As PyTorch DataLoader is a multi-process program, I guess darshan generate

 every log for every process. </span></div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

My question is: how can I get the IO analysis for the whole PyTorch workload task instead of these process logs?</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

</body>

</html>