[Darshan-users] Using darshan to instrument PyTorch

Lu Weizheng luweizheng36 at hotmail.com
Tue Jun 15 03:43:41 CDT 2021


Hi,

I am using darshan to instrument PyTorch on a local machine. My workload is an image classification problem on ImageNet dataset. When the training process ended, there are a lot of logs generated. Like:

u2020000_python_id4719_6-15-41351-17690910011763757569_1.darshan
u2020000_python_id5012_6-15-42860-17690910011763757569_1.darshan
u2020000_python_id4721_6-15-41352-17690910011763757569_1.darshan
u2020000_uname_id4720_6-15-41351-17690910011763757569_1.darshan
u2020000_python_id4722_6-15-41352-17690910011763757569_1.darshan
u2020000_uname_id4723_6-15-41354-17690910011763757569_1.darshan
u2020000_python_id4758_6-15-41830-17690910011763757569_1.darshan
u2020000_uname_id4724_6-15-41354-17690910011763757569_1.darshan
...

After using the darshan-util analysis tool for one of the above log file, it shows: I/O performance estimate (at the POSIX layer): transferred 7.5 MiB at 36.02 MiB/s

The transferred data showed in the PDF report is far less than the whole dataset size.As PyTorch DataLoader is a multi-process program, I guess darshan generate every log for every process.

My question is: how can I get the IO analysis for the whole PyTorch workload task instead of these process logs?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210615/c165fffd/attachment.html>


More information about the Darshan-users mailing list