[Darshan-users] darshan-job-summary.pl reports non-stop IO

David Shrader dshrader at lanl.gov
Tue May 7 17:10:15 CDT 2013


Hello,

In working a bit with Darshan, I have noticed that the "Timespan from 
first to last read/write access on independent files" graphs report 
continuous IO operations even when processes are not actively doing IO 
to their files. To at least give a visual to what I am talking about, I 
have attached a job summary from an IO reproducing program called 
fs_test where it is writing a N to N pattern using 32 processes. Before 
each write, I have fs_test doing some verification on its buffers before 
issuing the actual write command. Additionally, I have fs_test doing 
verification on buffers after each read. However, in looking at the 
Timespan from first to last access graphs, it looks like continuous IO. 
Judging by tracing data, time in the verification functions account for 
about 75% of the total time of the job whereas the rest of the time is 
spent doing actual IO operations (which is accurately reflected in 
"Average I/O cost per process" graph in the summary), so the depiction 
of continuous IO in those graphs isn't correct. I'm trying to figure out 
if this discrepancy is just the way darshan-job-summary.pl creates the 
Timespan graph or if something else might be to blame.

Here's how fs_test was conceptually run to get that job summary. It 
basically has two phases: a write phase followed by a read phase. Within 
the write phase, each process writes a certain amount to its own file. 
Write operations continue in a loop for 5 minutes (verification happens 
before each write), but each write goes to a different offset within the 
file. Each process opens its file only once: before the first write. 
Each process closes its file only once: after the last write. The read 
phase goes in the same way; it does read operations in a loop for 5 
minutes (verification happens after each read), but only opens and 
closes the file once.

Does anyone have any insight on why the job summary seems to depict 
continuous IO operations? Could the fact that the files are held open 
even though no IO to disk is actually happening be the reason the graphs 
suggest continuous IO operations?

Thank you very much for any insight!
David

-- 
David Shrader
SICORP, Inc
1350 Central Ave
Suite 104
Los Alamos, NM 87544
David.Shrader at SICORP.com

LANL contact information:

LANL #: 505-664-0996
LANL email: dshrader at lanl.gov

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fs_test.moonlight.x-darshan_N-N_summary.pdf
Type: application/pdf
Size: 39295 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20130507/dcf1eaa1/attachment-0001.pdf>


More information about the Darshan-users mailing list