[Darshan-users] darshan-job-summary.pl reports non-stop IO
dshrader at lanl.gov
Tue May 7 17:10:15 CDT 2013
In working a bit with Darshan, I have noticed that the "Timespan from
first to last read/write access on independent files" graphs report
continuous IO operations even when processes are not actively doing IO
to their files. To at least give a visual to what I am talking about, I
have attached a job summary from an IO reproducing program called
fs_test where it is writing a N to N pattern using 32 processes. Before
each write, I have fs_test doing some verification on its buffers before
issuing the actual write command. Additionally, I have fs_test doing
verification on buffers after each read. However, in looking at the
Timespan from first to last access graphs, it looks like continuous IO.
Judging by tracing data, time in the verification functions account for
about 75% of the total time of the job whereas the rest of the time is
spent doing actual IO operations (which is accurately reflected in
"Average I/O cost per process" graph in the summary), so the depiction
of continuous IO in those graphs isn't correct. I'm trying to figure out
if this discrepancy is just the way darshan-job-summary.pl creates the
Timespan graph or if something else might be to blame.
Here's how fs_test was conceptually run to get that job summary. It
basically has two phases: a write phase followed by a read phase. Within
the write phase, each process writes a certain amount to its own file.
Write operations continue in a loop for 5 minutes (verification happens
before each write), but each write goes to a different offset within the
file. Each process opens its file only once: before the first write.
Each process closes its file only once: after the last write. The read
phase goes in the same way; it does read operations in a loop for 5
minutes (verification happens after each read), but only opens and
closes the file once.
Does anyone have any insight on why the job summary seems to depict
continuous IO operations? Could the fact that the files are held open
even though no IO to disk is actually happening be the reason the graphs
suggest continuous IO operations?
Thank you very much for any insight!
1350 Central Ave
Los Alamos, NM 87544
David.Shrader at SICORP.com
LANL contact information:
LANL #: 505-664-0996
LANL email: dshrader at lanl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 39295 bytes
Desc: not available
More information about the Darshan-users