[Darshan-commits] [Darshan] branch, master, updated. darshan-2.3.1-7-gf68228a

Wed Aug 19 10:56:09 CDT 2015

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "".

The branch, master has been updated
       via  f68228a5a98d67a3d1e2e29edf94c4090a1a7392 (commit)
      from  fbce8030d52a56218fd8b813b180d8060ce4a99b (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit f68228a5a98d67a3d1e2e29edf94c4090a1a7392
Author: Phil Carns <carns at mcs.anl.gov>
Date:   Wed Aug 19 11:55:50 2015 -0400

    integrate expanded darshan-parser documentation
    
    - provided by Huong Luu

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                         |    1 +
 darshan-util/doc/darshan-util.txt |  166 ++++++++++++++++++++++++++++--------
 2 files changed, 130 insertions(+), 37 deletions(-)


Diff of changes:

diff --git a/ChangeLog b/ChangeLog
index ed45c28..3e1a054 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -10,6 +10,7 @@ darshan-2.3.2-pre1
 * Fix faulty logic in extracting I/O data from the aio_return 
   wrapper (Shane Snyder)
 * Fix bug in common access counter logic (Shane Snyder)
+* Expand and clarify darshan-parser documentation (Huong Luu)
 
 darshan-2.3.1
 =============
diff --git a/darshan-util/doc/darshan-util.txt b/darshan-util/doc/darshan-util.txt
index b9eced7..dbf282e 100644
--- a/darshan-util/doc/darshan-util.txt
+++ b/darshan-util/doc/darshan-util.txt
@@ -133,11 +133,10 @@ specified file.
 
 === darshan-parser
 
-In order to obtained a full, human readable dump of all information
-contained in a log file, you can use the `darshan-parser` command
-line utility.  It does not require any additional command line tools.
-The following example essentially converts the contents of the log file
-into a fully expanded text file:
+You can use the `darshan-parser` command line utility to obtain a
+complete, human-readable, text-format dump of all information contained
+in a log file.   The following example converts the contents of the
+log file into a fully expanded text file:
 
 ----
 darshan-parser carns_my-app_id114525_7-27-58921_19.darshan.gz > ~/job-characterization.txt
@@ -148,8 +147,14 @@ The format of this output is described in the following section.
 === Guide to darshan-parser output
 
 The beginning of the output from darshan-parser displays a summary of
-overall information about the job. The following table defines the meaning
-of each line:
+overall information about the job. Additional job-level summary information
+can also be produced using the `--perf`, `--file`, `--file-list`, or
+`--file-list-detailed` command line options.  See the
+<<addsummary,Additional summary output>> section for more information about
+those options.
+
+The following table defines the meaning
+of each line in the default header section of the output:
 
 [cols="25%,75%",options="header"]
 |====
@@ -307,11 +312,11 @@ each file:
 |====
 
 ==== Additional summary output
+[[addsummary]]
 
 ===== Performance
 
-Use the '--perf' option to get performance approximations using four
-different computations.
+Job performance information can be generated using the `--perf` command-line option.
 
 .Example output
 ----
@@ -344,6 +349,54 @@ different computations.
 # agg_perf_by_slowest: 2206.983935
 ----
 
+The `total_bytes` line shows the total number of bytes transferred
+(read/written) by the job.  That is followed by three sections:
+
+.I/O timing for unique files
+
+This section reports information about any files that were *not* opened
+by every rank in the job.  This includes independent files (opened by
+1 process) and partially shared files (opened by a proper subset of
+the job's processes). The I/O time for this category of file access
+is reported based on the *slowest* rank of all processes that performed this
+type of file access.
+
+* unique files: slowest_rank_io_time: total I/O time for unique files
+  (including both metadata + data transfer time)
+* unique files: slowest_rank_meta_time: metadata time for unique files
+* unique files: slowest_rank: the rank of the slowest process
+
+.I/O timing for shared files
+
+This section reports information about files that were globally shared (i.e.
+opened by every rank in the job).  This section estimates performance for
+globally shared files using four different methods.  The `time_by_slowest`
+is generally the most accurate, but it may not available in some older Darshan
+log files. 
+
+* shared files: time_by_cumul_*: adds the cumulative time across all
+  processes and divides by the number of processes (inaccurate when there is
+  high variance among processes).
+** shared files: time_by_cumul_io_only: include metadata AND data transfer
+   time for global shared files
+** shared files: time_by_cumul_meta_only: metadata time for global shared
+   files
+* shared files: time_by_open: difference between timestamp of open and
+  close (inaccurate if file is left open without I/O activity)
+* shared files: time_by_open_lastio: difference between timestamp of open
+  and the timestamp of last I/O (similar to above but fixes case where file is
+  left open after I/O is complete)
+* shared files: time_by_slowest : measures time according to which rank was
+  the slowest to perform both metadata operations and data transfer for each
+  shared file. (most accurate but requires newer log version)
+
+.Aggregate performance
+
+Performance is calculated by dividing the total bytes by the I/O time
+(shared files and unique files combined) computed
+using each of the four methods described in the previous output section. Note the unit for total bytes is
+Byte and for the aggregate performance is MiB/s (1024*1024 Bytes/s).
+
 ===== Files
 Use the `--file` option to get totals based on file usage.
 The first column is the count of files for that type, the second column is
@@ -353,9 +406,14 @@ accessed.
 * total: All files
 * read_only: Files that were only read from
 * write_only: Files that were only written to
+* read_write: Files that were both read and written
 * unique: Files that were opened on only one rank
 * shared: File that were opened by more than one rank
 
+Each line has 3 columns. The first column is the count of files for that
+type of file, the second column is number of bytes for that type, and the third
+column is the maximum offset accessed.
+
 .Example output
 ----
 # files
@@ -368,37 +426,20 @@ accessed.
 # shared: 1540 236561051820 154157611
 ----
 
-===== Totals
-
-Use the `--total` option to get all statistics as an aggregate total.
-Statistics that make sense to be aggregated are aggregated. Other statistics
-may be a minimum or maximum if that makes sense. Other data maybe zeroed if
-it doesn't make sense to aggregate the data.
-
-.Example output
-----
-total_CP_INDEP_OPENS: 0
-total_CP_COLL_OPENS: 196608
-total_CP_INDEP_READS: 0
-total_CP_INDEP_WRITES: 0
-total_CP_COLL_READS: 0
-total_CP_COLL_WRITES: 0
-total_CP_SPLIT_READS: 0
-total_CP_SPLIT_WRITES: 1179648
-total_CP_NB_READS: 0
-total_CP_NB_WRITES: 0
-total_CP_SYNCS: 0
-total_CP_POSIX_READS: 983045
-total_CP_POSIX_WRITES: 33795
-total_CP_POSIX_OPENS: 230918
-...
-----
-
 ===== File list
 
 Use the `--file-list` option to produce a list of files opened by the
 application along with estimates of the amount of time spent accessing each
-file.
+file.  Each file is represented with one line with six columns:
+
+* <hash>: hash of file name
+* <suffix>: last 15 characters of file name
+* <type>: MPI or POSIX. A file is considered of MPI type if it is opened
+usijng an MPI function (directly or indirectly by a higher library such as
+HDF or NetCDF). 
+* <nprocs>: number of processes that opened the file
+* <slowest>: (estimated) time in seconds consumed in IO by slowest process
+* <avg>: average time in seconds consumed in IO per process
 
 .Example output
 ----
@@ -414,11 +455,62 @@ file.
 17028232952633024488    amples/boom.dat MPI 2   0.000363    0.012262
 ----
 
+This data could be post-processed to compute more in-depth statistics, such as
+the total number of MPI files and total number of POSIX files used in a
+job, categorizing files into independent/unique/local files (opened by
+1 process), subset/partially shared files (opened by a proper subset of
+processes) or globally shared files (opened by all processes), and ranking
+files according to how much time was spent performing I/O in each file.
+
 ===== Detailed file list
 
 The `--file-list-detailed` is the same as --file-list except that it
 produces many columns of output containing statistics broken down by file.
-This option is mainly useful for automated analysis.
+This option is mainly useful for automated analysis.  Each file opened by
+the job is represented using one output line with the following colums:
+
+* <hash>: hash of file name
+* <suffix>: last 15 characters of file name
+* <type>: MPI or POSIX. A file is considered of MPI type if it is opened
+using an MPI function (directly or indirectly by a higher library such as HDF,
+NetCDF). 
+* <nprocs>: number of processes that opened the file
+* <slowest>: (estimated) time in seconds consumed in IO by slowest process
+* <avg>: average time in seconds consumed in IO per process
+* <start_{open/read/write}>: start timestamp of first open, read, or write
+* <end_{open/read/write}>: end timestamp of last open, read, or write
+* <mpi_indep_opens>: independent MPI_File_open calls
+* <mpi_coll_opens>: collective MPI_File_open calls
+* <posix_opens>: POSIX open calls
+* <CP_SIZE_READ_*>: POSIX read size histogram
+* <CP_SIZE_WRITE_*>: POSIX write size histogram
+
+===== Totals
+
+Use the `--total` option to get all statistics as an aggregate total rather
+than broken down per file.  Each field is either summed across files and
+process (for values such as number of opens), set to global minimums and
+maximums (for values such as open time and close time), or zeroed out (for
+statistics that are nonsensical in aggregate).
+
+.Example output
+----
+total_CP_INDEP_OPENS: 0
+total_CP_COLL_OPENS: 196608
+total_CP_INDEP_READS: 0
+total_CP_INDEP_WRITES: 0
+total_CP_COLL_READS: 0
+total_CP_COLL_WRITES: 0
+total_CP_SPLIT_READS: 0
+total_CP_SPLIT_WRITES: 1179648
+total_CP_NB_READS: 0
+total_CP_NB_WRITES: 0
+total_CP_SYNCS: 0
+total_CP_POSIX_READS: 983045
+total_CP_POSIX_WRITES: 33795
+total_CP_POSIX_OPENS: 230918
+...
+----
 
 === Other command line utilities
 


hooks/post-receive
--