[Darshan-commits] [Git][darshan/darshan][autoperf-mod-update] 2 commits: darshan-runtime docs updates for autoperf

Shane Snyder xgitlab at cels.anl.gov
Mon Mar 29 16:38:51 CDT 2021



Shane Snyder pushed to branch autoperf-mod-update at darshan / darshan


Commits:
cf79e678 by Shane Snyder at 2021-03-26T16:21:54-05:00
darshan-runtime docs updates for autoperf

- - - - -
627dc4e9 by Shane Snyder at 2021-03-29T16:38:10-05:00
updated docs for details on AutoPerf

- - - - -


4 changed files:

- darshan-runtime/configure
- darshan-runtime/doc/darshan-runtime.txt
- darshan-util/doc/darshan-util.txt
- modules/autoperf


Changes:

=====================================
darshan-runtime/configure
=====================================
@@ -688,6 +688,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -785,6 +786,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1037,6 +1039,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
     silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+    ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+    runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
     ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1174,7 +1185,7 @@ fi
 for ac_var in	exec_prefix prefix bindir sbindir libexecdir datarootdir \
 		datadir sysconfdir sharedstatedir localstatedir includedir \
 		oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
-		libdir localedir mandir
+		libdir localedir mandir runstatedir
 do
   eval ac_val=\$$ac_var
   # Remove trailing slashes.
@@ -1327,6 +1338,7 @@ Fine tuning of the installation directories:
   --sysconfdir=DIR        read-only single-machine data [PREFIX/etc]
   --sharedstatedir=DIR    modifiable architecture-independent data [PREFIX/com]
   --localstatedir=DIR     modifiable single-machine data [PREFIX/var]
+  --runstatedir=DIR       modifiable per-process data [LOCALSTATEDIR/run]
   --libdir=DIR            object code libraries [EPREFIX/lib]
   --includedir=DIR        C header files [PREFIX/include]
   --oldincludedir=DIR     C header files for non-gcc [/usr/include]


=====================================
darshan-runtime/doc/darshan-runtime.txt
=====================================
@@ -393,6 +393,28 @@ configuration file:
 export DXT_TRIGGER_CONF_PATH=/path/to/dxt/config/file
 ----
 
+== Using AutoPerf instrumentation modules
+
+AutoPerf offers two additional Darshan instrumentation modules that may be enabled for MPI applications.
+
+* APMPI: Instrumentation of over 70 MPI-3 communication routines, providing operation counts, datatype sizes, and timing information for each application MPI rank.
+* APXC: Instrumentation of Cray XC environments to provide network and compute counters of interest, via PAPI.
+
+Users can request Darshan to build the APMPI and APXC modules by passing `--enable-apmpi-mod` and `--enable-apxc-mod` options to configure, respectively. Note that these options can be requested independently (i.e., you can build Darshan with APMPI support but not APXC support, and vice versa).
+
+The only prerequsisite for the APMPI module is that Darsan be configured with a MPI-3 compliant compiler. For APXC, the user must obviously be using a Cray XC system and must make the PAPI interace available to Darshan (i.e., by running `module load papi`, before building Darshan).
+
+If using the APMPI module, users can additionally specify the `--enable-apmpi-coll-sync` configure option to force Darshan to synchronize before calling underlying MPI routines and to capture additional timing information on how synchronized processes are. Users should note this option will impose additional overheads, but can be useful to help diagnose whether applications are spending a lot of time synchronizing as part of collective communication calls. For this reason, we do not recommend users setting this particular option for production Darshan deployments.
+
+[NOTE]
+====
+The AutoPerf instrumentation modules are provided as Git submodules to Darshan's main repository, so if building Darshan source that has been cloned from Git, it is neccessary to first retrieve the AutoPerf submodules by running the following command:
+
+----
+git submodule update --init
+----
+====
+
 == Darshan installation recipes
 
 The following recipes provide examples for prominent HPC systems.


=====================================
darshan-util/doc/darshan-util.txt
=====================================
@@ -56,6 +56,19 @@ method of compilation.
 The `--enable-shared` argument to configure can be used to enable
 compilation of a shared version of the darshan-util library.
 
+The `--enable-autoperf-apmpi` and `--enable-autoperf-apxc` configure 
+arguments must be specified to build darshan-util with support for AutoPerf
+APMPI and APXC modules, respectively.
+
+[NOTE]
+====
+AutoPerf log analysis code is provided as Git submodules to Darshan's main repository, so if building Darshan source that has been cloned from Git, it is neccessary to first retrieve the AutoPerf submodules by running the following command:
+
+----
+git submodule update --init
+----
+====
+
 == Analyzing log files
 
 Each time a darshan-instrumented application is executed, it will generate a
@@ -432,6 +445,70 @@ value of 1 MiB for optimal file alignment.
 
 ===== Additional modules 
 
+.Lustre module (if enabled, for Lustre file systems)
+[cols="40%,60%",options="header"]
+|====
+| counter name | description
+| LUSTRE_OSTS | number of OSTs (object storage targets) for the file system
+| LUSTRE_MDTS | number of MDTs (metadata targets) for the file system
+| LUSTRE_STRIPE_OFFSET | OST id offset specified at file creation time
+| LUSTRE_STRIPE_SIZE | stripe size for the file in bytes
+| LUSTRE_STRIPE_WIDTH | number of OSTs over which the file is striped
+| LUSTRE_OST_ID_* | indices of OSTs over which the file is striped
+|====
+
+.APXC module header record (if enabled, for Cray XC systems)
+[cols="40%,60%",options="header"]
+|====
+| counter name | description
+| APXC_GROUPS | total number of groups for the job
+| APXC_CHASSIS | total number of chassis for the job
+| APXC_BLADES | total number of blades for the job
+| APXC_MEMORY_MODE | Intel Xeon memory mode
+| APXC_CLUSTER_MODE | Intel Xeon NUMA configuration
+| APXC_MEMORY_MODE_CONSISTENT | Intel Xeon memory mode consistent across all nodes
+| APXC_CLUSTER_MODE_CONSISTENT | Intel Xeon cluster mode consistent across all nodes
+|====
+
+.APXC module per-router record (if enabled, for Cray XC systems)
+[cols="40%,60%",options="header"]
+|====
+| counter name | description
+| APXC_GROUP | group this router is on
+| APXC_CHASSIS | chassis this router is on
+| APXC_BLADE | blade this router is on
+| APXC_NODE | node connected to this router
+| APXC_AR_RTR_x_y_INQ_PRF_INCOMING_FLIT_VC[0-7] | flits on VCs of x y tile for router-router ports
+| APXC_AR_RTR_x_y_INQ_PRF_ROWBUS_STALL_CNT | stalls on x y tile for router-router ports
+| APXC_AR_RTR_PT_x_y_INQ_PRF_INCOMING_FLIT_VC[0,4] | flits on VCs of x y tile for router-nic ports
+| APXC_AR_RTR_PT_x_y_INQ_PRF_REQ_ROWBUS_STALL_CNT | stalls on x y tile for router-nic ports
+|====
+
+.APMPI module header record (if enabled, for MPI applications)
+[cols="40%,60%",options="header"]
+|====
+| counter name | description
+| MPI_TOTAL_COMM_TIME_VARIANCE | variance in total communication time across all the processes
+| MPI_TOTAL_COMM_SYNC_TIME_VARIANCE | variance in total sync time across all the processes, if enabled
+|====
+
+.APMPI module per-process record (if enabled, for MPI applications)
+[cols="40%,60%",options="header"]
+|====
+| counter name | description
+| MPI_PROCESSOR_NAME | name of the processor used by the MPI process
+| MPI_*_CALL_COUNT | total call count for an MPI op
+| MPI_*_TOTAL_BYTES | total bytes (i.e., cumulative across all calls) moved with an MPI op
+| MPI_*\_MSG_SIZE_AGG_* | histogram of total bytes moved for all the calls of an MPI op
+| MPI_*_TOTAL_TIME | total time (i.e, cumulative across all calls) of an MPI op
+| MPI_*_MIN_TIME | minimum time across all calls of an MPI op
+| MPI_*_MAX_TIME | maximum time across all calls of an MPI op
+| MPI_*_TOTAL_SYNC_TIME | total sync time (cumulative across all calls of an op) of an MPI op, if enabled
+| MPI_TOTAL_COMM_TIME | total communication (MPI) time of a process across all the MPI ops
+| MPI_TOTAL_COMM_SYNC_TIME | total sync time of a process across all the MPI ops, if enabled
+|====
+
+
 .BG/Q module (if enabled on BG/Q systems)
 [cols="40%,60%",options="header"]
 |====
@@ -450,19 +527,6 @@ value of 1 MiB for optimal file alignment.
 | BGQ_F_TIMESTAMP | Timestamp of when BG/Q data was collected
 |====
 
-
-.Lustre module (if enabled, for Lustre file systems)
-[cols="40%,60%",options="header"]
-|====
-| counter name | description
-| LUSTRE_OSTS | number of OSTs (object storage targets) for the file system
-| LUSTRE_MDTS | number of MDTs (metadata targets) for the file system
-| LUSTRE_STRIPE_OFFSET | OST id offset specified at file creation time
-| LUSTRE_STRIPE_SIZE | stripe size for the file in bytes
-| LUSTRE_STRIPE_WIDTH | number of OSTs over which the file is striped
-| LUSTRE_OST_ID_* | indices of OSTs over which the file is striped
-|====
-
 ==== Additional summary output
 [[addsummary]]
 


=====================================
modules/autoperf
=====================================
@@ -1 +1 @@
-Subproject commit ba69643fcd0ed7ba48070ba9c50c4facb4fdb78c
+Subproject commit f1f93ce58605e06a82f32a3e1b207e2eaf61f202



View it on GitLab: https://xgitlab.cels.anl.gov/darshan/darshan/-/compare/f4e6aad4960b7f1a0573fae39fd4847ea9c75ac3...627dc4e956e5b210652ec3635b021b3737e1c4be

-- 
View it on GitLab: https://xgitlab.cels.anl.gov/darshan/darshan/-/compare/f4e6aad4960b7f1a0573fae39fd4847ea9c75ac3...627dc4e956e5b210652ec3635b021b3737e1c4be
You're receiving this email because of your account on xgitlab.cels.anl.gov.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-commits/attachments/20210329/d08545b5/attachment-0001.html>


More information about the Darshan-commits mailing list