[Darshan-users] Estimating the overlap of MPI I/O wait time

Jean-Thomas Acquaviva jacquaviva at ddn.com
Tue Feb 9 04:38:06 CST 2016


Hi Kevin,

do you think it could make sense to hack Darshan runtime in order to record event time stamp?
Maybe in a statistical way to limit memory overhead. I'm thinking for instance about event density for every second of execution. 
So at the end of a job I'll be able to estimate contention (assuming that clock are not drifting that much between MPI ranks).

jean-thomas
________________________________________
From: Harms, Kevin [harms at alcf.anl.gov]
Sent: Wednesday, February 03, 2016 6:36 PM
To: Jean-Thomas Acquaviva; darshan-users at lists.mcs.anl.gov
Subject: Re: [Darshan-users]  Estimating the overlap of MPI I/O wait time

Jean-Thomas,

  your analysis is correct. Given that the data collected is statistical, one must make some assumptions about the I/O to estimate performance. If you use, darshan-parser --perf <logfile>, this will print some estimates using different heuristics. If you have any feedback on the accuracy, let us know.

kevin




>Hi,
>
>I'm using Darshan to analyze the behavior of a parallel MPI Fortran application which writes to a shared file. Among 1024 MPI ranks only 48 are effectively doing I/O, this "geometry" is successfully reported by Darshan.
>
>However, for the remaining 48 ranks Darshan reports the MPI I/O time spent writing to the shared file on a per rank basis, but I did not find a way to estimate the overlap between these different ranks.
>
>At one end of the spectrum, if write accesses are serialized then the total I/O wait time for the application is the sum of the individual wait time of each rank.
>On the other end, if write accesses are perfectly parallel the total app wait time is the max among the 48 writers.
>
>
>Probably the exact number lies between these two values, is there a way to estimate it ?
>
>
>Best regards,
>
> jean-thomas
>


More information about the Darshan-users mailing list