[Darshan-users] Estimating the overlap of MPI I/O wait time

Harms, Kevin harms at alcf.anl.gov
Wed Feb 10 12:20:30 CST 2016


Jean-Thomas,

  there are some variance counters in the darshan data. Did you take a look at those? They might give you some idea about how different the workload is among the ranks. If this isn't collective I/O and there is high variance, the performance estimates may be off.

kevin



>Hi Kevin,
>
>do you think it could make sense to hack Darshan runtime in order to record event time stamp?
>Maybe in a statistical way to limit memory overhead. I'm thinking for instance about event density for every second of execution. 
>So at the end of a job I'll be able to estimate contention (assuming that clock are not drifting that much between MPI ranks).
>
>jean-thomas
>________________________________________
>From: Harms, Kevin [harms at alcf.anl.gov]
>Sent: Wednesday, February 03, 2016 6:36 PM
>To: Jean-Thomas Acquaviva; darshan-users at lists.mcs.anl.gov
>Subject: Re: [Darshan-users]  Estimating the overlap of MPI I/O wait time
>
>Jean-Thomas,
>
>  your analysis is correct. Given that the data collected is statistical, one must make some assumptions about the I/O to estimate performance. If you use, darshan-parser --perf <logfile>, this will print some estimates using different heuristics. If you have any feedback on the accuracy, let us know.
>
>kevin
>
>
>
>
>>Hi,
>>
>>I'm using Darshan to analyze the behavior of a parallel MPI Fortran application which writes to a shared file. Among 1024 MPI ranks only 48 are effectively doing I/O, this "geometry" is successfully reported by Darshan.
>>
>>However, for the remaining 48 ranks Darshan reports the MPI I/O time spent writing to the shared file on a per rank basis, but I did not find a way to estimate the overlap between these different ranks.
>>
>>At one end of the spectrum, if write accesses are serialized then the total I/O wait time for the application is the sum of the individual wait time of each rank.
>>On the other end, if write accesses are perfectly parallel the total app wait time is the max among the 48 writers.
>>
>>
>>Probably the exact number lies between these two values, is there a way to estimate it ?
>>
>>
>>Best regards,
>>
>> jean-thomas
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4090 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20160210/18014093/attachment.bin>


More information about the Darshan-users mailing list