[Darshan-users] Darshan & EPCC benchio different behaviour

Shane Snyder ssnyder at mcs.anl.gov
Tue Feb 11 13:15:50 CST 2020


Definitely looks like something strange is happening when Darshan is 
estimating the time spent in I/O operations (as seen in the very first 
figure, observed write time barely even registers) in the serial case, 
which it is ultimately used to provide the performance estimate.

If you could provide them, the raw Darshan logs would be really helpful. 
That should make it clear whether it's an instrumentation issue (i.e., 
under accounting for time spent in I/O operations at runtime) or if its 
an issue with the heuristics in the PDF summary tool you are using, as 
Kevin points out. If it's the latter, having an example log to test 
modifications to our heuristics would be very helpful to us.

Thanks,
--Shane

On 2/11/20 8:36 AM, Harms, Kevin wrote:
> Piero,
>
>    the performance estimate is based on heuristics, it's possible the 'serial' model is breaking some assumptions about how the I/O is done. Is every rank opening the file, but only rank 0 is doing actual I/O?
>
>    If possible, you could provide the log and we could check to see what the counters look like.
>
> kevin
>
> ________________________________________
> From: Piero LANUCARA <p.lanucara at cineca.it>
> Sent: Tuesday, February 11, 2020 2:28 AM
> To: Harms, Kevin
> Cc: darshan-users at lists.mcs.anl.gov
> Subject: Re: [Darshan-users] Darshan & EPCC benchio different behaviour
>
> Hi Kevin
>
> first of all thanks for the investigation..I did some futher test and it
> seems like the issue may appear using Fortran (MPI, mainly IntelMPI) codes.
>
> Is this information useful?
>
> regards
> Piero
> Il 07/02/2020 16:07, Harms, Kevin ha scritto:
>> Piero,
>>
>>     just to confirm, the serial case is still running in parallel, 36 processes, but the I/O is only from rank 0?
>>
>> kevin
>>
>> ________________________________________
>> From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Piero LANUCARA <p.lanucara at cineca.it>
>> Sent: Wednesday, February 5, 2020 4:56 AM
>> To: darshan-users at lists.mcs.anl.gov
>> Subject: Re: [Darshan-users] Darshan & EPCC benchio different behaviour
>>
>> p.s
>>
>> to be more "verbose" I add to the discussion:
>>
>> Darshan output for the "serial" run (serial.pdf)
>>
>> Darshan output for the MPI-IO run (mpiio.pdf)
>>
>> benchio output for "serial" run (serial.out)
>>
>> benchio output for "MPI-IO" run (mpi-io.out)
>>
>> thanks
>>
>> Piero
>>
>> Il 04/02/2020 19:44, Piero LANUCARA ha scritto:
>>> Dear all
>>>
>>> I'm using Darshan to measure EPCC benchio benchmark
>>> (https://github.com/EPCCed/benchio) behaviour on a given x86 Tier1
>>> machine.
>>>
>>> running two benchio tests (MPI-IO and serial) a different behaviour
>>> appear
>>>
>>> while Darhsan pdf log file is able to recover the estimated time and
>>> bandwidth in the MPI-IO case, the "serial" run is completely
>>> underestimated by Darshan (the time and bandwidth are less/greater
>>> than benchio output).
>>>
>>> Suggestions are welcomed
>>>
>>> thanks
>>>
>>> Piero
>>>
>>> _______________________________________________
>>> Darshan-users mailing list
>>> Darshan-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users



More information about the Darshan-users mailing list