[Darshan-users] Darshan & EPCC benchio different behaviour
Piero LANUCARA
p.lanucara at cineca.it
Wed Feb 12 04:29:49 CST 2020
Hi Shane, Kevin
thanks for the update.
I attached a new upated files (log and pdf) to this email.
Also, the log from BENCHIO is attached.
thanks again
regards
Piero
Il 11/02/2020 20:15, Shane Snyder ha scritto:
> Definitely looks like something strange is happening when Darshan is
> estimating the time spent in I/O operations (as seen in the very first
> figure, observed write time barely even registers) in the serial case,
> which it is ultimately used to provide the performance estimate.
>
> If you could provide them, the raw Darshan logs would be really
> helpful. That should make it clear whether it's an instrumentation
> issue (i.e., under accounting for time spent in I/O operations at
> runtime) or if its an issue with the heuristics in the PDF summary
> tool you are using, as Kevin points out. If it's the latter, having an
> example log to test modifications to our heuristics would be very
> helpful to us.
>
> Thanks,
> --Shane
>
> On 2/11/20 8:36 AM, Harms, Kevin wrote:
>> Piero,
>>
>> the performance estimate is based on heuristics, it's possible the
>> 'serial' model is breaking some assumptions about how the I/O is
>> done. Is every rank opening the file, but only rank 0 is doing actual
>> I/O?
>>
>> If possible, you could provide the log and we could check to see
>> what the counters look like.
>>
>> kevin
>>
>> ________________________________________
>> From: Piero LANUCARA <p.lanucara at cineca.it>
>> Sent: Tuesday, February 11, 2020 2:28 AM
>> To: Harms, Kevin
>> Cc: darshan-users at lists.mcs.anl.gov
>> Subject: Re: [Darshan-users] Darshan & EPCC benchio different behaviour
>>
>> Hi Kevin
>>
>> first of all thanks for the investigation..I did some futher test and it
>> seems like the issue may appear using Fortran (MPI, mainly IntelMPI)
>> codes.
>>
>> Is this information useful?
>>
>> regards
>> Piero
>> Il 07/02/2020 16:07, Harms, Kevin ha scritto:
>>> Piero,
>>>
>>> just to confirm, the serial case is still running in parallel,
>>> 36 processes, but the I/O is only from rank 0?
>>>
>>> kevin
>>>
>>> ________________________________________
>>> From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on
>>> behalf of Piero LANUCARA <p.lanucara at cineca.it>
>>> Sent: Wednesday, February 5, 2020 4:56 AM
>>> To: darshan-users at lists.mcs.anl.gov
>>> Subject: Re: [Darshan-users] Darshan & EPCC benchio different behaviour
>>>
>>> p.s
>>>
>>> to be more "verbose" I add to the discussion:
>>>
>>> Darshan output for the "serial" run (serial.pdf)
>>>
>>> Darshan output for the MPI-IO run (mpiio.pdf)
>>>
>>> benchio output for "serial" run (serial.out)
>>>
>>> benchio output for "MPI-IO" run (mpi-io.out)
>>>
>>> thanks
>>>
>>> Piero
>>>
>>> Il 04/02/2020 19:44, Piero LANUCARA ha scritto:
>>>> Dear all
>>>>
>>>> I'm using Darshan to measure EPCC benchio benchmark
>>>> (https://github.com/EPCCed/benchio) behaviour on a given x86 Tier1
>>>> machine.
>>>>
>>>> running two benchio tests (MPI-IO and serial) a different behaviour
>>>> appear
>>>>
>>>> while Darhsan pdf log file is able to recover the estimated time and
>>>> bandwidth in the MPI-IO case, the "serial" run is completely
>>>> underestimated by Darshan (the time and bandwidth are less/greater
>>>> than benchio output).
>>>>
>>>> Suggestions are welcomed
>>>>
>>>> thanks
>>>>
>>>> Piero
>>>>
>>>> _______________________________________________
>>>> Darshan-users mailing list
>>>> Darshan-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: benchio_1202.darshan
Type: application/octet-stream
Size: 877 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200212/844476f4/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: benchio_1202.darshan.pdf
Type: application/pdf
Size: 67310 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200212/844476f4/attachment-0001.pdf>
-------------- next part --------------
Simple Parallel IO benchmark
----------------------------
Running on 32 process(es)
Process grid is ( 2 , 4 , 4 )
Array size is ( 256 , 256 , 256 )
Global size is ( 512 , 1024 , 1024 )
Total amount of data = 4096.00000000000 MiB
Clock resolution is 1.00000000000000 , usecs
------
Serial
------
Writing to benchio_files/serial.dat
time = 4.30992698669434 , rate = 950.364127430749 MiB/s
time = 4.26962995529175 , rate = 959.333722802710 MiB/s
time = 4.28477692604065 , rate = 955.942414436243 MiB/s
time = 4.28777194023132 , rate = 955.274687435690 MiB/s
time = 4.28683900833130 , rate = 955.482580997231 MiB/s
time = 4.29409790039062 , rate = 953.867400095232 MiB/s
time = 4.28207302093506 , rate = 956.546042062957 MiB/s
time = 4.26365089416504 , rate = 960.679028765119 MiB/s
time = 4.26587200164795 , rate = 960.178832936777 MiB/s
time = 4.26457810401917 , rate = 960.470156740643 MiB/s
mintime = 4.26365089416504 , maxrate = 960.679028765119 MiB/s
avgtime = 4.28092167377472 , avgrate = 956.813899370335 MiB/s
Deleting: benchio_files/serial.dat
--------
Finished
--------
real 0m45.555s
user 0m0.025s
sys 0m0.025s
More information about the Darshan-users
mailing list