[codes-ross-users] Replay HPL's dumpi trace on CODES

Maxime Chevalier maxime.chevalier at inria.fr
Sun Jun 4 11:20:58 CDT 2017


Hi Misbah, 
Thanks for your help, you can find dumpi traces with "UNDEFINED DATA TYPE" and without via the link below. Codes-workload-dump utility is very usefull, thanks for that (I was using dumpistat). 

https://1drv.ms/f/s!Ati25f8zqy9lnNFi7EX8u1tmdJ4rfw 

Regards, 
Maxime 
----- Mail original -----

> De: "Misbah Mubarak" <mmubarak at anl.gov>
> À: "Maxime Chevalier" <maxime.chevalier at inria.fr>,
> codes-ross-users at lists.mcs.anl.gov
> Envoyé: Vendredi 2 Juin 2017 18:54:13
> Objet: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES

> Hi Maxime,

> There is a codes-workload-dump utility that helps you inspect the traces and
> provides detailed information on the individual MPI operations such as
> number of bytes transmitted (which is derived by the data type and count).
> If you could run the utility with one of the traces and send me the output,
> I can have a look at whats going on. Alternatively, if you could share the
> traces, I can have a look at those.

> Using the utility is simple, here is some documentation on how to run it:

> https://xgitlab.cels.anl.gov/codes/codes/wikis/codes-dumpi-workload

> Thanks,
> Misbah
> From: < codes-ross-users-bounces at lists.mcs.anl.gov > on behalf of Maxime
> Chevalier < maxime.chevalier at inria.fr >
> Date: Friday, June 2, 2017 at 8:52 AM
> To: " codes-ross-users at lists.mcs.anl.gov " <
> codes-ross-users at lists.mcs.anl.gov >
> Subject: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES

> Hi Misbah,
> Thanks for your fast response. I was looking for the data type, but I don't
> really understand. I have figured out how to avoid "UNDEFINED DATA TYPE"
> errors by compiling HPL whit "HPL_NO_MPI_DATATYPE", but the output is quite
> the same (see trace below). I don't know if it's a step forward or
> backward...

> Regards,
> Maxime

> Trace :

> Fri Jun 2 09:15:49 2017

> ROSS Revision: 4c6a7d8eb9c784797d900edfc76725d62ec25941

> tw_net_start: Found world size to be 1

> ROSS Core Configuration:
> Total Nodes 1
> Total Processors [Nodes (1) x PE_per_Node (1)] 1
> Total KPs [Nodes (1) x KPs (16)] 16
> Total LPs 54
> Simulation End Time 300000000000.00
> LP-to-PE Mapping model defined

> ROSS Event Memory Allocation:
> Model events 13825
> Network events 50000
> Total events 63824

> *** START SEQUENTIAL SIMULATION ***

> *** END SIMULATION ***

> LP 1 unmatched irecvs 1 unmatched sends 0 Total sends 0 receives 2
> collectives 0 delays 8 wait alls 0 waits 0 send time 0.000000 wait 0.000000
> LP 3 unmatched irecvs 1 unmatched sends 0 Total sends 1 receives 1
> collectives 0 delays 10 wait alls 0 waits 0 send time 3.202149 wait 0.000000
> LP 5 unmatched irecvs 0 unmatched sends 0 Total sends 0 receives 1
> collectives 0 delays 7 wait alls 0 waits 0 send time 0.000000 wait 0.000000
> LP 7 unmatched irecvs 1 unmatched sends 0 Total sends 1 receives 1
> collectives 0 delays 10 wait alls 0 waits 0 send time 3.189207 wait 0.000000
> : Running Time = 0.0001 seconds

> TW Library Statistics:
> Total Events Processed 56
> Events Aborted (part of RBs) 0
> Events Rolled Back 0
> Event Ties Detected in PE Queues 0
> Efficiency 100.00 %
> Total Remote (shared mem) Events Processed 0
> Percent Remote Events 0.00 %
> Total Remote (network) Events Processed 0
> Percent Remote Events 0.00 %

> Total Roll Backs 0
> Primary Roll Backs 0
> Secondary Roll Backs 0
> Fossil Collect Attempts 0
> Total GVT Computations 0

> Net Events Processed 56
> Event Rate (events/sec) 823529.4
> Total Events Scheduled Past End Time 0

> TW Memory Statistics:
> Events Allocated 63825
> Memory Allocated 62573
> Memory Wasted 683

> TW Data Structure sizes in bytes (sizeof):
> PE struct 608
> KP struct 144
> LP struct 128
> LP Model struct 760
> LP RNGs 80
> Total LP 968
> Event struct 144
> Event struct with Model 928

> TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
> Priority Queue (enq/deq) 0.0000
> AVL Tree (insert/delete) 0.0000
> LZ4 (de)compression 0.0000
> Buddy system 0.0000
> Event Processing 0.0000
> Event Cancel 0.0000
> Event Abort 0.0000

> GVT 0.0000
> Fossil Collect 0.0000
> Primary Rollbacks 0.0000
> Network Read 0.0000
> Statistics Computation 0.0000
> Statistics Write 0.0000
> Total Time (Note: Using Running Time above for Speedup) 0.0002

> TW GVT Statistics: MPI AllReduce
> GVT Interval 16
> GVT Real Time Interval (cycles) 0
> GVT Real Time Interval (sec) 0.00000000
> Batch Size 16

> Forced GVT 0
> Total GVT Computations 0
> Total All Reduce Calls 0
> Average Reduction / GVT -nan

> Total bytes sent 8 recvd 20
> max runtime 0.000000 ns avg runtime 0.000000
> max comm time 0.000000 avg comm time -69573.000000
> max send time 3.202149 avg send time 1.597839
> max recv time 45682.609151 avg recv time 11420.652288
> max wait time 0.000000 avg wait time 0.000000
> ----- Mail original -----

> > De: "Misbah Mubarak" < mmubarak at anl.gov >
> 
> > À: "Maxime Chevalier" < maxime.chevalier at inria.fr >,
> > codes-ross-users at lists.mcs.anl.gov
> 
> > Envoyé: Mardi 30 Mai 2017 18:12:46
> 
> > Objet: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES
> 

> > Hi Maxime,
> 

> > Thanks for your message. There seems to be a data type that is either not
> > supported by DUMPI or CODES. Are you familiar with what data types are
> > being
> > used by the HPL trace? I will find out if the support for them can be added
> > in the code.
> 

> > Regards,
> 
> > Misbah
> 
> > From: < codes-ross-users-bounces at lists.mcs.anl.gov > on behalf of Maxime
> > Chevalier < maxime.chevalier at inria.fr >
> 
> > Date: Monday, May 29, 2017 at 3:51 AM
> 
> > To: " codes-ross-users at lists.mcs.anl.gov " <
> > codes-ross-users at lists.mcs.anl.gov >
> 
> > Subject: [codes-ross-users] Replay HPL's dumpi trace on CODES
> 

> > Hi,
> 
> > I'm trying to replay HPL's DUMPI trace generated on my computer with CODES.
> > Unfortunately, I get a lot of "Undefined data type" errors (see the trace
> > below).
> 
> > I have already replayed AMG traces (downloaded here ) and replayed my own
> > generated AMG traces. It has worked fine.
> 
> > So I'm wondering if I did something wrong, or if it's HPL fault.
> 

> > Best regards,
> 
> > Maxime
> 

> > Trace :
> 

> > > ROSS Revision: 4c6a7d8eb9c784797d900edfc76725d62ec25941
> > 
> 

> > > tw_net_start: Found world size to be 1
> > 
> 

> > > ROSS Core Configuration:
> > 
> 
> > > Total Nodes 1
> > 
> 
> > > Total Processors [Nodes (1) x PE_per_Node (1)] 1
> > 
> 
> > > Total KPs [Nodes (1) x KPs (16)] 16
> > 
> 
> > > Total LPs 5
> > 
> 
> > > Simulation End Time 300000000000.00
> > 
> 
> > > LP-to-PE Mapping model defined
> > 
> 

> > > ROSS Event Memory Allocation:
> > 
> 
> > > Model events 1281
> > 
> 
> > > Network events 50000
> > 
> 
> > > Total events 51280
> > 
> 

> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type
> > 
> 
> > > Undefined data type *** START SEQUENTIAL SIMULATION ***
> > 
> 

> > > *** END SIMULATION ***
> > 
> 

> > > LP 1 unmatched irecvs 1 unmatched sends 0 Total sends 0 receives 1
> > > collectives 0 delays 7 wait alls 0 waits 0 send time 0.000000 wait
> > > 0.000000
> > 
> 
> > > : Running Time = 0.0000 seconds
> > 
> 

> > > TW Library Statistics:
> > 
> 
> > > Total Events Processed 8
> > 
> 
> > > Events Aborted (part of RBs) 0
> > 
> 
> > > Events Rolled Back 0
> > 
> 
> > > Event Ties Detected in PE Queues 0
> > 
> 
> > > Efficiency 100.00 %
> > 
> 
> > > Total Remote (shared mem) Events Processed 0
> > 
> 
> > > Percent Remote Events 0.00 %
> > 
> 
> > > Total Remote (network) Events Processed 0
> > 
> 
> > > Percent Remote Events 0.00 %
> > 
> 

> > > Total Roll Backs 0
> > 
> 
> > > Primary Roll Backs 0
> > 
> 
> > > Secondary Roll Backs 0
> > 
> 
> > > Fossil Collect Attempts 0
> > 
> 
> > > Total GVT Computations 0
> > 
> 

> > > Net Events Processed 8
> > 
> 
> > > Event Rate (events/sec) 307692.3
> > 
> 
> > > Total Events Scheduled Past End Time 0
> > 
> 

> > > TW Memory Statistics:
> > 
> 
> > > Events Allocated 51281
> > 
> 
> > > Memory Allocated 51168
> > 
> 
> > > Memory Wasted 720
> > 
> 

> > > TW Data Structure sizes in bytes (sizeof):
> > 
> 
> > > PE struct 608
> > 
> 
> > > KP struct 144
> > 
> 
> > > LP struct 128
> > 
> 
> > > LP Model struct 760
> > 
> 
> > > LP RNGs 80
> > 
> 
> > > Total LP 968
> > 
> 
> > > Event struct 144
> > 
> 
> > > Event struct with Model 928
> > 
> 

> > > TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
> > 
> 
> > > Priority Queue (enq/deq) 0.0000
> > 
> 
> > > AVL Tree (insert/delete) 0.0000
> > 
> 
> > > LZ4 (de)compression 0.0000
> > 
> 
> > > Buddy system 0.0000
> > 
> 
> > > Event Processing 0.0000
> > 
> 
> > > Event Cancel 0.0000
> > 
> 
> > > Event Abort 0.0000
> > 
> 

> > > GVT 0.0000
> > 
> 
> > > Fossil Collect 0.0000
> > 
> 
> > > Primary Rollbacks 0.0000
> > 
> 
> > > Network Read 0.0000
> > 
> 
> > > Statistics Computation 0.0000
> > 
> 
> > > Statistics Write 0.0000
> > 
> 
> > > Total Time (Note: Using Running Time above for Speedup) 0.0001
> > 
> 

> > > TW GVT Statistics: MPI AllReduce
> > 
> 
> > > GVT Interval 16
> > 
> 
> > > GVT Real Time Interval (cycles) 0
> > 
> 
> > > GVT Real Time Interval (sec) 0.00000000
> > 
> 
> > > Batch Size 16
> > 
> 

> > > Forced GVT 0
> > 
> 
> > > Total GVT Computations 0
> > 
> 
> > > Total All Reduce Calls 0
> > 
> 
> > > Average Reduction / GVT -nan
> > 
> 

> > > Total bytes sent 0 recvd 4
> > 
> 
> > > max runtime 0.000000 ns avg runtime 0.000000
> > 
> 
> > > max comm time 0.000000 avg comm time -66232.000000
> > 
> 
> > > max send time 0.000000 avg send time 0.000000
> > 
> 
> > > max recv time 0.000000 avg recv time 0.000000
> > 
> 
> > > max wait time 0.000000 avg wait time 0.000000
> > 
> 
> > > LP-IO: writing output to hpl-trace-25282-1495543803/
> > 
> 
> > > LP-IO: data files:
> > 
> 
> > > hpl-trace-25282-1495543803/mpi-replay-stats
> > 
> 
> > > hpl-trace-25282-1495543803/model-net-category-all
> > 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20170604/6ff27b13/attachment-0001.html>


More information about the codes-ross-users mailing list