[codes-ross-users] Replay HPL's dumpi trace on CODES

Maxime Chevalier maxime.chevalier at inria.fr
Fri Jun 2 07:52:56 CDT 2017


Hi Misbah, 
Thanks for your fast response. I was looking for the data type, but I don't really understand. I have figured out how to avoid "UNDEFINED DATA TYPE" errors by compiling HPL whit "HPL_NO_MPI_DATATYPE", but the output is quite the same (see trace below). I don't know if it's a step forward or backward... 

Regards, 
Maxime 

Trace : 

Fri Jun 2 09:15:49 2017 

ROSS Revision: 4c6a7d8eb9c784797d900edfc76725d62ec25941 

tw_net_start: Found world size to be 1 

ROSS Core Configuration: 
Total Nodes 1 
Total Processors [Nodes (1) x PE_per_Node (1)] 1 
Total KPs [Nodes (1) x KPs (16)] 16 
Total LPs 54 
Simulation End Time 300000000000.00 
LP-to-PE Mapping model defined 

ROSS Event Memory Allocation: 
Model events 13825 
Network events 50000 
Total events 63824 

*** START SEQUENTIAL SIMULATION *** 

*** END SIMULATION *** 

LP 1 unmatched irecvs 1 unmatched sends 0 Total sends 0 receives 2 collectives 0 delays 8 wait alls 0 waits 0 send time 0.000000 wait 0.000000 
LP 3 unmatched irecvs 1 unmatched sends 0 Total sends 1 receives 1 collectives 0 delays 10 wait alls 0 waits 0 send time 3.202149 wait 0.000000 
LP 5 unmatched irecvs 0 unmatched sends 0 Total sends 0 receives 1 collectives 0 delays 7 wait alls 0 waits 0 send time 0.000000 wait 0.000000 
LP 7 unmatched irecvs 1 unmatched sends 0 Total sends 1 receives 1 collectives 0 delays 10 wait alls 0 waits 0 send time 3.189207 wait 0.000000 
: Running Time = 0.0001 seconds 

TW Library Statistics: 
Total Events Processed 56 
Events Aborted (part of RBs) 0 
Events Rolled Back 0 
Event Ties Detected in PE Queues 0 
Efficiency 100.00 % 
Total Remote (shared mem) Events Processed 0 
Percent Remote Events 0.00 % 
Total Remote (network) Events Processed 0 
Percent Remote Events 0.00 % 

Total Roll Backs 0 
Primary Roll Backs 0 
Secondary Roll Backs 0 
Fossil Collect Attempts 0 
Total GVT Computations 0 

Net Events Processed 56 
Event Rate (events/sec) 823529.4 
Total Events Scheduled Past End Time 0 

TW Memory Statistics: 
Events Allocated 63825 
Memory Allocated 62573 
Memory Wasted 683 

TW Data Structure sizes in bytes (sizeof): 
PE struct 608 
KP struct 144 
LP struct 128 
LP Model struct 760 
LP RNGs 80 
Total LP 968 
Event struct 144 
Event struct with Model 928 

TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz): 
Priority Queue (enq/deq) 0.0000 
AVL Tree (insert/delete) 0.0000 
LZ4 (de)compression 0.0000 
Buddy system 0.0000 
Event Processing 0.0000 
Event Cancel 0.0000 
Event Abort 0.0000 

GVT 0.0000 
Fossil Collect 0.0000 
Primary Rollbacks 0.0000 
Network Read 0.0000 
Statistics Computation 0.0000 
Statistics Write 0.0000 
Total Time (Note: Using Running Time above for Speedup) 0.0002 

TW GVT Statistics: MPI AllReduce 
GVT Interval 16 
GVT Real Time Interval (cycles) 0 
GVT Real Time Interval (sec) 0.00000000 
Batch Size 16 

Forced GVT 0 
Total GVT Computations 0 
Total All Reduce Calls 0 
Average Reduction / GVT -nan 

Total bytes sent 8 recvd 20 
max runtime 0.000000 ns avg runtime 0.000000 
max comm time 0.000000 avg comm time -69573.000000 
max send time 3.202149 avg send time 1.597839 
max recv time 45682.609151 avg recv time 11420.652288 
max wait time 0.000000 avg wait time 0.000000 
----- Mail original -----

> De: "Misbah Mubarak" <mmubarak at anl.gov>
> À: "Maxime Chevalier" <maxime.chevalier at inria.fr>,
> codes-ross-users at lists.mcs.anl.gov
> Envoyé: Mardi 30 Mai 2017 18:12:46
> Objet: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES

> Hi Maxime,

> Thanks for your message. There seems to be a data type that is either not
> supported by DUMPI or CODES. Are you familiar with what data types are being
> used by the HPL trace? I will find out if the support for them can be added
> in the code.

> Regards,
> Misbah
> From: < codes-ross-users-bounces at lists.mcs.anl.gov > on behalf of Maxime
> Chevalier < maxime.chevalier at inria.fr >
> Date: Monday, May 29, 2017 at 3:51 AM
> To: " codes-ross-users at lists.mcs.anl.gov " <
> codes-ross-users at lists.mcs.anl.gov >
> Subject: [codes-ross-users] Replay HPL's dumpi trace on CODES

> Hi,
> I'm trying to replay HPL's DUMPI trace generated on my computer with CODES.
> Unfortunately, I get a lot of "Undefined data type" errors (see the trace
> below).
> I have already replayed AMG traces (downloaded here ) and replayed my own
> generated AMG traces. It has worked fine.
> So I'm wondering if I did something wrong, or if it's HPL fault.

> Best regards,
> Maxime

> Trace :

> > ROSS Revision: 4c6a7d8eb9c784797d900edfc76725d62ec25941
> 

> > tw_net_start: Found world size to be 1
> 

> > ROSS Core Configuration:
> 
> > Total Nodes 1
> 
> > Total Processors [Nodes (1) x PE_per_Node (1)] 1
> 
> > Total KPs [Nodes (1) x KPs (16)] 16
> 
> > Total LPs 5
> 
> > Simulation End Time 300000000000.00
> 
> > LP-to-PE Mapping model defined
> 

> > ROSS Event Memory Allocation:
> 
> > Model events 1281
> 
> > Network events 50000
> 
> > Total events 51280
> 

> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type
> 
> > Undefined data type *** START SEQUENTIAL SIMULATION ***
> 

> > *** END SIMULATION ***
> 

> > LP 1 unmatched irecvs 1 unmatched sends 0 Total sends 0 receives 1
> > collectives 0 delays 7 wait alls 0 waits 0 send time 0.000000 wait 0.000000
> 
> > : Running Time = 0.0000 seconds
> 

> > TW Library Statistics:
> 
> > Total Events Processed 8
> 
> > Events Aborted (part of RBs) 0
> 
> > Events Rolled Back 0
> 
> > Event Ties Detected in PE Queues 0
> 
> > Efficiency 100.00 %
> 
> > Total Remote (shared mem) Events Processed 0
> 
> > Percent Remote Events 0.00 %
> 
> > Total Remote (network) Events Processed 0
> 
> > Percent Remote Events 0.00 %
> 

> > Total Roll Backs 0
> 
> > Primary Roll Backs 0
> 
> > Secondary Roll Backs 0
> 
> > Fossil Collect Attempts 0
> 
> > Total GVT Computations 0
> 

> > Net Events Processed 8
> 
> > Event Rate (events/sec) 307692.3
> 
> > Total Events Scheduled Past End Time 0
> 

> > TW Memory Statistics:
> 
> > Events Allocated 51281
> 
> > Memory Allocated 51168
> 
> > Memory Wasted 720
> 

> > TW Data Structure sizes in bytes (sizeof):
> 
> > PE struct 608
> 
> > KP struct 144
> 
> > LP struct 128
> 
> > LP Model struct 760
> 
> > LP RNGs 80
> 
> > Total LP 968
> 
> > Event struct 144
> 
> > Event struct with Model 928
> 

> > TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
> 
> > Priority Queue (enq/deq) 0.0000
> 
> > AVL Tree (insert/delete) 0.0000
> 
> > LZ4 (de)compression 0.0000
> 
> > Buddy system 0.0000
> 
> > Event Processing 0.0000
> 
> > Event Cancel 0.0000
> 
> > Event Abort 0.0000
> 

> > GVT 0.0000
> 
> > Fossil Collect 0.0000
> 
> > Primary Rollbacks 0.0000
> 
> > Network Read 0.0000
> 
> > Statistics Computation 0.0000
> 
> > Statistics Write 0.0000
> 
> > Total Time (Note: Using Running Time above for Speedup) 0.0001
> 

> > TW GVT Statistics: MPI AllReduce
> 
> > GVT Interval 16
> 
> > GVT Real Time Interval (cycles) 0
> 
> > GVT Real Time Interval (sec) 0.00000000
> 
> > Batch Size 16
> 

> > Forced GVT 0
> 
> > Total GVT Computations 0
> 
> > Total All Reduce Calls 0
> 
> > Average Reduction / GVT -nan
> 

> > Total bytes sent 0 recvd 4
> 
> > max runtime 0.000000 ns avg runtime 0.000000
> 
> > max comm time 0.000000 avg comm time -66232.000000
> 
> > max send time 0.000000 avg send time 0.000000
> 
> > max recv time 0.000000 avg recv time 0.000000
> 
> > max wait time 0.000000 avg wait time 0.000000
> 
> > LP-IO: writing output to hpl-trace-25282-1495543803/
> 
> > LP-IO: data files:
> 
> > hpl-trace-25282-1495543803/mpi-replay-stats
> 
> > hpl-trace-25282-1495543803/model-net-category-all
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20170602/da3a7cf6/attachment-0001.html>


More information about the codes-ross-users mailing list