[codes-ross-users] Replay HPL's dumpi trace on CODES
Mubarak, Misbah
mmubarak at anl.gov
Fri Jun 2 11:54:13 CDT 2017
Hi Maxime,
There is a codes-workload-dump utility that helps you inspect the traces and provides detailed information on the individual MPI operations such as number of bytes transmitted (which is derived by the data type and count). If you could run the utility with one of the traces and send me the output, I can have a look at whats going on. Alternatively, if you could share the traces, I can have a look at those.
Using the utility is simple, here is some documentation on how to run it:
https://xgitlab.cels.anl.gov/codes/codes/wikis/codes-dumpi-workload
Thanks,
Misbah
From: <codes-ross-users-bounces at lists.mcs.anl.gov<mailto:codes-ross-users-bounces at lists.mcs.anl.gov>> on behalf of Maxime Chevalier <maxime.chevalier at inria.fr<mailto:maxime.chevalier at inria.fr>>
Date: Friday, June 2, 2017 at 8:52 AM
To: "codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>" <codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>>
Subject: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES
Hi Misbah,
Thanks for your fast response. I was looking for the data type, but I don't really understand. I have figured out how to avoid "UNDEFINED DATA TYPE" errors by compiling HPL whit "HPL_NO_MPI_DATATYPE", but the output is quite the same (see trace below). I don't know if it's a step forward or backward...
Regards,
Maxime
Trace :
Fri Jun 2 09:15:49 2017
ROSS Revision: 4c6a7d8eb9c784797d900edfc76725d62ec25941
tw_net_start: Found world size to be 1
ROSS Core Configuration:
Total Nodes 1
Total Processors [Nodes (1) x PE_per_Node (1)] 1
Total KPs [Nodes (1) x KPs (16)] 16
Total LPs 54
Simulation End Time 300000000000.00
LP-to-PE Mapping model defined
ROSS Event Memory Allocation:
Model events 13825
Network events 50000
Total events 63824
*** START SEQUENTIAL SIMULATION ***
*** END SIMULATION ***
LP 1 unmatched irecvs 1 unmatched sends 0 Total sends 0 receives 2 collectives 0 delays 8 wait alls 0 waits 0 send time 0.000000 wait 0.000000
LP 3 unmatched irecvs 1 unmatched sends 0 Total sends 1 receives 1 collectives 0 delays 10 wait alls 0 waits 0 send time 3.202149 wait 0.000000
LP 5 unmatched irecvs 0 unmatched sends 0 Total sends 0 receives 1 collectives 0 delays 7 wait alls 0 waits 0 send time 0.000000 wait 0.000000
LP 7 unmatched irecvs 1 unmatched sends 0 Total sends 1 receives 1 collectives 0 delays 10 wait alls 0 waits 0 send time 3.189207 wait 0.000000
: Running Time = 0.0001 seconds
TW Library Statistics:
Total Events Processed 56
Events Aborted (part of RBs) 0
Events Rolled Back 0
Event Ties Detected in PE Queues 0
Efficiency 100.00 %
Total Remote (shared mem) Events Processed 0
Percent Remote Events 0.00 %
Total Remote (network) Events Processed 0
Percent Remote Events 0.00 %
Total Roll Backs 0
Primary Roll Backs 0
Secondary Roll Backs 0
Fossil Collect Attempts 0
Total GVT Computations 0
Net Events Processed 56
Event Rate (events/sec) 823529.4
Total Events Scheduled Past End Time 0
TW Memory Statistics:
Events Allocated 63825
Memory Allocated 62573
Memory Wasted 683
TW Data Structure sizes in bytes (sizeof):
PE struct 608
KP struct 144
LP struct 128
LP Model struct 760
LP RNGs 80
Total LP 968
Event struct 144
Event struct with Model 928
TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
Priority Queue (enq/deq) 0.0000
AVL Tree (insert/delete) 0.0000
LZ4 (de)compression 0.0000
Buddy system 0.0000
Event Processing 0.0000
Event Cancel 0.0000
Event Abort 0.0000
GVT 0.0000
Fossil Collect 0.0000
Primary Rollbacks 0.0000
Network Read 0.0000
Statistics Computation 0.0000
Statistics Write 0.0000
Total Time (Note: Using Running Time above for Speedup) 0.0002
TW GVT Statistics: MPI AllReduce
GVT Interval 16
GVT Real Time Interval (cycles) 0
GVT Real Time Interval (sec) 0.00000000
Batch Size 16
Forced GVT 0
Total GVT Computations 0
Total All Reduce Calls 0
Average Reduction / GVT -nan
Total bytes sent 8 recvd 20
max runtime 0.000000 ns avg runtime 0.000000
max comm time 0.000000 avg comm time -69573.000000
max send time 3.202149 avg send time 1.597839
max recv time 45682.609151 avg recv time 11420.652288
max wait time 0.000000 avg wait time 0.000000
________________________________
De: "Misbah Mubarak" <mmubarak at anl.gov<mailto:mmubarak at anl.gov>>
À: "Maxime Chevalier" <maxime.chevalier at inria.fr<mailto:maxime.chevalier at inria.fr>>, codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>
Envoyé: Mardi 30 Mai 2017 18:12:46
Objet: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES
Hi Maxime,
Thanks for your message. There seems to be a data type that is either not supported by DUMPI or CODES. Are you familiar with what data types are being used by the HPL trace? I will find out if the support for them can be added in the code.
Regards,
Misbah
From: <codes-ross-users-bounces at lists.mcs.anl.gov<mailto:codes-ross-users-bounces at lists.mcs.anl.gov>> on behalf of Maxime Chevalier <maxime.chevalier at inria.fr<mailto:maxime.chevalier at inria.fr>>
Date: Monday, May 29, 2017 at 3:51 AM
To: "codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>" <codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>>
Subject: [codes-ross-users] Replay HPL's dumpi trace on CODES
Hi,
I'm trying to replay HPL's DUMPI trace generated on my computer with CODES. Unfortunately, I get a lot of "Undefined data type" errors (see the trace below).
I have already replayed AMG traces (downloaded here<http://portal.nersc.gov/project/CAL/designforward.htm>) and replayed my own generated AMG traces. It has worked fine.
So I'm wondering if I did something wrong, or if it's HPL fault.
Best regards,
Maxime
Trace :
ROSS Revision: 4c6a7d8eb9c784797d900edfc76725d62ec25941
tw_net_start: Found world size to be 1
ROSS Core Configuration:
Total Nodes 1
Total Processors [Nodes (1) x PE_per_Node (1)] 1
Total KPs [Nodes (1) x KPs (16)] 16
Total LPs 5
Simulation End Time 300000000000.00
LP-to-PE Mapping model defined
ROSS Event Memory Allocation:
Model events 1281
Network events 50000
Total events 51280
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type *** START SEQUENTIAL SIMULATION ***
*** END SIMULATION ***
LP 1 unmatched irecvs 1 unmatched sends 0 Total sends 0 receives 1 collectives 0 delays 7 wait alls 0 waits 0 send time 0.000000 wait 0.000000
: Running Time = 0.0000 seconds
TW Library Statistics:
Total Events Processed 8
Events Aborted (part of RBs) 0
Events Rolled Back 0
Event Ties Detected in PE Queues 0
Efficiency 100.00 %
Total Remote (shared mem) Events Processed 0
Percent Remote Events 0.00 %
Total Remote (network) Events Processed 0
Percent Remote Events 0.00 %
Total Roll Backs 0
Primary Roll Backs 0
Secondary Roll Backs 0
Fossil Collect Attempts 0
Total GVT Computations 0
Net Events Processed 8
Event Rate (events/sec) 307692.3
Total Events Scheduled Past End Time 0
TW Memory Statistics:
Events Allocated 51281
Memory Allocated 51168
Memory Wasted 720
TW Data Structure sizes in bytes (sizeof):
PE struct 608
KP struct 144
LP struct 128
LP Model struct 760
LP RNGs 80
Total LP 968
Event struct 144
Event struct with Model 928
TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
Priority Queue (enq/deq) 0.0000
AVL Tree (insert/delete) 0.0000
LZ4 (de)compression 0.0000
Buddy system 0.0000
Event Processing 0.0000
Event Cancel 0.0000
Event Abort 0.0000
GVT 0.0000
Fossil Collect 0.0000
Primary Rollbacks 0.0000
Network Read 0.0000
Statistics Computation 0.0000
Statistics Write 0.0000
Total Time (Note: Using Running Time above for Speedup) 0.0001
TW GVT Statistics: MPI AllReduce
GVT Interval 16
GVT Real Time Interval (cycles) 0
GVT Real Time Interval (sec) 0.00000000
Batch Size 16
Forced GVT 0
Total GVT Computations 0
Total All Reduce Calls 0
Average Reduction / GVT -nan
Total bytes sent 0 recvd 4
max runtime 0.000000 ns avg runtime 0.000000
max comm time 0.000000 avg comm time -66232.000000
max send time 0.000000 avg send time 0.000000
max recv time 0.000000 avg recv time 0.000000
max wait time 0.000000 avg wait time 0.000000
LP-IO: writing output to hpl-trace-25282-1495543803/
LP-IO: data files:
hpl-trace-25282-1495543803/mpi-replay-stats
hpl-trace-25282-1495543803/model-net-category-all
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20170602/1135d697/attachment-0001.html>
More information about the codes-ross-users
mailing list