[codes-ross-users] Replay HPL's dumpi trace on CODES

Mubarak, Misbah mmubarak at anl.gov
Wed Jun 7 15:42:14 CDT 2017


Hi Maxime,

I ran the HPL traces with no MPI data type on the simulation and here are some observations. I disabled any synchronizations (wait, wait-alls) in the simulation so that it only matches the MPI sends with the receives and does nothing else.

- Rank 0 expects 192 messages from Rank 1 but it instead receives 192 messages from Rank 2.
- Rank 1 receives 192 messages from rank 0 but there are no corresponding receives posted so the messages remain unmatched. -
- Rank 2 is expecting 192 messages from Rank 0 but they don’t arrive (probably because they arrived at Rank 1).

Is it possible that having no MPI data type resulted in missing messages that introduced these discrepancies? Or maybe the application is terminating earlier than usual?

I will try the version with MPI data types and let you know if the results are different.

Thanks,
Misbah
From: <codes-ross-users-bounces at lists.mcs.anl.gov<mailto:codes-ross-users-bounces at lists.mcs.anl.gov>> on behalf of Maxime Chevalier <maxime.chevalier at inria.fr<mailto:maxime.chevalier at inria.fr>>
Date: Sunday, June 4, 2017 at 12:20 PM
To: "codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>" <codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>>
Subject: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES


Hi Misbah,
Thanks for your help, you can find dumpi traces with "UNDEFINED DATA TYPE" and without via the link below. Codes-workload-dump utility is very usefull, thanks for that (I was using dumpistat).

https://1drv.ms/f/s!Ati25f8zqy9lnNFi7EX8u1tmdJ4rfw

Regards,
Maxime
________________________________
De: "Misbah Mubarak" <mmubarak at anl.gov<mailto:mmubarak at anl.gov>>
À: "Maxime Chevalier" <maxime.chevalier at inria.fr<mailto:maxime.chevalier at inria.fr>>, codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>
Envoyé: Vendredi 2 Juin 2017 18:54:13
Objet: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES

Hi Maxime,

There is a codes-workload-dump utility that helps you inspect the traces and provides detailed information on the individual MPI operations such as number of bytes transmitted (which is derived by the data type and count). If you could run the utility with one of the traces and send me the output, I can have a look at whats going on.  Alternatively, if you could share the traces, I can have a look at those.

Using the utility is simple, here is some documentation on how to run it:

https://xgitlab.cels.anl.gov/codes/codes/wikis/codes-dumpi-workload

Thanks,
Misbah
From: <codes-ross-users-bounces at lists.mcs.anl.gov<mailto:codes-ross-users-bounces at lists.mcs.anl.gov>> on behalf of Maxime Chevalier <maxime.chevalier at inria.fr<mailto:maxime.chevalier at inria.fr>>
Date: Friday, June 2, 2017 at 8:52 AM
To: "codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>" <codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>>
Subject: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES

Hi Misbah,
Thanks for your fast response. I was looking for the data type, but I don't really understand. I have figured out how to avoid "UNDEFINED DATA TYPE" errors by compiling HPL whit "HPL_NO_MPI_DATATYPE", but the output is quite the same (see trace below). I don't know if it's a step forward or backward...

Regards,
Maxime

Trace :

Fri Jun 2 09:15:49 2017

ROSS Revision: 4c6a7d8eb9c784797d900edfc76725d62ec25941

tw_net_start: Found world size to be 1

ROSS Core Configuration:
Total Nodes 1
Total Processors [Nodes (1) x PE_per_Node (1)] 1
Total KPs [Nodes (1) x KPs (16)] 16
Total LPs 54
Simulation End Time 300000000000.00
LP-to-PE Mapping model defined

ROSS Event Memory Allocation:
Model events 13825
Network events 50000
Total events 63824

*** START SEQUENTIAL SIMULATION ***

*** END SIMULATION ***

LP 1 unmatched irecvs 1 unmatched sends 0 Total sends 0 receives 2 collectives 0 delays 8 wait alls 0 waits 0 send time 0.000000 wait 0.000000
LP 3 unmatched irecvs 1 unmatched sends 0 Total sends 1 receives 1 collectives 0 delays 10 wait alls 0 waits 0 send time 3.202149 wait 0.000000
LP 5 unmatched irecvs 0 unmatched sends 0 Total sends 0 receives 1 collectives 0 delays 7 wait alls 0 waits 0 send time 0.000000 wait 0.000000
LP 7 unmatched irecvs 1 unmatched sends 0 Total sends 1 receives 1 collectives 0 delays 10 wait alls 0 waits 0 send time 3.189207 wait 0.000000
: Running Time = 0.0001 seconds

TW Library Statistics:
Total Events Processed 56
Events Aborted (part of RBs) 0
Events Rolled Back 0
Event Ties Detected in PE Queues 0
Efficiency 100.00 %
Total Remote (shared mem) Events Processed 0
Percent Remote Events 0.00 %
Total Remote (network) Events Processed 0
Percent Remote Events 0.00 %

Total Roll Backs 0
Primary Roll Backs 0
Secondary Roll Backs 0
Fossil Collect Attempts 0
Total GVT Computations 0

Net Events Processed 56
Event Rate (events/sec) 823529.4
Total Events Scheduled Past End Time 0

TW Memory Statistics:
Events Allocated 63825
Memory Allocated 62573
Memory Wasted 683

TW Data Structure sizes in bytes (sizeof):
PE struct 608
KP struct 144
LP struct 128
LP Model struct 760
LP RNGs 80
Total LP 968
Event struct 144
Event struct with Model 928

TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
Priority Queue (enq/deq) 0.0000
AVL Tree (insert/delete) 0.0000
LZ4 (de)compression 0.0000
Buddy system 0.0000
Event Processing 0.0000
Event Cancel 0.0000
Event Abort 0.0000

GVT 0.0000
Fossil Collect 0.0000
Primary Rollbacks 0.0000
Network Read 0.0000
Statistics Computation 0.0000
Statistics Write 0.0000
Total Time (Note: Using Running Time above for Speedup) 0.0002

TW GVT Statistics: MPI AllReduce
GVT Interval 16
GVT Real Time Interval (cycles) 0
GVT Real Time Interval (sec) 0.00000000
Batch Size 16

Forced GVT 0
Total GVT Computations 0
Total All Reduce Calls 0
Average Reduction / GVT -nan

Total bytes sent 8 recvd 20
max runtime 0.000000 ns avg runtime 0.000000
max comm time 0.000000 avg comm time -69573.000000
max send time 3.202149 avg send time 1.597839
max recv time 45682.609151 avg recv time 11420.652288
max wait time 0.000000 avg wait time 0.000000

________________________________
De: "Misbah Mubarak" <mmubarak at anl.gov<mailto:mmubarak at anl.gov>>
À: "Maxime Chevalier" <maxime.chevalier at inria.fr<mailto:maxime.chevalier at inria.fr>>, codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>
Envoyé: Mardi 30 Mai 2017 18:12:46
Objet: Re: [codes-ross-users] Replay HPL's dumpi trace on CODES

Hi Maxime,

Thanks for your message. There seems to be a data type that is either not supported by DUMPI or CODES. Are you familiar with what data types are being used by the HPL trace? I will find out if the support for them can be added in the code.

Regards,
Misbah
From: <codes-ross-users-bounces at lists.mcs.anl.gov<mailto:codes-ross-users-bounces at lists.mcs.anl.gov>> on behalf of Maxime Chevalier <maxime.chevalier at inria.fr<mailto:maxime.chevalier at inria.fr>>
Date: Monday, May 29, 2017 at 3:51 AM
To: "codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>" <codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>>
Subject: [codes-ross-users] Replay HPL's dumpi trace on CODES

Hi,
I'm trying to replay HPL's DUMPI trace generated on my computer with CODES. Unfortunately, I get a lot of "Undefined data type" errors (see the trace below).
I have already replayed AMG traces (downloaded here<http://portal.nersc.gov/project/CAL/designforward.htm>) and replayed my own generated AMG traces. It has worked fine.
So I'm wondering if I did something wrong, or if it's HPL fault.

Best regards,
Maxime


Trace :


ROSS Revision: 4c6a7d8eb9c784797d900edfc76725d62ec25941

tw_net_start: Found world size to be 1

ROSS Core Configuration:
Total Nodes 1
Total Processors [Nodes (1) x PE_per_Node (1)] 1
Total KPs [Nodes (1) x KPs (16)] 16
Total LPs 5
Simulation End Time 300000000000.00
LP-to-PE Mapping model defined

ROSS Event Memory Allocation:
Model events 1281
Network events 50000
Total events 51280

Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type
Undefined data type *** START SEQUENTIAL SIMULATION ***

*** END SIMULATION ***

LP 1 unmatched irecvs 1 unmatched sends 0 Total sends 0 receives 1 collectives 0 delays 7 wait alls 0 waits 0 send time 0.000000 wait 0.000000
: Running Time = 0.0000 seconds

TW Library Statistics:
Total Events Processed 8
Events Aborted (part of RBs) 0
Events Rolled Back 0
Event Ties Detected in PE Queues 0
Efficiency 100.00 %
Total Remote (shared mem) Events Processed 0
Percent Remote Events 0.00 %
Total Remote (network) Events Processed 0
Percent Remote Events 0.00 %

Total Roll Backs 0
Primary Roll Backs 0
Secondary Roll Backs 0
Fossil Collect Attempts 0
Total GVT Computations 0

Net Events Processed 8
Event Rate (events/sec) 307692.3
Total Events Scheduled Past End Time 0

TW Memory Statistics:
Events Allocated 51281
Memory Allocated 51168
Memory Wasted 720

TW Data Structure sizes in bytes (sizeof):
PE struct 608
KP struct 144
LP struct 128
LP Model struct 760
LP RNGs 80
Total LP 968
Event struct 144
Event struct with Model 928

TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
Priority Queue (enq/deq) 0.0000
AVL Tree (insert/delete) 0.0000
LZ4 (de)compression 0.0000
Buddy system 0.0000
Event Processing 0.0000
Event Cancel 0.0000
Event Abort 0.0000

GVT 0.0000
Fossil Collect 0.0000
Primary Rollbacks 0.0000
Network Read 0.0000
Statistics Computation 0.0000
Statistics Write 0.0000
Total Time (Note: Using Running Time above for Speedup) 0.0001

TW GVT Statistics: MPI AllReduce
GVT Interval 16
GVT Real Time Interval (cycles) 0
GVT Real Time Interval (sec) 0.00000000
Batch Size 16

Forced GVT 0
Total GVT Computations 0
Total All Reduce Calls 0
Average Reduction / GVT -nan

Total bytes sent 0 recvd 4
max runtime 0.000000 ns avg runtime 0.000000
max comm time 0.000000 avg comm time -66232.000000
max send time 0.000000 avg send time 0.000000
max recv time 0.000000 avg recv time 0.000000
max wait time 0.000000 avg wait time 0.000000
LP-IO: writing output to hpl-trace-25282-1495543803/
LP-IO: data files:
hpl-trace-25282-1495543803/mpi-replay-stats
hpl-trace-25282-1495543803/model-net-category-all


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20170607/69b6cb91/attachment-0001.html>


More information about the codes-ross-users mailing list