[Swift-devel] Swift and BGP plots
Ioan Raicu
iraicu at cs.uchicago.edu
Mon Oct 26 16:36:00 CDT 2009
Hi Mihael,
This is interesting stuff!
Here is what I understood from the following figures (given the summary
of the execute2 tab):
Mihael Hategan wrote:
> 16k jobs, scratch on GPFS:
> http://www.mcs.anl.gov/~hategan/report-bgp-plain/
>
Shortest event (s): 17.1009998321533
Longest event (s): 521.828999996185
Mean event duration (s): 358.316546020797
Standard deviation of event duration (s): 109.087010337575
> 16k jobs, scratch on compute node:
> http://www.mcs.anl.gov/~hategan/report-bgp-scratch/
>
Shortest event (s): 11.606999874115
Longest event (s): 376.588999986649
Mean event duration (s): 176.960421203068
Standard deviation of event duration (s): 75.0991380521202
> 16k jobs, scratch on compute node, status through provider:
> http://www.mcs.anl.gov/~hategan/report-bgp-scratch-provider/
>
Shortest event (s): 11.8900001049042
Longest event (s): 223.809000015259
Mean event duration (s): 135.653097596136
Standard deviation of event duration (s): 62.1117594571245
For all the above stats, I don't understand why the minimum time is
10~20 seconds, when the jobs are sleep 60? Or perhaps you were doing
sleep 0 here?
> 64k jobs, 4000 workers:
> http://www.mcs.anl.gov/~hategan/report-dc-4000/
>
Shortest event (s): 106.119999885559
Longest event (s): 1246.60699987411
Mean event duration (s): 334.987874176266
Standard deviation of event duration (s): 290.212811366649
Efficiency: 18%?
> 64k jobs, 6000 workers:
> http://www.mcs.anl.gov/~hategan/report-dc-6000/
>
Shortest event (s): 108.671000003815
Longest event (s): 940.963999986649
Mean event duration (s): 255.069875579873
Standard deviation of event duration (s): 130.231747714145
Efficiency: 23.5%?
I assume this is all with "sleep 60" jobs, right?
Here are some comparisons of raw Falkon:
20K jobs, 2K workers, sleep 32, single Falkon service running on login6
Mean event duration (s): 32.6798
Efficiency: 98.3%
40K jobs, 4K workers, sleep 32, distributed Falkon service running on 16
I/O nodes
Mean event duration (s): 34.659
Efficiency: 92.3%
40K jobs, 4K workers, sleep 64, distributed Falkon service running on 16
I/O nodes
Mean event duration (s): 67.02667
Efficiency: 95.5%
1M jobs (983,040), 160K workers, sleep 64, distributed Falkon service
running on 640 I/O nodes
Mean event duration (s): 70.71823
Efficiency: 90.5%
I did a search through my inbox for an old email from Zhao Zhang. Here
is the summary of a run he made:
Total number of events: 512
Shortest event (s): 30
Longest event (s): 33
Mean event duration (s): 30.986328125
Standard deviation of event duration (s): 0.784249612581342
Efficiency: 96.8%
I believe in this run, he had 512 jobs of sleep 30, running on 256
workers via Falkon, using Swift. This small scale run, got a 96.8%
efficiency, which seemed great! He used to have some logs online at
http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1808-96bsfgec/,
but they are not there anymore. Perhaps Zhao still has these plots. I
believe he might have been using some of his CIO (collective I/O)
optimizations in these runs. I can't seem to find larger scale runs with
these optimizations.
These numbers that I posted above, are bits and pieces I found through
various experiments I found we ran over the last year, so their direct
comparisons is not apples to apples, as things have evolved, Swift,
Falkon, including the file system GPFS on the BG/P.
I certainly think it would be useful to have a detailed comparison of
Swift+Coaster and Swift+Falkon using the latest Swift.
Cheers,
Ioan
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
--
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384
Evanston, IL 60208-3118
=================================================================
Cel: 1-847-722-0876
Tel: 1-847-491-8163
Email: iraicu at eecs.northwestern.edu
Web: http://www.eecs.northwestern.edu/~iraicu/
https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================
More information about the Swift-devel
mailing list