[Swift-devel] Re: provider staging stage-in rate on localhost and PADS

Michael Wilde wilde at mcs.anl.gov
Wed Jan 19 10:45:03 CST 2011


A few more test results:

moving 3 byte files: this runs at about 20 jobs/sec in the single-node 8-core test.

moving 30MB files: runs 100 jobs in 143 secs = about 40 MB/sec total in/out

Both tests are using a single input file going to all jobs and N unique output files coming back.

So the latter job I think is about the same ballpark as Mihael's latest results?  And the former job confirms that provider staging does not seem to slow down the job rate unacceptably.

- Mike


----- Original Message -----
> I forgot to also state regarding the test below:
> - tried both stagingMethod proxy and file - no significant perf diff
> - tried workersPerNode > 8 (10 seemed to cause slight degradation, so
> I went back to 8; this test may have just been noise)
> - I used this recent swift and cog: Swift svn swift-r3997 cog-r3029
> (cog modified locally)
> 
> I think the following tests would be good to do in the micro-series:
> 
> - re-do the above on a quiet dedicated PADS cluster node (qsub -I)
> 
> - try the test with just input and just output
> - try the test with a 1-byte file (to see what the protocol issues
> are)
> - try the test with a 30MB file (to try to replicate Mihaels results)
> 
> - try testing from one pads node client to say 3-4 other pads nodes
> (again, with either qsub -I or a swift run with auto coasters and the
> maxNode and nodeGranularity set to say 4 & 4 (or 5 & 5, etc)
> 
> This last test will probe the ability of swift to move more tasks/sec
> when there are more concurrent app-job endpoints (ie when swift is
> driving more cores). We *think* swift trunk should be able to drive
> >100 tasks/sec - maybe even 200/sec - when the configuration is
> optimized: all local disk use; log settings tuned, perhaps; throttles
> set right; ...)
> 
> Then also try swift fast branch, but Mihael needs to post (or you need
> to check in svn) whether all the latest provider staging improvements
> have been, or could be, applied to fast branch.
> 
> Lastly, for the wide area test:
> 
> 10 OSG sites
> try to keep say 2 < N < 10 workers active per site (using queue-N
> COndor script) with most sites having large numbers of workers. That
> should more closely mimic the load your will need to drive for the
> actual application.
> workersPerNode=1
> 
> The wan test will likely require more thought.
> 
> - Mike
> 
> ----- Original Message -----
> > Continuing to work on resolving this problem.
> >
> > I think the next step is to methodically test provider staginug
> > moving
> > from the single-node test to multi-node local (pads) and then to
> > multi-node wan tests.
> >
> > Now that the native coaster job rate to a single one-core worker is
> > better understood (and seems to be 4-5 jobs per second) we can now
> > devise tests with a better understanding of more factors involved.
> >
> > I tried a local test on pads login(at a fairly quiet time, unloaded)
> > as follows:
> > - local coasters service (in Swift jvm)
> > - app is "mv" (to avoid extra data movement)
> > - same input data file is used (so its likely in kernel block cache)
> > - unique output file is used
> > - swift and cwd is on /scratch local disk
> > - file is 3MB (to be closer to Allan's 2.3 MB)
> > - mv app stages file to worker and back (no app reads or writes)
> > - workers per node = 8 (on an 8 core host)
> > - throttle of 200 jobs (2.0)
> > - 100 jobs per swift script invocation
> >
> > I get just over 5 apps/sec or 30MB/sec with this setup.
> >
> > Allan, I'd like to suggest you take it from here, but lets talk as
> > soon as possible this morning to make a plan.
> >
> > One approach that may be fruitful is to re-design a remote test that
> > is closer to what a real scec workload would be (basically your
> > prior
> > tests with some adjustment to the concurrency: more workers per
> > site,
> > and more overall files going in parallel.
> >
> > Then, every time we have a new insight or code change, re-test the
> > larger-scale WAN test in parallel with continuing down the
> > micro-test
> > methods. That way, as soon as we hit a breakthrough that reaches
> > your
> > requires WAN data transfer rate, you can restart the full scec
> > workflow, while we continue to analyze swift behavior issues with
> > the
> > simpler micro benchmarks.
> >
> > Regards,
> >
> > Mike
> >
> >
> > ----- Original Message -----
> > > Ok, so I committed a fix to make the worker send files a bit
> > > faster
> > > and
> > > adjusted the buffer sizes a bit. There is a trade-off between per
> > > worker
> > > performance and number of workers, so this should probably be a
> > > setting
> > > of some sort (since when there are many workers, the client
> > > bandwidth
> > > becomes the bottleneck).
> > >
> > > With a plain cat, 4 workers, 1 job/w, and 32M files I get this:
> > > [IN]: Total transferred: 7.99 GB, current rate: 23.6 MB/s, average
> > > rate:
> > > 16.47 MB/s
> > > [MEM] Heap total: 155.31 MMB, Heap used: 104.2 MMB
> > > [OUT] Total transferred: 8 GB, current rate: 0 B/s, average rate:
> > > 16.49
> > > MB/s
> > > Final status: time:498988 Finished successfully:256
> > > Time: 500.653, rate: 0 j/s
> > >
> > > So the system probably sees 96 MB/s combined reads and writes. I'd
> > > be
> > > curious how this looks without caching, but during the run the
> > > computer
> > > became laggy, so it's saturating something in the OS and/or
> > > hardware.
> > >
> > > I'll test on a cluster next.
> > >
> > > On Sun, 2011-01-16 at 18:02 -0800, Mihael Hategan wrote:
> > > > On Sun, 2011-01-16 at 19:38 -0600, Allan Espinosa wrote:
> > > > > So for the measurement interface, are you measuring the total
> > > > > data
> > > > > received as
> > > > > the data arrives or when the received file is completely
> > > > > written
> > > > > to the job
> > > > > directory.
> > > >
> > > > The average is all the bytes that go from client to all the
> > > > workers
> > > > over
> > > > the entire time spent to run the jobs.
> > > >
> > > > >
> > > > > I was measuring from the logs from JOB_START to JOB_END. I
> > > > > assumed
> > > > > the actualy
> > > > > job execution to be 0. The 7MB/s probably corresponds to
> > > > > Mihael's
> > > > > stage out
> > > > > results. the cat jobs dump to stdout (redirected to a file in
> > > > > the
> > > > > swift
> > > > > wrapper) probably shows the same behavior as the stageout.
> > > >
> > > > I'm becoming less surprised about 7MB/s in the local case. You
> > > > have
> > > > to
> > > > multiply that by 6 to get the real disk I/O bandwidth:
> > > > 1. client reads from disk
> > > > 2. worker writes to disk
> > > > 3. cat reads from disk
> > > > 4. cat writes to disk
> > > > 5. worker reads from disk
> > > > 6. client writes to disk
> > > >
> > > > If it all happens on a single disk, then it adds up to about 42
> > > > MB/s,
> > > > which is a reasonable fraction of what a normal disk can do. It
> > > > would be
> > > > useful to do a dd from /dev/zero to see what the actual disk
> > > > performance
> > > > is.
> > > >
> > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list