[Swift-devel] Re: provider staging stage-in rate on localhost and PADS

Wed Jan 19 10:12:28 CST 2011

Continuing to work on resolving this problem.

I think the next step is to methodically test provider staginug moving from the single-node test to multi-node local (pads) and then to multi-node wan tests.

Now that the native coaster job rate to a single one-core worker is better understood (and seems to be 4-5 jobs per second) we can now devise tests with a better understanding of more factors involved.

I tried a local test on pads login(at a fairly quiet time, unloaded) as follows:
- local coasters service (in Swift jvm)
- app is "mv" (to avoid extra data movement)
- same input data file is used (so its likely in kernel block cache)
- unique output file is used
- swift and cwd is on /scratch local disk
- file is 3MB (to be closer to Allan's 2.3 MB)
- mv app stages file to worker and back (no app reads or writes)
- workers per node = 8 (on an 8 core host)
- throttle of 200 jobs (2.0)
- 100 jobs per swift script invocation

I get just over 5 apps/sec or 30MB/sec with this setup.

Allan, I'd like to suggest you take it from here, but lets talk as soon as possible this morning to make a plan.

One approach that may be fruitful is to re-design a remote test that is closer to what a real scec workload would be (basically your prior tests with some adjustment to the concurrency: more workers per site, and more overall files going in parallel.

Then, every time we have a new insight or code change, re-test the larger-scale WAN test in parallel with continuing down the micro-test methods.  That way, as soon as we hit a breakthrough that reaches your requires WAN data transfer rate, you can restart the full scec workflow, while we continue to analyze swift behavior issues with the simpler micro benchmarks.

Regards,

Mike

----- Original Message -----
> Ok, so I committed a fix to make the worker send files a bit faster
> and
> adjusted the buffer sizes a bit. There is a trade-off between per
> worker
> performance and number of workers, so this should probably be a
> setting
> of some sort (since when there are many workers, the client bandwidth
> becomes the bottleneck).
> 
> With a plain cat, 4 workers, 1 job/w, and 32M files I get this:
> [IN]: Total transferred: 7.99 GB, current rate: 23.6 MB/s, average
> rate:
> 16.47 MB/s
> [MEM] Heap total: 155.31 MMB, Heap used: 104.2 MMB
> [OUT] Total transferred: 8 GB, current rate: 0 B/s, average rate:
> 16.49
> MB/s
> Final status: time:498988 Finished successfully:256
> Time: 500.653, rate: 0 j/s
> 
> So the system probably sees 96 MB/s combined reads and writes. I'd be
> curious how this looks without caching, but during the run the
> computer
> became laggy, so it's saturating something in the OS and/or hardware.
> 
> I'll test on a cluster next.
> 
> On Sun, 2011-01-16 at 18:02 -0800, Mihael Hategan wrote:
> > On Sun, 2011-01-16 at 19:38 -0600, Allan Espinosa wrote:
> > > So for the measurement interface, are you measuring the total data
> > > received as
> > > the data arrives or when the received file is completely written
> > > to the job
> > > directory.
> >
> > The average is all the bytes that go from client to all the workers
> > over
> > the entire time spent to run the jobs.
> >
> > >
> > > I was measuring from the logs from JOB_START to JOB_END. I assumed
> > > the actualy
> > > job execution to be 0. The 7MB/s probably corresponds to Mihael's
> > > stage out
> > > results. the cat jobs dump to stdout (redirected to a file in the
> > > swift
> > > wrapper) probably shows the same behavior as the stageout.
> >
> > I'm becoming less surprised about 7MB/s in the local case. You have
> > to
> > multiply that by 6 to get the real disk I/O bandwidth:
> > 1. client reads from disk
> > 2. worker writes to disk
> > 3. cat reads from disk
> > 4. cat writes to disk
> > 5. worker reads from disk
> > 6. client writes to disk
> >
> > If it all happens on a single disk, then it adds up to about 42
> > MB/s,
> > which is a reasonable fraction of what a normal disk can do. It
> > would be
> > useful to do a dd from /dev/zero to see what the actual disk
> > performance
> > is.
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory