[Swift-devel] Re: Running the GO Swift prototype

Michael Wilde wilde at mcs.anl.gov
Wed Dec 15 18:42:06 CST 2010


One possibility for scaling:

- instead of calling task:execute(external.sh), have dostagein() put the request into the "ready/" queue/dir, just like external.sh does now, but direct from Karajan (ie, dont fork a process and shell).

- then wait on a future in a map of futures, with the map key being the request id.

- a listener reads the done/ queue periodically, posting the futures of the completed requests based on keys that remain associated with the requests in the queue

This approach does not replace a true data provider: its still just a prototype to learn more about how to do staging efficiently using external interfaces.

But its possible that if you know Karajan well the above logic is pretty easy, while writing a real data provider is more like a week of work or more, just to learn the mechanics. (And I think working out the interface logic to the external tool or service, as above, will help in building the provider).

- Mike

----- Original Message -----
> Hi Allan,
> 
> This is at least partial good news, and nice progress.
> 
> First step we can try on scaling (maybe easy, maybe not) is to cut
> down on external processes. I'll take a quick look and see if I can
> spot another strategy.
> 
> The obvious strategy would be to bite the bullet and move to a true
> provider, that can keep a huge number of requests pending without
> consuming heavy resources.
> 
> I'm fearful that external.sh needs to be synchronous, but maybe we can
> use a slightly different interface to separate the requests from the
> notifications.
> 
> - Mike
> 
> ----- Original Message -----
> > Hi Mike,
> >
> > I got the basic functionality working from your sample external.sh
> > scripts. I was able to synthesize a workload of 200 transfers. I'll
> > send you and raj about that in another email.
> >
> > The basic scripts starts to break at 3000 transfers. I set my number
> > of files to 10k and foreach.maxthread to 3000 . I guess at this
> > point, external.sh already create too much files at a time for the
> > ready queue to handle. The number of processes forked is probably
> > too
> > much as well. communicado is already crawling at this point. Even
> > though Swift already reported 3000 files staged in. The logs in
> > external.sh only reported 758 transfers initiated to globus online.
> >
> > A CDM external handler will probably blow-up in general as it will
> > fork a process / shellscript for each transfer. If foreach is set to
> > 10000, we can't scale.
> >
> > I guess a more scalable solution for swift is to make a native call
> > (karajan/java) to a queueing service (something like Stork in
> > condor)
> > for data transfer.
> >
> > -Allan
> >
> > 2010/12/13 Michael Wilde <wilde at mcs.anl.gov>:
> > > Allan, the code is on PADS login1 under /scratch and seems to
> > > work.
> > >
> > > You will need to look into the swift/src/trunk.gomods src tree to
> > > see what I changed in there. Some but perhaps not all the diffs in
> > > that tree are for supporting globus online.
> > >
> > > Let me know if you can replicate the example test below.
> > >
> > > Justin, it would be good if we can integrate the mods for this
> > > into
> > > trunk in some non-invasive way as way to share these tests, even
> > > if
> > > we do it as a separate vdl-int.GO.k that the user/experimenter
> > > needs
> > > to manually copy to vdl-int.k
> > >
> > > Or I guess we could put them in an experimental branch?
> > >
> > > - Mike
> > >
> > > ---
> > >
> > > 1 ) gorunner.sh >& gorunner.out
> > > 2 )
> > > PATH=/scratch/local/wilde/swift/src/trunk.gomods/cog/modules/swift/dist/swift-svn/bin:$PATH
> > > 3 ) swift -config cf -tc.file tc.data -sites.file sites.xml
> > > -cdm.file fs.ftponly gcat2.swift
> > > Thats it.
> > > In gorunner.out, should see:
> > > ...
> > > ./gorunner.sh: joblist is empty
> > > ./gorunner.sh: joblist is empty
> > > ./gorunner.sh: joblist is:
> > > cp-yuptfz2k.job.in
> > > ./gorunner.sh: started transfer task
> > > 4dfe903e-06f7-11e0-aa30-1231350018b1
> > > /home/wilde/swift/lab/go/gowaiter.sh: waiting on
> > > 4dfe903e-06f7-11e0-aa30-1231350018b1
> > > ./gorunner.sh: joblist is empty
> > > ./gorunner.sh: joblist is empty
> > > ./gorunner.sh: joblist is empty
> > > /home/wilde/swift/lab/go/gowaiter.sh:
> > > 4dfe903e-06f7-11e0-aa30-1231350018b1 has completed
> > > /home/wilde/swift/lab/go/gowaiter.sh: marked cp-yuptfz2k.job.in
> > > transferred
> > > ./gorunner.sh: joblist is empty
> > > ./gorunner.sh: joblist is empty
> > > ...
> > > On swift stdout/err should see:
> > > login1$ swift -config cf -tc.file tc.data -sites.file sites.xml
> > > -cdm.file fs.ftponly gcat2.swift
> > > CDM file: fs.ftponly
> > > Swift svn swift-r3707 (swift modified locally) cog-r2932 (cog
> > > modified locally)
> > >
> > > RunID: 20101213-1426-dff3my97
> > > Progress:
> > > /home/wilde/swift/lab/go/external.sh: running in
> > > /home/wilde/swift/lab/go
> > > /home/wilde/swift/lab/go/external.sh: running in
> > > /home/wilde/swift/lab/go
> > > Progress: Submitting:1
> > > in /home/wilde/swift/lab/go/cp.sh:
> > > wd=/home/wilde/swift/lab/go/work/gcat2-20101213-1426-dff3my97/jobs/y/cp-yuptfz2k
> > > arg1=etc/group arg2=output/plainoutput.txt
> > > in /home/wilde/swift/lab/go/cp.sh: rc=0
> > > Progress: Checking status:1
> > > Final status: Finished successfully:1
> > > login1$
> > > Thats it.
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list