[Swift-devel] Coaster provider staging data xfer problem

Michael Wilde wilde at mcs.anl.gov
Sun Oct 3 21:20:05 CDT 2010


Interesting: at 5000 jobs, the run completes normally; at 10,000 it fails again, as before. (I adjusted the script slightly to give 10,000 jobs instead of 9999)

- Mike


----- "Michael Wilde" <wilde at mcs.anl.gov> wrote:

> I just re-ran what I thought was my failing test, and it ran OK, but
> failed strangely in the swift cleanup process.
> 
> Localhost coasters; provider staging.
> 
> This run is at the moment on communicado in /tmp/wilde/run01 (Im
> running in /tmp due to the nature of the IO)
> 
> The test is running 9999 cat jobs; one file in and one out per job. 
> The file sizes are on order of <500KB each (random sizes).
> 
> all 9999 files were produced, but then I got a lot of unlink messages
> and some strange exit code 11 error.
> 
> The messages are in swift.stdouterr
> 
> The script eas executed using ./run.sh; tc and sites file are in that
> run01 dir.
> 
> This is worth looking at but low prio I think. I think the script
> terminated cleanly on smaller runs (-n=5, -n=100). So perhaps provider
> staging gets confused or has sync/mutex problems related to cleanup
> that occur at larger volumes of file copies???
> 
> At any rate, this was *not* the error that I was referring to in the
> message below; in that test, staging died in the middle of a run.  I
> will also try to test between two hosts.
> 
> - Mike
> 
> 
> ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> 
> > On Sat, 2010-10-02 at 17:51 -0600, wilde at mcs.anl.gov wrote:
> > > ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> > > > 
> > > > Ok. I'll look at that. Just to be clear, you are talking about
> > > > gridftp=coaster rather than use.provider.staging, right?
> > > 
> > > No, I dont *think* so!
> > 
> > Ok.
> > 
> > > 
> > > What I meant above was provider staging via the coaster execution
> > > provider, which is the only coaster based data transport
> technique
> > I
> > > knew of.
> > > 
> > > I'll try to replicate my test and send it.
> > 
> > Ok. I tried 1024 jobs, 8 concurrent, 7MB files and I can't
> reproduce
> > it,
> > so it may not be straightforward.
> > 
> > > 
> > > I didnt know there was such a thing as gridftp=coaster!
> > > 
> > > Would that be done by saying <filesystem provider="coaster"> ?
> > 
> > Yes.
> > 
> > > I didnt know you could say either of those. Can you explain what
> > that
> > > would do and how to say it? Is it a different data provider path
> > than
> > > provider staging, but which still uses coasters?  Independent of
> > > coaster execution? (I might be way off base here, sorry!)
> > 
> > Yes, and yes. Though I recommend provider staging.
> > 
> > Mihael
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list