[Swift-devel] coaster io with NIO.

David Kelly davidk at ci.uchicago.edu
Wed Apr 11 01:00:36 CDT 2012


I just ran a test on OSG similar to what Ketan described in the initial entry for ticket #690:

Submit host: communicado using data on GPFS
100 nodes using Condor GlideinWMS
500 jobs
10MB data files

The test completed without errors.

David

----- Original Message -----
> From: "David Kelly" <davidk at ci.uchicago.edu>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Tuesday, April 10, 2012 11:33:20 PM
> Subject: Re: [Swift-devel] coaster io with NIO.
> Since the latest update which fixes coaster-service, I have tested
> with two configurations:
> 
> 1 machine only, 4 jobs per node, 100 200MB files (ran twice, passed
> twice)
> 2 MCS machines - swift and coaster-service running on one machine, 1
> worker, 4 jobs per node, 500 20MB files (also ran twice, passed twice)
> 
> These tests were failing pretty consistently yesterday. I am not
> positive it is completely fixed yet, but things have definitely
> improved.
> 
> I have never been able to reproduce provider staging problems using
> jobs per node set of 1. It was only when I got to a value of 4 that I
> started seeing issues.
> 
> I will write a test tonight that runs on OSG and let you know what
> happens.
> 
> David
> 
> 
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Tuesday, April 10, 2012 10:51:05 PM
> > Subject: Re: [Swift-devel] coaster io with NIO.
> > Thanks, Ketan. David, can you try to reproduce the problem with
> > jobsPerNode=1?
> >
> > - Mike
> >
> > ----- Original Message -----
> > > From: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > Sent: Tuesday, April 10, 2012 9:31:34 PM
> > > Subject: Re: [Swift-devel] coaster io with NIO.
> > > Jobspernode setting were indeed 1 on the tests done on osg.
> > >
> > >
> > > I do not recall seeing the blocking messages seen by David's
> > > current/recent tests.
> > >
> > >
> > > On Tuesday, April 10, 2012, Michael Wilde wrote:
> > >
> > >
> > > Mihael, while the scenario below seems plausible, I thought that
> > > the
> > > timeout problem was first detected on OSG nodes, which should have
> > > been running with jobsPerNode=1.
> > >
> > > David, Ketan, can you comment on the jobsPerNode settings for the
> > > many
> > > tests you have done which encountered this problem?
> > >
> > > - Mike
> > >
> > > ----- Original Message -----
> > > > From: "Mihael Hategan" < hategan at mcs.anl.gov >
> > > > To: "David Kelly" < davidk at ci.uchicago.edu >
> > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >
> > > > Sent: Tuesday, April 10, 2012 7:04:56 PM
> > > > Subject: Re: [Swift-devel] coaster io with NIO.
> > > > On Tue, 2012-04-10 at 17:25 -0500, David Kelly wrote:
> > > > > Yep, I gave it a try with automatic coasters, but am still
> > > > > seeing
> > > > > the timeouts.
> > > > >
> > > >
> > > > I think I see the problem. With multiple jobs per worker the
> > > > situation
> > > > may such be that both a stagein and a stageout happen at the
> > > > same
> > > > time
> > > > (on the same TCP connection). If the stageout runs out of
> > > > buffers
> > > > the
> > > > writing to the socket on the worker side blocks causing the read
> > > > loop
> > > > to
> > > > not happen. This eventually fills the other direction on the TCP
> > > > link
> > > > and everything deadlocks.
> > > >
> > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > > --
> > > Michael Wilde
> > > Computation Institute, University of Chicago
> > > Mathematics and Computer Science Division
> > > Argonne National Laboratory
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > >
> > > --
> > > Ketan
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel



More information about the Swift-devel mailing list