[Swift-devel] coaster io with NIO.

Michael Wilde wilde at mcs.anl.gov
Tue Apr 10 21:07:08 CDT 2012


Mihael, while the scenario below seems plausible, I thought that the timeout problem was first detected on OSG nodes, which should have been running with jobsPerNode=1.

David, Ketan, can you comment on the jobsPerNode settings for the many tests you have done which encountered this problem?

- Mike

----- Original Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Tuesday, April 10, 2012 7:04:56 PM
> Subject: Re: [Swift-devel] coaster io with NIO.
> On Tue, 2012-04-10 at 17:25 -0500, David Kelly wrote:
> > Yep, I gave it a try with automatic coasters, but am still seeing
> > the timeouts.
> >
> 
> I think I see the problem. With multiple jobs per worker the situation
> may such be that both a stagein and a stageout happen at the same time
> (on the same TCP connection). If the stageout runs out of buffers the
> writing to the socket on the worker side blocks causing the read loop
> to
> not happen. This eventually fills the other direction on the TCP link
> and everything deadlocks.
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list