[Swift-devel] Coaster socket issue

Michael Wilde wilde at mcs.anl.gov
Wed Mar 28 21:21:18 CDT 2012


Now that I think about it, I suspect the pipes may be from Swift running various commands, like qsub/qstat from the localscheduler provider, and/or app() calls from the local execution provider. I dint know if we ever paid much attention whether these were all getting cleaned up.

- Mike

----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Cc: "swift-devel at ci.uchicago.edu Devel" <swift-devel at ci.uchicago.edu>
> Sent: Wednesday, March 28, 2012 9:10:38 PM
> Subject: Re: [Swift-devel] Coaster socket issue
> I think that on Jon's Beagle runs we say about 100 pipes but several
> thousand sockets, so we didnt pay any attention to the pipes (yet).
> 
> The sockets were clearly from workers to the coaster service.
> 
> I have no idea yet what the pipes are. ls -l of /proc/fd/ does a nice
> job of trying to identify and format the file name or object
> associated with each file descriptor. I suspect its doing the same
> thing lsof does.
> 
> - Mike
> 
> ----- Original Message -----
> > From: "David Kelly" <davidk at ci.uchicago.edu>
> > To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > Cc: "swift-devel at ci.uchicago.edu Devel"
> > <swift-devel at ci.uchicago.edu>
> > Sent: Wednesday, March 28, 2012 8:49:21 PM
> > Subject: Re: [Swift-devel] Coaster socket issue
> > Strange, I just ran into a similar issues tonight while running on
> > ibicluster (SGE). I saw the "too many open files" error after
> > sitting
> > in the queue waiting for a job to start. I restarted the job and
> > then
> > periodically ran 'lsof' to see the number of java pipes increasing
> > over time. I thought at first this might be SGE specific, but
> > perhaps
> > it is something else. (This was with 0.93)
> >
> > ----- Original Message -----
> > > From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > > To: "swift-devel at ci.uchicago.edu Devel"
> > > <swift-devel at ci.uchicago.edu>
> > > Sent: Wednesday, March 28, 2012 8:30:52 PM
> > > Subject: [Swift-devel] Coaster socket issue
> > > Hello,
> > > In running the SciColSim app on raven(which is a cluster similar
> > > to
> > > Beagle) I noticed that the app hung. It was not hung where the
> > > hang
> > > checker kicked in but Swift was waiting for jobs to be active but
> > > there was none submitted to PBS. I took a look at the log file and
> > > noticed that I had a java.io.IOException thrown for "too many open
> > > files". Since I killed it I couldn't probe the run but I had the
> > > same
> > > run running on Beagle. Upon Mike's suggestion I took a look at the
> > > /proc/<pid>/fd directory. There were over 2000 sockets in the
> > > CLOSE_WAIT state with a single message in the receive queue. Raven
> > > has
> > > a limit of 1024 open files at a time while Beagle has a limit
> > > around
> > > 60K number of files open. I got this limit using ulimit -n.
> > >
> > > So my question is, why is there so many sockets waiting to be
> > > closed?
> > > I did some reading about the CLOSE_WAIT state and it seems this
> > > happens when one of the ends closes there socket but the other
> > > does
> > > not. Is Coaster not closing the socket when a worker shuts down?
> > > What
> > > other information should I be looking for to help debug the issue.
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list