[Swift-user] Swift is stuck with 5K jobs

Michael Wilde wilde at mcs.anl.gov
Mon Mar 14 11:19:00 CDT 2011


Hi Andriy,

Can you post your sites.xml, tc, properties (if you are changing any) and swift command line?

It looks to me like perhaps your script and configuration is trying to use provider staging with coasters. Or perhaps the coaster data provider. Was that intended? (I say this because of the "PutFileCommand" method listed in the traceback. Mihael, is that for provider staging or coaster data provider?)

If so, can you try this with a local file provider, is using something like this in sites.xml:

    <filesystem provider="local"/>
    <workdirectory >/home/yourhomedir/swiftwork</workdirectory>

If thats already what you have, then Im not sure whats happening.

If you were trying one of the coaster-based data transfer methods, we need to dig deeper into whats failing but hopefully the local data provider will get you further for now.

- Mike


----- Original Message -----
> Thanks, Allan. Now I have a different exception:
> 
> class
> org.globus.cog.abstraction.impl.file.coaster.buffers.NIOChannelReadBuffer
> throws exception in doStuff. Fix it!
> java.lang.NullPointerException
> at
> org.globus.cog.abstraction.impl.file.coaster.commands.PutFileCommand.error(PutFileCommand.java:95)
> at
> org.globus.cog.abstraction.impl.file.coaster.buffers.ReadBuffer.error(ReadBuffer.java:79)
> at
> org.globus.cog.abstraction.impl.file.coaster.buffers.NIOChannelReadBuffer.doStuff(NIOChannelReadBuffer.java:42)
> at
> org.globus.cog.abstraction.impl.file.coaster.buffers.Buffers.run(Buffers.java:133)
> 
> 
> 
> On Mon, Mar 14, 2011 at 11:15, Allan Espinosa
> <aespinosa at cs.uchicago.edu> wrote:
> > Hello Andriy,
> >
> > The default package may have a small max heap limit. Usually, I
> > apply
> > this patch whenever I get a new version of Swift:
> >
> > --- old/bin/swift 2010-10-12 12:18:47.000000000 -0500
> > +++ new/bin/swift 2010-10-12 12:18:37.000000000 -0500
> > @@ -9,7 +9,7 @@
> >
> >  CYGWIN=
> >  CPDELIM=":"
> > -HEAPMAX=256M
> > +HEAPMAX=4096M
> >
> >  if echo `uname` | grep -i "cygwin"; then
> >   CYGWIN="yes"
> >
> >
> > Works well with 800K jobs.
> >
> > -Allan
> >
> > 2011/3/14 Andriy Fedorov <fedorov at bwh.harvard.edu>:
> >> Hi,
> >>
> >> I am using swift with coasters on NCSA Abe. I use binary build of
> >> swift 0.92. My script should generate about 5K individual jobs.
> >> When I
> >> try to run it, I have
> >>
> >> Swift svn swift-r4157 cog-r3056
> >>
> >> RunID: 20110314-0951-f3c45zja
> >> Progress:
> >> Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap
> >> space
> >>
> >> Exception in thread "SIGINT handler"
> >> Exception in thread "SIGINT handler" Exception in thread "SIGTERM
> >> handler"
> >>
> >> After this error, I am not able to terminate the script, and no
> >> jobs
> >> get scheduled to pbs apparently.
> >>
> >> Am I hitting some limit? Is 5K jobs too much?
> >>
> >> How do I terminate swift now not to waste cycles of the head node?
> >>
> >> Thanks
> >> --
> >> Andriy Fedorov, Ph.D.
> >>
> >> Research Fellow
> >> Brigham and Women's Hospital
> >> Harvard Medical School
> >> 75 Francis Street
> >> Boston, MA 02115 USA
> >> fedorov at bwh.harvard.edu
> >> (617) 525-6258 (office)
> >> _______________________________________________
> >> Swift-user mailing list
> >> Swift-user at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >>
> >>
> >
> >
> >
> > --
> > Allan M. Espinosa <http://amespinosa.wordpress.com>
> > PhD student, Computer Science
> > University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
> >
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list