[Swift-user] Swift is stuck with 5K jobs

Andriy Fedorov fedorov at bwh.harvard.edu
Mon Mar 14 12:43:08 CDT 2011


Hi all,

Thank you for your help! I indeed had coaster data provider. After
reporting the original problem, I switched back to an earlier version
of swift, and I observed some very strange errors -- the input files
arrived corrupted. Once I changed the data provider to local, the data
corruption problem seems to have disappeared. I have not tried yet to
use 0.92 again, will let you know when I do if I run into problems.

Sorry I cannot set aside time to debug this, trying to get the actual
work done ...

AF



On Mon, Mar 14, 2011 at 12:19, Michael Wilde <wilde at mcs.anl.gov> wrote:
> Hi Andriy,
>
> Can you post your sites.xml, tc, properties (if you are changing any) and swift command line?
>
> It looks to me like perhaps your script and configuration is trying to use provider staging with coasters. Or perhaps the coaster data provider. Was that intended? (I say this because of the "PutFileCommand" method listed in the traceback. Mihael, is that for provider staging or coaster data provider?)
>
> If so, can you try this with a local file provider, is using something like this in sites.xml:
>
>    <filesystem provider="local"/>
>    <workdirectory >/home/yourhomedir/swiftwork</workdirectory>
>
> If thats already what you have, then Im not sure whats happening.
>
> If you were trying one of the coaster-based data transfer methods, we need to dig deeper into whats failing but hopefully the local data provider will get you further for now.
>
> - Mike
>
>
> ----- Original Message -----
>> Thanks, Allan. Now I have a different exception:
>>
>> class
>> org.globus.cog.abstraction.impl.file.coaster.buffers.NIOChannelReadBuffer
>> throws exception in doStuff. Fix it!
>> java.lang.NullPointerException
>> at
>> org.globus.cog.abstraction.impl.file.coaster.commands.PutFileCommand.error(PutFileCommand.java:95)
>> at
>> org.globus.cog.abstraction.impl.file.coaster.buffers.ReadBuffer.error(ReadBuffer.java:79)
>> at
>> org.globus.cog.abstraction.impl.file.coaster.buffers.NIOChannelReadBuffer.doStuff(NIOChannelReadBuffer.java:42)
>> at
>> org.globus.cog.abstraction.impl.file.coaster.buffers.Buffers.run(Buffers.java:133)
>>
>>
>>
>> On Mon, Mar 14, 2011 at 11:15, Allan Espinosa
>> <aespinosa at cs.uchicago.edu> wrote:
>> > Hello Andriy,
>> >
>> > The default package may have a small max heap limit. Usually, I
>> > apply
>> > this patch whenever I get a new version of Swift:
>> >
>> > --- old/bin/swift 2010-10-12 12:18:47.000000000 -0500
>> > +++ new/bin/swift 2010-10-12 12:18:37.000000000 -0500
>> > @@ -9,7 +9,7 @@
>> >
>> >  CYGWIN=
>> >  CPDELIM=":"
>> > -HEAPMAX=256M
>> > +HEAPMAX=4096M
>> >
>> >  if echo `uname` | grep -i "cygwin"; then
>> >   CYGWIN="yes"
>> >
>> >
>> > Works well with 800K jobs.
>> >
>> > -Allan
>> >
>> > 2011/3/14 Andriy Fedorov <fedorov at bwh.harvard.edu>:
>> >> Hi,
>> >>
>> >> I am using swift with coasters on NCSA Abe. I use binary build of
>> >> swift 0.92. My script should generate about 5K individual jobs.
>> >> When I
>> >> try to run it, I have
>> >>
>> >> Swift svn swift-r4157 cog-r3056
>> >>
>> >> RunID: 20110314-0951-f3c45zja
>> >> Progress:
>> >> Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap
>> >> space
>> >>
>> >> Exception in thread "SIGINT handler"
>> >> Exception in thread "SIGINT handler" Exception in thread "SIGTERM
>> >> handler"
>> >>
>> >> After this error, I am not able to terminate the script, and no
>> >> jobs
>> >> get scheduled to pbs apparently.
>> >>
>> >> Am I hitting some limit? Is 5K jobs too much?
>> >>
>> >> How do I terminate swift now not to waste cycles of the head node?
>> >>
>> >> Thanks
>> >> --
>> >> Andriy Fedorov, Ph.D.
>> >>
>> >> Research Fellow
>> >> Brigham and Women's Hospital
>> >> Harvard Medical School
>> >> 75 Francis Street
>> >> Boston, MA 02115 USA
>> >> fedorov at bwh.harvard.edu
>> >> (617) 525-6258 (office)
>> >> _______________________________________________
>> >> Swift-user mailing list
>> >> Swift-user at ci.uchicago.edu
>> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Allan M. Espinosa <http://amespinosa.wordpress.com>
>> > PhD student, Computer Science
>> > University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
>> >
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>



More information about the Swift-user mailing list