[Swift-devel] coaster error on Midway

Michael Wilde wilde at mcs.anl.gov
Fri Oct 19 18:04:49 CDT 2012


Indeed, when I used latest trunk, the error has not recurred, and Ive sone several 1000-job runs of the EpiSnp app.

But that means that the trunk version in Midway's default swift module has this bug.

Lets talk on Monday about how to make and test an 0.94 release, which I would define simply as "a trustable trunk snapshot", and then to get that out to all systems and users.

- Mike




----- Original Message -----
> From: "David Kelly" <davidk at ci.uchicago.edu>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "swift-devel" <swift-devel at ci.uchicago.edu>, "Mihael Hategan" <hategan at mcs.anl.gov>
> Sent: Friday, October 19, 2012 5:59:56 PM
> Subject: Re: coaster error on Midway
> I remember something like this happening in one of the versions of
> trunk - I believe it has been fixed in the latest version.
> 
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "David Kelly" <davidk at ci.uchicago.edu>, "Mihael Hategan"
> > <hategan at mcs.anl.gov>
> > Cc: "swift-devel" <swift-devel at ci.uchicago.edu>
> > Sent: Friday, October 19, 2012 3:10:01 PM
> > Subject: coaster error on Midway
> > Im getting the error below on Midway (running the swift from module
> > load swift)
> >
> > Is this a know issue, with a known fix?
> >
> > I'll try latest trunk next.
> >
> > This happened both without and with provider staging.
> >
> > I will post bug with log files if it recurs with latest trunk.
> >
> > - Mike
> >
> > Swift trunk swift-r5939 cog-r3472
> >
> > RunID: 20121019-2005-yptzr8l6
> > Progress: time: Fri, 19 Oct 2012 20:05:54 +0000
> > Progress: time: Fri, 19 Oct 2012 20:05:56 +0000 Stage in:1
> > Submitted:99
> > Progress: time: Fri, 19 Oct 2012 20:05:57 +0000 Stage in:33
> > Submitted:67
> > Progress: time: Fri, 19 Oct 2012 20:05:58 +0000 Stage in:56
> > Submitted:36 Active:8
> > Progress: time: Fri, 19 Oct 2012 20:05:59 +0000 Stage in:65
> > Submitted:3 Active:32
> > Progress: time: Fri, 19 Oct 2012 20:06:00 +0000 Stage in:51
> > Active:49
> > Progress: time: Fri, 19 Oct 2012 20:06:08 +0000 Active:99 Stage
> > out:1
> > Progress: time: Fri, 19 Oct 2012 20:06:24 +0000 Active:96 Finished
> > successfully:4
> > Progress: time: Fri, 19 Oct 2012 20:06:28 +0000 Active:95 Stage
> > out:1
> > Finished successfully:4
> > Progress: time: Fri, 19 Oct 2012 20:06:29 +0000 Active:74 Stage
> > out:12
> > Finished successfully:14
> > Progress: time: Fri, 19 Oct 2012 20:06:30 +0000 Active:52 Stage
> > out:3
> > Finished successfully:45
> > Progress: time: Fri, 19 Oct 2012 20:06:31 +0000 Active:25 Stage
> > out:24
> > Finished successfully:51
> > Exception caught while processing reply
> > java.lang.IllegalArgumentException: Wrong data size: 4. Data was
> > @].z
> > at
> > org.globus.cog.karajan.workflow.service.RequestReply.unpackLong(RequestReply.java:237)
> > at
> > org.globus.cog.karajan.workflow.service.RequestReply.getInDataAsLong(RequestReply.java:232)
> > at
> > org.globus.cog.karajan.workflow.service.commands.HeartBeatCommand.replyReceived(HeartBeatCommand.java:40)
> > at
> > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleReply(AbstractKarajanChannel.java:401)
> > at
> > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.stepNIO(AbstractStreamKarajanChannel.java:234)
> > at
> > org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97)
> > at
> > org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.run(NIOMultiplexer.java:56)
> > Exception caught while processing reply
> > java.lang.IllegalArgumentException: Wrong data size: 4. Data was
> > @].z
> > at
> > org.globus.cog.karajan.workflow.service.RequestReply.unpackLong(RequestReply.java:237)
> > at
> > org.globus.cog.karajan.workflow.service.RequestReply.getInDataAsLong(RequestReply.java:232)
> > at
> > org.globus.cog.karajan.workflow.service.commands.HeartBeatCommand.replyReceived(HeartBeatCommand.java:40)
> > at
> > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleReply(AbstractKarajanChannel.java:401)
> > at
> > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.stepNIO(AbstractStreamKarajanChannel.java:234)
> > at
> > org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97)
> > at
> > org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.run(NIOMultiplexer.java:56)
> > Exception caught while processing reply
> > java.lang.IllegalArgumentException: Wrong data size: 4. Data was
> > @].z
> > at
> > org.globus.cog.karajan.workflow.service.RequestReply.unpackLong(RequestReply.java:237)
> > at
> > org.globus.cog.karajan.workflow.service.RequestReply.getInDataAsLong(RequestReply.java:232)
> > at
> > org.globus.cog.karajan.workflow.service.commands.HeartBeatCommand.replyReceived(HeartBeatCommand.java:40)
> > at
> > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleReply(AbstractKarajanChannel.java:401)
> > at
> > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.stepNIO(AbstractStreamKarajanChannel.java:234)
> > at
> > org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97)
> > at
> > org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.run(NIOMultiplexer.java:56)
> > Progress: time: Fri, 19 Oct 2012 20:06:33 +0000 Stage out:10
> > Finished
> > successfully:90
> > Final status: Fri, 19 Oct 2012 20:06:33 +0000 Finished
> > successfully:100
> > mid$
> >
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list