[Swift-devel] trunk coasters

Michael Wilde wilde at mcs.anl.gov
Wed Aug 3 10:12:41 CDT 2011


Im testing the persistent coaster setup that was failing as below, but instead of starting the workers on remote OSG sites Im starting a single worker locally on communicado, where the service is running.

This seems to fail in a different manner than the test to OSG sites.

I see this in the service log (swift.log file):

2011-08-03 10:01:32,000-0500 INFO  Settings Local contacts: [http://128.135.125.17:35852]
2011-08-03 10:01:32,014-0500 INFO  CoasterService Started local service: http://128.135.125.17:35852
2011-08-03 10:01:32,014-0500 INFO  CoasterService Started coaster service: http://128.135.125.17:41176
2011-08-03 10:05:50,884-0500 INFO  AbstractStreamKarajanChannel$Multiplexer Multiplexer 0 started
2011-08-03 10:05:50,884-0500 INFO  AbstractStreamKarajanChannel$Multiplexer (0) Scheduling SC-null for addition
2011-08-03 10:05:50,885-0500 INFO  AbstractStreamKarajanChannel nullChannel started
2011-08-03 10:05:50,885-0500 INFO  AbstractStreamKarajanChannel$Multiplexer Multiplexer 1 started
2011-08-03 10:05:50,909-0500 INFO  LocalTCPService Received registration: blockid = twork, url = communicado.ci.uchicago.edu
2011-08-03 10:05:50,919-0500 INFO  AbstractKarajanChannel MetaChannel: 700804192[1615734796: {}] -> null: Disabling heartbeats (conf\
ig is null)
2011-08-03 10:05:50,920-0500 INFO  MetaChannel MetaChannel: 700804192[1615734796: {}] -> null.bind -> SC-null
2011-08-03 10:05:50,922-0500 DEBUG Cpu workerStarted: twork:communicado.ci.uchicago.edu:0
2011-08-03 10:05:50,922-0500 DEBUG Cpu twork:0 pullLater
2011-08-03 10:05:50,924-0500 INFO  Block Started CPU 0:1312383950s
2011-08-03 10:05:50,924-0500 INFO  Block Started worker twork:000000
2011-08-03 10:05:50,924-0500 INFO  Cpu twork:0 pull
2011-08-03 10:05:50,926-0500 WARN  BlockQueueProcessor Failed to send worker status update to client
java.lang.NullPointerException
        at org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:434)
        at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227)
        at org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.j\
ava:72)
        at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143)
        at org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64)
        at org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57)
        at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84)
        at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416)
        at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157)
        at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChanne\
l.java:375)
2011-08-03 10:06:00,893-0500 INFO  AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0
2011-08-03 10:06:00,971-0500 INFO  PullThread runTime: 4, sleepTime: 10043
2011-08-03 10:06:10,904-0500 INFO  AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0
2011-08-03 10:06:11,012-0500 INFO  PullThread runTime: 1, sleepTime: 10040
2011-08-03 10:06:20,911-0500 INFO  AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0
2011-08-03 10:06:21,050-0500 INFO  PullThread runTime: 2, sleepTime: 10036
etc...

=== and this in the service's std out/err log:

Local contacts: [http://128.135.125.17:35852]
Started local service: http://128.135.125.17:35852
Started coaster service: http://128.135.125.17:41176
Started coaster service: http://128.135.125.17:41176
Multiplexer 0 started
(0) Scheduling SC-null for addition
nullChannel started
Multiplexer 1 started
Received registration: blockid = twork, url = communicado.ci.uchicago.edu
MetaChannel: 700804192[1615734796: {}] -> null: Disabling heartbeats (config is null)
MetaChannel: 700804192[1615734796: {}] -> null.bind -> SC-null
Started CPU 0:1312383950s
Started worker twork:000000
twork:0 pull
Failed to send worker status update to client
java.lang.NullPointerException
        at org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:434)
        at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227)
        at org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.j\
ava:72)
        at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143)
        at org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64)
        at org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57)
        at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84)
        at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416)
        at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157)
        at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChanne\
l.java:375)
Avg stream buf: 0
runTime: 4, sleepTime: 10043
Avg stream buf: 0
runTime: 1, sleepTime: 10040
Avg stream buf: 0
runTime: 2, sleepTime: 10036
Sender 742510685 queue size: 1
Avg stream buf: 0
runTime: 1, sleepTime: 10042
etc...

=== and this in the worker log (log level DEBUG):

1312383950.848 INFO  - twork Logging started: Wed Aug  3 10:05:50 2011
1312383950.848 INFO  - Running on node communicado.ci.uchicago.edu
1312383950.848 DEBUG - uri=http://communicado.ci.uchicago.edu:35852
1312383950.848 DEBUG - scheme=http
1312383950.848 DEBUG - host=communicado.ci.uchicago.edu
1312383950.848 DEBUG - port=35852
1312383950.848 DEBUG - blockid=twork
1312383950.848 INFO  - Connecting (0)...
1312383950.848 DEBUG - Trying communicado.ci.uchicago.edu:35852...
1312383950.862 INFO  - Connected
1312383950.862 DEBUG - Replies: {}
1312383950.862 DEBUG - OUT: len=8, tag=0, flags=0
1312383950.863 DEBUG - OUT: len=5, tag=0, flags=0
1312383950.863 DEBUG - OUT: len=27, tag=0, flags=0
1312383950.863 DEBUG - OUT: len=16, tag=0, flags=2
1312383950.863 DEBUG - done sending frags for 0
1312383950.931 DEBUG - Fin flag set
1312383950.931 INFO  000000 Registration successful. ID=000000
1312383980.863 DEBUG 000000 Replies: {}
1312383980.863 DEBUG 000000 OUT: len=9, tag=1, flags=2
1312383980.864 DEBUG 000000 done sending frags for 1
1312383980.868 DEBUG 000000 Fin flag set
1312383980.868 DEBUG 000000 Heartbeat acknowledged
1312383986.739 DEBUG 000000 New request (1)
1312383986.739 DEBUG 000000 Fin flag set
1312383986.739 DEBUG 000000 Processing request
1312383986.739 DEBUG 000000 Cmd is HEARTBEAT
1312383986.739 DEBUG 000000 OUT: len=2, tag=1, flags=3
1312383986.739 DEBUG 000000 done sending frags for 1
etc...



----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Wednesday, August 3, 2011 9:08:54 AM
> Subject: Re: [Swift-devel] trunk coasters
> Last night (on current trunk) I was getting this:
> 
> 2011-08-02 23:24:49,863-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION
> jobid=cat-4q4lmvdk - Application exception: null
> Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could not submit job
> Caused by: org.globus.cog.karajan.workflow.service.ProtocolException:
> Failed to set configuration: For input string: ""
> Caused by: org.globus.cog.karajan.workflow.service.RemoteException:
> Failed to set configuration: For input string: ""
> 2011-08-02 23:24:49,866-0500 INFO vdl:execute END_FAILURE
> thread=0-3-0-1 tr=cat
> 2011-08-02 23:24:49,868-0500 DEBUG VDL2ExecutionContext Exception in
> cat:
> Arguments: [data.txt]
> Host: localhost
> Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk
> - - -
> 
> Exception in cat:
> Arguments: [data.txt]
> Host: localhost
> Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk
> - - -
> 
> Caused by: null
> Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could not submit job
> Caused by: org.globus.cog.karajan.workflow.service.ProtocolException:
> Failed to set configuration: For input string: ""
> Caused by: org.globus.cog.karajan.workflow.service.RemoteException:
> Failed to set configuration: For input string: ""
> at
> org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29)
> 
> --- But that was a rather new configuration, so I need to do more
> diagnosis.
> 
> Why did you ask, Mihael - are you seeing problems too?
> 
> I hope to work with Alberto this week to resume site config testing
> with a focus on coaster configs.
> 
> - Mike
> 
> 
> 
> 
> ----- Original Message -----
> > From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "Swift Devel"
> > <swift-devel at ci.uchicago.edu>
> > Sent: Wednesday, August 3, 2011 8:54:48 AM
> > Subject: Re: [Swift-devel] trunk coasters
> > I have been using automatic coasters and submitting to PADS. I
> > haven't
> > tried any large scale runs recently though.
> > On Aug 3, 2011, at 8:53 AM, Michael Wilde wrote:
> >
> > > They were failing for me using persistent coasters to osg sites;
> > > will
> > > be testing further today and file bugs as needed.
> > >
> > > On 8/3/11, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > >> Has anybody ran a job trunk coasters recently?
> > >>
> > >> Mihael
> > >>
> > >> _______________________________________________
> > >> Swift-devel mailing list
> > >> Swift-devel at ci.uchicago.edu
> > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >>
> > >
> > > --
> > > Sent from my mobile device
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list