[Swift-devel] trunk coasters

Michael Wilde wilde at mcs.anl.gov
Wed Aug 3 10:16:34 CDT 2011


Correction: the simple local worker test below *is* failing in the same manner as the test to OSG sites.  A swift run again the service with a single local worker returns the same error as I reported earlier in this thread:

com$ swift -config cf.ps -tc.file tc -sites.file sites.grid-ps.xml catsn.swift -n=1
Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog modified locally)

RunID: 20110803-1013-2v63ui0g
Progress:  time: Wed, 03 Aug 2011 10:13:49 -0500
Find: http://localhost:41176
Find:  keepalive(120), reconnect - http://localhost:41176
Execution failed:
        Failed to set configuration: For input string: ""
com$ 

- Mike

----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Wednesday, August 3, 2011 10:12:41 AM
> Subject: Re: [Swift-devel] trunk coasters
> Im testing the persistent coaster setup that was failing as below, but
> instead of starting the workers on remote OSG sites Im starting a
> single worker locally on communicado, where the service is running.
> 
> This seems to fail in a different manner than the test to OSG sites.
> 
> I see this in the service log (swift.log file):
> 
> 2011-08-03 10:01:32,000-0500 INFO Settings Local contacts:
> [http://128.135.125.17:35852]
> 2011-08-03 10:01:32,014-0500 INFO CoasterService Started local
> service: http://128.135.125.17:35852
> 2011-08-03 10:01:32,014-0500 INFO CoasterService Started coaster
> service: http://128.135.125.17:41176
> 2011-08-03 10:05:50,884-0500 INFO
> AbstractStreamKarajanChannel$Multiplexer Multiplexer 0 started
> 2011-08-03 10:05:50,884-0500 INFO
> AbstractStreamKarajanChannel$Multiplexer (0) Scheduling SC-null for
> addition
> 2011-08-03 10:05:50,885-0500 INFO AbstractStreamKarajanChannel
> nullChannel started
> 2011-08-03 10:05:50,885-0500 INFO
> AbstractStreamKarajanChannel$Multiplexer Multiplexer 1 started
> 2011-08-03 10:05:50,909-0500 INFO LocalTCPService Received
> registration: blockid = twork, url = communicado.ci.uchicago.edu
> 2011-08-03 10:05:50,919-0500 INFO AbstractKarajanChannel MetaChannel:
> 700804192[1615734796: {}] -> null: Disabling heartbeats (conf\
> ig is null)
> 2011-08-03 10:05:50,920-0500 INFO MetaChannel MetaChannel:
> 700804192[1615734796: {}] -> null.bind -> SC-null
> 2011-08-03 10:05:50,922-0500 DEBUG Cpu workerStarted:
> twork:communicado.ci.uchicago.edu:0
> 2011-08-03 10:05:50,922-0500 DEBUG Cpu twork:0 pullLater
> 2011-08-03 10:05:50,924-0500 INFO Block Started CPU 0:1312383950s
> 2011-08-03 10:05:50,924-0500 INFO Block Started worker twork:000000
> 2011-08-03 10:05:50,924-0500 INFO Cpu twork:0 pull
> 2011-08-03 10:05:50,926-0500 WARN BlockQueueProcessor Failed to send
> worker status update to client
> java.lang.NullPointerException
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:434)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.j\
> ava:72)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143)
> at
> org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64)
> at
> org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57)
> at
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChanne\
> l.java:375)
> 2011-08-03 10:06:00,893-0500 INFO
> AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0
> 2011-08-03 10:06:00,971-0500 INFO PullThread runTime: 4, sleepTime:
> 10043
> 2011-08-03 10:06:10,904-0500 INFO
> AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0
> 2011-08-03 10:06:11,012-0500 INFO PullThread runTime: 1, sleepTime:
> 10040
> 2011-08-03 10:06:20,911-0500 INFO
> AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0
> 2011-08-03 10:06:21,050-0500 INFO PullThread runTime: 2, sleepTime:
> 10036
> etc...
> 
> === and this in the service's std out/err log:
> 
> Local contacts: [http://128.135.125.17:35852]
> Started local service: http://128.135.125.17:35852
> Started coaster service: http://128.135.125.17:41176
> Started coaster service: http://128.135.125.17:41176
> Multiplexer 0 started
> (0) Scheduling SC-null for addition
> nullChannel started
> Multiplexer 1 started
> Received registration: blockid = twork, url =
> communicado.ci.uchicago.edu
> MetaChannel: 700804192[1615734796: {}] -> null: Disabling heartbeats
> (config is null)
> MetaChannel: 700804192[1615734796: {}] -> null.bind -> SC-null
> Started CPU 0:1312383950s
> Started worker twork:000000
> twork:0 pull
> Failed to send worker status update to client
> java.lang.NullPointerException
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:434)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.j\
> ava:72)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143)
> at
> org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64)
> at
> org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57)
> at
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChanne\
> l.java:375)
> Avg stream buf: 0
> runTime: 4, sleepTime: 10043
> Avg stream buf: 0
> runTime: 1, sleepTime: 10040
> Avg stream buf: 0
> runTime: 2, sleepTime: 10036
> Sender 742510685 queue size: 1
> Avg stream buf: 0
> runTime: 1, sleepTime: 10042
> etc...
> 
> === and this in the worker log (log level DEBUG):
> 
> 1312383950.848 INFO - twork Logging started: Wed Aug 3 10:05:50 2011
> 1312383950.848 INFO - Running on node communicado.ci.uchicago.edu
> 1312383950.848 DEBUG - uri=http://communicado.ci.uchicago.edu:35852
> 1312383950.848 DEBUG - scheme=http
> 1312383950.848 DEBUG - host=communicado.ci.uchicago.edu
> 1312383950.848 DEBUG - port=35852
> 1312383950.848 DEBUG - blockid=twork
> 1312383950.848 INFO - Connecting (0)...
> 1312383950.848 DEBUG - Trying communicado.ci.uchicago.edu:35852...
> 1312383950.862 INFO - Connected
> 1312383950.862 DEBUG - Replies: {}
> 1312383950.862 DEBUG - OUT: len=8, tag=0, flags=0
> 1312383950.863 DEBUG - OUT: len=5, tag=0, flags=0
> 1312383950.863 DEBUG - OUT: len=27, tag=0, flags=0
> 1312383950.863 DEBUG - OUT: len=16, tag=0, flags=2
> 1312383950.863 DEBUG - done sending frags for 0
> 1312383950.931 DEBUG - Fin flag set
> 1312383950.931 INFO 000000 Registration successful. ID=000000
> 1312383980.863 DEBUG 000000 Replies: {}
> 1312383980.863 DEBUG 000000 OUT: len=9, tag=1, flags=2
> 1312383980.864 DEBUG 000000 done sending frags for 1
> 1312383980.868 DEBUG 000000 Fin flag set
> 1312383980.868 DEBUG 000000 Heartbeat acknowledged
> 1312383986.739 DEBUG 000000 New request (1)
> 1312383986.739 DEBUG 000000 Fin flag set
> 1312383986.739 DEBUG 000000 Processing request
> 1312383986.739 DEBUG 000000 Cmd is HEARTBEAT
> 1312383986.739 DEBUG 000000 OUT: len=2, tag=1, flags=3
> 1312383986.739 DEBUG 000000 done sending frags for 1
> etc...
> 
> 
> 
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Wednesday, August 3, 2011 9:08:54 AM
> > Subject: Re: [Swift-devel] trunk coasters
> > Last night (on current trunk) I was getting this:
> >
> > 2011-08-02 23:24:49,863-0500 DEBUG vdl:execute2
> > APPLICATION_EXCEPTION
> > jobid=cat-4q4lmvdk - Application exception: null
> > Caused by:
> > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Could not submit job
> > Caused by:
> > org.globus.cog.karajan.workflow.service.ProtocolException:
> > Failed to set configuration: For input string: ""
> > Caused by: org.globus.cog.karajan.workflow.service.RemoteException:
> > Failed to set configuration: For input string: ""
> > 2011-08-02 23:24:49,866-0500 INFO vdl:execute END_FAILURE
> > thread=0-3-0-1 tr=cat
> > 2011-08-02 23:24:49,868-0500 DEBUG VDL2ExecutionContext Exception in
> > cat:
> > Arguments: [data.txt]
> > Host: localhost
> > Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk
> > - - -
> >
> > Exception in cat:
> > Arguments: [data.txt]
> > Host: localhost
> > Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk
> > - - -
> >
> > Caused by: null
> > Caused by:
> > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Could not submit job
> > Caused by:
> > org.globus.cog.karajan.workflow.service.ProtocolException:
> > Failed to set configuration: For input string: ""
> > Caused by: org.globus.cog.karajan.workflow.service.RemoteException:
> > Failed to set configuration: For input string: ""
> > at
> > org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29)
> >
> > --- But that was a rather new configuration, so I need to do more
> > diagnosis.
> >
> > Why did you ask, Mihael - are you seeing problems too?
> >
> > I hope to work with Alberto this week to resume site config testing
> > with a focus on coaster configs.
> >
> > - Mike
> >
> >
> >
> >
> > ----- Original Message -----
> > > From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "Swift Devel"
> > > <swift-devel at ci.uchicago.edu>
> > > Sent: Wednesday, August 3, 2011 8:54:48 AM
> > > Subject: Re: [Swift-devel] trunk coasters
> > > I have been using automatic coasters and submitting to PADS. I
> > > haven't
> > > tried any large scale runs recently though.
> > > On Aug 3, 2011, at 8:53 AM, Michael Wilde wrote:
> > >
> > > > They were failing for me using persistent coasters to osg sites;
> > > > will
> > > > be testing further today and file bugs as needed.
> > > >
> > > > On 8/3/11, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > >> Has anybody ran a job trunk coasters recently?
> > > >>
> > > >> Mihael
> > > >>
> > > >> _______________________________________________
> > > >> Swift-devel mailing list
> > > >> Swift-devel at ci.uchicago.edu
> > > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >>
> > > >
> > > > --
> > > > Sent from my mobile device
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list