[Swift-user] Re: understanding coaster service logs

Allan Espinosa aespinosa at cs.uchicago.edu
Tue Oct 19 12:44:25 CDT 2010


Btw, on the client/ submit host side, I get reply timeout exceptions
from the service:

2010-10-19 12:41:15,173-0500 WARN  Command Command(581490,
HEARTBEAT)fault was: Reply timeout
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
        at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280)
        at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)
2010-10-19 12:41:15,173-0500 INFO  Command Sending Command(581490,
HEARTBEAT) on MetaChannel: 715323878[834961640: {}] ->
GSSCChannel-https://communicado.ci.uchicago.edu:61999(1)[834961640:
{}]


2010/10/19 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> Hi,
>
> My coaster services are registering 1.2k CPUs at a time.  Yet my jobs
> does not seem to get sent to some workers:
>
> Progress:  Selecting site:2762  Submitted:580  Finished
> successfully:424 Failed but can retry:138
> Progress:  Selecting site:2762  Submitted:580  Finished
> successfully:424 Failed but can retry:138
>
> It maybe the case that my workers only has outbound connections and no inbound?
>
> Which coaster-service log entries should I look out for to know if (1)
> a job is received and (2) a job is dispatched to a worker (3) blockID
> the job was sent to?
>
> Does <Block-ID>: pull refer to a worker receiving a job?

-- 
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>



More information about the Swift-user mailing list