[Swift-user] Re: understanding coaster service logs
Allan Espinosa
aespinosa at cs.uchicago.edu
Tue Oct 19 12:44:25 CDT 2010
Btw, on the client/ submit host side, I get reply timeout exceptions
from the service:
2010-10-19 12:41:15,173-0500 WARN Command Command(581490,
HEARTBEAT)fault was: Reply timeout
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280)
at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
2010-10-19 12:41:15,173-0500 INFO Command Sending Command(581490,
HEARTBEAT) on MetaChannel: 715323878[834961640: {}] ->
GSSCChannel-https://communicado.ci.uchicago.edu:61999(1)[834961640:
{}]
2010/10/19 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> Hi,
>
> My coaster services are registering 1.2k CPUs at a time. Yet my jobs
> does not seem to get sent to some workers:
>
> Progress: Selecting site:2762 Submitted:580 Finished
> successfully:424 Failed but can retry:138
> Progress: Selecting site:2762 Submitted:580 Finished
> successfully:424 Failed but can retry:138
>
> It maybe the case that my workers only has outbound connections and no inbound?
>
> Which coaster-service log entries should I look out for to know if (1)
> a job is received and (2) a job is dispatched to a worker (3) blockID
> the job was sent to?
>
> Does <Block-ID>: pull refer to a worker receiving a job?
--
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
More information about the Swift-user
mailing list