[Swift-devel] execution.retries
Mihael Hategan
hategan at mcs.anl.gov
Wed Jun 11 15:43:29 CDT 2008
On Wed, 2008-06-11 at 14:57 -0500, lixi at uchicago.edu wrote:
> >Look how many lines there are in the log like this:
> >
> >2008-06-10 10:48:03,137-0500 INFO vdl:initshareddir START
> >host=OSG_LIGO_MIT
> >
> >followed closely by:
> >
> >2008-06-10 10:48:03,196-0500 DEBUG TaskImpl Task
> (type=FILE_OPERATION,
> >identity=u
> >rn:0-1-701-1-1213112750531) setting status to Failed
> >org.globus.cog.abstraction.
> >impl.file.IrrecoverableResourceException: Error
> communicating with the
> >GridFTP server
>
> I'm sorry to turn to the old question and ask again. But I'm
> still confused about it. Both of you said that these lines
> mean different retries. However among these lines in the log
> file, there is no site selection action which is represented
> by "WeightedHostScoreScheduler Sorted".
Hmm?
2008-06-10 10:47:27,429-0500 DEBUG WeightedHostScoreScheduler Releasing
contact 7
2008-06-10 10:47:27,430-0500 INFO WeightedHostScoreScheduler Sorted:
[OSG_LIGO_MIT:21.822(51.667):2/4]
2008-06-10 10:47:27,430-0500 DEBUG WeightedHostScoreScheduler Rand:
15.78147400479908, sum: 100.07797485652034
2008-06-10 10:47:27,431-0500 DEBUG WeightedHostScoreScheduler Next
contact: OSG_LIGO_MIT:21.822(51.667):2/4
That seems to be your only contact. Running
cat /home/lixi/newswift/latest/score/3500/workflowtest-20080610-1045-58kc7p6f.log|grep "Next contact: OSG_LIGO_MIT"|wc
produces: 4376 30632 469760
So there's 4376 site selections there.
If you remove the |wc you can see the evolution of the score.
> Then I wonder what
> would be included in one try. One try just means trying to
> do the same operation to the same site or selecting next site
> (may be another one or the same one) to do the same
> operation. In log file, what kind of expression implies the
> beginning or end of one try?
You only seem to have one site there. Re-trying means full re-scheduling
(so maybe another site if there is one).
There isn't much marking the start of a try besides the scheduler
allocating a site. The successful end of a try is represented by
"JOB_END". Failed -> "APPLICATION_EXCEPTION".
>
> Thanks,
>
> Xi
More information about the Swift-devel
mailing list