[Swift-devel] execution.retries

Mihael Hategan hategan at mcs.anl.gov
Wed Jun 11 15:43:29 CDT 2008


On Wed, 2008-06-11 at 14:57 -0500, lixi at uchicago.edu wrote:
> >Look how many lines there are in the log like this:
> >
> >2008-06-10 10:48:03,137-0500 INFO  vdl:initshareddir START 
> >host=OSG_LIGO_MIT 
> >
> >followed closely by:
> >
> >2008-06-10 10:48:03,196-0500 DEBUG TaskImpl Task
> (type=FILE_OPERATION, 
> >identity=u
> >rn:0-1-701-1-1213112750531) setting status to Failed 
> >org.globus.cog.abstraction.
> >impl.file.IrrecoverableResourceException: Error 
> communicating with the 
> >GridFTP server
> 
> I'm sorry to turn to the old question and ask again. But I'm 
> still confused about it. Both of you said that these lines 
> mean different retries. However among these lines in the log 
> file, there is no site selection action which is represented 
> by "WeightedHostScoreScheduler Sorted".

Hmm?
2008-06-10 10:47:27,429-0500 DEBUG WeightedHostScoreScheduler Releasing
contact 7
2008-06-10 10:47:27,430-0500 INFO  WeightedHostScoreScheduler Sorted:
[OSG_LIGO_MIT:21.822(51.667):2/4]
2008-06-10 10:47:27,430-0500 DEBUG WeightedHostScoreScheduler Rand:
15.78147400479908, sum: 100.07797485652034
2008-06-10 10:47:27,431-0500 DEBUG WeightedHostScoreScheduler Next
contact: OSG_LIGO_MIT:21.822(51.667):2/4


That seems to be your only contact. Running
cat /home/lixi/newswift/latest/score/3500/workflowtest-20080610-1045-58kc7p6f.log|grep "Next contact: OSG_LIGO_MIT"|wc

produces: 4376   30632  469760

So there's 4376 site selections there.

If you remove the |wc you can see the evolution of the score.

>  Then I wonder what 
> would be included in one try. One try just means trying to 
> do the same operation to the same site or selecting next site
> (may be another one or the same one) to do the same 
> operation. In log file, what kind of expression implies the 
> beginning or end of one try?

You only seem to have one site there. Re-trying means full re-scheduling
(so maybe another site if there is one).

There isn't much marking the start of a try besides the scheduler
allocating a site. The successful end of a try is represented by
"JOB_END". Failed -> "APPLICATION_EXCEPTION".

> 
> Thanks,
> 
> Xi




More information about the Swift-devel mailing list