[Swift-devel] Some remote workers + provider staging logs (ReplyTimeouts on large workflows)

Allan Espinosa aespinosa at cs.uchicago.edu
Thu Dec 30 11:51:40 CST 2010


I redid the OSG run with only 1 worker per coaster service and the
same workflow finished without problems.  I'll investigate if there
are problems on multiple workers by making a testbed case in PADS as
well.

2010/12/30 Mihael Hategan <hategan at mcs.anl.gov>:
> On Wed, 2010-12-29 at 15:28 -0600, Allan Espinosa wrote:
>
>> Does the timeout occur from the jobs being to long in the coaster
>> service queue?
>
> No. The coaster protocol requires each command sent on a channel to be
> acknowledged (pretty much like TCP does). Either the worker was very
> busy (unlikely by design) or it has a fault that disturbed its main
> event loop or there was an actual networking problem (also unlikely).
>
>>
>>
>> I did the same workflow on PADS only (site throttle makes it receive
>> only a maximum of 400 jobs).  I got the same errors at some point when
>> my workers failed at a time less than the timeout period:
>>
>> The last line shows the worker.pl message when it exited:
>>
>> rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111/5
>> rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111
>> rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations
>> unlink /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/wrapper.log
>> unlink /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/stdout.txt
>> rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k
>> Failed to process data:  at
>> /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl
>> line 639.
>
> I wish perl had a stack trace. Can you enable TRACE on the worker and
> re-run and send me the log for the failing worker?
>
> Mihael
>
>
>
>



-- 
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>



More information about the Swift-devel mailing list