[Swift-devel] Coaster Task Submission Stalling

Tim Armstrong tim.g.armstrong at gmail.com
Wed Sep 3 20:26:33 CDT 2014


Here are client and service logs, with part of service log edited down to
be a reasonable size (I have the full thing if needed, but it was over a
gigabyte).

One relevant section is from 19:49:35 onwards.  The client submits 4 jobs
(its limit), but they don't complete until 19:51:32 or so (I can see that
one task completed based on ncompleted=1 in the check_tasks log message).
It looks like something has happened with broken pipes and workers being
lost, but I'm not sure what the ultimate cause of that is likely to be.

- Tim



On Wed, Sep 3, 2014 at 6:20 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

> Hi Tim,
>
> I've never seen this before with pure Java.
>
> Do you have logs from these runs?
>
> Mihael
>
> On Wed, 2014-09-03 at 16:49 -0500, Tim Armstrong wrote:
> > I'm running a test Swift/T script that submit tasks to Coasters through
> the
> > C++ client and I'm seeing some odd behaviour where task
> > submission/execution is stalling for ~2 minute periods.  For example, I'm
> > seeing submit log messages like "submitting
> > urn:133-1409778135377-1409778135378: /bin/hostname" in bursts of several
> > seconds with a gap of roughly 2 minutes in between, e.g. I'm seeing
> bursts
> > with the following intervals in my logs.
> >
> > 16:07:04,603 to 16:07:10,391
> > 16:09:07,377 to 16:09:13,076
> > 16:11:10,005 to 16:11:16,770
> > 16:13:13,291 to 16:13:19,296
> > 16:15:16,000 to 16:15:21,602
> >
> > From what I can tell, the delay is on the coaster service side: the C
> > client is just waiting for a response.
> >
> > The jobs are just being submitted through the local job manager, so I
> > wouldn't expect any delays there.  The tasks are also just
> "/bin/hostname",
> > so should return immediately.
> >
> > I'm going to continue digging into this on my own, but the 2 minute delay
> > seems like a big clue: does anyone have an idea what could cause stalls
> in
> > task submission of 2 minute duration?
> >
> > Cheers,
> > Tim
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140903/7b1970ad/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: coaster-service.out.gz
Type: application/x-gzip
Size: 36069 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140903/7b1970ad/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: swift-t-client.out.gz
Type: application/x-gzip
Size: 1049192 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140903/7b1970ad/attachment-0001.bin>


More information about the Swift-devel mailing list