[Swift-devel] Reducing swift log size

Mihael Hategan hategan at mcs.anl.gov
Thu Jun 12 13:36:44 CDT 2014


On Thu, 2014-06-12 at 13:02 -0500, Michael Wilde wrote:
> In general we're moving to running coasters in all configurations (in 
> part to reduce the number of configurations to explain and test).

Right. Although we could default to the local provider for local things.

> 
> Yadu's also looking at using provider staging shared-filesystem mode to 
> avoid un-necessary staging for local filesystems.
> 
> Can you explain the connection between this and the excessive logging? 
> Can that be fixed rather than resorting to an alternate provider?

Local coaster services run in the same JVM. So static variables are the
same in multiple instances of local coaster services. The code was
written with the assumption that there would be one service per JVM, a
scenario that we didn't think we would deviate from a few years ago.

The job to worker node submission scheme is made up of a thread that
looks at queued jobs and matches them with free workers. This runs in a
loop that polls both the job queue and the worker queue. It is, however,
possible for workers to be available that cannot fit any of the queued
jobs due to walltime constraints. So you don't want to loop constantly
in that case.

The good news is that if a worker cannot run a queued job now due to
time constraints, it will never be able to. So unless a new job with a
smaller walltime comes in, you can safely assume that you don't need to
bother waking up said worker.

This is achieved using a sequence number. The job queue keeps one and
changes it monotonically when new jobs come in. Sleeping workers take a
snapshot of that and are only awaken if it differs from the one in the
job queue (i.e. new jobs came in since we last figured that this worker
cannot run any of the already queued jobs).

The problem is that there are two job queues, one for each coaster
service. But the code only looks at one static instance of them when
checking whether a worker should be awaken. So the worker gets a low
sequence number from the right job queue, but then it checks it against
the other job queue, which has a higher sequence number. So it gets
awaken. Then it gets put to sleep because it has nothing to run really.

Anyway, there are two things that should be fixed there: the static
variables and this should be made threadless.

Mihael





More information about the Swift-devel mailing list