[Swift-devel] New 0.93 problem: <jobname>.error No such file or directory

Mihael Hategan hategan at mcs.anl.gov
Tue Aug 9 14:05:36 CDT 2011


On Tue, 2011-08-09 at 07:16 -0500, Michael Wilde wrote:
> I stopped this run and started a larger one: 5M catsn jobs to a pool
> of 300-400 workers (varies over time).  It finished 2.2M and was still
> running, albeit slowly, when I ended it.
> 
> The job rate ramped up quickly as the external QueueN script obtained
> workers. After about 15 mins had obtained 80 workers and seemed to be
> running at several hundred tasks per second. I had moved all the test
> clients, IO, and logging to local hard disk on communicado for speed.
> I set a retry count of 5, and turned on lazy failure mode.
> 
> After about 6 hours, the test had passed 2.2M jobs and was still
> progressing, but seemed to have drastically slowed down from its
> earlier rate. Seemed to have dropped below a few jobs per second.
> Possibly it ate through its throttle due to failed/hung workers.

Shouldn't be the case any more. My first suspicion would be that swift
is running out of memory. But then it could also be some leak in the
coaster staging buffers. I'll look at the logs later today.




More information about the Swift-devel mailing list