[Swift-devel] coaster termination problems cause large runs to hang

Mihael Hategan hategan at mcs.anl.gov
Mon Jun 27 13:56:18 CDT 2011


Ah, sorry.

On Mon, 2011-06-27 at 13:48 -0500, Michael Wilde wrote:
> >  (which I will copy to ~wilde/dssat.run01 on the CI net)?
> 
> bri$ pwd
> /home/wilde
> bri$ ls -ld dssat.run01/
> drwxr-xr-x 2 wilde ci-users 4096 Jun 24 19:01 dssat.run01//
> bri$ 
> 
> 
> ----- Original Message -----
> > [hategan at login ~]$ cd /home/papia/dssat/run01
> > -bash: cd: /home/papia/dssat/run01: Permission denied
> > 
> > 
> > On Fri, 2011-06-24 at 18:54 -0500, Michael Wilde wrote:
> > > Mihael,
> > >
> > > Papia is running large sweeps of the DSSAT land use model on PADS,
> > > and getting failures, it seems, when the coasters time out. Her
> > > script is attempting about 120K model invocations, each taking about
> > > 60 seconds to run. She gets between 30K and 60K of these done before
> > > it fails.
> > >
> > > Can you look at the example below, on the CI network in
> > > /home/papia/dssat/run01
> > >  (which I will copy to ~wilde/dssat.run01 on the CI net)?
> > >
> > > The swift.out file shows the run progressing nicely until the first
> > > coaster worker timeout occurs.
> > >
> > > The run was started with ./RunSweep.sh:
> > > time swift -tc.file tc -sites.file sites.xml -config cf
> > > RunDssat.swift >& swift.out
> > >
> > > The run id is RunID: 20110624-1333-r17fczk0
> > > Swift is 0.92.1.
> > >
> > > Thanks,
> > >
> > > Mike
> > >
> > >
> > > login2$ head swift.out
> > > Swift svn swift-r4371 cog-r3096
> > >
> > > RunID: 20110624-1333-r17fczk0
> > > Progress:
> > > Progress: uninitialized:2
> > > Progress: Selecting site:36 Stage in:53 Submitting:1 Submitted:10
> > > Progress: Selecting site:36 Stage in:8 Submitting:2 Submitted:54
> > > Progress: Selecting site:36 Submitted:64
> > > Progress: Selecting site:36 Submitted:64
> > > Progress: Selecting site:36 Submitted:63 Active:1
> > > login2$ ls -l *zk0.log
> > > -rw-r--r-- 1 papia ci-users 161039247 Jun 24 17:25
> > > RunDssat-20110624-1333-r17fczk0.log
> > > login2$ pwd
> > > /home/papia/dssat/run01
> > > login2$
> > >
> > >
> 





More information about the Swift-devel mailing list