[Swift-devel] coaster termination problems cause large runs to hang

Michael Wilde wilde at mcs.anl.gov
Mon Jun 27 13:48:50 CDT 2011


>  (which I will copy to ~wilde/dssat.run01 on the CI net)?

bri$ pwd
/home/wilde
bri$ ls -ld dssat.run01/
drwxr-xr-x 2 wilde ci-users 4096 Jun 24 19:01 dssat.run01//
bri$ 


----- Original Message -----
> [hategan at login ~]$ cd /home/papia/dssat/run01
> -bash: cd: /home/papia/dssat/run01: Permission denied
> 
> 
> On Fri, 2011-06-24 at 18:54 -0500, Michael Wilde wrote:
> > Mihael,
> >
> > Papia is running large sweeps of the DSSAT land use model on PADS,
> > and getting failures, it seems, when the coasters time out. Her
> > script is attempting about 120K model invocations, each taking about
> > 60 seconds to run. She gets between 30K and 60K of these done before
> > it fails.
> >
> > Can you look at the example below, on the CI network in
> > /home/papia/dssat/run01
> >  (which I will copy to ~wilde/dssat.run01 on the CI net)?
> >
> > The swift.out file shows the run progressing nicely until the first
> > coaster worker timeout occurs.
> >
> > The run was started with ./RunSweep.sh:
> > time swift -tc.file tc -sites.file sites.xml -config cf
> > RunDssat.swift >& swift.out
> >
> > The run id is RunID: 20110624-1333-r17fczk0
> > Swift is 0.92.1.
> >
> > Thanks,
> >
> > Mike
> >
> >
> > login2$ head swift.out
> > Swift svn swift-r4371 cog-r3096
> >
> > RunID: 20110624-1333-r17fczk0
> > Progress:
> > Progress: uninitialized:2
> > Progress: Selecting site:36 Stage in:53 Submitting:1 Submitted:10
> > Progress: Selecting site:36 Stage in:8 Submitting:2 Submitted:54
> > Progress: Selecting site:36 Submitted:64
> > Progress: Selecting site:36 Submitted:64
> > Progress: Selecting site:36 Submitted:63 Active:1
> > login2$ ls -l *zk0.log
> > -rw-r--r-- 1 papia ci-users 161039247 Jun 24 17:25
> > RunDssat-20110624-1333-r17fczk0.log
> > login2$ pwd
> > /home/papia/dssat/run01
> > login2$
> >
> >

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list