[Swift-devel] misassignment of jobs

Michael Wilde wilde at mcs.anl.gov
Sun Nov 21 17:10:15 CST 2010


Mihael,

If you're in fixin' mode, I'll spend some time now trying to reproduce the 3 coaster problems that are high on my "needed for users" list:

1. Swift hangs/fails talking to persistent server if it sites idle for a few minutes, even with large timeout values (which were possibly not set correctly or fully).

2. With normal coaster mode, if workers start toiming out for lack of work, the Swift run dies.

3. Errors in provider staging at high volume.

If you already have test cases for these issues, let me know, and I'll focus on the missing ones. But Im assuming for now you need all three.

- Mike


----- Original Message -----
> Sadly though, I can't reproduce this.
> 
> Can you give me more details, such as the swift script, the version of
> swift used, and anything that would be unusual compared to vanilla
> swift
> use.
> 
> Mihael
> 
> On Thu, 2010-11-18 at 21:57 -0800, Mihael Hategan wrote:
> > I was ready to blame cosmic rays, but this seems to be a pretty
> > common
> > occurrence in your log. So I'm on it.
> >
> > mike at blabla2 tmp$ cat worker-20101117-1538-fe9aq209.log|grep
> > JOB_START |
> > awk '{print $7 " " $13}'|sort|uniq
> > tr=worker0 host=BNL-ATLAS_gridgk01.racf.bnl.gov
> > tr=worker0 host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu
> > tr=worker1 host=BNL-ATLAS_gridgk02.racf.bnl.gov
> > tr=worker10 host=Nebraska_red.unl.edu
> > tr=worker11 host=Prairiefire_pf-grid.unl.edu
> > tr=worker12 host=Purdue-RCAC_osg.rcac.purdue.edu
> > tr=worker13 host=GridUNESP_CENTRAL_ce.grid.unesp.br
> > tr=worker13 host=RENCI-Engagement_belhaven-1.renci.org
> > tr=worker14 host=SBGrid-Harvard-East_osg-east.hms.harvard.edu
> > tr=worker15 host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu
> > tr=worker15 host=SPRACE_osg-ce.sprace.org.br
> > tr=worker16 host=UCHC_CBG_vdgateway.vcell.uchc.edu
> > tr=worker17 host=UCR-HEP_top.ucr.edu
> > tr=worker18 host=UFlorida-HPC_osg.hpc.ufl.edu
> > tr=worker19 host=UFlorida-PG_pg.ihepa.ufl.edu
> > tr=worker2 host=FNAL_FERMIGRID_fnpcosg1.fnal.gov
> > tr=worker20 host=MIT_CMS_ce01.cmsaf.mit.edu
> > tr=worker20 host=UMissHEP_umiss001.hep.olemiss.edu
> > tr=worker21 host=Firefly_ff-grid3.unl.edu
> > tr=worker21 host=Nebraska_red.unl.edu
> > tr=worker21 host=UTA_SWT2_gk04.swt2.uta.edu
> > tr=worker22 host=WQCG-Harvard-OSG_tuscany.med.harvard.edu
> > tr=worker3 host=Firefly_ff-grid3.unl.edu
> > tr=worker3 host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu
> > tr=worker4 host=GridUNESP_CENTRAL_ce.grid.unesp.br
> > tr=worker5 host=Firefly_ff-grid3.unl.edu
> > tr=worker5 host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu
> > tr=worker6 host=MIT_CMS_ce01.cmsaf.mit.edu
> > tr=worker7 host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu
> > tr=worker7 host=MIT_CMS_ce02.cmsaf.mit.edu
> > tr=worker8 host=NYSGRID_CORNELL_NYS1_nys1.cac.cornell.edu
> > tr=worker9 host=Nebraska_gpn-husker.unl.edu
> >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list