[Swift-devel] Re: Swift hang

Michael Wilde wilde at mcs.anl.gov
Tue Jan 18 17:54:52 CST 2011


PBS:

<config>
  <pool handle="pbs">
    <execution provider="pbs" url="none" />
    <!-- profile namespace="globus" key="maxWallTime">7200</profile> -->
    <profile namespace="globus" key="maxWallTime">02:00:00</profile>
    <profile namespace="karajan" key="jobThrottle">2.55</profile> <!--256 concurrent tasks-->
    <profile namespace="karajan" key="initialScore">10000</profile>
    <profile namespace="globus" key="queue">short</profile>
    <filesystem provider="local"/>
    <workdirectory>/home/wilde/swiftwork</workdirectory>
  </pool>
</config>l

adjust maxWallTime, throttle, and queue

Maybe try running with just localhost, and jobThrottle=0.07 or 0.08 on a pads compute node, with as much of your data as possible, and your workdirectory, on a local disk (/scratch/local).  Could also try that on Ranger.  For PADS use qsub -I -l walltime=01:00:00 to get a local node to login to to get a quiet node all to yourself.

- Mike 



----- Original Message -----
> I have tried several configurations using coasters(using different
> number of blocks, workers per node, etc.) and tried following coasters
> code in cog to see if this is where Swift hung. Still haven't found
> out
> if coasters is the culprit for the hang. This has proved to be hard
> since the work flows that hang regularly are very large and take
> awhile
> to get to the point where they hang.
> 
> Can anyone help with send me a sample configuration of what using
> straight pbs would look like? Something that doesn't use coasters just
> uses pbs. I believe this will help me narrow down if coasters is the
> culprit or is the problem really in Swift.
> 
> Mihael, were you ever able to take a look at the log files for my runs
> to see if you saw anything? I know cleaning up for the next release
> has
> been more of the priority just wondering if you ever got a chance.
> 
> On the plus side, I have also been developing a script that runs Swift
> that allows me to choose certain configurations so I don't have to be
> constantly changing files. This I believe will be useful in the
> overall
> final product(what ever that may be).
> 
> On 1/18/11 10:41 AM, Daniel S. Katz wrote:
> > Hi Jon,
> >
> > How is this going?
> >
> > Dan
> >
> >
> > On Jan 5, 2011, at 1:50 PM, Jonathan Monette wrote:
> >
> >> Hello,
> >>    I have encountered swift hanging. The deadlock appears to be in
> >>    the same place every time. This deadlock does seem to be
> >>    intermittent since smaller work sizes does complete. This job
> >>    size is with approximately 1200 files. The behavior that the
> >>    logs show is that the files needed for the job submission are
> >>    staged in but no jobs are submitted. The Coaster heartbeat that
> >>    appears in the swift logs shows that the job queue is empty. The
> >>    logs for the runs are in
> >>    ~jonmon/Workspace/Swift/Montage/m101_j_6x6/run.000[1,2,3] I will
> >>    try to recreate the problem using simple cat jobs.
> >>
> >> --
> >> Jon
> >>
> >> Computers are incredibly fast, accurate, and stupid. Human beings
> >> are incredibly slow, inaccurate, and brilliant. Together they are
> >> powerful beyond imagination.
> >> - Albert Einstein
> >>

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list