[Swift-devel] Re: Swift hang
Michael Wilde
wilde at mcs.anl.gov
Tue Jan 18 17:54:52 CST 2011
PBS:
<config>
<pool handle="pbs">
<execution provider="pbs" url="none" />
<!-- profile namespace="globus" key="maxWallTime">7200</profile> -->
<profile namespace="globus" key="maxWallTime">02:00:00</profile>
<profile namespace="karajan" key="jobThrottle">2.55</profile> <!--256 concurrent tasks-->
<profile namespace="karajan" key="initialScore">10000</profile>
<profile namespace="globus" key="queue">short</profile>
<filesystem provider="local"/>
<workdirectory>/home/wilde/swiftwork</workdirectory>
</pool>
</config>l
adjust maxWallTime, throttle, and queue
Maybe try running with just localhost, and jobThrottle=0.07 or 0.08 on a pads compute node, with as much of your data as possible, and your workdirectory, on a local disk (/scratch/local). Could also try that on Ranger. For PADS use qsub -I -l walltime=01:00:00 to get a local node to login to to get a quiet node all to yourself.
- Mike
----- Original Message -----
> I have tried several configurations using coasters(using different
> number of blocks, workers per node, etc.) and tried following coasters
> code in cog to see if this is where Swift hung. Still haven't found
> out
> if coasters is the culprit for the hang. This has proved to be hard
> since the work flows that hang regularly are very large and take
> awhile
> to get to the point where they hang.
>
> Can anyone help with send me a sample configuration of what using
> straight pbs would look like? Something that doesn't use coasters just
> uses pbs. I believe this will help me narrow down if coasters is the
> culprit or is the problem really in Swift.
>
> Mihael, were you ever able to take a look at the log files for my runs
> to see if you saw anything? I know cleaning up for the next release
> has
> been more of the priority just wondering if you ever got a chance.
>
> On the plus side, I have also been developing a script that runs Swift
> that allows me to choose certain configurations so I don't have to be
> constantly changing files. This I believe will be useful in the
> overall
> final product(what ever that may be).
>
> On 1/18/11 10:41 AM, Daniel S. Katz wrote:
> > Hi Jon,
> >
> > How is this going?
> >
> > Dan
> >
> >
> > On Jan 5, 2011, at 1:50 PM, Jonathan Monette wrote:
> >
> >> Hello,
> >> I have encountered swift hanging. The deadlock appears to be in
> >> the same place every time. This deadlock does seem to be
> >> intermittent since smaller work sizes does complete. This job
> >> size is with approximately 1200 files. The behavior that the
> >> logs show is that the files needed for the job submission are
> >> staged in but no jobs are submitted. The Coaster heartbeat that
> >> appears in the swift logs shows that the job queue is empty. The
> >> logs for the runs are in
> >> ~jonmon/Workspace/Swift/Montage/m101_j_6x6/run.000[1,2,3] I will
> >> try to recreate the problem using simple cat jobs.
> >>
> >> --
> >> Jon
> >>
> >> Computers are incredibly fast, accurate, and stupid. Human beings
> >> are incredibly slow, inaccurate, and brilliant. Together they are
> >> powerful beyond imagination.
> >> - Albert Einstein
> >>
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list