[Swift-devel] Re: Swift hang

Jonathan Monette jon.monette at gmail.com
Tue Jan 18 17:59:07 CST 2011


Thanks for the PBS entry.  Will try that right now.

I was going to try just localhost last since my work flows already take 
quite a bit of time(at least 30 mins to get to a hang using 40 
workers).  Adjusting the parameters for optimal use probably would bring 
that time down but the hang has been a priority than finding the optimal 
numbers.

On 1/18/11 5:54 PM, Michael Wilde wrote:
> PBS:
>
> <config>
>    <pool handle="pbs">
>      <execution provider="pbs" url="none" />
>      <!-- profile namespace="globus" key="maxWallTime">7200</profile>  -->
>      <profile namespace="globus" key="maxWallTime">02:00:00</profile>
>      <profile namespace="karajan" key="jobThrottle">2.55</profile>  <!--256 concurrent tasks-->
>      <profile namespace="karajan" key="initialScore">10000</profile>
>      <profile namespace="globus" key="queue">short</profile>
>      <filesystem provider="local"/>
>      <workdirectory>/home/wilde/swiftwork</workdirectory>
>    </pool>
> </config>l
>
> adjust maxWallTime, throttle, and queue
>
> Maybe try running with just localhost, and jobThrottle=0.07 or 0.08 on a pads compute node, with as much of your data as possible, and your workdirectory, on a local disk (/scratch/local).  Could also try that on Ranger.  For PADS use qsub -I -l walltime=01:00:00 to get a local node to login to to get a quiet node all to yourself.
>
> - Mike
>
>
>
> ----- Original Message -----
>> I have tried several configurations using coasters(using different
>> number of blocks, workers per node, etc.) and tried following coasters
>> code in cog to see if this is where Swift hung. Still haven't found
>> out
>> if coasters is the culprit for the hang. This has proved to be hard
>> since the work flows that hang regularly are very large and take
>> awhile
>> to get to the point where they hang.
>>
>> Can anyone help with send me a sample configuration of what using
>> straight pbs would look like? Something that doesn't use coasters just
>> uses pbs. I believe this will help me narrow down if coasters is the
>> culprit or is the problem really in Swift.
>>
>> Mihael, were you ever able to take a look at the log files for my runs
>> to see if you saw anything? I know cleaning up for the next release
>> has
>> been more of the priority just wondering if you ever got a chance.
>>
>> On the plus side, I have also been developing a script that runs Swift
>> that allows me to choose certain configurations so I don't have to be
>> constantly changing files. This I believe will be useful in the
>> overall
>> final product(what ever that may be).
>>
>> On 1/18/11 10:41 AM, Daniel S. Katz wrote:
>>> Hi Jon,
>>>
>>> How is this going?
>>>
>>> Dan
>>>
>>>
>>> On Jan 5, 2011, at 1:50 PM, Jonathan Monette wrote:
>>>
>>>> Hello,
>>>>     I have encountered swift hanging. The deadlock appears to be in
>>>>     the same place every time. This deadlock does seem to be
>>>>     intermittent since smaller work sizes does complete. This job
>>>>     size is with approximately 1200 files. The behavior that the
>>>>     logs show is that the files needed for the job submission are
>>>>     staged in but no jobs are submitted. The Coaster heartbeat that
>>>>     appears in the swift logs shows that the job queue is empty. The
>>>>     logs for the runs are in
>>>>     ~jonmon/Workspace/Swift/Montage/m101_j_6x6/run.000[1,2,3] I will
>>>>     try to recreate the problem using simple cat jobs.
>>>>
>>>> --
>>>> Jon
>>>>
>>>> Computers are incredibly fast, accurate, and stupid. Human beings
>>>> are incredibly slow, inaccurate, and brilliant. Together they are
>>>> powerful beyond imagination.
>>>> - Albert Einstein
>>>>



More information about the Swift-devel mailing list