[Swift-devel] Re: Swift hang
Jonathan Monette
jon.monette at gmail.com
Tue Jan 18 17:59:07 CST 2011
Thanks for the PBS entry. Will try that right now.
I was going to try just localhost last since my work flows already take
quite a bit of time(at least 30 mins to get to a hang using 40
workers). Adjusting the parameters for optimal use probably would bring
that time down but the hang has been a priority than finding the optimal
numbers.
On 1/18/11 5:54 PM, Michael Wilde wrote:
> PBS:
>
> <config>
> <pool handle="pbs">
> <execution provider="pbs" url="none" />
> <!-- profile namespace="globus" key="maxWallTime">7200</profile> -->
> <profile namespace="globus" key="maxWallTime">02:00:00</profile>
> <profile namespace="karajan" key="jobThrottle">2.55</profile> <!--256 concurrent tasks-->
> <profile namespace="karajan" key="initialScore">10000</profile>
> <profile namespace="globus" key="queue">short</profile>
> <filesystem provider="local"/>
> <workdirectory>/home/wilde/swiftwork</workdirectory>
> </pool>
> </config>l
>
> adjust maxWallTime, throttle, and queue
>
> Maybe try running with just localhost, and jobThrottle=0.07 or 0.08 on a pads compute node, with as much of your data as possible, and your workdirectory, on a local disk (/scratch/local). Could also try that on Ranger. For PADS use qsub -I -l walltime=01:00:00 to get a local node to login to to get a quiet node all to yourself.
>
> - Mike
>
>
>
> ----- Original Message -----
>> I have tried several configurations using coasters(using different
>> number of blocks, workers per node, etc.) and tried following coasters
>> code in cog to see if this is where Swift hung. Still haven't found
>> out
>> if coasters is the culprit for the hang. This has proved to be hard
>> since the work flows that hang regularly are very large and take
>> awhile
>> to get to the point where they hang.
>>
>> Can anyone help with send me a sample configuration of what using
>> straight pbs would look like? Something that doesn't use coasters just
>> uses pbs. I believe this will help me narrow down if coasters is the
>> culprit or is the problem really in Swift.
>>
>> Mihael, were you ever able to take a look at the log files for my runs
>> to see if you saw anything? I know cleaning up for the next release
>> has
>> been more of the priority just wondering if you ever got a chance.
>>
>> On the plus side, I have also been developing a script that runs Swift
>> that allows me to choose certain configurations so I don't have to be
>> constantly changing files. This I believe will be useful in the
>> overall
>> final product(what ever that may be).
>>
>> On 1/18/11 10:41 AM, Daniel S. Katz wrote:
>>> Hi Jon,
>>>
>>> How is this going?
>>>
>>> Dan
>>>
>>>
>>> On Jan 5, 2011, at 1:50 PM, Jonathan Monette wrote:
>>>
>>>> Hello,
>>>> I have encountered swift hanging. The deadlock appears to be in
>>>> the same place every time. This deadlock does seem to be
>>>> intermittent since smaller work sizes does complete. This job
>>>> size is with approximately 1200 files. The behavior that the
>>>> logs show is that the files needed for the job submission are
>>>> staged in but no jobs are submitted. The Coaster heartbeat that
>>>> appears in the swift logs shows that the job queue is empty. The
>>>> logs for the runs are in
>>>> ~jonmon/Workspace/Swift/Montage/m101_j_6x6/run.000[1,2,3] I will
>>>> try to recreate the problem using simple cat jobs.
>>>>
>>>> --
>>>> Jon
>>>>
>>>> Computers are incredibly fast, accurate, and stupid. Human beings
>>>> are incredibly slow, inaccurate, and brilliant. Together they are
>>>> powerful beyond imagination.
>>>> - Albert Einstein
>>>>
More information about the Swift-devel
mailing list