[Swift-devel] [Swift-user] gram on ranger

Michael Wilde wilde at mcs.anl.gov
Tue Oct 11 15:28:48 CDT 2011


We have another example of swift hanging at the end of a ParVis
script. I think I reported that on the list. Mihael needs a jstack
dump of this along with the swift log.

On 10/11/11, Sarah Kenny <skenny at uchicago.edu> wrote:
> On Tue, Oct 11, 2011 at 11:49 AM, David Kelly <davidk at ci.uchicago.edu>wrote:
>
>>
>> That could be it.. maybe a cleanup script is not getting the right
>> parameters and failing. Do you happen to have a copy of the coaster log?
>
>
> just put it in /home/skenny/swift_logs
>
>
>
>> Maybe there will be some clues in there.
>>
>> ----- Original Message -----
>> > From: "Sarah Kenny" <skenny at uchicago.edu>
>> > To: "David Kelly" <davidk at ci.uchicago.edu>
>> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>, "Swift User" <
>> swift-user at ci.uchicago.edu>, "Justin M Wozniak"
>> > <wozniak at mcs.anl.gov>
>> > Sent: Tuesday, October 11, 2011 1:32:37 PM
>> > Subject: Re: [Swift-user] gram on ranger
>> > so, this workflow completes all the jobs but then just hangs
>> > indefinitely at the end...maybe a stray cleanup job?
>> >
>> > log is here:
>> >
>> > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log
>> >
>> > just tweaked the sites file a bit from what david sent me:
>> >
>> > <config>
>> > <pool handle="RANGER">
>> > <execution provider="coaster" jobManager="gt2:SGE" url="
>> > gatekeeper.ranger.tacc.teragrid.org "/>
>> > <filesystem provider="gsiftp" url="gsiftp://
>> > gridftp.ranger.tacc.teragrid.org "/>
>> > <profile namespace="globus" key="maxtime">28800</profile>
>> > <profile namespace="globus" key="maxWallTime">00:15:00</profile>
>> > <profile namespace="globus" key="jobsPerNode">1</profile>
>> > <profile namespace="globus" key="nodeGranularity">64</profile>
>> > <profile namespace="globus" key="maxNodes">256</profile>
>> > <profile namespace="globus" key="queue">normal</profile>
>> > <profile namespace="karajan" key="jobThrottle">1</profile>
>> > <profile namespace="globus" key="project">TG-DBS080004N</profile>
>> > <profile namespace="globus" key="pe">16way</profile>
>> > <profile namespace="karajan" key="initialScore">10000</profile>
>> > <workdirectory>/work/00043/tg457040/sidgrid_out/skenny</workdirectory>
>> > </pool>
>> > </config>
>> >
>> >
>> >
>> > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < skenny at uchicago.edu >
>> > wrote:
>> >
>> >
>> > ok, thanks, got in the queue now...also, realized my last run may have
>> > been using the old swift. apparently i had SWIFT_HOME set in my env
>> > and that overrides the newer swift i had set in my PATH.
>> >
>> > ~sk
>> >
>> >
>> >
>> > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < davidk at ci.uchicago.edu
>> > > wrote:
>> >
>> >
>> >
>> >
>> >
>> > Sarah,
>> >
>> > Can you give this another try with the latest 0.93? I made some
>> > changes to the coaster and sge providers and was able to get it
>> > working with a simple catns script. Here is the configuration file I
>> > was using:
>> >
>> > <config>
>> > <pool handle="ranger">
>> > <execution provider="coaster" jobManager="gt2:SGE" url="
>> > gatekeeper.ranger.tacc.teragrid.org "/>
>> >
>> > <filesystem provider="gsiftp" url="gsiftp://
>> > gridftp.ranger.tacc.teragrid.org "/>
>> > <profile namespace="globus" key="maxtime">3600</profile>
>> > <profile namespace="globus" key="maxWallTime">00:00:03</profile>
>> > <profile namespace="globus" key="jobsPerNode">1</profile>
>> > <profile namespace="globus" key="nodeGranularity">16</profile>
>> > <profile namespace="globus" key="maxNodes">16</profile>
>> > <profile namespace="globus" key="queue">development</profile>
>> > <profile namespace="karajan" key="jobThrottle">0.9</profile>
>> >
>> > <profile namespace="globus" key="project">TG-DBS080004N</profile>
>> >
>> > <profile namespace="globus" key="pe">16way</profile>
>> > <workdirectory>/share/home/01503/davidkel/swiftwork</workdirectory>
>> > </pool>
>> > </config>
>> >
>> > Thanks,
>> >
>> > David
>> >
>> > ----- Original Message -----
>> >
>> > > From: "Sarah Kenny" < skenny at uchicago.edu >
>> > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov >
>> > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, "Swift User" <
>> > > swift-user at ci.uchicago.edu >
>> >
>> >
>> >
>> > > Sent: Friday, October 7, 2011 3:13:57 PM
>> > > Subject: Re: [Swift-user] gram on ranger
>> > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log
>> > >
>> > > on ci
>> > >
>> > >
>> > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak <
>> > > wozniak at mcs.anl.gov
>> > > > wrote:
>> > >
>> > >
>> > >
>> > > Can I take a look at the log?
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, 6 Oct 2011, Sarah Kenny wrote:
>> > >
>> > >
>> > >
>> > > hey all, i'm trying to submit to gram on ranger using the latest
>> > > swift
>> > > (built from trunk). it failes like so:
>> > >
>> > > Cannot submit job
>> > > Caused by:
>> > > org.globus.cog.abstraction. impl.common.task.
>> > > TaskSubmissionException:
>> > > Cannot
>> > > submit job
>> > > Caused by: org.globus.gram.GramException: Parameter not supported
>> > > Cannot submit job
>> > >
>> > > the gram log was saying first that 'jobsPerNode' is not supported so
>> > > i
>> > > changed it to workersPerNode and then it was saying 'maxnodes' is
>> > > not
>> > > supported. here's my sites file:
>> > >
>> > > <config>
>> > > <pool handle="RANGER">
>> > > <profile namespace="karajan" key="initialScore">10000</ profile>
>> > > <profile namespace="karajan" key="jobThrottle">1</profile>
>> > > <profile namespace="globus" key="maxWallTime">00:15:00</ profile>
>> > > <profile namespace="globus" key="maxTime">86400</profile>
>> > > <profile namespace="globus" key="slots">1</profile>
>> > > <profile namespace="globus" key="maxNodes">256</profile>
>> > > <profile namespace="globus" key="pe">16way</profile>
>> > > <profile namespace="globus" key="workersPerNode">1</ profile>
>> > > <profile namespace="globus" key="nodeGranularity">64</ profile>
>> > > <profile namespace="globus" key="queue">normal</profile>
>> > > <profile namespace="globus" key="project">TG-DBS080004N</ profile>
>> > > <filesystem provider="gsiftp" url="gsiftp://
>> > > gridftp.ranger.tacc.teragrid. org "/>
>> >
>> > > <execution provider="coaster" jobManager="gt2:gt2:SGE" url="
>> > > gatekeeper.ranger.tacc. teragrid.org "/>
>> >
>> > > <execution provider="gt2" jobManager="SGE" url="
>> > > gatekeeper.ranger.tacc. teragrid.org "/>
>> > > <workdirectory>/work/00043/ tg457040</workdirectory>
>> >
>> > > </pool>
>> > > </config>
>> > >
>> > > thoughts? ideas?
>> > >
>> > > --
>> > > Justin M Wozniak
>> > >
>> > >
>> > >
>> > > --
>> > > Sarah Kenny
>> > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
>> > > University of California Irvine, Dept. of Neurology ~ 773-818-8300
>> > >
>> > >
>> > > _______________________________________________
>> > > Swift-user mailing list
>> > > Swift-user at ci.uchicago.edu
>> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Sarah Kenny
>> > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
>> > University of California Irvine, Dept. of Neurology ~ 773-818-8300
>> >
>> >
>> >
>> >
>> > --
>> > Sarah Kenny
>> > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
>> > University of California Irvine, Dept. of Neurology ~ 773-818-8300
>>
>
>
>
> --
> Sarah Kenny
> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> University of California Irvine, Dept. of Neurology ~ 773-818-8300
>

-- 
Sent from my mobile device



More information about the Swift-devel mailing list