[Swift-user] [Swift-devel] swift on ranger

Sarah Kenny skenny at uci.edu
Thu Dec 22 16:44:37 CST 2011


yeah, latest build works...i don't get a warning when specifying 1 job per
node with 16way pe.

On Wed, Dec 21, 2011 at 11:54 PM, David Kelly <davidk at ci.uchicago.edu>wrote:

> Sarah,
>
> Can you please give this another try? I believe it should work now with
> your original sites.xml.
>
> David
>
> ----- Original Message -----
> > From: "Sarah Kenny" <skenny at uci.edu>
> > To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>, "Swift User" <
> swift-user at ci.uchicago.edu>
> > Sent: Wednesday, December 21, 2011 4:50:48 PM
> > Subject: Re: [Swift-devel] [Swift-user] swift on ranger
> > yr right ketan, if i change it to: <profile namespace="globus"
> > key="jobsPerNode">16</profile> the warning message goes away. however,
> > there are times i don't want to run 16 jobs per node...e.g. bcs a
> > single job needs all the available memory so even though the node has
> > 16 processors i can't actually use them all. so perhaps this is just a
> > scheduling issue with ranger/sge in that they don't want you to submit
> > a job that's going to leave processors idle? that seems a bit
> > restrictive though...
> >
> >
> > On Wed, Dec 21, 2011 at 7:58 AM, Ketan Maheshwari <
> > ketancmaheshwari at gmail.com > wrote:
> >
> >
> > Sarah,
> >
> > I checked my sites.xml. The only difference between yours and mine
> > being the value of jobspernode which is 16 in my case. I have had this
> > value in other multiples of 16 which has worked fine for me.
> >
> >
> >
> >
> >
> >
> > On Wed, Dec 21, 2011 at 6:57 AM, Sarah Kenny < skenny at uci.edu > wrote:
> >
> >
> >
> >
> >
> > getting this when submitting to ranger with both the latest and our
> > previous version of swift (swift-r5259 cog-r3313)
> >
> > Final status: time: Wed, 21 Dec 2011 04:49:15 -0800 Finished
> > successfully:100
> > The following warnings have occurred:
> > 1.
> > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Cannot submit job: Could not submit job (qsub reported an exit code of
> > 1).
> >
> --------------------------------------------------------------------------
> > Welcome to TACC's Ranger System, an NSF XD Resource
> >
> ---------------------------------------------------------------------------->
> > Checking that you specified -V...--> Checking that you specified a
> > time limit...--> Checking that you specified a queue...--> Setting
> > project...--> Checking that you specified a parallel environment...-->
> > Checking that you specified a valid parallel environment name...-->
> > Checking that the minimum and maximum PE counts are the same...-->
> > Checking that the number of PEs requested is
> > valid...------------------> Rejecting job <------------------Your slot
> > (or core) request is not a multiple of 16.Syntax: -pe <pe_name>
> > <n>where <n> is a multiple of
> > 16.-----------------------------------------------------
> > Unable to run job: JSV rejected job.Exiting.
> >
> > at
> >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
> > at
> >
> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45)
> > at
> >
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:57)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
> > Caused by:
> > org.globus.cog.abstraction.impl.scheduler.common.ProcessException:
> > Could not submit job (qsub reported an exit code of 1).
> >
> --------------------------------------------------------------------------
> > Welcome to TACC's Ranger System, an NSF XD Resource
> >
> ---------------------------------------------------------------------------->
> > Checking that you specified -V...--> Checking that you specified a
> > time limit...--> Checking that you specified a queue...--> Setting
> > project...--> Checking that you specified a parallel environment...-->
> > Checking that you specified a valid parallel environment name...-->
> > Checking that the minimum and maximum PE counts are the same...-->
> > Checking that the number of PEs requested is
> > valid...------------------> Rejecting job <------------------Your slot
> > (or core) request is not a multiple of 16.Syntax: -pe <pe_name>
> > <n>where <n> is a multiple of
> > 16.-----------------------------------------------------
> > Unable to run job: JSV rejected job.Exiting.
> >
> > at
> >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:108)
> > at
> >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
> > ... 3 more
> >
> > ################### sites file
> >
> > <config>
> > <pool handle="RANGER">
> > <execution provider="coaster" jobManager="gt2:SGE" url="
> > gatekeeper.ranger.tacc.teragrid.org "/>
> > <filesystem provider="gsiftp" url="gsiftp://
> > gridftp.ranger.tacc.teragrid.org "/>
> > <profile namespace="globus" key="maxtime">86400</profile>
> > <profile namespace="globus" key="maxWallTime">02:00:00</profile>
> > <profile namespace="globus" key="jobsPerNode">1</profile>
> > <profile namespace="globus" key="nodeGranularity">64</profile>
> > <profile namespace="globus" key="maxNodes">4096</profile>
> > <profile namespace="globus" key="queue">normal</profile>
> > <profile namespace="karajan" key="jobThrottle">1.28</profile>
> > <profile namespace="globus" key="project">TG-DBS080004N</profile>
> > <profile namespace="globus" key="pe">16way</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> > <workdirectory>/work/00043/tg457040/swiftwork</workdirectory>
> > </pool>
> > </config>
> >
> > same settings we've been using for a while, i'm not sure why this
> > seems to be popping up now, but it's rather consistent. all jobs are
> > finishing successfully, so it's rather confusing...any idea what i
> > might be missing here?
> >
> > thanks
> > ~sk
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> >
> >
> >
> > --
> > Ketan
> >
> >
> >
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >
> >
> >
> > --
> > Sarah Kenny
> > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> > University of California Irvine, Dept. of Neurology ~ 773-818-8300
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>



-- 
Sarah Kenny
Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
University of California Irvine, Dept. of Neurology ~ 773-818-8300
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20111222/5d5c42fa/attachment.html>


More information about the Swift-user mailing list