[Swift-devel] swift on ranger
Michael Wilde
wilde at mcs.anl.gov
Wed Dec 21 10:20:12 CST 2011
Node granularity is the size increment, in nodes, of the number of nodes requested in each coaster block. So it can be anything that the user wants, as long as its valid for the local scheduler.
We recently discussed the need to improve and clarify the user guide documentation on how to specify node request parameters for the coaster provider. Im going to file this as a ticket now for 0.94.
- Mike
----- Original Message -----
> From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> Cc: "Sarah Kenny" <skenny at uci.edu>, "Swift Devel" <swift-devel at ci.uchicago.edu>, "Swift User"
> <swift-user at ci.uchicago.edu>
> Sent: Wednesday, December 21, 2011 10:04:48 AM
> Subject: Re: [Swift-devel] [Swift-user] swift on ranger
> Also, shouldn't node granularity be set to 16 on ranger an not 64?
>
>
>
>
> On Dec 21, 2011, at 9:58 AM, Ketan Maheshwari <
> ketancmaheshwari at gmail.com > wrote:
>
>
>
>
>
> Sarah,
>
> I checked my sites.xml. The only difference between yours and mine
> being the value of jobspernode which is 16 in my case. I have had this
> value in other multiples of 16 which has worked fine for me.
>
>
>
> On Wed, Dec 21, 2011 at 6:57 AM, Sarah Kenny < skenny at uci.edu > wrote:
>
>
> getting this when submitting to ranger with both the latest and our
> previous version of swift (swift-r5259 cog-r3313)
>
> Final status: time: Wed, 21 Dec 2011 04:49:15 -0800 Finished
> successfully:100
> The following warnings have occurred:
> 1.
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Cannot submit job: Could not submit job (qsub reported an exit code of
> 1).
> --------------------------------------------------------------------------
> Welcome to TACC's Ranger System, an NSF XD Resource
> ---------------------------------------------------------------------------->
> Checking that you specified -V...--> Checking that you specified a
> time limit...--> Checking that you specified a queue...--> Setting
> project...--> Checking that you specified a parallel environment...-->
> Checking that you specified a valid parallel environment name...-->
> Checking that the minimum and maximum PE counts are the same...-->
> Checking that the number of PEs requested is
> valid...------------------> Rejecting job <------------------Your slot
> (or core) request is not a multiple of 16.Syntax: -pe <pe_name>
> <n>where <n> is a multiple of
> 16.-----------------------------------------------------
> Unable to run job: JSV rejected job.Exiting.
>
> at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
> at
> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45)
> at
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:57)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
> Caused by:
> org.globus.cog.abstraction.impl.scheduler.common.ProcessException:
> Could not submit job (qsub reported an exit code of 1).
> --------------------------------------------------------------------------
> Welcome to TACC's Ranger System, an NSF XD Resource
> ---------------------------------------------------------------------------->
> Checking that you specified -V...--> Checking that you specified a
> time limit...--> Checking that you specified a queue...--> Setting
> project...--> Checking that you specified a parallel environment...-->
> Checking that you specified a valid parallel environment name...-->
> Checking that the minimum and maximum PE counts are the same...-->
> Checking that the number of PEs requested is
> valid...------------------> Rejecting job <------------------Your slot
> (or core) request is not a multiple of 16.Syntax: -pe <pe_name>
> <n>where <n> is a multiple of
> 16.-----------------------------------------------------
> Unable to run job: JSV rejected job.Exiting.
>
> at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:108)
> at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
> ... 3 more
>
> ################### sites file
>
> <config>
> <pool handle="RANGER">
> <execution provider="coaster" jobManager="gt2:SGE" url="
> gatekeeper.ranger.tacc.teragrid.org "/>
> <filesystem provider="gsiftp" url="gsiftp://
> gridftp.ranger.tacc.teragrid.org "/>
> <profile namespace="globus" key="maxtime">86400</profile>
> <profile namespace="globus" key="maxWallTime">02:00:00</profile>
> <profile namespace="globus" key="jobsPerNode">1</profile>
> <profile namespace="globus" key="nodeGranularity">64</profile>
> <profile namespace="globus" key="maxNodes">4096</profile>
> <profile namespace="globus" key="queue">normal</profile>
> <profile namespace="karajan" key="jobThrottle">1.28</profile>
> <profile namespace="globus" key="project">TG-DBS080004N</profile>
> <profile namespace="globus" key="pe">16way</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> <workdirectory>/work/00043/tg457040/swiftwork</workdirectory>
> </pool>
> </config>
>
> same settings we've been using for a while, i'm not sure why this
> seems to be popping up now, but it's rather consistent. all jobs are
> finishing successfully, so it's rather confusing...any idea what i
> might be missing here?
>
> thanks
> ~sk
>
>
>
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
>
>
>
> --
> Ketan
>
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list