[Swift-user] [Swift-devel] swift on ranger

Jonathan Monette jonmon at mcs.anl.gov
Wed Dec 21 10:04:48 CST 2011


Also, shouldn't node granularity be set to 16 on ranger an not 64?



On Dec 21, 2011, at 9:58 AM, Ketan Maheshwari <ketancmaheshwari at gmail.com> wrote:

> Sarah,
> 
> I checked my sites.xml. The only difference between yours and mine being the value of jobspernode which is 16 in my case. I have had  this value in other multiples of 16 which has worked fine for me.
> 
> 
> On Wed, Dec 21, 2011 at 6:57 AM, Sarah Kenny <skenny at uci.edu> wrote:
> getting this when submitting to ranger with both the latest and our previous version of swift (swift-r5259 cog-r3313)
> 
> Final status:  time: Wed, 21 Dec 2011 04:49:15 -0800  Finished successfully:100
> The following warnings have occurred:
> 1. org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Cannot submit job: Could not submit job (qsub reported an exit code of 1). -------------------------------------------------------------------------- Welcome to TACC's Ranger System, an NSF XD Resource ----------------------------------------------------------------------------> Checking that you specified -V...--> Checking that you specified a time limit...--> Checking that you specified a queue...--> Setting project...--> Checking that you specified a parallel environment...--> Checking that you specified a valid parallel environment name...--> Checking that the minimum and maximum PE counts are the same...--> Checking that the number of PEs requested is valid...------------------> Rejecting job <------------------Your slot (or core) request is not a multiple of 16.Syntax: -pe <pe_name> <n>where <n> is a multiple of 16.-----------------------------------------------------
> Unable to run job: JSV rejected job.Exiting.
> 
>         at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
>         at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45)
>         at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:57)
>         at org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
> Caused by: org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could not submit job (qsub reported an exit code of 1). -------------------------------------------------------------------------- Welcome to TACC's Ranger System, an NSF XD Resource ----------------------------------------------------------------------------> Checking that you specified -V...--> Checking that you specified a time limit...--> Checking that you specified a queue...--> Setting project...--> Checking that you specified a parallel environment...--> Checking that you specified a valid parallel environment name...--> Checking that the minimum and maximum PE counts are the same...--> Checking that the number of PEs requested is valid...------------------> Rejecting job <------------------Your slot (or core) request is not a multiple of 16.Syntax: -pe <pe_name> <n>where <n> is a multiple of 16.-----------------------------------------------------
> Unable to run job: JSV rejected job.Exiting.
> 
>         at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:108)
>         at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
>         ... 3 more
> 
> ################### sites file
> 
> <config>
> <pool handle="RANGER">
>      <execution provider="coaster" jobManager="gt2:SGE" url="gatekeeper.ranger.tacc.teragrid.org"/>
>      <filesystem provider="gsiftp" url="gsiftp://gridftp.ranger.tacc.teragrid.org"/>
>      <profile namespace="globus" key="maxtime">86400</profile>
>      <profile namespace="globus" key="maxWallTime">02:00:00</profile>
>      <profile namespace="globus" key="jobsPerNode">1</profile>
>      <profile namespace="globus" key="nodeGranularity">64</profile>
>      <profile namespace="globus" key="maxNodes">4096</profile>
>      <profile namespace="globus" key="queue">normal</profile>
>      <profile namespace="karajan" key="jobThrottle">1.28</profile>
>      <profile namespace="globus" key="project">TG-DBS080004N</profile>
>      <profile namespace="globus" key="pe">16way</profile>
>      <profile namespace="karajan" key="initialScore">10000</profile>
>      <workdirectory>/work/00043/tg457040/swiftwork</workdirectory>
> </pool>
> </config>
> 
> same settings we've been using for a while, i'm not sure why this seems to be popping up now, but it's rather consistent. all jobs are finishing successfully, so it's rather confusing...any idea what i might be missing here? 
> 
> thanks
> ~sk
> 
> 
>    
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> 
> 
> 
> -- 
> Ketan
> 
> 
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20111221/c601f265/attachment.html>


More information about the Swift-user mailing list