[Swift-devel] [Swift-user] swift on ranger
Sarah Kenny
skenny at uci.edu
Wed Dec 21 16:50:48 CST 2011
yr right ketan, if i change it to: <profile namespace="globus"
key="jobsPerNode">16</profile> the warning message goes away. however,
there are times i don't want to run 16 jobs per node...e.g. bcs a single
job needs all the available memory so even though the node has 16
processors i can't actually use them all. so perhaps this is just a
scheduling issue with ranger/sge in that they don't want you to submit a
job that's going to leave processors idle? that seems a bit restrictive
though...
On Wed, Dec 21, 2011 at 7:58 AM, Ketan Maheshwari <
ketancmaheshwari at gmail.com> wrote:
> Sarah,
>
> I checked my sites.xml. The only difference between yours and mine being
> the value of jobspernode which is 16 in my case. I have had this value in
> other multiples of 16 which has worked fine for me.
>
>
> On Wed, Dec 21, 2011 at 6:57 AM, Sarah Kenny <skenny at uci.edu> wrote:
>
>> getting this when submitting to ranger with both the latest and our
>> previous version of swift (swift-r5259 cog-r3313)
>>
>> Final status: time: Wed, 21 Dec 2011 04:49:15 -0800 Finished
>> successfully:100
>> The following warnings have occurred:
>> 1. org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Cannot submit job: Could not submit job (qsub reported an exit code of 1).
>> --------------------------------------------------------------------------
>> Welcome to TACC's Ranger System, an NSF XD Resource
>> ---------------------------------------------------------------------------->
>> Checking that you specified -V...--> Checking that you specified a time
>> limit...--> Checking that you specified a queue...--> Setting project...-->
>> Checking that you specified a parallel environment...--> Checking that you
>> specified a valid parallel environment name...--> Checking that the minimum
>> and maximum PE counts are the same...--> Checking that the number of PEs
>> requested is valid...------------------> Rejecting job
>> <------------------Your slot (or core) request is not a multiple of
>> 16.Syntax: -pe <pe_name> <n>where <n> is a multiple of
>> 16.-----------------------------------------------------
>> Unable to run job: JSV rejected job.Exiting.
>>
>> at
>> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
>> at
>> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45)
>> at
>> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:57)
>> at
>> org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
>> Caused by:
>> org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could
>> not submit job (qsub reported an exit code of 1).
>> --------------------------------------------------------------------------
>> Welcome to TACC's Ranger System, an NSF XD Resource
>> ---------------------------------------------------------------------------->
>> Checking that you specified -V...--> Checking that you specified a time
>> limit...--> Checking that you specified a queue...--> Setting project...-->
>> Checking that you specified a parallel environment...--> Checking that you
>> specified a valid parallel environment name...--> Checking that the minimum
>> and maximum PE counts are the same...--> Checking that the number of PEs
>> requested is valid...------------------> Rejecting job
>> <------------------Your slot (or core) request is not a multiple of
>> 16.Syntax: -pe <pe_name> <n>where <n> is a multiple of
>> 16.-----------------------------------------------------
>> Unable to run job: JSV rejected job.Exiting.
>>
>> at
>> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:108)
>> at
>> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
>> ... 3 more
>>
>> ################### sites file
>>
>> <config>
>> <pool handle="RANGER">
>> <execution provider="coaster" jobManager="gt2:SGE" url="
>> gatekeeper.ranger.tacc.teragrid.org"/>
>> <filesystem provider="gsiftp" url="gsiftp://
>> gridftp.ranger.tacc.teragrid.org"/>
>> <profile namespace="globus" key="maxtime">86400</profile>
>> <profile namespace="globus" key="maxWallTime">02:00:00</profile>
>> <profile namespace="globus" key="jobsPerNode">1</profile>
>> <profile namespace="globus" key="nodeGranularity">64</profile>
>> <profile namespace="globus" key="maxNodes">4096</profile>
>> <profile namespace="globus" key="queue">normal</profile>
>> <profile namespace="karajan" key="jobThrottle">1.28</profile>
>> <profile namespace="globus" key="project">TG-DBS080004N</profile>
>> <profile namespace="globus" key="pe">16way</profile>
>> <profile namespace="karajan" key="initialScore">10000</profile>
>> <workdirectory>/work/00043/tg457040/swiftwork</workdirectory>
>> </pool>
>> </config>
>>
>> same settings we've been using for a while, i'm not sure why this seems
>> to be popping up now, but it's rather consistent. all jobs are finishing
>> successfully, so it's rather confusing...any idea what i might be missing
>> here?
>>
>> thanks
>> ~sk
>>
>>
>>
>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>
>>
>
>
> --
> Ketan
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
--
Sarah Kenny
Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
University of California Irvine, Dept. of Neurology ~ 773-818-8300
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20111221/9e48e197/attachment.html>
More information about the Swift-devel
mailing list