[Swift-devel] Re: Error in pbs job submission with coasters in PADS fast queue
wilde at mcs.anl.gov
wilde at mcs.anl.gov
Thu Jun 3 14:42:39 CDT 2010
Forgot to add:
for now, a remedy is to change your config:
<pool handle="coasterpads">
<execution provider="coaster" url="login1.pads.ci.uchicago.edu" jobmanager="ssh:pbs"/>
<profile namespace="globus" key="maxtime">3000</profile>
<profile namespace="globus" key="workersPerNode">8</profile>
<profile namespace="globus" key="slots">1</profile>
<profile namespace="globus" key="nodeGranularity">1</profile>
<profile namespace="globus" key="maxNodes">10</profile>
<profile namespace="globus" key="queue">short</profile>
<profile namespace="karajan" key="jobThrottle">0.5</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<filesystem provider="ssh" url="login1.pads.ci.uchicago.edu"/>
<workdirectory>/home/wilde/swift/lab/wwjbug.2010.0602</workdirectory>
</pool>
from queue fast to queue "short"
or to use slots 10 and maxnodes 1
This will enable you (and me) to pop up one level and debug the original coaster reply timeout problem that we were trying to re-create here.
- Mike
----- wilde at mcs.anl.gov wrote:
> Wenjun,
>
> The error that was causing your "cat.swift" script to fail in the PADS
> fast queue as soon as the swift script generates more jobs than
> "workersPerNode" is that that fast queue is limited to one node.
>
> Trapping the pbs submit script (I think debug=true in the
>
> login1$ qsub PBS5612813565711306842.submit
>
> which has:
> #PBS -l nodes=2
> #PBS -q fast
>
> Gives:
>
> qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max
> nodect requirement
>
> We (swift developers) need to find out why the error message wasn't
> made prominent.
>
> We need to both document this limitation, make sure the error message
> gets to the user, and document procedures for finding and debugging
> Swift's generated .submit files.
>
> - Mike
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list