[Swift-devel] swift changing walltime of prews-gram jobs
Michael Wilde
wilde at mcs.anl.gov
Tue Jan 27 12:54:28 CST 2009
I'm trying to duplicate Allan's success with coasters using the
local:pbs configuration on TeraPort. (Im trying local:pbs because my
gt2:gt2:pbs coaster jobs are also failing; am still debugging those and
will send separate email on them).
I'm running 0.8rc1, submitting from tp-login to TeraPort.
This combination seems to still get the walltime from the Globus profile
options, but it seems to be putting the time estimate in the seconds
portion of the PBS request instead of the minutes, so my jobs are dying
on wall-time exceed (from pbs) (4 jobs, needing 30-60 seconds each).
My sites.xml is:
<config>
<pool handle="teraport" >
<profile namespace="globus" key="queue">fast</profile>
<profile namespace="globus" key="maxwalltime">00:05:00</profile>
<execution provider="coaster" url="none" jobmanager="local:pbs" />
<filesystem provider="coaster" url="local://localhost" />
<workdirectory>/home/wilde/swiftwork</workdirectory>
</pool>
</config>
But PBS qstat -f shows:
Resource_List.walltime = 00:00:51
(full qstat -f below)
And I get a walltime-exceeded email message from PBS for each pbs
coaster job submitted.
The coaster log shows: attr=maxwalltime=00:05:00
When I change the sites.xml maxwalltime to "05:00:00" I do indeed get 50
minutes, and the entire script runs to completion.
So it seems to be placing the walltime request one unit to the right of
where it should.
- Mike
Job Id: 848667.tp-mgt.ci.uchicago.edu
Job_Name = null
Job_Owner = wilde at tp-login2.ci.uchicago.edu
job_state = R
queue = fast
server = tp-mgt.ci.uchicago.edu
Checkpoint = u
ctime = Tue Jan 27 12:22:21 2009
Error_Path =
tp-login2.ci.uchicago.edu:/home/wilde/.globus/scripts/pbs2062
7.qsub.stderr
exec_host = tp-c118/0
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = n
mtime = Tue Jan 27 12:22:24 2009
Output_Path =
tp-login2.ci.uchicago.edu:/home/wilde/.globus/scripts/pbs206
27.qsub.stdout
Priority = 0
qtime = Tue Jan 27 12:22:21 2009
Rerunable = True
Resource_List.nodect = 1
Resource_List.nodes = 1:rhel4-compute
Resource_List.walltime = 00:00:51
session_id = 32414
Shell_Path_List = /bin/sh
Variable_List = PBS_O_HOME=/home/wilde,PBS_O_LANG=en_US.UTF-8,
PBS_O_LOGNAME=wilde,
PBS_O_PATH=/autonfs/home/wilde/tutorials/osgedu/build/docbook-xsl/too
ls/bin:/home/wilde/bin:/soft/java-1.5.0_06-sun-r1/bin:/soft/java-1.5.0
_06-sun-r1/jre/bin:/soft/apache-ant-1.7.1-r1/bin:/software/common/gx-m
ap-0.5.3.3-r1/bin:/soft/condor-6.8.1-r1/bin:/soft/apache-ant-1.6.5-r1/
bin:/software/common/cert-scripts-2-5.rev44-r1/bin:/soft/globus-4.0.3-
r1/bin:/soft/globus-4.0.3-r1/sbin:/usr/kerberos/bin:/bin:/usr/bin:/usr
/X11R6/bin:/usr/local/bin:/software/common/softenv-1.6.0-r1/bin:/home/
wilde/bin/linux-rhel4-x86_64:/home/wilde/bin:/soft/R-2.4.0-r1/bin:/sof
t/R-2.4.0-r1/lib/R/bin:/soft/torque-2.3.3-r1/bin:/soft/maui-3.2.6p19-g
cc-r1/bin:/soft/maui-3.2.6p19-gcc-r1/sbin:/soft/matlab-7.5-r1/bin:/sof
t/xcat-1.2.0-r1/bin:/soft/xcat-1.2.0-r1/sbin:/soft/xcat-1.2.0-r1/x86_6
4/bin:/soft/xcat-1.2.0-r1/x86_64/sbin:/soft/xcat-1.2.0-r1/contrib/bin:
/soft/xcat-1.2.0-r1/contrib/sbin:/soft/xcat-1.2.0-r1/contrib/x86_64/bi
n:/soft/xcat-1.2.0-r1/contrib/x86_64/sbin:/home/wilde/swift/tools:/hom
e/wilde/swift/rev/latest/bin:/home/wilde/blast/ncbi/bin,
PBS_O_MAIL=/var/spool/mail/wilde,PBS_O_SHELL=/bin/bash,
PBS_SERVER=tp-login2.ci.uchicago.edu,
PBS_O_HOST=tp-login2.ci.uchicago.edu,PBS_O_WORKDIR=/home/wilde,
PBS_O_QUEUE=fast
etime = Tue Jan 27 12:22:21 2009
submit_args = /home/wilde/.globus/scripts/pbs20627.qsub
start_time = Tue Jan 27 12:22:23 2009
start_count = 1
tp$
On 1/25/09 9:34 AM, Ben Clifford wrote:
> On Sun, 25 Jan 2009, Allan Espinosa wrote:
>
>> Oh right. The coaster service on the site runs over fork and the
>> workers over LRM right? so just one queue is needed to be specified.
>
> yes.
>
More information about the Swift-devel
mailing list