[Swift-devel] swift changing walltime of prews-gram jobs

Michael Wilde wilde at mcs.anl.gov
Tue Jan 27 12:54:28 CST 2009


I'm trying to duplicate Allan's success with coasters using the 
local:pbs configuration on TeraPort. (Im trying local:pbs because my 
gt2:gt2:pbs coaster jobs are also failing; am still debugging those and 
will send separate email on them).

I'm running 0.8rc1, submitting from tp-login to TeraPort.

This combination seems to still get the walltime from the Globus profile 
options, but it seems to be putting the time estimate in the seconds 
portion of the PBS request instead of the minutes, so my jobs are dying 
on wall-time exceed (from pbs) (4 jobs, needing 30-60 seconds each).

My sites.xml is:

<config>
<pool handle="teraport" >
   <profile namespace="globus" key="queue">fast</profile>
   <profile namespace="globus" key="maxwalltime">00:05:00</profile>
   <execution provider="coaster" url="none" jobmanager="local:pbs" />
   <filesystem provider="coaster" url="local://localhost" />
   <workdirectory>/home/wilde/swiftwork</workdirectory>
</pool>
</config>

But PBS qstat -f shows:
     Resource_List.walltime = 00:00:51
(full qstat -f below)

And I get a walltime-exceeded email message from PBS for each pbs 
coaster job submitted.

The coaster log shows: attr=maxwalltime=00:05:00

When I change the sites.xml maxwalltime to "05:00:00" I do indeed get 50 
minutes, and the entire script runs to completion.

So it seems to be placing the walltime request one unit to the right of 
where it should.

- Mike




Job Id: 848667.tp-mgt.ci.uchicago.edu
     Job_Name = null
     Job_Owner = wilde at tp-login2.ci.uchicago.edu
     job_state = R
     queue = fast
     server = tp-mgt.ci.uchicago.edu
     Checkpoint = u
     ctime = Tue Jan 27 12:22:21 2009
     Error_Path = 
tp-login2.ci.uchicago.edu:/home/wilde/.globus/scripts/pbs2062
         7.qsub.stderr
     exec_host = tp-c118/0
     Hold_Types = n
     Join_Path = n
     Keep_Files = n
     Mail_Points = n
     mtime = Tue Jan 27 12:22:24 2009
     Output_Path = 
tp-login2.ci.uchicago.edu:/home/wilde/.globus/scripts/pbs206
         27.qsub.stdout
     Priority = 0
     qtime = Tue Jan 27 12:22:21 2009
     Rerunable = True
     Resource_List.nodect = 1
     Resource_List.nodes = 1:rhel4-compute
     Resource_List.walltime = 00:00:51
     session_id = 32414
     Shell_Path_List = /bin/sh
     Variable_List = PBS_O_HOME=/home/wilde,PBS_O_LANG=en_US.UTF-8,
         PBS_O_LOGNAME=wilde,
 
PBS_O_PATH=/autonfs/home/wilde/tutorials/osgedu/build/docbook-xsl/too
 
ls/bin:/home/wilde/bin:/soft/java-1.5.0_06-sun-r1/bin:/soft/java-1.5.0
 
_06-sun-r1/jre/bin:/soft/apache-ant-1.7.1-r1/bin:/software/common/gx-m
 
ap-0.5.3.3-r1/bin:/soft/condor-6.8.1-r1/bin:/soft/apache-ant-1.6.5-r1/
 
bin:/software/common/cert-scripts-2-5.rev44-r1/bin:/soft/globus-4.0.3-
 
r1/bin:/soft/globus-4.0.3-r1/sbin:/usr/kerberos/bin:/bin:/usr/bin:/usr
 
/X11R6/bin:/usr/local/bin:/software/common/softenv-1.6.0-r1/bin:/home/
 
wilde/bin/linux-rhel4-x86_64:/home/wilde/bin:/soft/R-2.4.0-r1/bin:/sof
 
t/R-2.4.0-r1/lib/R/bin:/soft/torque-2.3.3-r1/bin:/soft/maui-3.2.6p19-g
 
cc-r1/bin:/soft/maui-3.2.6p19-gcc-r1/sbin:/soft/matlab-7.5-r1/bin:/sof
 
t/xcat-1.2.0-r1/bin:/soft/xcat-1.2.0-r1/sbin:/soft/xcat-1.2.0-r1/x86_6
 
4/bin:/soft/xcat-1.2.0-r1/x86_64/sbin:/soft/xcat-1.2.0-r1/contrib/bin:
 
/soft/xcat-1.2.0-r1/contrib/sbin:/soft/xcat-1.2.0-r1/contrib/x86_64/bi
 
n:/soft/xcat-1.2.0-r1/contrib/x86_64/sbin:/home/wilde/swift/tools:/hom
         e/wilde/swift/rev/latest/bin:/home/wilde/blast/ncbi/bin,
         PBS_O_MAIL=/var/spool/mail/wilde,PBS_O_SHELL=/bin/bash,
         PBS_SERVER=tp-login2.ci.uchicago.edu,
         PBS_O_HOST=tp-login2.ci.uchicago.edu,PBS_O_WORKDIR=/home/wilde,
         PBS_O_QUEUE=fast
     etime = Tue Jan 27 12:22:21 2009
     submit_args = /home/wilde/.globus/scripts/pbs20627.qsub
     start_time = Tue Jan 27 12:22:23 2009
     start_count = 1

tp$


On 1/25/09 9:34 AM, Ben Clifford wrote:
> On Sun, 25 Jan 2009, Allan Espinosa wrote:
> 
>> Oh right. The coaster service on the site runs over fork and the
>> workers over LRM right? so just one queue is needed to be specified.
> 
> yes.
> 



More information about the Swift-devel mailing list