[Swift-devel] active jobs vs available processors on submitted coaster queues

Allan Espinosa aespinosa at cs.uchicago.edu
Wed Jun 10 16:42:35 CDT 2009


Here's run on 1k jobs:  only 2 jobs were active .  the 18 procs here
in the LRM i think is the 2nd block request:

[aespinosa at tg-login1 ~]$ showq -u $USER

active jobs------------------------
JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME

2016757            aespinos    Running    18    00:15:09  Wed Jun 10 16:29:31

1 active job             18 of 114 processors in use by local jobs (15.79%)
                          50 of 57 nodes active      (87.72%)

eligible jobs----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME


0 eligible jobs

blocked jobs-----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME


swift session:
Swift svn swift-r2949 cog-r2406

RunID: out.run_000
Progress:
Progress:  uninitialized:1
Progress:  Initializing:1000  Selecting site:1
Progress:  Selecting site:1000  Initializing site shared directory:1
Progress:  Selecting site:999  Initializing site shared directory:1  Stage in:1
Progress:  Selecting site:996  Stage in:5
Progress:  Selecting site:996  Stage in:5
Progress:  Selecting site:995  Stage in:6
Progress:  Selecting site:994  Stage in:7
Progress:  Selecting site:994  Stage in:7
Progress:  Selecting site:993  Stage in:8
Progress:  Selecting site:993  Stage in:8
Progress:  Selecting site:993  Stage in:8
Progress:  Selecting site:993  Stage in:8
Progress:  Selecting site:992  Stage in:9
Progress:  Selecting site:992  Stage in:9
Progress:  Selecting site:992  Stage in:8  Submitting:1
Progress:  Selecting site:991  Stage in:1  Submitting:8  Submitted:1
Progress:  Selecting site:991  Submitted:9  Active:1
Progress:  Selecting site:991  Submitted:9  Active:1
Progress:  Selecting site:991  Submitted:8  Active:2
Progress:  Selecting site:991  Submitted:1  Active:2  Checking
status:6 Failed but can retry:1
Progress:  Selecting site:991  Active:1  Checking status:4 Failed but
can retry:5
Progress:  Selecting site:990  Stage in:1  Active:1 Failed but can retry:9
Progress:  Selecting site:990  Active:1  Checking status:1 Failed but
can retry:9
Progress:  Selecting site:989  Submitting:1  Active:1 Failed but can retry:10
Progress:  Selecting site:989  Active:1  Checking status:1 Failed but
can retry:10
Progress:  Selecting site:988  Submitting:1  Active:1 Failed but can retry:11
Progress:  Selecting site:988  Active:1  Checking status:1 Failed but
can retry:11
Progress:  Selecting site:987  Submitting:1  Active:1 Failed but can retry:12
Progress:  Selecting site:987  Active:1  Checking status:1 Failed but
can retry:12
Progress:  Selecting site:986  Stage in:1  Active:1 Failed but can retry:13
Progress:  Selecting site:986  Active:1  Checking status:1 Failed but
can retry:13
Progress:  Selecting site:985  Stage in:1  Active:1 Failed but can retry:14
Progress:  Selecting site:985  Active:1  Checking status:1 Failed but
can retry:14
Progress:  Selecting site:984  Stage in:1  Active:1 Failed but can retry:15
Progress:  Selecting site:984  Active:1  Checking status:1 Failed but
can retry:15
Progress:  Selecting site:983  Stage in:1  Active:1 Failed but can retry:16
Progress:  Selecting site:983  Active:2 Failed but can retry:16
Progress:  Selecting site:983  Active:2 Failed but can retry:16
Progress:  Selecting site:983  Active:1  Checking status:1 Failed but
can retry:16
Progress:  Selecting site:982  Stage in:1  Active:1  Finished
successfully:1 Failed but can retry:16
Progress:  Selecting site:982  Active:1  Checking status:1  Finished
successfully:1 Failed but can retry:16
Progress:  Selecting site:981  Submitting:1  Active:1  Finished
successfully:1 Failed but can retry:17
Progress:  Selecting site:981  Active:1  Finished successfully:1
Failed but can retry:18
Progress:  Selecting site:980  Submitting:1  Active:1  Finished
successfully:1 Failed but can retry:18
Progress:  Selecting site:980  Active:1  Checking status:1  Finished
successfully:1 Failed but can retry:18
Progress:  Selecting site:979  Stage in:1  Active:1  Finished
successfully:1 Failed but can retry:19
Progress:  Selecting site:979  Active:1  Checking status:1  Finished
successfully:1 Failed but can retry:19
Progress:  Selecting site:979  Active:1  Finished successfully:1
Failed but can retry:20
Progress:  Selecting site:978  Stage in:1  Active:1  Finished
successfully:1 Failed but can retry:20
Progress:  Selecting site:978  Active:1  Checking status:1  Finished
successfully:1 Failed but can retry:20
Progress:  Selecting site:977  Stage in:1  Active:1  Finished
successfully:1 Failed but can retry:21
Progress:  Selecting site:977  Active:1  Checking status:1  Finished
successfully:1 Failed but can retry:21
Progress:  Selecting site:976  Stage in:1  Active:1  Finished
successfully:1 Failed but can retry:22
Progress:  Selecting site:976  Submitted:1  Active:1  Finished
successfully:1 Failed but can retry:22
Progress:  Selecting site:976  Submitted:1  Active:1  Finished
successfully:1 Failed but can retry:22
Progress:  Selecting site:976  Submitted:1  Active:1  Finished
successfully:1 Failed but can retry:22
Progress:  Selecting site:976  Submitted:1  Active:1  Finished
successfully:1 Failed but can retry:22
Progress:  Selecting site:976  Submitted:1  Finished successfully:1
Failed but can retry:23
Progress:  Selecting site:975  Stage in:1  Submitted:1  Finished
successfully:1 Failed but can retry:23
Progress:  Selecting site:975  Submitted:2  Finished successfully:1
Failed but can retry:23
Progress:  Selecting site:975  Submitted:2  Finished successfully:1
Failed but can retry:23
Progress:  Selecting site:975  Submitted:2  Finished successfully:1
Failed but can retry:23
Progress:  Selecting site:975  Submitted:2  Finished successfully:1
Failed but can retry:23
Progress:  Selecting site:975  Submitted:2  Finished successfully:1
Failed but can retry:23
Progress:  Selecting site:975  Submitted:2  Finished successfully:1
Failed but can retry:23
qProgress:  Selecting site:975  Submitted:2  Finished successfully:1
Failed but can retry:23
Progress:  Selecting site:975  Submitted:2  Finished successfully:1
Failed but can retry:23




2009/6/10 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> hi mihael,
>
> I reran the job and attached the log files (coaster log, swift-log, gram logs).
>
> swift session:
> rogress:  Submitted:1  Active:1  Finished successfully:4
> Progress:  Submitted:1  Active:1  Finished successfully:4
> Progress:  Submitted:1  Active:1  Finished successfully:4
> Progress:  Submitted:1  Checking status:1  Finished successfully:4
> Progress:  Active:1  Finished successfully:5
> Progress:  Active:1  Finished successfully:5
> Progress:  Active:1  Finished successfully:5
> Progress:  Active:1  Finished successfully:5
> Progress:  Active:1  Finished successfully:5
> Progress:  Active:1  Finished successfully:5
> Progress:  Active:1  Finished successfully:5
> Progress:  Active:1  Finished successfully:5
> Progress:  Checking status:1  Finished successfully:5
> Progress:  Stage out:1  Finished successfully:5
> Progress:  Submitted:1  Finished successfully:6
> Progress:  Submitted:1  Finished successfully:6
> Progress:  Submitted:1  Finished successfully:6
> Progress:  Submitted:1  Finished successfully:6
> Progress:  Submitted:1  Finished successfully:6
> ...
>
> sites.xml (i may have changed it during this run):
> <config>
>        <pool handle="UCANL" sysinfo="INTEL32::LINUX">
>                <execution provider="coaster"
> url="tg-grid.uc.teragrid.org"  jobmanager="gt2:gt2:pbs" />
>                <gridftp url="gsiftp://tg-gridftp.uc.teragrid.org" />
>                <workdirectory >/home/aespinosa/blast-runs</workdirectory>
>
>                <profile namespace="karajan" key="initialScore">1</profile>
>                <profile namespace="karajan" key="jobThrottle">1.26</profile>
>
>                <profile namespace="globus"
> key="host_types">ia64-compute</profile>
>                <profile namespace="globus" key="slots">4</profile>
>                <profile namespace="globus" key="maxnodes">2</profile>
>        </pool>
> </config>
>
> it looks like the last job was submitted but have not yet registered
> with the gram service in the ucanl remote site.  at this point the
> coaster for the previous 5 jobs already ended.
> -Allan
>
> 2009/6/10 Mihael Hategan <hategan at mcs.anl.gov>:
>> I need to look at the coaster log.
>>
>> On Tue, 2009-06-09 at 15:10 -0500, Allan Espinosa wrote:
>>> I was expecting to have 2 active jobs at a time from the swift log but
>>> instead got only one at a time:
>>> Swift svn swift-r2949 cog-r2406
>>>
>>> RunID: out.run_000
>>> Progress:
>>> Progress:  Selecting site:4  Initializing site shared directory:1  Stage in:1
>>> Progress:  Stage in:6
>>> Progress:  Stage in:6
>>>
>>>
>>>
>>> Progress:  Stage in:6
>>> Progress:  Stage in:6
>>> Progress:  Stage in:6
>>> Progress:  Stage in:6
>>> Progress:  Stage in:5  Submitting:1
>>> Progress:  Submitting:5  Submitted:1
>>> Progress:  Submitted:6
>>> Progress:  Submitted:5  Active:1
>>> Progress:  Submitted:5  Active:1
>>> Progress:  Submitted:5  Active:1
>>> Progress:  Submitted:5  Active:1
>>> Progress:  Submitted:5  Active:1
>>> Progress:  Submitted:5  Checking status:1
>>> Progress:  Submitted:4  Active:1  Finished successfully:1
>>> Progress:  Submitted:4  Active:1  Finished successfully:1
>>> Progress:  Submitted:4  Active:1  Finished successfully:1
>>> Progress:  Submitted:4  Active:1  Finished successfully:1
>>> Progress:  Submitted:4  Active:1  Finished successfully:1
>>> Progress:  Submitted:4  Active:1  Finished successfully:1
>>> Progress:  Submitted:4  Checking status:1  Finished successfully:1
>>> Progress:  Submitted:3  Active:1  Finished successfully:2
>>> Progress:  Submitted:3  Active:1  Finished successfully:2
>>> Progress:  Submitted:3  Active:1  Finished successfully:2
>>> Progress:  Submitted:3  Checking status:1  Finished successfully:2
>>> Progress:  Submitted:2  Active:1  Finished successfully:3
>>> ...
>>> ...
>>>
>>>
>>> uc-teragrid queue status: $showq -u $USER
>>> [aespinosa at tg-login1 ~]$ showq -u $USER
>>>
>>> active jobs------------------------
>>> JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME
>>>
>>> 2015982            aespinos    Running     2    00:55:41  Tue Jun  9 15:02:18
>>>
>>> 1 active job              2 of 116 processors in use by local jobs (1.72%)
>>>                           42 of 58 nodes active      (72.41%)
>>>
>>> eligible jobs----------------------
>>> JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME
>>>
>>>
>>> 0 eligible jobs
>>>
>>> blocked jobs-----------------------
>>> JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME
>>>
>>>
>>> 0 blocked jobs
>>>
>>> Total job:  1
>>>
>>>
>>> sites.xml:
>>> <config>
>>>         <pool handle="UCANL" sysinfo="INTEL32::LINUX">
>>>                 <execution provider="coaster"
>>> url="tg-grid.uc.teragrid.org"  jobmanager="gt2:gt2:pbs" />
>>>                 <gridftp url="gsiftp://tg-gridftp.uc.teragrid.org" />
>>>                 <workdirectory >/home/aespinosa/blast-runs</workdirectory>
>>>
>>>                 <profile namespace="karajan" key="initialScore">5</profile>
>>>                 <profile namespace="karajan" key="jobThrottle">1.26</profile>
>>>
>>>                 <profile namespace="globus"
>>> key="host_types">ia64-compute</profile>
>>>                 <profile namespace="globus" key="slots">4</profile>
>>>                 <profile namespace="globus" key="maxnodes">16</profile>
>>>         </pool>
>>> </config>
>



-- 
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tarball.tar.gz
Type: application/x-gzip
Size: 182455 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20090610/40c88233/attachment.bin>


More information about the Swift-devel mailing list