[Swift-devel] more active processes than requested cores

Allan Espinosa aespinosa at cs.uchicago.edu
Wed Jun 17 14:08:50 CDT 2009


I also get this after a while.

Attached are the logs when the workflow finished.  Actually it did not
finish because the coaster got an out of memory error.  This does not
happen if coasters were not used.

2009/6/17 Zhao Zhang <zhaozhang at uchicago.edu>:
> Hi, All
>
> Here is something in my test case:
>
> Swift says:
> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> previous run:487  Finished successfully:295
> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> previous run:487  Finished successfully:295
> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> previous run:487  Finished successfully:295
> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> previous run:487  Finished successfully:295
> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> previous run:487  Finished successfully:295
>
> And showq -u says
> login3% showq -u
> ACTIVE JOBS--------------------------
> JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
> ================================================================================
>
>    0 active jobs :    0 of 3828 hosts (  0.00 %)
>
> Why there are no active SGE jobs, but swift says there are 115 active jobs?
>
> zhao
>
> Mihael Hategan wrote:
>>
>> On Tue, 2009-06-16 at 17:30 -0500, Allan Espinosa wrote:
>>
>>>
>>> By the throttling parameters below, i do expect to have a thousand
>>> jobs active at a time.  But shouldn't the coaster request larger
>>> blocks to accommodate the 277 active jobs?
>>>
>>
>> Not if they fit in existing blocks (either vertically or horizontally).
>> This is something that should be thought of some more, but for short
>> jobs it seems ok.
>>
>>
>>>
>>> sge snapshot:
>>> ACTIVE JOBS--------------------------
>>> JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
>>>
>>> ================================================================================
>>> 779616    data       tg802895      Running 16     00:36:01  Tue Jun 16
>>> 15:59:41
>>> 779723    data       tg802895      Running 16     01:44:01  Tue Jun 16
>>> 17:07:41
>>> 779724    data       tg802895      Running 16     01:44:01  Tue Jun 16
>>> 17:07:41
>>> 779727    data       tg802895      Running 16     01:45:58  Tue Jun 16
>>> 17:09:38
>>>
>>>
>>> swift session snipper
>>> Progress:  Selecting site:38  Submitted:707  Active:278  Finished
>>> successfully:1861
>>> Progress:  Selecting site:38  Submitted:707  Active:277  Checking
>>> status:1  Finished successfully:1861
>>>
>>>
>>> sites.xml
>>> <config>
>>>  <pool handle="RANGER" >
>>>    <gridftp  url="gsiftp://gridftp.ranger.tacc.teragrid.org" />
>>>    <execution  provider="coaster"
>>> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
>>>    <profile namespace="globus" key="project">TG-CCR080022N</profile>
>>>    <workdirectory >/work/01035/tg802895/blast-runs</workdirectory>
>>>    <profile namespace="globus" key="workersPerNode">16</profile>
>>>    <profile namespace="globus" key="queue">development</profile>
>>>    <profile namespace="globus" key="slots">4</profile>
>>>    <profile namespace="globus" key="maxwalltime">00:30:00</profile>
>>>    <profile namespace="globus" key="nodeGranularity">2</profile>
>>>    <profile namespace="karajan" key="initialScore">2</profile>
>>>    <profile namespace="karajan" key="jobThrottle">10</profile>
>>>  </pool>
>>> </config>
>>>
>>> i'll send the swift and coaster logs once the run finishes.
>>>
>>> -Allan
>>>

-- 
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bug05.tar.gz
Type: application/x-gzip
Size: 5444754 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20090617/539b3a5d/attachment.bin>


More information about the Swift-devel mailing list