[Swift-devel] more active processes than requested cores

Allan Espinosa aespinosa at cs.uchicago.edu
Wed Jun 17 14:22:55 CDT 2009


oops. forgot all the wrapper logs.

this next attachment should have it.

2009/6/17 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> I also get this after a while.
>
> Attached are the logs when the workflow finished.  Actually it did not
> finish because the coaster got an out of memory error.  This does not
> happen if coasters were not used.
>
> 2009/6/17 Zhao Zhang <zhaozhang at uchicago.edu>:
>> Hi, All
>>
>> Here is something in my test case:
>>
>> Swift says:
>> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
>> previous run:487  Finished successfully:295
>> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
>> previous run:487  Finished successfully:295
>> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
>> previous run:487  Finished successfully:295
>> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
>> previous run:487  Finished successfully:295
>> Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
>> previous run:487  Finished successfully:295
>>
>> And showq -u says
>> login3% showq -u
>> ACTIVE JOBS--------------------------
>> JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
>> ================================================================================
>>
>>    0 active jobs :    0 of 3828 hosts (  0.00 %)
>>
>> Why there are no active SGE jobs, but swift says there are 115 active jobs?
>>
>> zhao
>>
>> Mihael Hategan wrote:
>>>
>>> On Tue, 2009-06-16 at 17:30 -0500, Allan Espinosa wrote:
>>>
>>>>
>>>> By the throttling parameters below, i do expect to have a thousand
>>>> jobs active at a time.  But shouldn't the coaster request larger
>>>> blocks to accommodate the 277 active jobs?
>>>>
>>>
>>> Not if they fit in existing blocks (either vertically or horizontally).
>>> This is something that should be thought of some more, but for short
>>> jobs it seems ok.
>>>
>>>
>>>>
>>>> sge snapshot:
>>>> ACTIVE JOBS--------------------------
>>>> JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
>>>>
>>>> ================================================================================
>>>> 779616    data       tg802895      Running 16     00:36:01  Tue Jun 16
>>>> 15:59:41
>>>> 779723    data       tg802895      Running 16     01:44:01  Tue Jun 16
>>>> 17:07:41
>>>> 779724    data       tg802895      Running 16     01:44:01  Tue Jun 16
>>>> 17:07:41
>>>> 779727    data       tg802895      Running 16     01:45:58  Tue Jun 16
>>>> 17:09:38
>>>>
>>>>
>>>> swift session snipper
>>>> Progress:  Selecting site:38  Submitted:707  Active:278  Finished
>>>> successfully:1861
>>>> Progress:  Selecting site:38  Submitted:707  Active:277  Checking
>>>> status:1  Finished successfully:1861
>>>>
>>>>
>>>> sites.xml
>>>> <config>
>>>>  <pool handle="RANGER" >
>>>>    <gridftp  url="gsiftp://gridftp.ranger.tacc.teragrid.org" />
>>>>    <execution  provider="coaster"
>>>> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
>>>>    <profile namespace="globus" key="project">TG-CCR080022N</profile>
>>>>    <workdirectory >/work/01035/tg802895/blast-runs</workdirectory>
>>>>    <profile namespace="globus" key="workersPerNode">16</profile>
>>>>    <profile namespace="globus" key="queue">development</profile>
>>>>    <profile namespace="globus" key="slots">4</profile>
>>>>    <profile namespace="globus" key="maxwalltime">00:30:00</profile>
>>>>    <profile namespace="globus" key="nodeGranularity">2</profile>
>>>>    <profile namespace="karajan" key="initialScore">2</profile>
>>>>    <profile namespace="karajan" key="jobThrottle">10</profile>
>>>>  </pool>
>>>> </config>
>>>>
>>>> i'll send the swift and coaster logs once the run finishes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: buf04.tar.gz
Type: application/x-gzip
Size: 5654046 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20090617/f74436c3/attachment.bin>


More information about the Swift-devel mailing list