[Swift-devel] more active processes than requested cores
Allan Espinosa
aespinosa at cs.uchicago.edu
Wed Jun 17 14:22:55 CDT 2009
oops. forgot all the wrapper logs.
this next attachment should have it.
2009/6/17 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> I also get this after a while.
>
> Attached are the logs when the workflow finished. Actually it did not
> finish because the coaster got an out of memory error. This does not
> happen if coasters were not used.
>
> 2009/6/17 Zhao Zhang <zhaozhang at uchicago.edu>:
>> Hi, All
>>
>> Here is something in my test case:
>>
>> Swift says:
>> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
>> previous run:487 Finished successfully:295
>> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
>> previous run:487 Finished successfully:295
>> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
>> previous run:487 Finished successfully:295
>> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
>> previous run:487 Finished successfully:295
>> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
>> previous run:487 Finished successfully:295
>>
>> And showq -u says
>> login3% showq -u
>> ACTIVE JOBS--------------------------
>> JOBID JOBNAME USERNAME STATE CORE REMAINING STARTTIME
>> ================================================================================
>>
>> 0 active jobs : 0 of 3828 hosts ( 0.00 %)
>>
>> Why there are no active SGE jobs, but swift says there are 115 active jobs?
>>
>> zhao
>>
>> Mihael Hategan wrote:
>>>
>>> On Tue, 2009-06-16 at 17:30 -0500, Allan Espinosa wrote:
>>>
>>>>
>>>> By the throttling parameters below, i do expect to have a thousand
>>>> jobs active at a time. But shouldn't the coaster request larger
>>>> blocks to accommodate the 277 active jobs?
>>>>
>>>
>>> Not if they fit in existing blocks (either vertically or horizontally).
>>> This is something that should be thought of some more, but for short
>>> jobs it seems ok.
>>>
>>>
>>>>
>>>> sge snapshot:
>>>> ACTIVE JOBS--------------------------
>>>> JOBID JOBNAME USERNAME STATE CORE REMAINING STARTTIME
>>>>
>>>> ================================================================================
>>>> 779616 data tg802895 Running 16 00:36:01 Tue Jun 16
>>>> 15:59:41
>>>> 779723 data tg802895 Running 16 01:44:01 Tue Jun 16
>>>> 17:07:41
>>>> 779724 data tg802895 Running 16 01:44:01 Tue Jun 16
>>>> 17:07:41
>>>> 779727 data tg802895 Running 16 01:45:58 Tue Jun 16
>>>> 17:09:38
>>>>
>>>>
>>>> swift session snipper
>>>> Progress: Selecting site:38 Submitted:707 Active:278 Finished
>>>> successfully:1861
>>>> Progress: Selecting site:38 Submitted:707 Active:277 Checking
>>>> status:1 Finished successfully:1861
>>>>
>>>>
>>>> sites.xml
>>>> <config>
>>>> <pool handle="RANGER" >
>>>> <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org" />
>>>> <execution provider="coaster"
>>>> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
>>>> <profile namespace="globus" key="project">TG-CCR080022N</profile>
>>>> <workdirectory >/work/01035/tg802895/blast-runs</workdirectory>
>>>> <profile namespace="globus" key="workersPerNode">16</profile>
>>>> <profile namespace="globus" key="queue">development</profile>
>>>> <profile namespace="globus" key="slots">4</profile>
>>>> <profile namespace="globus" key="maxwalltime">00:30:00</profile>
>>>> <profile namespace="globus" key="nodeGranularity">2</profile>
>>>> <profile namespace="karajan" key="initialScore">2</profile>
>>>> <profile namespace="karajan" key="jobThrottle">10</profile>
>>>> </pool>
>>>> </config>
>>>>
>>>> i'll send the swift and coaster logs once the run finishes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: buf04.tar.gz
Type: application/x-gzip
Size: 5654046 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20090617/f74436c3/attachment.bin>
More information about the Swift-devel
mailing list