[Swift-devel] more active processes than requested cores
Allan Espinosa
aespinosa at cs.uchicago.edu
Wed Jun 17 14:08:50 CDT 2009
I also get this after a while.
Attached are the logs when the workflow finished. Actually it did not
finish because the coaster got an out of memory error. This does not
happen if coasters were not used.
2009/6/17 Zhao Zhang <zhaozhang at uchicago.edu>:
> Hi, All
>
> Here is something in my test case:
>
> Swift says:
> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> previous run:487 Finished successfully:295
> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> previous run:487 Finished successfully:295
> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> previous run:487 Finished successfully:295
> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> previous run:487 Finished successfully:295
> Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> previous run:487 Finished successfully:295
>
> And showq -u says
> login3% showq -u
> ACTIVE JOBS--------------------------
> JOBID JOBNAME USERNAME STATE CORE REMAINING STARTTIME
> ================================================================================
>
> 0 active jobs : 0 of 3828 hosts ( 0.00 %)
>
> Why there are no active SGE jobs, but swift says there are 115 active jobs?
>
> zhao
>
> Mihael Hategan wrote:
>>
>> On Tue, 2009-06-16 at 17:30 -0500, Allan Espinosa wrote:
>>
>>>
>>> By the throttling parameters below, i do expect to have a thousand
>>> jobs active at a time. But shouldn't the coaster request larger
>>> blocks to accommodate the 277 active jobs?
>>>
>>
>> Not if they fit in existing blocks (either vertically or horizontally).
>> This is something that should be thought of some more, but for short
>> jobs it seems ok.
>>
>>
>>>
>>> sge snapshot:
>>> ACTIVE JOBS--------------------------
>>> JOBID JOBNAME USERNAME STATE CORE REMAINING STARTTIME
>>>
>>> ================================================================================
>>> 779616 data tg802895 Running 16 00:36:01 Tue Jun 16
>>> 15:59:41
>>> 779723 data tg802895 Running 16 01:44:01 Tue Jun 16
>>> 17:07:41
>>> 779724 data tg802895 Running 16 01:44:01 Tue Jun 16
>>> 17:07:41
>>> 779727 data tg802895 Running 16 01:45:58 Tue Jun 16
>>> 17:09:38
>>>
>>>
>>> swift session snipper
>>> Progress: Selecting site:38 Submitted:707 Active:278 Finished
>>> successfully:1861
>>> Progress: Selecting site:38 Submitted:707 Active:277 Checking
>>> status:1 Finished successfully:1861
>>>
>>>
>>> sites.xml
>>> <config>
>>> <pool handle="RANGER" >
>>> <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org" />
>>> <execution provider="coaster"
>>> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
>>> <profile namespace="globus" key="project">TG-CCR080022N</profile>
>>> <workdirectory >/work/01035/tg802895/blast-runs</workdirectory>
>>> <profile namespace="globus" key="workersPerNode">16</profile>
>>> <profile namespace="globus" key="queue">development</profile>
>>> <profile namespace="globus" key="slots">4</profile>
>>> <profile namespace="globus" key="maxwalltime">00:30:00</profile>
>>> <profile namespace="globus" key="nodeGranularity">2</profile>
>>> <profile namespace="karajan" key="initialScore">2</profile>
>>> <profile namespace="karajan" key="jobThrottle">10</profile>
>>> </pool>
>>> </config>
>>>
>>> i'll send the swift and coaster logs once the run finishes.
>>>
>>> -Allan
>>>
--
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bug05.tar.gz
Type: application/x-gzip
Size: 5444754 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20090617/539b3a5d/attachment.bin>
More information about the Swift-devel
mailing list