[Swift-devel] more active processes than requested cores
Mihael Hategan
hategan at mcs.anl.gov
Thu Jun 18 07:14:47 CDT 2009
Ok. This is getting messy, and I need to be able to reproduce it.
I suggest testing with one of the existing workflows, such as
066-many.swift, and if that does not trigger the problem, a custom
version of it with /bin/sleep instead. If that fails too, I'll need
access to your blast installation.
I also need to know if this is an intermittent issue or not, so testing
more than once would be desirable.
On Wed, 2009-06-17 at 14:08 -0500, Allan Espinosa wrote:
> I also get this after a while.
>
> Attached are the logs when the workflow finished. Actually it did not
> finish because the coaster got an out of memory error. This does not
> happen if coasters were not used.
>
> 2009/6/17 Zhao Zhang <zhaozhang at uchicago.edu>:
> > Hi, All
> >
> > Here is something in my test case:
> >
> > Swift says:
> > Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> > previous run:487 Finished successfully:295
> > Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> > previous run:487 Finished successfully:295
> > Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> > previous run:487 Finished successfully:295
> > Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> > previous run:487 Finished successfully:295
> > Progress: Selecting site:80 Submitted:828 Active:115 Finished in
> > previous run:487 Finished successfully:295
> >
> > And showq -u says
> > login3% showq -u
> > ACTIVE JOBS--------------------------
> > JOBID JOBNAME USERNAME STATE CORE REMAINING STARTTIME
> > ================================================================================
> >
> > 0 active jobs : 0 of 3828 hosts ( 0.00 %)
> >
> > Why there are no active SGE jobs, but swift says there are 115 active jobs?
> >
> > zhao
> >
> > Mihael Hategan wrote:
> >>
> >> On Tue, 2009-06-16 at 17:30 -0500, Allan Espinosa wrote:
> >>
> >>>
> >>> By the throttling parameters below, i do expect to have a thousand
> >>> jobs active at a time. But shouldn't the coaster request larger
> >>> blocks to accommodate the 277 active jobs?
> >>>
> >>
> >> Not if they fit in existing blocks (either vertically or horizontally).
> >> This is something that should be thought of some more, but for short
> >> jobs it seems ok.
> >>
> >>
> >>>
> >>> sge snapshot:
> >>> ACTIVE JOBS--------------------------
> >>> JOBID JOBNAME USERNAME STATE CORE REMAINING STARTTIME
> >>>
> >>> ================================================================================
> >>> 779616 data tg802895 Running 16 00:36:01 Tue Jun 16
> >>> 15:59:41
> >>> 779723 data tg802895 Running 16 01:44:01 Tue Jun 16
> >>> 17:07:41
> >>> 779724 data tg802895 Running 16 01:44:01 Tue Jun 16
> >>> 17:07:41
> >>> 779727 data tg802895 Running 16 01:45:58 Tue Jun 16
> >>> 17:09:38
> >>>
> >>>
> >>> swift session snipper
> >>> Progress: Selecting site:38 Submitted:707 Active:278 Finished
> >>> successfully:1861
> >>> Progress: Selecting site:38 Submitted:707 Active:277 Checking
> >>> status:1 Finished successfully:1861
> >>>
> >>>
> >>> sites.xml
> >>> <config>
> >>> <pool handle="RANGER" >
> >>> <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org" />
> >>> <execution provider="coaster"
> >>> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
> >>> <profile namespace="globus" key="project">TG-CCR080022N</profile>
> >>> <workdirectory >/work/01035/tg802895/blast-runs</workdirectory>
> >>> <profile namespace="globus" key="workersPerNode">16</profile>
> >>> <profile namespace="globus" key="queue">development</profile>
> >>> <profile namespace="globus" key="slots">4</profile>
> >>> <profile namespace="globus" key="maxwalltime">00:30:00</profile>
> >>> <profile namespace="globus" key="nodeGranularity">2</profile>
> >>> <profile namespace="karajan" key="initialScore">2</profile>
> >>> <profile namespace="karajan" key="jobThrottle">10</profile>
> >>> </pool>
> >>> </config>
> >>>
> >>> i'll send the swift and coaster logs once the run finishes.
> >>>
> >>> -Allan
> >>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
More information about the Swift-devel
mailing list