[Swift-devel] more active processes than requested cores

Mihael Hategan hategan at mcs.anl.gov
Thu Jun 18 07:14:47 CDT 2009


Ok. This is getting messy, and I need to be able to reproduce it.

I suggest testing with one of the existing workflows, such as
066-many.swift, and if that does not trigger the problem, a custom
version of it with /bin/sleep instead. If that fails too, I'll need
access to your blast installation.

I also need to know if this is an intermittent issue or not, so testing
more than once would be desirable.

On Wed, 2009-06-17 at 14:08 -0500, Allan Espinosa wrote:
> I also get this after a while.
> 
> Attached are the logs when the workflow finished.  Actually it did not
> finish because the coaster got an out of memory error.  This does not
> happen if coasters were not used.
> 
> 2009/6/17 Zhao Zhang <zhaozhang at uchicago.edu>:
> > Hi, All
> >
> > Here is something in my test case:
> >
> > Swift says:
> > Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> > previous run:487  Finished successfully:295
> > Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> > previous run:487  Finished successfully:295
> > Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> > previous run:487  Finished successfully:295
> > Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> > previous run:487  Finished successfully:295
> > Progress:  Selecting site:80  Submitted:828  Active:115  Finished in
> > previous run:487  Finished successfully:295
> >
> > And showq -u says
> > login3% showq -u
> > ACTIVE JOBS--------------------------
> > JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
> > ================================================================================
> >
> >    0 active jobs :    0 of 3828 hosts (  0.00 %)
> >
> > Why there are no active SGE jobs, but swift says there are 115 active jobs?
> >
> > zhao
> >
> > Mihael Hategan wrote:
> >>
> >> On Tue, 2009-06-16 at 17:30 -0500, Allan Espinosa wrote:
> >>
> >>>
> >>> By the throttling parameters below, i do expect to have a thousand
> >>> jobs active at a time.  But shouldn't the coaster request larger
> >>> blocks to accommodate the 277 active jobs?
> >>>
> >>
> >> Not if they fit in existing blocks (either vertically or horizontally).
> >> This is something that should be thought of some more, but for short
> >> jobs it seems ok.
> >>
> >>
> >>>
> >>> sge snapshot:
> >>> ACTIVE JOBS--------------------------
> >>> JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
> >>>
> >>> ================================================================================
> >>> 779616    data       tg802895      Running 16     00:36:01  Tue Jun 16
> >>> 15:59:41
> >>> 779723    data       tg802895      Running 16     01:44:01  Tue Jun 16
> >>> 17:07:41
> >>> 779724    data       tg802895      Running 16     01:44:01  Tue Jun 16
> >>> 17:07:41
> >>> 779727    data       tg802895      Running 16     01:45:58  Tue Jun 16
> >>> 17:09:38
> >>>
> >>>
> >>> swift session snipper
> >>> Progress:  Selecting site:38  Submitted:707  Active:278  Finished
> >>> successfully:1861
> >>> Progress:  Selecting site:38  Submitted:707  Active:277  Checking
> >>> status:1  Finished successfully:1861
> >>>
> >>>
> >>> sites.xml
> >>> <config>
> >>>  <pool handle="RANGER" >
> >>>    <gridftp  url="gsiftp://gridftp.ranger.tacc.teragrid.org" />
> >>>    <execution  provider="coaster"
> >>> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
> >>>    <profile namespace="globus" key="project">TG-CCR080022N</profile>
> >>>    <workdirectory >/work/01035/tg802895/blast-runs</workdirectory>
> >>>    <profile namespace="globus" key="workersPerNode">16</profile>
> >>>    <profile namespace="globus" key="queue">development</profile>
> >>>    <profile namespace="globus" key="slots">4</profile>
> >>>    <profile namespace="globus" key="maxwalltime">00:30:00</profile>
> >>>    <profile namespace="globus" key="nodeGranularity">2</profile>
> >>>    <profile namespace="karajan" key="initialScore">2</profile>
> >>>    <profile namespace="karajan" key="jobThrottle">10</profile>
> >>>  </pool>
> >>> </config>
> >>>
> >>> i'll send the swift and coaster logs once the run finishes.
> >>>
> >>> -Allan
> >>>
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list