[Swift-devel] Suggested test for multiple local coaster pools

Michael Wilde wilde at mcs.anl.gov
Wed Jul 10 09:31:37 CDT 2013


Yadu, this bug could use a test (and User Guide clarification): ensure that two pools on the same local cluster can each have unique attributes (most notably, jobsPerNode).

This could use the same techniques we discussed to ensure that jobsPerNode itself is working properly.

- Mike


----- Forwarded Message -----
From: "Michael Wilde" <wilde at mcs.anl.gov>
To: swift-support at ci.uchicago.edu
Sent: Wednesday, July 10, 2013 9:28:35 AM
Subject: Re: [Swift Support #23172] Issue with using multiple pools


This symptom was discussed in Swift bug report 869:

https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=869

As I understand Mihael's resolution to this bug, in order to use multiple pools per host you should add the "url=" option to each pool, like this:

For pool 1:
<execution provider="coaster" jobmanager="local:local" url="localhost:1" />

For pool 2:
<execution provider="coaster" jobmanager="local:local" url="localhost:2" />

Could you try that and let us know if it works or not?

Thanks,

- Mike


----- Original Message -----
> From: "Mike Wilde" <swift-support at ci.uchicago.edu>
> Sent: Wednesday, July 10, 2013 9:21:48 AM
> Subject: Re: [Swift Support #23172] Issue with using multiple pools
> 
> The following addresses are receiving this ticket:
>  pittjj at uchicago.edu swift-support at ci.uchicago.edu
> lpesce at ci.uchicago.edu
> 
> Jason, I think we have seen cases where two pools running with the
> same coaster service can only support one value of jobsPerNode.
> 
> I recall a fix provided by Mihael to force coasters to use one
> service per pool by specifying unique URLs for each pool. (By
> default I think it uses one service per host/site). I'll try to hunt
> that down to see if it explains (and fixes) what you are seeing.
> 
> Mihael, or anyone familiar with this problem, can you clarify?
> 
> Thanks,
> 
> - Mike
> 
> ----- Original Message -----
> > From: "Jason J. Pitt" <swift-support at ci.uchicago.edu>
> > Sent: Wednesday, July 10, 2013 1:09:24 AM
> > Subject: [Swift Support #23172] Issue with using multiple pools
> > 
> > 
> > Wed Jul 10 01:09:23 2013: Request 23172 was acted upon.
> >  Transaction: Ticket created by pittjj at uchicago.edu
> >        Queue: swift-support
> >      Subject: Issue with using multiple pools
> >        Owner: Nobody
> >   Requestors: pittjj at uchicago.edu
> >       Status: new
> >  Ticket <URL:
> >  https://rt.ci.uchicago.edu/Ticket/Display.html?id=23172
> >  >
> > 
> > 
> > Hi Everyone,
> > 
> > I'm having issues with using multiple pools within my Swift script.
> > Without going into the details, as I believed we have discussed why
> > multiple pools are necessary in this forum before, I need to run
> > one
> > of my apps as 1 per node (handle=one) and the remainder can run 8
> > per node (handle=pbs). Right now we are using Beagle for the
> > computations.
> > 
> > The issue is that even if I specify the one pool to run as one per
> > node in the .tc and .xml files, it seems to be running as multiples
> > on the same node (jobs failing at that app with OMM and bad
> > allocation errors). Notably, if I specify the pbs pool (which is
> > usually run at 4-8 per node) to only run as 1 job per node the code
> > runs to completion with no errors. Below I have copied my .xml file
> > and part of my .tc file. Is there anything that I may be missing in
> > order to make the pooling occur properly? Since this is a very
> > brief
> > introduction to the issue Lorenzo and I welcome any questions.
> > 
> > .xml file
> > ------------------------------------------------
> > <config>
> >   <pool handle="pbs">
> >     <execution provider="coaster" jobManager="local:pbs"/>
> >     <!-- replace with your project -->
> >     <profile namespace="globus" key="project">${PROJECT}</profile>
> > 
> >     <profile namespace="globus"
> >     key="providerAttributes">${provider_attributes}</profile>
> >     $queueLine
> > 
> >     <profile namespace="globus"
> >     key="jobsPerNode">${jobsPerNode}</profile>
> >     <profile namespace="globus" key="maxTime">${walltime}</profile>
> >     <profile namespace="globus"
> >     key="maxwalltime">${apptime}</profile>
> >     <profile namespace="globus"
> >     key="lowOverallocation">100</profile>
> >     <profile namespace="globus"
> >     key="highOverallocation">100</profile>
> > 
> >     <profile namespace="globus" key="slots">${numnodes}</profile>
> >     <profile namespace="globus" key="nodeGranularity">1</profile>
> >     <profile namespace="globus" key="maxNodes">1</profile>
> > 
> >     <profile namespace="karajan"
> >     key="jobThrottle">${jobThrottle}</profile>
> >     <profile namespace="karajan" key="initialScore">10000</profile>
> > 
> >     <filesystem provider="local"/>
> >     <workdirectory>${swiftworkdir}</workdirectory>
> >   </pool>
> >   
> >   <pool handle="one">
> >     <execution provider="coaster" jobManager="local:pbs"/>
> >     <!-- replace with your project -->
> >     <profile namespace="globus" key="project">${PROJECT}</profile>
> > 
> >     <profile namespace="globus"
> >     key="providerAttributes">${provider_attributes}</profile>
> >     $queueLine
> > 
> >     <profile namespace="globus" key="jobsPerNode">1</profile>
> >     <profile namespace="globus" key="maxTime">${walltime}</profile>
> >     <profile namespace="globus"
> >     key="maxwalltime">${apptime}</profile>
> >     <profile namespace="globus"
> >     key="lowOverallocation">100</profile>
> >     <profile namespace="globus"
> >     key="highOverallocation">100</profile>
> > 
> >     <profile namespace="globus" key="slots">${numnodes}</profile>
> >     <profile namespace="globus" key="nodeGranularity">1</profile>
> >     <profile namespace="globus" key="maxNodes">1</profile>
> > 
> >     <profile namespace="karajan"
> >     key="jobThrottle">${jobThrottle}</profile>
> >     <profile namespace="karajan" key="initialScore">10000</profile>
> > 
> >     <filesystem provider="local"/>
> >     <workdirectory>${swiftworkdir}</workdirectory>
> >   </pool>
> > </config>
> > --------------------------------------------
> > 
> > 
> > .tc file
> > _______________________________
> > # sitename  transformation path
> > pbs   echo           /bin/echo
> > pbs   cat            /bin/cat
> > pbs   ls             /bin/ls
> > pbs   grep           /bin/grep
> > pbs   sort           /bin/sort
> > pbs   paste          /bin/paste
> > pbs   cp             /bin/cp
> > pbs   touch          /bin/touch
> > pbs   wc             /usr/bin/wc
> > 
> > # custom entries
> > pbs   flagstatWrapper ${flagstatWrapper} INSTALLED  AMD64::LINUX
> >  ENV::TMP="$LUSTRE_TMP";GLOBUS::maxwalltime="5:00:00"
> > pbs   coverageBedWrapper ${coverageBedWrapper} INSTALLED
> >  AMD64::LINUX  ENV::TMP="$LUSTRE_TMP";GLOBUS::maxwalltime="5:00:00"
> > one   mosaikAlnBam2FastqWrapper ${mosaikAlnBam2FastqWrapper}
> > INSTALLED  AMD64::LINUX
> >  ENV::TMP="$LUSTRE_TMP";GLOBUS::maxwalltime="17:40:00"
> > -----------------------------------------------
> > 
> > Also, I'm not sure if it is worth noting, but the test runs were
> > competed using a 20 node reservation on Beagle. numnodes=20 (slots)
> > was given for both pools. I'm not sure if this would have anything
> > to do with the behavior we're seeing.
> > 
> > Thanks for your help!
> > 
> > Best,
> > 
> > Jason
> > 
> 



More information about the Swift-devel mailing list