[Swift-devel] persistent coaster service

Mihael Hategan hategan at mcs.anl.gov
Mon Aug 9 18:17:11 CDT 2010


On Mon, 2010-08-09 at 18:14 -0500, Allan Espinosa wrote:
> So the url should be "bridled.ci.uchicago.edu"  since I run the service
> there. But this same field is also used for spawning the workers unless it
> specifies "manual coasters" right?

Right.

> 
> -Allan
> 
> On Mon, Aug 09, 2010 at 06:07:40PM -0500, Mihael Hategan wrote:
> > On Mon, 2010-08-09 at 18:03 -0500, Allan Espinosa wrote:
> > > Ah. so the persistent coaster service is meant to run with the manual workers?
> > 
> > No. It's like, say, GRAM, in that you need to start a service on some
> > head node, and you need to supply the URL of that head node in
> > sites.xml.
> > 
> > It won't start the service automatically.
> > 
> > > 
> > > -Allan
> > > 
> > > On Mon, Aug 09, 2010 at 05:36:59PM -0500, Mihael Hategan wrote:
> > > > ff-grid2.unl.edu is the url you are supplying in sites.xml. It's
> > > > connecting to that. Though I'm surprised it works given that you are
> > > > implying that there is no service running there.
> > > > 
> > > > On Mon, 2010-08-09 at 17:09 -0500, Allan Espinosa wrote:
> > > > > I tried it today on OSG.  The coaster service was run on bridled.ci .  But from
> > > > > the session below, it looks like its connecting to the site headnode instead:
> > > > > 
> > > > > RunID: coaster
> > > > > Progress:
> > > > > Progress:  uninitialized:1  Selecting site:675  Initializing site shared
> > > > > directory:1
> > > > > Progress:  Initializing:2  Selecting site:1444  Initializing site shared
> > > > > directory:1
> > > > > Progress:  uninitialized:1  Selecting site:2499  Initializing site shared
> > > > > directory:1
> > > > > Progress:  uninitialized:1  Selecting site:3818  Initializing site shared
> > > > > directory:1
> > > > > Progress:  uninitialized:1  Initializing:1  Selecting site:4201  Initializing
> > > > > site shared directory:1
> > > > > Progress:  Initializing:1  Selecting site:3  Stage in:4202
> > > > > Progress:  uninitialized:1  Initializing:1  Selecting site:5  Submitting:4202
> > > > > Progress:  Initializing:1  Selecting site:6  Stage in:2  Submitting:4202
> > > > > Find: https://ff-grid2.unl.edu:1984
> > > > > Find:  keepalive(120), reconnect - https://ff-grid2.unl.edu:1984
> > > > > Progress:  Initializing:2  Selecting site:6  Stage in:144  Submitting:4303
> > > > > Failed but can retry:16
> > > > > Progress:  Initializing:2  Selecting site:31  Stage in:80  Submitting:4945
> > > > > Failed but can retry:54
> > > > > Progress:  Initializing:1  Selecting site:6  Stage in:2  Submitting:5222 Failed
> > > > > but can retry:68
> > > > > Progress:  Initializing:1  Selecting site:6  Stage in:1  Submitting:5686
> > > > > Submitted:1 Failed but can retry:95
> > > > > ...
> > > > > ...
> > > > > 
> > > > > Corresponding log entry (IMO):
> > > > > 2010-08-09 17:01:31,690-0500 WARN  RemoteConfiguration Find:
> > > > > https://ff-grid2.unl.edu:1984
> > > > > 2010-08-09 17:01:31,690-0500 WARN  RemoteConfiguration Find:  keepalive(120),
> > > > > reconnect - https://ff-grid2.unl.edu:1984
> > > > > 
> > > > > 
> > > > > 
> > > > > sites.xml
> > > > >   <pool handle="Firefly">
> > > > >     <execution provider="coaster-persistent" url="ff-grid2.unl.edu"
> > > > >       jobmanager="gt2:gt2:pbs" />
> > > > > 
> > > > >     <profile namespace="globus" key="maxTime">86400</profile>
> > > > >     <profile namespace="globus" key="maxNodes">1290</profile>
> > > > >     <profile namespace="globus" key="spread">0.8</profile>
> > > > >     <profile namespace="globus" key="slots">10</profile>
> > > > >     <profile namespace="globus" key="lowOverallocation">20</profile>
> > > > >     <profile namespace="globus" key="remoteMonitorEnabled">true</profile>
> > > > > 
> > > > >     <profile namespace="karajan" key="initialScore">1500.0</profile>
> > > > >     <profile namespace="karajan" key="jobThrottle">51.54</profile>
> > > > > 
> > > > >     <gridftp  url="gsiftp://ff-grid3.unl.edu"/>
> > > > >     <workdirectory>/panfs/panasas/CMS/data/engage-scec/swift_scratch</workdirectory>
> > > > >   </pool>
> > > > > 
> > > > > 
> > > > > -Allan
> > > > > 
> > > > > On Thu, Aug 05, 2010 at 10:34:34PM -0500, Mihael Hategan wrote:
> > > > > 
> > > > > > ... was added in cog r2834.
> > > > > > 
> > > > > > Despite having run a few jobs with it, I don't feel very confident about
> > > > > > it. So please test.
> > > > > > 
> > > > > > Start with bin/coaster-service and use "coaster-persistent" as provider
> > > > > > in sites.xml. Everything else would be the same as in the "coaster"
> > > > > > case.
> > > > > > 
> > > > > > Mihael
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list