[Swift-devel] persistent coaster service

Mihael Hategan hategan at mcs.anl.gov
Mon Aug 9 17:36:59 CDT 2010


ff-grid2.unl.edu is the url you are supplying in sites.xml. It's
connecting to that. Though I'm surprised it works given that you are
implying that there is no service running there.

On Mon, 2010-08-09 at 17:09 -0500, Allan Espinosa wrote:
> I tried it today on OSG.  The coaster service was run on bridled.ci .  But from
> the session below, it looks like its connecting to the site headnode instead:
> 
> RunID: coaster
> Progress:
> Progress:  uninitialized:1  Selecting site:675  Initializing site shared
> directory:1
> Progress:  Initializing:2  Selecting site:1444  Initializing site shared
> directory:1
> Progress:  uninitialized:1  Selecting site:2499  Initializing site shared
> directory:1
> Progress:  uninitialized:1  Selecting site:3818  Initializing site shared
> directory:1
> Progress:  uninitialized:1  Initializing:1  Selecting site:4201  Initializing
> site shared directory:1
> Progress:  Initializing:1  Selecting site:3  Stage in:4202
> Progress:  uninitialized:1  Initializing:1  Selecting site:5  Submitting:4202
> Progress:  Initializing:1  Selecting site:6  Stage in:2  Submitting:4202
> Find: https://ff-grid2.unl.edu:1984
> Find:  keepalive(120), reconnect - https://ff-grid2.unl.edu:1984
> Progress:  Initializing:2  Selecting site:6  Stage in:144  Submitting:4303
> Failed but can retry:16
> Progress:  Initializing:2  Selecting site:31  Stage in:80  Submitting:4945
> Failed but can retry:54
> Progress:  Initializing:1  Selecting site:6  Stage in:2  Submitting:5222 Failed
> but can retry:68
> Progress:  Initializing:1  Selecting site:6  Stage in:1  Submitting:5686
> Submitted:1 Failed but can retry:95
> ...
> ...
> 
> Corresponding log entry (IMO):
> 2010-08-09 17:01:31,690-0500 WARN  RemoteConfiguration Find:
> https://ff-grid2.unl.edu:1984
> 2010-08-09 17:01:31,690-0500 WARN  RemoteConfiguration Find:  keepalive(120),
> reconnect - https://ff-grid2.unl.edu:1984
> 
> 
> 
> sites.xml
>   <pool handle="Firefly">
>     <execution provider="coaster-persistent" url="ff-grid2.unl.edu"
>       jobmanager="gt2:gt2:pbs" />
> 
>     <profile namespace="globus" key="maxTime">86400</profile>
>     <profile namespace="globus" key="maxNodes">1290</profile>
>     <profile namespace="globus" key="spread">0.8</profile>
>     <profile namespace="globus" key="slots">10</profile>
>     <profile namespace="globus" key="lowOverallocation">20</profile>
>     <profile namespace="globus" key="remoteMonitorEnabled">true</profile>
> 
>     <profile namespace="karajan" key="initialScore">1500.0</profile>
>     <profile namespace="karajan" key="jobThrottle">51.54</profile>
> 
>     <gridftp  url="gsiftp://ff-grid3.unl.edu"/>
>     <workdirectory>/panfs/panasas/CMS/data/engage-scec/swift_scratch</workdirectory>
>   </pool>
> 
> 
> -Allan
> 
> On Thu, Aug 05, 2010 at 10:34:34PM -0500, Mihael Hategan wrote:
> 
> > ... was added in cog r2834.
> > 
> > Despite having run a few jobs with it, I don't feel very confident about
> > it. So please test.
> > 
> > Start with bin/coaster-service and use "coaster-persistent" as provider
> > in sites.xml. Everything else would be the same as in the "coaster"
> > case.
> > 
> > Mihael
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list