[Swift-devel] persistent coaster service

wilde at mcs.anl.gov wilde at mcs.anl.gov
Thu Aug 26 16:00:21 CDT 2010


I have 2 questions on the persistent coaster service:

1) is -nosec working right?  I get this when I specify it:

bri$ coaster-service -port 55123 -nosec
Error loading credential: [JGLOBUS-5] Proxy file (/tmp/x509up_u1031) not found.
Error loading credential
org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file (/tmp/x509up_u1031) not found.
	at org.globus.gsi.GlobusCredential.<init>(GlobusCredential.java:114)
	at org.globus.gsi.GlobusCredential.reloadDefaultCredential(GlobusCredential.java:590)
	at org.globus.gsi.GlobusCredential.getDefaultCredential(GlobusCredential.java:575)
	at org.globus.cog.abstraction.coaster.service.CoasterPersistentService.main(CoasterPersistentService.java:73)
bri$

I would have expected it to fully eliminate the need for a proxy, no?

2) Following up on Allan's last question, can you clarify:

When you start a persistent coaster service do you have the option of either:

(a) the Swift client starts workers per the sites.xml profile settings or

(b) the user starts the workers manually, connecting to the persisten server, when you specify workerManager passive in the Globus profile tag:

  <profile namespace="globus"> key="workerManager">passive</profile>

Thanks,

- Mike

----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:

> On Mon, 2010-08-09 at 18:14 -0500, Allan Espinosa wrote:
> > So the url should be "bridled.ci.uchicago.edu"  since I run the
> service
> > there. But this same field is also used for spawning the workers
> unless it
> > specifies "manual coasters" right?
> 
> Right.
> 
> > 
> > -Allan
> > 
> > On Mon, Aug 09, 2010 at 06:07:40PM -0500, Mihael Hategan wrote:
> > > On Mon, 2010-08-09 at 18:03 -0500, Allan Espinosa wrote:
> > > > Ah. so the persistent coaster service is meant to run with the
> manual workers?
> > > 
> > > No. It's like, say, GRAM, in that you need to start a service on
> some
> > > head node, and you need to supply the URL of that head node in
> > > sites.xml.
> > > 
> > > It won't start the service automatically.
> > > 
> > > > 
> > > > -Allan
> > > > 
> > > > On Mon, Aug 09, 2010 at 05:36:59PM -0500, Mihael Hategan wrote:
> > > > > ff-grid2.unl.edu is the url you are supplying in sites.xml.
> It's
> > > > > connecting to that. Though I'm surprised it works given that
> you are
> > > > > implying that there is no service running there.
> > > > > 
> > > > > On Mon, 2010-08-09 at 17:09 -0500, Allan Espinosa wrote:
> > > > > > I tried it today on OSG.  The coaster service was run on
> bridled.ci .  But from
> > > > > > the session below, it looks like its connecting to the site
> headnode instead:
> > > > > > 
> > > > > > RunID: coaster
> > > > > > Progress:
> > > > > > Progress:  uninitialized:1  Selecting site:675  Initializing
> site shared
> > > > > > directory:1
> > > > > > Progress:  Initializing:2  Selecting site:1444  Initializing
> site shared
> > > > > > directory:1
> > > > > > Progress:  uninitialized:1  Selecting site:2499 
> Initializing site shared
> > > > > > directory:1
> > > > > > Progress:  uninitialized:1  Selecting site:3818 
> Initializing site shared
> > > > > > directory:1
> > > > > > Progress:  uninitialized:1  Initializing:1  Selecting
> site:4201  Initializing
> > > > > > site shared directory:1
> > > > > > Progress:  Initializing:1  Selecting site:3  Stage in:4202
> > > > > > Progress:  uninitialized:1  Initializing:1  Selecting site:5
>  Submitting:4202
> > > > > > Progress:  Initializing:1  Selecting site:6  Stage in:2 
> Submitting:4202
> > > > > > Find: https://ff-grid2.unl.edu:1984
> > > > > > Find:  keepalive(120), reconnect -
> https://ff-grid2.unl.edu:1984
> > > > > > Progress:  Initializing:2  Selecting site:6  Stage in:144 
> Submitting:4303
> > > > > > Failed but can retry:16
> > > > > > Progress:  Initializing:2  Selecting site:31  Stage in:80 
> Submitting:4945
> > > > > > Failed but can retry:54
> > > > > > Progress:  Initializing:1  Selecting site:6  Stage in:2 
> Submitting:5222 Failed
> > > > > > but can retry:68
> > > > > > Progress:  Initializing:1  Selecting site:6  Stage in:1 
> Submitting:5686
> > > > > > Submitted:1 Failed but can retry:95
> > > > > > ...
> > > > > > ...
> > > > > > 
> > > > > > Corresponding log entry (IMO):
> > > > > > 2010-08-09 17:01:31,690-0500 WARN  RemoteConfiguration
> Find:
> > > > > > https://ff-grid2.unl.edu:1984
> > > > > > 2010-08-09 17:01:31,690-0500 WARN  RemoteConfiguration Find:
>  keepalive(120),
> > > > > > reconnect - https://ff-grid2.unl.edu:1984
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > sites.xml
> > > > > >   <pool handle="Firefly">
> > > > > >     <execution provider="coaster-persistent"
> url="ff-grid2.unl.edu"
> > > > > >       jobmanager="gt2:gt2:pbs" />
> > > > > > 
> > > > > >     <profile namespace="globus"
> key="maxTime">86400</profile>
> > > > > >     <profile namespace="globus"
> key="maxNodes">1290</profile>
> > > > > >     <profile namespace="globus" key="spread">0.8</profile>
> > > > > >     <profile namespace="globus" key="slots">10</profile>
> > > > > >     <profile namespace="globus"
> key="lowOverallocation">20</profile>
> > > > > >     <profile namespace="globus"
> key="remoteMonitorEnabled">true</profile>
> > > > > > 
> > > > > >     <profile namespace="karajan"
> key="initialScore">1500.0</profile>
> > > > > >     <profile namespace="karajan"
> key="jobThrottle">51.54</profile>
> > > > > > 
> > > > > >     <gridftp  url="gsiftp://ff-grid3.unl.edu"/>
> > > > > >    
> <workdirectory>/panfs/panasas/CMS/data/engage-scec/swift_scratch</workdirectory>
> > > > > >   </pool>
> > > > > > 
> > > > > > 
> > > > > > -Allan
> > > > > > 
> > > > > > On Thu, Aug 05, 2010 at 10:34:34PM -0500, Mihael Hategan
> wrote:
> > > > > > 
> > > > > > > ... was added in cog r2834.
> > > > > > > 
> > > > > > > Despite having run a few jobs with it, I don't feel very
> confident about
> > > > > > > it. So please test.
> > > > > > > 
> > > > > > > Start with bin/coaster-service and use
> "coaster-persistent" as provider
> > > > > > > in sites.xml. Everything else would be the same as in the
> "coaster"
> > > > > > > case.
> > > > > > > 
> > > > > > > Mihael
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list