[Swift-devel] Re: Using coaster provider with jobmanager ssh:pbs

Arjun Comar mandaya at rose-hulman.edu
Sun Jun 6 22:50:43 CDT 2010


Alright, I've been playing with this for a few hours, but I can't manage to
get any further. The sites.xml file isn't up to date, the one you want to
see is sites-pads-pbs-coasters.xml. So I ran it a couple times, saving logs,
etc. and noticed that in the .globus/coasters/coasters.log file, the jvm was
being started with a -DGLOBUS_HOSTNAME=login.pads.ci.uchicago. So I tried
setting GLOBUS_HOSTNAME to login1.pads.ci.uchicago. But even after that, the
log file still showed the former. And the log shows an exception being
thrown. So my hunch is to figure out how to force GLOBUS_HOSTNAME to get
set. Anyone have any thoughts? Am I barking up the wrong tree?

Arjun

On Sat, Jun 5, 2010 at 9:53 AM, wilde at mcs.anl.gov <wilde at mcs.anl.gov> wrote:

> Looking at your latest logs, in particular coaster.log in your
> ~/.globus/coasters dir, Swift is still unable to create a secure connection
> using GSI: it thinks there is not a valid proxy in /tmp/x509/:
>
> Looking at your sites.xml files, this is because you are telling Swift to
> run at the hostname "login.ci.uchicago.edu" - a load balancing virtual DNS
> host rotors between login1 and login2
>
> I suspect that the coaster service tried to start on login2 while you made
> the proxy on login1, or something similar. Its a good exercise for you to
> examine all the logs involved to confirm or disprove this theory. Look at:
>
> - the detailed swift .log file
> - the $HOME/.globus/coasters/coasters.log file
> - the $HOME/.globus/scripts PBS submit file, stdout/err, and exitcode files
> - your proxy files in the local /tmp dirs of the machines that
> grid-proxy-init was run on
> - ifconfig (note that pads login hosts have multiple networks)
>
> ---
> login1.pads.ci.uchicago.edu
> login1$ ls -lt /tmp/x* | head
> -rw------- 1 arjun   ci-users 2995 Jun  4 22:01 /tmp/x509up_u1857
> ---
>
> I dont have time at the moment to trace this all back for you, but I
> suggest two steps:
>
> 1) specify login1 everywhere you have "login" in sites.xml and
> auth.defaults
>
> 2) look at the logs in your ~/.globus/coasters and /scripts directory,
> perhaps moving the old logs out to a save/ directory each time (save them
> for debugging till you resolve this). You'll be able to tell from host names
> and IP addresses
>
> You may need to set GLOBUS_HOSTNAME, but I am not sure about that (see the
> users guide and swift-user and devel lists for more info on that, then ask
> on the list if still not clear).
>
> If the problem persists after you set everything to use the specific login
> host login1, then be sure to send the the exact error message your are
> getting and the locations of all the log files, as even though the top-level
> error seems the same to you, the logs may indicate that the underlying error
> changes as you correct various aspects of the configuration and security
> context.
>
> - Mike
>
>
>
> login1$ grep login.pads *.xml
> sites.xml:    <filesystem url="login.pads.ci.uchicago.edu"
> provider="ssh"/>
> sites.xml:    <execution url="login.pads.ci.uchicago.edu" provider="ssh"/>
> testsites.xml:   <execution provider="coaster" url="
> login.pads.ci.uchicago.edu" jobmanager="ssh:pbs"/>
> testsites.xml:   <filesystem provider="ssh" url="
> login.pads.ci.uchicago.edu"/>
>
>
>
> ----- "Arjun Comar" <mandaya at rose-hulman.edu> wrote:
>
> > Just realized I only sent this to Mike. I'm resending it to
> > swift-devel.
> >
> >
> > On Fri, Jun 4, 2010 at 10:11 PM, Arjun Comar < mandaya at rose-hulman.edu
> > > wrote:
> >
> >
> > Nope, no luck. Here's grid-proxy-info from both:
> >
> > pads:
> > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar
> > 693820/CN=53942264
> > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> > type : RFC 3820 compliant impersonation proxy
> > strength : 512 bits
> > path : /tmp/x509up_u1857
> > timeleft : 11:52:08
> >
> > bridled:
> > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar
> > 693820/CN=1363223477
> > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> > type : RFC 3820 compliant impersonation proxy
> > strength : 512 bits
> > path : /tmp/x509up_u1857
> > timeleft : 11:57:52
> >
> > Used the same passphrase to get both proxies,and set no options on
> > grid-proxy-init.
> >
> > Arjun
> >
> >
> >
> >
> >
> > On Fri, Jun 4, 2010 at 9:00 PM, wilde at mcs.anl.gov < wilde at mcs.anl.gov
> > > wrote:
> >
> >
> > When you use this configuration for running jobs from a submit host to
> > a PBS cluster using ssh to launch the coaster service on the PBS login
> > host, you need to create a GSI proxy (using grid-proxy-init) on both
> > the client and on the remote login host, using the same certificate.
> >
> > <pool handle="coasterpads">
> > <execution provider="coaster" url=" login1.pads.ci.uchicago.edu "
> > jobmanager="ssh:pbs"/>
> > <profile namespace="globus" key="maxtime">3000</profile>
> > <profile namespace="globus" key="workersPerNode">8</profile>
> > <profile namespace="globus" key="slots">1</profile>
> > <profile namespace="globus" key="nodeGranularity">1</profile>
> > <profile namespace="globus" key="maxNodes">1</profile>
> > <profile namespace="globus" key="queue">fast</profile>
> > <profile namespace="karajan" key="jobThrottle">0.5</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> > <filesystem provider="ssh" url=" login1.pads.ci.uchicago.edu "/>
> > <workdirectory>/home/wilde/swift/lab</workdirectory>
> > </pool>
> >
> > Arjun, this is, I think, what was causing your workflow to fail.
> >
> > I thought, that in the past, we used to get at least a GSI (grid
> > security infrastructure) error in the detailed log file. But I don't
> > see that in this case.
> >
> > Let me know if creating proxies on both sides works for you. Be sure
> > to create it on the right PADS login host.
> >
> > David and Arjun, can you coordinate on integrating this use case into
> > the tutorial (and eventually the Users Guide)? I suggested we do a
> > series of "profiles" (with diagrams) to show the various ways of
> > running Swift locally and remotely, and provide accompanying site file
> > entries. Dennis, when you get started next week and try these cases,
> > we'll want to find a way to do automated tests for them.
> >
> > Thanks,
> >
> > Mike
> >
> > --
> >
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> >
> >
> >
> > --
> > Arjun Comar, Rose-Hulman '12
> >
> >
> >
> > --
> > Arjun Comar, Rose-Hulman '12
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>


-- 
Arjun Comar, Rose-Hulman '12
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20100606/4ef5e1c9/attachment.html>


More information about the Swift-devel mailing list