[Swift-devel] Re: Using coaster provider with jobmanager ssh:pbs
wilde at mcs.anl.gov
wilde at mcs.anl.gov
Sat Jun 5 09:53:12 CDT 2010
Looking at your latest logs, in particular coaster.log in your ~/.globus/coasters dir, Swift is still unable to create a secure connection using GSI: it thinks there is not a valid proxy in /tmp/x509/:
Looking at your sites.xml files, this is because you are telling Swift to run at the hostname "login.ci.uchicago.edu" - a load balancing virtual DNS host rotors between login1 and login2
I suspect that the coaster service tried to start on login2 while you made the proxy on login1, or something similar. Its a good exercise for you to examine all the logs involved to confirm or disprove this theory. Look at:
- the detailed swift .log file
- the $HOME/.globus/coasters/coasters.log file
- the $HOME/.globus/scripts PBS submit file, stdout/err, and exitcode files
- your proxy files in the local /tmp dirs of the machines that grid-proxy-init was run on
- ifconfig (note that pads login hosts have multiple networks)
---
login1.pads.ci.uchicago.edu
login1$ ls -lt /tmp/x* | head
-rw------- 1 arjun ci-users 2995 Jun 4 22:01 /tmp/x509up_u1857
---
I dont have time at the moment to trace this all back for you, but I suggest two steps:
1) specify login1 everywhere you have "login" in sites.xml and auth.defaults
2) look at the logs in your ~/.globus/coasters and /scripts directory, perhaps moving the old logs out to a save/ directory each time (save them for debugging till you resolve this). You'll be able to tell from host names and IP addresses
You may need to set GLOBUS_HOSTNAME, but I am not sure about that (see the users guide and swift-user and devel lists for more info on that, then ask on the list if still not clear).
If the problem persists after you set everything to use the specific login host login1, then be sure to send the the exact error message your are getting and the locations of all the log files, as even though the top-level error seems the same to you, the logs may indicate that the underlying error changes as you correct various aspects of the configuration and security context.
- Mike
login1$ grep login.pads *.xml
sites.xml: <filesystem url="login.pads.ci.uchicago.edu" provider="ssh"/>
sites.xml: <execution url="login.pads.ci.uchicago.edu" provider="ssh"/>
testsites.xml: <execution provider="coaster" url="login.pads.ci.uchicago.edu" jobmanager="ssh:pbs"/>
testsites.xml: <filesystem provider="ssh" url="login.pads.ci.uchicago.edu"/>
----- "Arjun Comar" <mandaya at rose-hulman.edu> wrote:
> Just realized I only sent this to Mike. I'm resending it to
> swift-devel.
>
>
> On Fri, Jun 4, 2010 at 10:11 PM, Arjun Comar < mandaya at rose-hulman.edu
> > wrote:
>
>
> Nope, no luck. Here's grid-proxy-info from both:
>
> pads:
> subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar
> 693820/CN=53942264
> issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> type : RFC 3820 compliant impersonation proxy
> strength : 512 bits
> path : /tmp/x509up_u1857
> timeleft : 11:52:08
>
> bridled:
> subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar
> 693820/CN=1363223477
> issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> type : RFC 3820 compliant impersonation proxy
> strength : 512 bits
> path : /tmp/x509up_u1857
> timeleft : 11:57:52
>
> Used the same passphrase to get both proxies,and set no options on
> grid-proxy-init.
>
> Arjun
>
>
>
>
>
> On Fri, Jun 4, 2010 at 9:00 PM, wilde at mcs.anl.gov < wilde at mcs.anl.gov
> > wrote:
>
>
> When you use this configuration for running jobs from a submit host to
> a PBS cluster using ssh to launch the coaster service on the PBS login
> host, you need to create a GSI proxy (using grid-proxy-init) on both
> the client and on the remote login host, using the same certificate.
>
> <pool handle="coasterpads">
> <execution provider="coaster" url=" login1.pads.ci.uchicago.edu "
> jobmanager="ssh:pbs"/>
> <profile namespace="globus" key="maxtime">3000</profile>
> <profile namespace="globus" key="workersPerNode">8</profile>
> <profile namespace="globus" key="slots">1</profile>
> <profile namespace="globus" key="nodeGranularity">1</profile>
> <profile namespace="globus" key="maxNodes">1</profile>
> <profile namespace="globus" key="queue">fast</profile>
> <profile namespace="karajan" key="jobThrottle">0.5</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> <filesystem provider="ssh" url=" login1.pads.ci.uchicago.edu "/>
> <workdirectory>/home/wilde/swift/lab</workdirectory>
> </pool>
>
> Arjun, this is, I think, what was causing your workflow to fail.
>
> I thought, that in the past, we used to get at least a GSI (grid
> security infrastructure) error in the detailed log file. But I don't
> see that in this case.
>
> Let me know if creating proxies on both sides works for you. Be sure
> to create it on the right PADS login host.
>
> David and Arjun, can you coordinate on integrating this use case into
> the tutorial (and eventually the Users Guide)? I suggested we do a
> series of "profiles" (with diagrams) to show the various ways of
> running Swift locally and remotely, and provide accompanying site file
> entries. Dennis, when you get started next week and try these cases,
> we'll want to find a way to do automated tests for them.
>
> Thanks,
>
> Mike
>
> --
>
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>
>
>
> --
> Arjun Comar, Rose-Hulman '12
>
>
>
> --
> Arjun Comar, Rose-Hulman '12
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list