[Swift-devel] Re: Using coaster provider with jobmanager ssh:pbs
wilde at mcs.anl.gov
wilde at mcs.anl.gov
Mon Jun 7 00:13:48 CDT 2010
Arjun, looking briefly at your logs, it seems like the run you tried at about 18:36 on Friday came close - it shows in your coasters.log file that it failed because there was no valid proxy on login 1.
After that, you reverted from using the more recent stable branch code (from /home/wilde/swift/src/stable/.../dist/ back tp the old 0.9 release in /common.
Like I mentioned Friday the old 0.9 release does not have the latest ssh provider code and thus doesnt recognize your auth.default parameters.
So use my swift (or build your own from stable branch), make sure you have a valid proxy on both sides, and try again. I suspect that will progress further.
You can see that after you reverted back to 0.9, Swift never again got as far as starting coasters (from your ~/.globus/coasters/coasters.log file) because the ssh likely failed (I suspect).
- Mike
>From your .log files:
login1$ fgrep .home $(ls -1t hello*.log | head -20)
helloworld-20100606-2209-uuldx126.log: vds.home = /software/common/swift-0.9-r1/bin/..
helloworld-20100606-2207-n9aul0q5.log: vds.home = /software/common/swift-0.9-r1/bin/..
helloworld-20100606-2204-f2x1rm9f.log: vds.home = /software/common/swift-0.9-r1/bin/..
helloworld-20100606-1958-zf7ppjl6.log: vds.home = /software/common/swift-0.9-r1/bin/..
helloworld-20100604-2208-omool1yb.log: vds.home = /software/common/swift-0.9-r1/bin/..
helloworld-20100604-2206-17fmgozg.log: vds.home = /software/common/swift-0.9-r1/bin/..
helloworld-20100604-1836-jp5jbuy5.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1835-83mngdfe.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1835-mvmb56f5.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1834-833fef14.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1833-7tgi5o87.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1832-gbenp2xa.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1831-044dbd38.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1830-ua5qxocg.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1827-b31yuh98.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1826-zxygui3c.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1824-iym4edt3.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
helloworld-20100604-1820-74936sp7.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/..
login1$
----- "Arjun Comar" <mandaya at rose-hulman.edu> wrote:
> Alright, I've been playing with this for a few hours, but I can't
> manage to get any further. The sites.xml file isn't up to date, the
> one you want to see is sites-pads-pbs-coasters.xml. So I ran it a
> couple times, saving logs, etc. and noticed that in the
> .globus/coasters/coasters.log file, the jvm was being started with a
> -DGLOBUS_HOSTNAME=login.pads.ci.uchicago. So I tried setting
> GLOBUS_HOSTNAME to login1.pads.ci.uchicago. But even after that, the
> log file still showed the former. And the log shows an exception being
> thrown. So my hunch is to figure out how to force GLOBUS_HOSTNAME to
> get set. Anyone have any thoughts? Am I barking up the wrong tree?
>
> Arjun
>
>
> On Sat, Jun 5, 2010 at 9:53 AM, wilde at mcs.anl.gov < wilde at mcs.anl.gov
> > wrote:
>
>
> Looking at your latest logs, in particular coaster.log in your
> ~/.globus/coasters dir, Swift is still unable to create a secure
> connection using GSI: it thinks there is not a valid proxy in
> /tmp/x509/:
>
> Looking at your sites.xml files, this is because you are telling Swift
> to run at the hostname " login.ci.uchicago.edu " - a load balancing
> virtual DNS host rotors between login1 and login2
>
> I suspect that the coaster service tried to start on login2 while you
> made the proxy on login1, or something similar. Its a good exercise
> for you to examine all the logs involved to confirm or disprove this
> theory. Look at:
>
> - the detailed swift .log file
> - the $HOME/.globus/coasters/coasters.log file
> - the $HOME/.globus/scripts PBS submit file, stdout/err, and exitcode
> files
> - your proxy files in the local /tmp dirs of the machines that
> grid-proxy-init was run on
> - ifconfig (note that pads login hosts have multiple networks)
>
> ---
>
> login1.pads.ci.uchicago.edu
> login1$ ls -lt /tmp/x* | head
> -rw------- 1 arjun ci-users 2995 Jun 4 22:01 /tmp/x509up_u1857
> ---
>
> I dont have time at the moment to trace this all back for you, but I
> suggest two steps:
>
> 1) specify login1 everywhere you have "login" in sites.xml and
> auth.defaults
>
> 2) look at the logs in your ~/.globus/coasters and /scripts directory,
> perhaps moving the old logs out to a save/ directory each time (save
> them for debugging till you resolve this). You'll be able to tell from
> host names and IP addresses
>
> You may need to set GLOBUS_HOSTNAME, but I am not sure about that (see
> the users guide and swift-user and devel lists for more info on that,
> then ask on the list if still not clear).
>
> If the problem persists after you set everything to use the specific
> login host login1, then be sure to send the the exact error message
> your are getting and the locations of all the log files, as even
> though the top-level error seems the same to you, the logs may
> indicate that the underlying error changes as you correct various
> aspects of the configuration and security context.
>
> - Mike
>
>
>
> login1$ grep login.pads *.xml
> sites.xml: <filesystem url=" login.pads.ci.uchicago.edu "
> provider="ssh"/>
> sites.xml: <execution url=" login.pads.ci.uchicago.edu "
> provider="ssh"/>
> testsites.xml: <execution provider="coaster" url="
> login.pads.ci.uchicago.edu " jobmanager="ssh:pbs"/>
> testsites.xml: <filesystem provider="ssh" url="
> login.pads.ci.uchicago.edu "/>
>
>
>
>
>
>
> ----- "Arjun Comar" < mandaya at rose-hulman.edu > wrote:
>
> > Just realized I only sent this to Mike. I'm resending it to
> > swift-devel.
> >
> >
> > On Fri, Jun 4, 2010 at 10:11 PM, Arjun Comar <
> mandaya at rose-hulman.edu
> > > wrote:
> >
> >
> > Nope, no luck. Here's grid-proxy-info from both:
> >
> > pads:
> > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar
> > 693820/CN=53942264
> > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> > type : RFC 3820 compliant impersonation proxy
> > strength : 512 bits
> > path : /tmp/x509up_u1857
> > timeleft : 11:52:08
> >
> > bridled:
> > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar
> > 693820/CN=1363223477
> > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820
> > type : RFC 3820 compliant impersonation proxy
> > strength : 512 bits
> > path : /tmp/x509up_u1857
> > timeleft : 11:57:52
> >
> > Used the same passphrase to get both proxies,and set no options on
> > grid-proxy-init.
> >
> > Arjun
> >
> >
> >
> >
> >
> > On Fri, Jun 4, 2010 at 9:00 PM, wilde at mcs.anl.gov <
> wilde at mcs.anl.gov
> > > wrote:
> >
> >
> > When you use this configuration for running jobs from a submit host
> to
> > a PBS cluster using ssh to launch the coaster service on the PBS
> login
> > host, you need to create a GSI proxy (using grid-proxy-init) on both
> > the client and on the remote login host, using the same certificate.
> >
> > <pool handle="coasterpads">
> > <execution provider="coaster" url=" login1.pads.ci.uchicago.edu "
> > jobmanager="ssh:pbs"/>
> > <profile namespace="globus" key="maxtime">3000</profile>
> > <profile namespace="globus" key="workersPerNode">8</profile>
> > <profile namespace="globus" key="slots">1</profile>
> > <profile namespace="globus" key="nodeGranularity">1</profile>
> > <profile namespace="globus" key="maxNodes">1</profile>
> > <profile namespace="globus" key="queue">fast</profile>
> > <profile namespace="karajan" key="jobThrottle">0.5</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> > <filesystem provider="ssh" url=" login1.pads.ci.uchicago.edu "/>
> > <workdirectory>/home/wilde/swift/lab</workdirectory>
> > </pool>
> >
> > Arjun, this is, I think, what was causing your workflow to fail.
> >
> > I thought, that in the past, we used to get at least a GSI (grid
> > security infrastructure) error in the detailed log file. But I don't
> > see that in this case.
> >
> > Let me know if creating proxies on both sides works for you. Be sure
> > to create it on the right PADS login host.
> >
> > David and Arjun, can you coordinate on integrating this use case
> into
> > the tutorial (and eventually the Users Guide)? I suggested we do a
> > series of "profiles" (with diagrams) to show the various ways of
> > running Swift locally and remotely, and provide accompanying site
> file
> > entries. Dennis, when you get started next week and try these cases,
> > we'll want to find a way to do automated tests for them.
> >
> > Thanks,
> >
> > Mike
> >
> > --
> >
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> >
> >
> >
> > --
> > Arjun Comar, Rose-Hulman '12
> >
> >
> >
> > --
> > Arjun Comar, Rose-Hulman '12
>
> --
>
>
>
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>
>
>
> --
> Arjun Comar, Rose-Hulman '12
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list