[Swift-devel] Re: Problem with incorrect host cert DN in coaster GSI authentication

Michael Wilde wilde at mcs.anl.gov
Thu Apr 29 17:45:24 CDT 2010


Yi, from where and to where were you running? If the "to" is a Nimbus workspace in AWS, I am assuming that with provider-coasters and jobmanager=gt2:pbs, what happens is this:

Swift on the submit host sends a gt2 job to the Nimbus head node

that job runs on the ID that your client-side proxy cert is mapped to

that login on the nimbus headnode should have qsub in its PATH

(You can test this with globus-job-run of something like /bin/sh -c "which qsub")

If there's some issue with things like .profile execution, etc, to get /opt/torque into the remote PATH, perhaps on your workspace headnode you can link qsub to /usr/bin or similar?)

You'll need to experiment, unless between Mihael and the Nimbus team someone can provide a definitive answer on what options you have for getting the remote qsub into the headnode's PATH for a GT2 job).

- Mike

----- "Yi Zhu" <yizhu at cs.uchicago.edu> wrote:

> HI,
> 
> I've tried it with "gt2:pbs", and got a "qsub not found" error, for
> further investigation, I pulled the env used by globus,and found that
> there is no "/opt/torque-2.3.6/bin/qsub" under the PATH= ,I think
> that's why cause "qsub not found" problem.
> 
> Any suggested solution ?
> 
> 
> Many thanks!
> 
> -Yi Zhu
> 
> 
> Swift screen dump:
> 
> Thu Apr 29 17:25:22 CDT 2010
> -bash-3.2$ swift -tc.file tc.test.data -sites.file sshpbscoast.xml
> first.swift
> Swift svn swift-r3262 cog-r2729 (cog modified locally)
> 
> RunID: 20100429-1725-16xmtae7
> Progress:
> Progress: Stage in:1
> Progress: Submitted:1
> Failed to transfer wrapper log from
> first-20100429-1725-16xmtae7/info/9 on ec2
> Progress: Failed:1
> Execution failed:
> Exception in echo:
> Arguments: [Hello, world!]
> Host: ec2
> Directory: first-20100429-1725-16xmtae7/jobs/9/echo-91j9i9rj
> stderr.txt:
> 
> stdout.txt:
> 
> ----
> 
> Caused by:
> Task failed: Error submitting block task
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Cannot submit job: java.io.IOException: qsub: not found
> at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
> at
> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
> at
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
> Caused by: java.io.IOException: java.io.IOException: qsub: not found
> at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
> at java.lang.Runtime.exec(Runtime.java:591)
> at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:89)
> at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
> ... 3 more
> 
> Cleaning up...
> Shutting down service at https://10.251.214.179:59447
> Got channel MetaChannel: 1535747955 -> GSSSChannel-02065467484(1)
> Command(3, SHUTDOWNSERVICE): handling reply timeout;
> sendReqTime=100429-172549.902, sendTime=100429-172549.903,
> now=100429-172559.908
> - Done
> 
> Env pulled from remote:
> 
> -bash-3.2$ globus-job-run ec2-204-236-204-71.compute-1.amazonaws.com
> /bin/env
> [...]
> PATH=/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/globus/bin:/opt/vdt-1.10.1/globus/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
> PERL5LIB=/opt/vdt-1.10.1/vdt/lib:/opt/vdt-1.10.1/perl/lib/5.8.0:/opt/vdt-1.10.1/perl/lib/5.8.0/i686-linux:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0/i686-linux:/opt/vdt-1.10.1/vdt/lib:/opt/vdt-1.10.1/perl/lib/5.8.0:/opt/vdt-1.10.1/perl/lib/5.8.0/i686-linux-thread-multi:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0/i686-linux-thread-multi::/opt/vdt-1.10.1/perl/lib/5.8.8:/opt/vdt-1.10.1/perl/lib/site_perl:/opt/vdt-1.10.1/perl/lib/5.8.8:/opt/vdt-1.10.1/perl/lib/site_perl
> X509_USER_PROXY=/home/torqueuser/.globus/job/ec2-204-236-204-71.compute-1.amazonaws.com/24453.1272579976/x509_up
> -bash-3.2$
> 
> compare to env on remote machine:
> 
> [torqueuser at ip-10-251-214-179 ~]$ env
> [...]
> PATH=/opt/torque-2.3.6/bin/:/opt/torque-2.3.6/sbin:/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/globus/bin:/opt/vdt-1.10.1/globus/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/opt/pacman-3.26/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/torqueuser/bin
> [..]
> [torqueuser at ip-10-251-214-179 ~]$
> 
> 
> On 4/29/2010 11:18 AM, Mihael Hategan wrote:
> 
> On Thu, 2010-04-29 at 10:57 -0500, Michael Wilde wrote:
> 
> OK, thanks. Its not clear to me exactly whats happening, but I get the
> high-level idea that it relates to trust relationships that get broken
> because of differences in DN settings and/or interpretations. No. It's
> something that someone while writing up GSI thought was going
> to make things easier. Well, it doesn't and it makes things unsecure.
> But once in, it never changed.
> 
> Normally, when you connect to bankofamerica.com, the browser resolves
> that name to an IP, contacts that IP, gets a certificate and checks
> the
> DN against the name you typed.
> 
> In GSI, when you connect to bankofamerica.com, the browser resolves
> that
> name to an IP, contacts that IP, gets a certificate, does a
> reverse-resolution on that IP and then checks the DN of the cert
> against
> the reverse-resolved name of the IP. That reverse-resolved name may
> not
> be bankofamerica.com.
> 
> This was done to provide easy (for the sysadmin) ways of having
> multiple
> DNS entries be used with the same machine. The problem is that it also
> fails for some scenarios (like the one we have). Not only that, it is
> an
> abomination in terms of security since impersonating a service can now
> be done with DNS hacks instead of the more difficult schemes involving
> cracking RSA/DSA.
> 
> Yi, can you try gt2:pbs?
> 
> Mihael, at some point can you post a note explaining the issues?
> 
> I think we need to document or automate/fix the various interactions
> between coasters and GSI:
> 
> - this new issue/restriction with gt2:gt2:pbs
> - the GSI needs and user config procedures for ssh:pbs
> 
> Thanks,
> 
> Mike
> 
> ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> 
> The host cert isn't incorrect. It's GSI with its silly reverse lookup
> that causes things to fail.
> 
> gt2:pbs should work (assuming the pbs provider does).
> 
> On Wed, 2010-04-28 at 23:54 -0500, Michael Wilde wrote:
> 
> Mihael,
> 
> Can you post an update on Yi's problem in getting coasters running
> over Nimbus/AWS?
> 
> Easy to fix or hard?
> 
> Should he try SSH for the coaster launch? (jobmanager=ssh:pbs ???)
> 
> Thanks,
> 
> Mike

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list