[Swift-devel] Re: Problem with incorrect host cert DN in coaster GSI authentication
Yi Zhu
yizhu at cs.uchicago.edu
Thu Apr 29 17:34:44 CDT 2010
HI,
I've tried it with "gt2:pbs", and got a "qsub not found" error, for
further investigation, I pulled the /env/ used by globus,and found
that there is no "/opt/torque-2.3.6/bin/qsub" under the PATH= ,I think
that's why cause "qsub not found" problem.
Any suggested solution ?
Many thanks!
-Yi Zhu
Swift screen dump:
Thu Apr 29 17:25:22 CDT 2010
-bash-3.2$ swift -tc.file tc.test.data -sites.file sshpbscoast.xml
first.swift
Swift svn swift-r3262 cog-r2729 (cog modified locally)
RunID: 20100429-1725-16xmtae7
Progress:
Progress: Stage in:1
Progress: Submitted:1
Failed to transfer wrapper log from first-20100429-1725-16xmtae7/info/9
on ec2
Progress: Failed:1
Execution failed:
Exception in echo:
Arguments: [Hello, world!]
Host: ec2
Directory: first-20100429-1725-16xmtae7/jobs/9/echo-91j9i9rj
stderr.txt:
stdout.txt:
----
Caused by:
Task failed: Error submitting block task
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
Cannot submit job: java.io.IOException: qsub: not found
at
org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
at
org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
at
org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
at
org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
Caused by: java.io.IOException: java.io.IOException: qsub: not found
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
at java.lang.Runtime.exec(Runtime.java:591)
at
org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:89)
at
org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
... 3 more
Cleaning up...
Shutting down service at https://10.251.214.179:59447
Got channel MetaChannel: 1535747955 -> GSSSChannel-02065467484(1)
Command(3, SHUTDOWNSERVICE): handling reply timeout;
sendReqTime=100429-172549.902, sendTime=100429-172549.903,
now=100429-172559.908
- Done
Env pulled from remote:
-bash-3.2$ globus-job-run ec2-204-236-204-71.compute-1.amazonaws.com
/bin/env
[...]
PATH=/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/globus/bin:/opt/vdt-1.10.1/globus/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
PERL5LIB=/opt/vdt-1.10.1/vdt/lib:/opt/vdt-1.10.1/perl/lib/5.8.0:/opt/vdt-1.10.1/perl/lib/5.8.0/i686-linux:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0/i686-linux:/opt/vdt-1.10.1/vdt/lib:/opt/vdt-1.10.1/perl/lib/5.8.0:/opt/vdt-1.10.1/perl/lib/5.8.0/i686-linux-thread-multi:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0/i686-linux-thread-multi::/opt/vdt-1.10.1/perl/lib/5.8.8:/opt/vdt-1.10.1/perl/lib/site_perl:/opt/vdt-1.10.1/perl/lib/5.8.8:/opt/vdt-1.10.1/perl/lib/site_perl
X509_USER_PROXY=/home/torqueuser/.globus/job/ec2-204-236-204-71.compute-1.amazonaws.com/24453.1272579976/x509_up
-bash-3.2$
compare to env on remote machine:
[torqueuser at ip-10-251-214-179 ~]$ env
[...]
PATH=/opt/torque-2.3.6/bin/:/opt/torque-2.3.6/sbin:/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/globus/bin:/opt/vdt-1.10.1/globus/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/opt/pacman-3.26/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/torqueuser/bin
[..]
[torqueuser at ip-10-251-214-179 ~]$
On 4/29/2010 11:18 AM, Mihael Hategan wrote:
> On Thu, 2010-04-29 at 10:57 -0500, Michael Wilde wrote:
>
>> OK, thanks. Its not clear to me exactly whats happening, but I get the
>> high-level idea that it relates to trust relationships that get broken
>> because of differences in DN settings and/or interpretations.
>>
> No. It's something that someone while writing up GSI thought was going
> to make things easier. Well, it doesn't and it makes things unsecure.
> But once in, it never changed.
>
> Normally, when you connect to bankofamerica.com, the browser resolves
> that name to an IP, contacts that IP, gets a certificate and checks the
> DN against the name you typed.
>
> In GSI, when you connect to bankofamerica.com, the browser resolves that
> name to an IP, contacts that IP, gets a certificate, does a
> reverse-resolution on that IP and then checks the DN of the cert against
> the reverse-resolved name of the IP. That reverse-resolved name may not
> be bankofamerica.com.
>
> This was done to provide easy (for the sysadmin) ways of having multiple
> DNS entries be used with the same machine. The problem is that it also
> fails for some scenarios (like the one we have). Not only that, it is an
> abomination in terms of security since impersonating a service can now
> be done with DNS hacks instead of the more difficult schemes involving
> cracking RSA/DSA.
>
>
>> Yi, can you try gt2:pbs?
>>
>> Mihael, at some point can you post a note explaining the issues?
>>
>> I think we need to document or automate/fix the various interactions between coasters and GSI:
>>
>> - this new issue/restriction with gt2:gt2:pbs
>> - the GSI needs and user config procedures for ssh:pbs
>>
>> Thanks,
>>
>> Mike
>>
>> ----- "Mihael Hategan"<hategan at mcs.anl.gov> wrote:
>>
>>
>>> The host cert isn't incorrect. It's GSI with its silly reverse lookup
>>> that causes things to fail.
>>>
>>> gt2:pbs should work (assuming the pbs provider does).
>>>
>>> On Wed, 2010-04-28 at 23:54 -0500, Michael Wilde wrote:
>>>
>>>> Mihael,
>>>>
>>>> Can you post an update on Yi's problem in getting coasters running
>>>>
>>> over Nimbus/AWS?
>>>
>>>> Easy to fix or hard?
>>>>
>>>> Should he try SSH for the coaster launch? (jobmanager=ssh:pbs ???)
>>>>
>>>> Thanks,
>>>>
>>>> Mike
>>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20100429/55794a2b/attachment.html>
More information about the Swift-devel
mailing list