[Swift-devel] Re: Problem with incorrect host cert DN in coaster GSI authentication

Yi Zhu yizhu at cs.uchicago.edu
Thu Apr 29 17:34:44 CDT 2010


HI,

  I've tried it with "gt2:pbs", and got a "qsub not found" error, for 
further investigation, I  pulled the /env/ used by globus,and  found 
that there is no "/opt/torque-2.3.6/bin/qsub" under the PATH= ,I think 
that's why cause  "qsub not found" problem.

Any suggested solution ?


Many thanks!

-Yi Zhu


Swift screen dump:

Thu Apr 29 17:25:22 CDT 2010
-bash-3.2$ swift -tc.file tc.test.data -sites.file sshpbscoast.xml 
first.swift
Swift svn swift-r3262 cog-r2729 (cog modified locally)

RunID: 20100429-1725-16xmtae7
Progress:
Progress:  Stage in:1
Progress:  Submitted:1
Failed to transfer wrapper log from first-20100429-1725-16xmtae7/info/9 
on ec2
Progress:  Failed:1
Execution failed:
         Exception in echo:
Arguments: [Hello, world!]
Host: ec2
Directory: first-20100429-1725-16xmtae7/jobs/9/echo-91j9i9rj
stderr.txt:

stdout.txt:

----

Caused by:
         Task failed: Error submitting block task
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: 
Cannot submit job: java.io.IOException: qsub: not found
         at 
org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
         at 
org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
         at 
org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
         at 
org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
Caused by: java.io.IOException: java.io.IOException: qsub: not found
         at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
         at java.lang.ProcessImpl.start(ProcessImpl.java:65)
         at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
         at java.lang.Runtime.exec(Runtime.java:591)
         at 
org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:89)
         at 
org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
         ... 3 more

Cleaning up...
Shutting down service at https://10.251.214.179:59447
Got channel MetaChannel: 1535747955 -> GSSSChannel-02065467484(1)
Command(3, SHUTDOWNSERVICE): handling reply timeout; 
sendReqTime=100429-172549.902, sendTime=100429-172549.903, 
now=100429-172559.908
- Done

Env pulled from remote:

-bash-3.2$ globus-job-run ec2-204-236-204-71.compute-1.amazonaws.com 
/bin/env
[...]
PATH=/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/globus/bin:/opt/vdt-1.10.1/globus/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
PERL5LIB=/opt/vdt-1.10.1/vdt/lib:/opt/vdt-1.10.1/perl/lib/5.8.0:/opt/vdt-1.10.1/perl/lib/5.8.0/i686-linux:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0/i686-linux:/opt/vdt-1.10.1/vdt/lib:/opt/vdt-1.10.1/perl/lib/5.8.0:/opt/vdt-1.10.1/perl/lib/5.8.0/i686-linux-thread-multi:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0:/opt/vdt-1.10.1/perl/lib/site_perl/5.8.0/i686-linux-thread-multi::/opt/vdt-1.10.1/perl/lib/5.8.8:/opt/vdt-1.10.1/perl/lib/site_perl:/opt/vdt-1.10.1/perl/lib/5.8.8:/opt/vdt-1.10.1/perl/lib/site_perl
X509_USER_PROXY=/home/torqueuser/.globus/job/ec2-204-236-204-71.compute-1.amazonaws.com/24453.1272579976/x509_up
-bash-3.2$

compare to env on remote machine:

[torqueuser at ip-10-251-214-179 ~]$ env
[...]
PATH=/opt/torque-2.3.6/bin/:/opt/torque-2.3.6/sbin:/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/globus/bin:/opt/vdt-1.10.1/globus/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/opt/pacman-3.26/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/torqueuser/bin
[..]
[torqueuser at ip-10-251-214-179 ~]$


On 4/29/2010 11:18 AM, Mihael Hategan wrote:
> On Thu, 2010-04-29 at 10:57 -0500, Michael Wilde wrote:
>    
>> OK, thanks. Its not clear to me exactly whats happening, but I get the
>> high-level idea that it relates to trust relationships that get broken
>> because of differences in DN settings and/or interpretations.
>>      
> No. It's something that someone while writing up GSI thought was going
> to make things easier. Well, it doesn't and it makes things unsecure.
> But once in, it never changed.
>
> Normally, when you connect to bankofamerica.com, the browser resolves
> that name to an IP, contacts that IP, gets a certificate and checks the
> DN against the name you typed.
>
> In GSI, when you connect to bankofamerica.com, the browser resolves that
> name to an IP, contacts that IP, gets a certificate, does a
> reverse-resolution on that IP and then checks the DN of the cert against
> the reverse-resolved name of the IP. That reverse-resolved name may not
> be bankofamerica.com.
>
> This was done to provide easy (for the sysadmin) ways of having multiple
> DNS entries be used with the same machine. The problem is that it also
> fails for some scenarios (like the one we have). Not only that, it is an
> abomination in terms of security since impersonating a service can now
> be done with DNS hacks instead of the more difficult schemes involving
> cracking RSA/DSA.
>
>    
>> Yi, can you try gt2:pbs?
>>
>> Mihael, at some point can you post a note explaining the issues?
>>
>> I think we need to document or automate/fix the various interactions between coasters and GSI:
>>
>> - this new issue/restriction with gt2:gt2:pbs
>> - the GSI needs and user config procedures for ssh:pbs
>>
>> Thanks,
>>
>> Mike
>>
>> ----- "Mihael Hategan"<hategan at mcs.anl.gov>  wrote:
>>
>>      
>>> The host cert isn't incorrect. It's GSI with its silly reverse lookup
>>> that causes things to fail.
>>>
>>> gt2:pbs should work (assuming the pbs provider does).
>>>
>>> On Wed, 2010-04-28 at 23:54 -0500, Michael Wilde wrote:
>>>        
>>>> Mihael,
>>>>
>>>> Can you post an update on Yi's problem in getting coasters running
>>>>          
>>> over Nimbus/AWS?
>>>        
>>>> Easy to fix or hard?
>>>>
>>>> Should he try SSH for the coaster launch? (jobmanager=ssh:pbs ???)
>>>>
>>>> Thanks,
>>>>
>>>> Mike
>>>>          
>>      
>
>    

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20100429/55794a2b/attachment.html>


More information about the Swift-devel mailing list