[Swift-devel] ssh:pbs to beagle

Ketan Maheshwari ketancmaheshwari at gmail.com
Thu Apr 28 13:01:18 CDT 2011


Hi,

It looks better now. However, I am getting the following:

=====

[ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1
Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally)

RunID: 20110428-1251-oi9theh8
Progress:
Progress:  Stage in:1
Could not submit job
Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job
Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service
Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file (/tmp/x509up_u2006) not found.
Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file (/tmp/x509up_u2006) not found.
Failed to transfer wrapper log from catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh

=====

How do I specify "-nosec" on automatic coasters?

Ketan

On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote:

> OK. Was there a cookbook on the ssh settings? Did you set up a $HOME/.ssh/auth.defaults per the user guide?
> 
> Here is an auth.defaults example. Im not sure its 100% correct, but could serve as a base for you:
> 
> xlogin1.pads.ci.uchicago.edu.type=password
> xlogin1.pads.ci.uchicago.edu.username=wilde
> 
> login.pads.ci.uchicago.edu.type=key
> login.pads.ci.uchicago.edu.username=wilde
> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!!
> 
> login1.pads.ci.uchicago.edu.type=key
> login1.pads.ci.uchicago.edu.username=wilde
> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!!
> 
> login.mcs.anl.gov.type=key
> login.mcs.anl.gov.username=wilde
> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa
> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE mode=600!!!
> 
> - Mike
> 
> ----- Original Message -----
>> It does look like an ssh problem. I am getting the same stderr and log
>> messages on trying to communicate from Bridled to Communicado.
>> 
>> Ketan
>> 
>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:
>> 
>>> Have you already run a simple hellow-world swift test from
>>> communicado to bridled to make sure you have ssh configured
>>> correctly? I would do that first.
>>> 
>>> Im not sure if an ssh problem explains what you show below, or not.
>>> 
>>> - Mike
>>> 
>>> ----- Original Message -----
>>>> Thanks, I made the change. However, now, I am getting the following
>>>> on
>>>> my stderr
>>>> 
>>>> 
>>>> ===========
>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>> -sites.file
>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>>> modified
>>>> locally)
>>>> 
>>>> RunID: 20110428-1022-n9s0k0e0
>>>> Progress:
>>>> [ketan]
>>>> Progress: Initializing site shared directory:1
>>>> [ketan] Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> Progress: Initializing site shared directory:1
>>>> ========
>>>> 
>>>> And from the log it seems some network transmission has failed:
>>>> 
>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending
>>>> SSH_MSG_SERVICE_REQUEST
>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon Received
>>>> SSH_MSG_SERVICE_ACCEPT
>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The
>>>> Transport Protocol thread failed
>>>> java.io.IOException: The socket is EOF
>>>> at
>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
>>>> at
>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
>>>> at
>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
>>>> at
>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
>>>> at
>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
>>>> at java.lang.Thread.run(Thread.java:662)
>>>> 
>>>> 
>>>> Any clues?
>>>> Ketan
>>>> 
>>>> 
>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
>>>> 
>>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh
>>>>> but
>>>>> you used pbs in your tc.data.
>>>>> 
>>>>> - Mike
>>>>> 
>>>>> ----- Original Message -----
>>>>>> Hello,
>>>>>> 
>>>>>> Some context:
>>>>>> I am trying to submit a big run on Beagle using swift + coasters.
>>>>>> However, a previous run is already underway on beagle. So, there
>>>>>> are
>>>>>> two difficulties running a new run from its login node:
>>>>>> 
>>>>>> 1. Running another swift from the same jvm will result in chaos
>>>>>> on
>>>>>> the
>>>>>> logs (As far as I know, please correct me if this is not the case
>>>>>> anymore)
>>>>>> 
>>>>>> 2. Login node is already under load because of my running
>>>>>> previous
>>>>>> big
>>>>>> run.
>>>>>> 
>>>>>> /context
>>>>>> 
>>>>>> So, I am now trying to submit this big run from a remote host
>>>>>> (bridled). I know this has been done on PADS using ssh:pbs,
>>>>>> provider
>>>>>> coaster. I tried the similar approach on a trial swift script but
>>>>>> getting error.
>>>>>> 
>>>>>> Following is the error message:
>>>>>> 
>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>>>> -sites.file
>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>>>>> modified
>>>>>> locally)
>>>>>> 
>>>>>> RunID: 20110428-1002-c8rvqhe6
>>>>>> Progress:
>>>>>> The application "cat" is not available in your tc.data catalog
>>>>>> Caused by:
>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException
>>>>>> Final status: Failed:1
>>>>>> The following errors have occurred:
>>>>>> 1. The application "cat" is not available in your tc.data catalog
>>>>>> 
>>>>>> 
>>>>>> Attached are my .swift, sites.xml and tc.data files.
>>>>>> 
>>>>>> Could someone indicate if what I am doing is doable and if so how
>>>>>> can
>>>>>> I correctly configure my sites and tc setup.
>>>>>> 
>>>>>> Thanks.
>>>>>> Ketan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>> 
>>>>> --
>>>>> Michael Wilde
>>>>> Computation Institute, University of Chicago
>>>>> Mathematics and Computer Science Division
>>>>> Argonne National Laboratory
>>>>> 
>>> 
>>> --
>>> Michael Wilde
>>> Computation Institute, University of Chicago
>>> Mathematics and Computer Science Division
>>> Argonne National Laboratory
>>> 
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 




More information about the Swift-devel mailing list