[Swift-devel] ssh:pbs to beagle

Ketan Maheshwari ketancmaheshwari at gmail.com
Thu Apr 28 13:36:15 CDT 2011


Ok, I got past CredentialException with grid-proxy-init, now I am facing this (note: I have turned on provider staging)  :

========
[ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1
Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally)

RunID: 20110428-1332-llaa031f
Progress:
Could not start connection handler
java.io.EOFException
	at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
	at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
	at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
	at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
	at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
	at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
	at org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
	at org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
	at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
	at org.globus.net.BaseServer.run(BaseServer.java:247)
	at java.lang.Thread.run(Thread.java:662)
Progress:  Submitted:1
Could not start connection handler
java.io.EOFException
	at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
	at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
	at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
	at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
	at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
	at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
	at org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
	at org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
	at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
	at org.globus.net.BaseServer.run(BaseServer.java:247)
	at java.lang.Thread.run(Thread.java:662)
Progress:  Submitted:1
Exception in cat:
Arguments: [data.txt]
Host: beagle-remote-pbs-coasters-ssh
Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs
----

Caused by: Could not submit job
Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job
Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service
Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task ended before registration was received. 
STDOUT: 
STDERR: 
Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1
Final status:  Failed:1
The following errors have occurred:
1. Job failed with an exit code of 1

========


From bridled to communicado, I see the following error:

**************
[ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1
Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally)

RunID: 20110428-1335-k685b2ye
Progress:
Progress:  Submitted:1
Progress:  Active:1
Exception in cat:
Arguments: [data.txt]
Host: communicado-ssh
Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs
----

Caused by: Job failed with an exit code of 524
Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 524
Final status:  Failed:1
The following errors have occurred:
1. Job failed with an exit code of 524

************


--
Ketan




On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote:

> For now - create a proxy using grid-proxy-init on the swift execution machine.
> I think there is an option to set "no security" for this config but I cant recall where that is specified.  Maybe swift.properties, I cant recall.
> 
> - Mike
> 
> ----- Original Message -----
>> Hi,
>> 
>> It looks better now. However, I am getting the following:
>> 
>> =====
>> 
>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file
>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified
>> locally)
>> 
>> RunID: 20110428-1251-oi9theh8
>> Progress:
>> Progress: Stage in:1
>> Could not submit job
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Could not submit job
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Could not start coaster service
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file
>> (/tmp/x509up_u2006) not found.
>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy
>> file (/tmp/x509up_u2006) not found.
>> Failed to transfer wrapper log from
>> catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh
>> 
>> =====
>> 
>> How do I specify "-nosec" on automatic coasters?
>> 
>> Ketan
>> 
>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote:
>> 
>>> OK. Was there a cookbook on the ssh settings? Did you set up a
>>> $HOME/.ssh/auth.defaults per the user guide?
>>> 
>>> Here is an auth.defaults example. Im not sure its 100% correct, but
>>> could serve as a base for you:
>>> 
>>> xlogin1.pads.ci.uchicago.edu.type=password
>>> xlogin1.pads.ci.uchicago.edu.username=wilde
>>> 
>>> login.pads.ci.uchicago.edu.type=key
>>> login.pads.ci.uchicago.edu.username=wilde
>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE
>>> mode=600!!!
>>> 
>>> login1.pads.ci.uchicago.edu.type=key
>>> login1.pads.ci.uchicago.edu.username=wilde
>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
>>> SURE mode=600!!!
>>> 
>>> login.mcs.anl.gov.type=key
>>> login.mcs.anl.gov.username=wilde
>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa
>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE
>>> mode=600!!!
>>> 
>>> - Mike
>>> 
>>> ----- Original Message -----
>>>> It does look like an ssh problem. I am getting the same stderr and
>>>> log
>>>> messages on trying to communicate from Bridled to Communicado.
>>>> 
>>>> Ketan
>>>> 
>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:
>>>> 
>>>>> Have you already run a simple hellow-world swift test from
>>>>> communicado to bridled to make sure you have ssh configured
>>>>> correctly? I would do that first.
>>>>> 
>>>>> Im not sure if an ssh problem explains what you show below, or
>>>>> not.
>>>>> 
>>>>> - Mike
>>>>> 
>>>>> ----- Original Message -----
>>>>>> Thanks, I made the change. However, now, I am getting the
>>>>>> following
>>>>>> on
>>>>>> my stderr
>>>>>> 
>>>>>> 
>>>>>> ===========
>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>>>> -sites.file
>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>>>>> modified
>>>>>> locally)
>>>>>> 
>>>>>> RunID: 20110428-1022-n9s0k0e0
>>>>>> Progress:
>>>>>> [ketan]
>>>>>> Progress: Initializing site shared directory:1
>>>>>> [ketan] Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> Progress: Initializing site shared directory:1
>>>>>> ========
>>>>>> 
>>>>>> And from the log it seems some network transmission has failed:
>>>>>> 
>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending
>>>>>> SSH_MSG_SERVICE_REQUEST
>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon
>>>>>> Received
>>>>>> SSH_MSG_SERVICE_ACCEPT
>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The
>>>>>> Transport Protocol thread failed
>>>>>> java.io.IOException: The socket is EOF
>>>>>> at
>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
>>>>>> at
>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
>>>>>> at
>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
>>>>>> at
>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
>>>>>> at
>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
>>>>>> at java.lang.Thread.run(Thread.java:662)
>>>>>> 
>>>>>> 
>>>>>> Any clues?
>>>>>> Ketan
>>>>>> 
>>>>>> 
>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
>>>>>> 
>>>>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh
>>>>>>> but
>>>>>>> you used pbs in your tc.data.
>>>>>>> 
>>>>>>> - Mike
>>>>>>> 
>>>>>>> ----- Original Message -----
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Some context:
>>>>>>>> I am trying to submit a big run on Beagle using swift +
>>>>>>>> coasters.
>>>>>>>> However, a previous run is already underway on beagle. So,
>>>>>>>> there
>>>>>>>> are
>>>>>>>> two difficulties running a new run from its login node:
>>>>>>>> 
>>>>>>>> 1. Running another swift from the same jvm will result in chaos
>>>>>>>> on
>>>>>>>> the
>>>>>>>> logs (As far as I know, please correct me if this is not the
>>>>>>>> case
>>>>>>>> anymore)
>>>>>>>> 
>>>>>>>> 2. Login node is already under load because of my running
>>>>>>>> previous
>>>>>>>> big
>>>>>>>> run.
>>>>>>>> 
>>>>>>>> /context
>>>>>>>> 
>>>>>>>> So, I am now trying to submit this big run from a remote host
>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs,
>>>>>>>> provider
>>>>>>>> coaster. I tried the similar approach on a trial swift script
>>>>>>>> but
>>>>>>>> getting error.
>>>>>>>> 
>>>>>>>> Following is the error message:
>>>>>>>> 
>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>>>>>> -sites.file
>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>>>>>>> modified
>>>>>>>> locally)
>>>>>>>> 
>>>>>>>> RunID: 20110428-1002-c8rvqhe6
>>>>>>>> Progress:
>>>>>>>> The application "cat" is not available in your tc.data catalog
>>>>>>>> Caused by:
>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException
>>>>>>>> Final status: Failed:1
>>>>>>>> The following errors have occurred:
>>>>>>>> 1. The application "cat" is not available in your tc.data
>>>>>>>> catalog
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Attached are my .swift, sites.xml and tc.data files.
>>>>>>>> 
>>>>>>>> Could someone indicate if what I am doing is doable and if so
>>>>>>>> how
>>>>>>>> can
>>>>>>>> I correctly configure my sites and tc setup.
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> Ketan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Swift-devel mailing list
>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>> 
>>>>>>> --
>>>>>>> Michael Wilde
>>>>>>> Computation Institute, University of Chicago
>>>>>>> Mathematics and Computer Science Division
>>>>>>> Argonne National Laboratory
>>>>>>> 
>>>>> 
>>>>> --
>>>>> Michael Wilde
>>>>> Computation Institute, University of Chicago
>>>>> Mathematics and Computer Science Division
>>>>> Argonne National Laboratory
>>>>> 
>>> 
>>> --
>>> Michael Wilde
>>> Computation Institute, University of Chicago
>>> Mathematics and Computer Science Division
>>> Argonne National Laboratory
>>> 
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 




More information about the Swift-devel mailing list