[Swift-devel] ssh:pbs to beagle

Ketan Maheshwari ketancmaheshwari at gmail.com
Thu Apr 28 14:14:00 CDT 2011


Ok, I am trying a manual coaster setup from bridled (service, swift) to beagle (worker.pl).

--Ketan
 
On Apr 28, 2011, at 2:11 PM, Michael Wilde wrote:

> As far as I can tell from the swift-devel archives, the only feature for disabling coaster security is the -nosec option of the coaster-service command.
> 
> - Mike
> 
> 
> ----- Original Message -----
>> Now I think you need to create the same proxy on the Beagle side. For
>> starters, just try copying your proxy file from /tmp on communicado to
>> /tmp on the Beagle login node on which you are running Swift. Later
>> you can do this by creating a proxy on the Beagle size using
>> grid-proxy-init, but you'll need to install CA certs there.
>> 
>> Also, have you considered running a passive coaster server on the
>> communicado side, and just having Beagle worker.pl scripts connect
>> back to it?
>> 
>> - Mike
>> 
>> ----- Original Message -----
>>> Ok, I got past CredentialException with grid-proxy-init, now I am
>>> facing this (note: I have turned on provider staging) :
>>> 
>>> ========
>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>> -sites.file
>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>> modified
>>> locally)
>>> 
>>> RunID: 20110428-1332-llaa031f
>>> Progress:
>>> Could not start connection handler
>>> java.io.EOFException
>>> at
>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
>>> at
>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
>>> at
>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
>>> at
>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
>>> at
>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
>>> at
>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
>>> at org.globus.net.BaseServer.run(BaseServer.java:247)
>>> at java.lang.Thread.run(Thread.java:662)
>>> Progress: Submitted:1
>>> Could not start connection handler
>>> java.io.EOFException
>>> at
>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
>>> at
>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
>>> at
>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
>>> at
>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
>>> at
>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
>>> at
>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
>>> at org.globus.net.BaseServer.run(BaseServer.java:247)
>>> at java.lang.Thread.run(Thread.java:662)
>>> Progress: Submitted:1
>>> Exception in cat:
>>> Arguments: [data.txt]
>>> Host: beagle-remote-pbs-coasters-ssh
>>> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO:
>>> outs
>>> ----
>>> 
>>> Caused by: Could not submit job
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>> Could not submit job
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>> Could not start coaster service
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>> Task ended before registration was received.
>>> STDOUT:
>>> STDERR:
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.execution.JobException: Job
>>> failed with an exit code of 1
>>> Final status: Failed:1
>>> The following errors have occurred:
>>> 1. Job failed with an exit code of 1
>>> 
>>> ========
>>> 
>>> 
>>> From bridled to communicado, I see the following error:
>>> 
>>> **************
>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>> -sites.file
>>> coaster-local-ssh-communicado.xml catsn.swift -n=1
>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>> modified
>>> locally)
>>> 
>>> RunID: 20110428-1335-k685b2ye
>>> Progress:
>>> Progress: Submitted:1
>>> Progress: Active:1
>>> Exception in cat:
>>> Arguments: [data.txt]
>>> Host: communicado-ssh
>>> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO:
>>> outs
>>> ----
>>> 
>>> Caused by: Job failed with an exit code of 524
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.execution.JobException: Job
>>> failed with an exit code of 524
>>> Final status: Failed:1
>>> The following errors have occurred:
>>> 1. Job failed with an exit code of 524
>>> 
>>> ************
>>> 
>>> 
>>> --
>>> Ketan
>>> 
>>> 
>>> 
>>> 
>>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote:
>>> 
>>>> For now - create a proxy using grid-proxy-init on the swift
>>>> execution machine.
>>>> I think there is an option to set "no security" for this config
>>>> but
>>>> I cant recall where that is specified. Maybe swift.properties, I
>>>> cant recall.
>>>> 
>>>> - Mike
>>>> 
>>>> ----- Original Message -----
>>>>> Hi,
>>>>> 
>>>>> It looks better now. However, I am getting the following:
>>>>> 
>>>>> =====
>>>>> 
>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>>> -sites.file
>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>>>> modified
>>>>> locally)
>>>>> 
>>>>> RunID: 20110428-1251-oi9theh8
>>>>> Progress:
>>>>> Progress: Stage in:1
>>>>> Could not submit job
>>>>> Caused by:
>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>>>> Could not submit job
>>>>> Caused by:
>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>>>> Could not start coaster service
>>>>> Caused by:
>>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
>>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file
>>>>> (/tmp/x509up_u2006) not found.
>>>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5]
>>>>> Proxy
>>>>> file (/tmp/x509up_u2006) not found.
>>>>> Failed to transfer wrapper log from
>>>>> catsn-20110428-1251-oi9theh8/info/e on
>>>>> beagle-remote-pbs-coasters-ssh
>>>>> 
>>>>> =====
>>>>> 
>>>>> How do I specify "-nosec" on automatic coasters?
>>>>> 
>>>>> Ketan
>>>>> 
>>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote:
>>>>> 
>>>>>> OK. Was there a cookbook on the ssh settings? Did you set up a
>>>>>> $HOME/.ssh/auth.defaults per the user guide?
>>>>>> 
>>>>>> Here is an auth.defaults example. Im not sure its 100% correct,
>>>>>> but
>>>>>> could serve as a base for you:
>>>>>> 
>>>>>> xlogin1.pads.ci.uchicago.edu.type=password
>>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde
>>>>>> 
>>>>>> login.pads.ci.uchicago.edu.type=key
>>>>>> login.pads.ci.uchicago.edu.username=wilde
>>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
>>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
>>>>>> SURE
>>>>>> mode=600!!!
>>>>>> 
>>>>>> login1.pads.ci.uchicago.edu.type=key
>>>>>> login1.pads.ci.uchicago.edu.username=wilde
>>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
>>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
>>>>>> SURE mode=600!!!
>>>>>> 
>>>>>> login.mcs.anl.gov.type=key
>>>>>> login.mcs.anl.gov.username=wilde
>>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa
>>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE
>>>>>> mode=600!!!
>>>>>> 
>>>>>> - Mike
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>>> It does look like an ssh problem. I am getting the same stderr
>>>>>>> and
>>>>>>> log
>>>>>>> messages on trying to communicate from Bridled to Communicado.
>>>>>>> 
>>>>>>> Ketan
>>>>>>> 
>>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:
>>>>>>> 
>>>>>>>> Have you already run a simple hellow-world swift test from
>>>>>>>> communicado to bridled to make sure you have ssh configured
>>>>>>>> correctly? I would do that first.
>>>>>>>> 
>>>>>>>> Im not sure if an ssh problem explains what you show below, or
>>>>>>>> not.
>>>>>>>> 
>>>>>>>> - Mike
>>>>>>>> 
>>>>>>>> ----- Original Message -----
>>>>>>>>> Thanks, I made the change. However, now, I am getting the
>>>>>>>>> following
>>>>>>>>> on
>>>>>>>>> my stderr
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ===========
>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>>>>>>> -sites.file
>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>>>>>>>> modified
>>>>>>>>> locally)
>>>>>>>>> 
>>>>>>>>> RunID: 20110428-1022-n9s0k0e0
>>>>>>>>> Progress:
>>>>>>>>> [ketan]
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> [ketan] Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> ========
>>>>>>>>> 
>>>>>>>>> And from the log it seems some network transmission has
>>>>>>>>> failed:
>>>>>>>>> 
>>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon
>>>>>>>>> Sending
>>>>>>>>> SSH_MSG_SERVICE_REQUEST
>>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon
>>>>>>>>> Received
>>>>>>>>> SSH_MSG_SERVICE_ACCEPT
>>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The
>>>>>>>>> Transport Protocol thread failed
>>>>>>>>> java.io.IOException: The socket is EOF
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
>>>>>>>>> at java.lang.Thread.run(Thread.java:662)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Any clues?
>>>>>>>>> Ketan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
>>>>>>>>> 
>>>>>>>>>> The pool name in your sites file is
>>>>>>>>>> pads-remote-pbs-coasters-ssh
>>>>>>>>>> but
>>>>>>>>>> you used pbs in your tc.data.
>>>>>>>>>> 
>>>>>>>>>> - Mike
>>>>>>>>>> 
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> Hello,
>>>>>>>>>>> 
>>>>>>>>>>> Some context:
>>>>>>>>>>> I am trying to submit a big run on Beagle using swift +
>>>>>>>>>>> coasters.
>>>>>>>>>>> However, a previous run is already underway on beagle. So,
>>>>>>>>>>> there
>>>>>>>>>>> are
>>>>>>>>>>> two difficulties running a new run from its login node:
>>>>>>>>>>> 
>>>>>>>>>>> 1. Running another swift from the same jvm will result in
>>>>>>>>>>> chaos
>>>>>>>>>>> on
>>>>>>>>>>> the
>>>>>>>>>>> logs (As far as I know, please correct me if this is not
>>>>>>>>>>> the
>>>>>>>>>>> case
>>>>>>>>>>> anymore)
>>>>>>>>>>> 
>>>>>>>>>>> 2. Login node is already under load because of my running
>>>>>>>>>>> previous
>>>>>>>>>>> big
>>>>>>>>>>> run.
>>>>>>>>>>> 
>>>>>>>>>>> /context
>>>>>>>>>>> 
>>>>>>>>>>> So, I am now trying to submit this big run from a remote
>>>>>>>>>>> host
>>>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs,
>>>>>>>>>>> provider
>>>>>>>>>>> coaster. I tried the similar approach on a trial swift
>>>>>>>>>>> script
>>>>>>>>>>> but
>>>>>>>>>>> getting error.
>>>>>>>>>>> 
>>>>>>>>>>> Following is the error message:
>>>>>>>>>>> 
>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>>>>>>>>> -sites.file
>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088
>>>>>>>>>>> (cog
>>>>>>>>>>> modified
>>>>>>>>>>> locally)
>>>>>>>>>>> 
>>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6
>>>>>>>>>>> Progress:
>>>>>>>>>>> The application "cat" is not available in your tc.data
>>>>>>>>>>> catalog
>>>>>>>>>>> Caused by:
>>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException
>>>>>>>>>>> Final status: Failed:1
>>>>>>>>>>> The following errors have occurred:
>>>>>>>>>>> 1. The application "cat" is not available in your tc.data
>>>>>>>>>>> catalog
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files.
>>>>>>>>>>> 
>>>>>>>>>>> Could someone indicate if what I am doing is doable and if
>>>>>>>>>>> so
>>>>>>>>>>> how
>>>>>>>>>>> can
>>>>>>>>>>> I correctly configure my sites and tc setup.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Ketan
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Michael Wilde
>>>>>>>>>> Computation Institute, University of Chicago
>>>>>>>>>> Mathematics and Computer Science Division
>>>>>>>>>> Argonne National Laboratory
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Michael Wilde
>>>>>>>> Computation Institute, University of Chicago
>>>>>>>> Mathematics and Computer Science Division
>>>>>>>> Argonne National Laboratory
>>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Michael Wilde
>>>>>> Computation Institute, University of Chicago
>>>>>> Mathematics and Computer Science Division
>>>>>> Argonne National Laboratory
>>>>>> 
>>>> 
>>>> --
>>>> Michael Wilde
>>>> Computation Institute, University of Chicago
>>>> Mathematics and Computer Science Division
>>>> Argonne National Laboratory
>>>> 
>> 
>> --
>> Michael Wilde
>> Computation Institute, University of Chicago
>> Mathematics and Computer Science Division
>> Argonne National Laboratory
>> 
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 




More information about the Swift-devel mailing list