[Swift-devel] ssh:pbs to beagle
Ketan Maheshwari
ketancmaheshwari at gmail.com
Thu Apr 28 14:14:00 CDT 2011
Ok, I am trying a manual coaster setup from bridled (service, swift) to beagle (worker.pl).
--Ketan
On Apr 28, 2011, at 2:11 PM, Michael Wilde wrote:
> As far as I can tell from the swift-devel archives, the only feature for disabling coaster security is the -nosec option of the coaster-service command.
>
> - Mike
>
>
> ----- Original Message -----
>> Now I think you need to create the same proxy on the Beagle side. For
>> starters, just try copying your proxy file from /tmp on communicado to
>> /tmp on the Beagle login node on which you are running Swift. Later
>> you can do this by creating a proxy on the Beagle size using
>> grid-proxy-init, but you'll need to install CA certs there.
>>
>> Also, have you considered running a passive coaster server on the
>> communicado side, and just having Beagle worker.pl scripts connect
>> back to it?
>>
>> - Mike
>>
>> ----- Original Message -----
>>> Ok, I got past CredentialException with grid-proxy-init, now I am
>>> facing this (note: I have turned on provider staging) :
>>>
>>> ========
>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>> -sites.file
>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>> modified
>>> locally)
>>>
>>> RunID: 20110428-1332-llaa031f
>>> Progress:
>>> Could not start connection handler
>>> java.io.EOFException
>>> at
>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
>>> at
>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
>>> at
>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
>>> at
>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
>>> at
>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
>>> at
>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
>>> at org.globus.net.BaseServer.run(BaseServer.java:247)
>>> at java.lang.Thread.run(Thread.java:662)
>>> Progress: Submitted:1
>>> Could not start connection handler
>>> java.io.EOFException
>>> at
>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
>>> at
>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
>>> at
>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
>>> at
>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
>>> at
>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
>>> at
>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
>>> at
>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
>>> at org.globus.net.BaseServer.run(BaseServer.java:247)
>>> at java.lang.Thread.run(Thread.java:662)
>>> Progress: Submitted:1
>>> Exception in cat:
>>> Arguments: [data.txt]
>>> Host: beagle-remote-pbs-coasters-ssh
>>> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO:
>>> outs
>>> ----
>>>
>>> Caused by: Could not submit job
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>> Could not submit job
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>> Could not start coaster service
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>> Task ended before registration was received.
>>> STDOUT:
>>> STDERR:
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.execution.JobException: Job
>>> failed with an exit code of 1
>>> Final status: Failed:1
>>> The following errors have occurred:
>>> 1. Job failed with an exit code of 1
>>>
>>> ========
>>>
>>>
>>> From bridled to communicado, I see the following error:
>>>
>>> **************
>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>> -sites.file
>>> coaster-local-ssh-communicado.xml catsn.swift -n=1
>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>> modified
>>> locally)
>>>
>>> RunID: 20110428-1335-k685b2ye
>>> Progress:
>>> Progress: Submitted:1
>>> Progress: Active:1
>>> Exception in cat:
>>> Arguments: [data.txt]
>>> Host: communicado-ssh
>>> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO:
>>> outs
>>> ----
>>>
>>> Caused by: Job failed with an exit code of 524
>>> Caused by:
>>> org.globus.cog.abstraction.impl.common.execution.JobException: Job
>>> failed with an exit code of 524
>>> Final status: Failed:1
>>> The following errors have occurred:
>>> 1. Job failed with an exit code of 524
>>>
>>> ************
>>>
>>>
>>> --
>>> Ketan
>>>
>>>
>>>
>>>
>>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote:
>>>
>>>> For now - create a proxy using grid-proxy-init on the swift
>>>> execution machine.
>>>> I think there is an option to set "no security" for this config
>>>> but
>>>> I cant recall where that is specified. Maybe swift.properties, I
>>>> cant recall.
>>>>
>>>> - Mike
>>>>
>>>> ----- Original Message -----
>>>>> Hi,
>>>>>
>>>>> It looks better now. However, I am getting the following:
>>>>>
>>>>> =====
>>>>>
>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>>> -sites.file
>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>>>> modified
>>>>> locally)
>>>>>
>>>>> RunID: 20110428-1251-oi9theh8
>>>>> Progress:
>>>>> Progress: Stage in:1
>>>>> Could not submit job
>>>>> Caused by:
>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>>>> Could not submit job
>>>>> Caused by:
>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>>>> Could not start coaster service
>>>>> Caused by:
>>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
>>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file
>>>>> (/tmp/x509up_u2006) not found.
>>>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5]
>>>>> Proxy
>>>>> file (/tmp/x509up_u2006) not found.
>>>>> Failed to transfer wrapper log from
>>>>> catsn-20110428-1251-oi9theh8/info/e on
>>>>> beagle-remote-pbs-coasters-ssh
>>>>>
>>>>> =====
>>>>>
>>>>> How do I specify "-nosec" on automatic coasters?
>>>>>
>>>>> Ketan
>>>>>
>>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote:
>>>>>
>>>>>> OK. Was there a cookbook on the ssh settings? Did you set up a
>>>>>> $HOME/.ssh/auth.defaults per the user guide?
>>>>>>
>>>>>> Here is an auth.defaults example. Im not sure its 100% correct,
>>>>>> but
>>>>>> could serve as a base for you:
>>>>>>
>>>>>> xlogin1.pads.ci.uchicago.edu.type=password
>>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde
>>>>>>
>>>>>> login.pads.ci.uchicago.edu.type=key
>>>>>> login.pads.ci.uchicago.edu.username=wilde
>>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
>>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
>>>>>> SURE
>>>>>> mode=600!!!
>>>>>>
>>>>>> login1.pads.ci.uchicago.edu.type=key
>>>>>> login1.pads.ci.uchicago.edu.username=wilde
>>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
>>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
>>>>>> SURE mode=600!!!
>>>>>>
>>>>>> login.mcs.anl.gov.type=key
>>>>>> login.mcs.anl.gov.username=wilde
>>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa
>>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE
>>>>>> mode=600!!!
>>>>>>
>>>>>> - Mike
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> It does look like an ssh problem. I am getting the same stderr
>>>>>>> and
>>>>>>> log
>>>>>>> messages on trying to communicate from Bridled to Communicado.
>>>>>>>
>>>>>>> Ketan
>>>>>>>
>>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:
>>>>>>>
>>>>>>>> Have you already run a simple hellow-world swift test from
>>>>>>>> communicado to bridled to make sure you have ssh configured
>>>>>>>> correctly? I would do that first.
>>>>>>>>
>>>>>>>> Im not sure if an ssh problem explains what you show below, or
>>>>>>>> not.
>>>>>>>>
>>>>>>>> - Mike
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> Thanks, I made the change. However, now, I am getting the
>>>>>>>>> following
>>>>>>>>> on
>>>>>>>>> my stderr
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ===========
>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>>>>>>> -sites.file
>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>>>>>>>> modified
>>>>>>>>> locally)
>>>>>>>>>
>>>>>>>>> RunID: 20110428-1022-n9s0k0e0
>>>>>>>>> Progress:
>>>>>>>>> [ketan]
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> [ketan] Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> Progress: Initializing site shared directory:1
>>>>>>>>> ========
>>>>>>>>>
>>>>>>>>> And from the log it seems some network transmission has
>>>>>>>>> failed:
>>>>>>>>>
>>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon
>>>>>>>>> Sending
>>>>>>>>> SSH_MSG_SERVICE_REQUEST
>>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon
>>>>>>>>> Received
>>>>>>>>> SSH_MSG_SERVICE_ACCEPT
>>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The
>>>>>>>>> Transport Protocol thread failed
>>>>>>>>> java.io.IOException: The socket is EOF
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
>>>>>>>>> at
>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
>>>>>>>>> at java.lang.Thread.run(Thread.java:662)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any clues?
>>>>>>>>> Ketan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
>>>>>>>>>
>>>>>>>>>> The pool name in your sites file is
>>>>>>>>>> pads-remote-pbs-coasters-ssh
>>>>>>>>>> but
>>>>>>>>>> you used pbs in your tc.data.
>>>>>>>>>>
>>>>>>>>>> - Mike
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Some context:
>>>>>>>>>>> I am trying to submit a big run on Beagle using swift +
>>>>>>>>>>> coasters.
>>>>>>>>>>> However, a previous run is already underway on beagle. So,
>>>>>>>>>>> there
>>>>>>>>>>> are
>>>>>>>>>>> two difficulties running a new run from its login node:
>>>>>>>>>>>
>>>>>>>>>>> 1. Running another swift from the same jvm will result in
>>>>>>>>>>> chaos
>>>>>>>>>>> on
>>>>>>>>>>> the
>>>>>>>>>>> logs (As far as I know, please correct me if this is not
>>>>>>>>>>> the
>>>>>>>>>>> case
>>>>>>>>>>> anymore)
>>>>>>>>>>>
>>>>>>>>>>> 2. Login node is already under load because of my running
>>>>>>>>>>> previous
>>>>>>>>>>> big
>>>>>>>>>>> run.
>>>>>>>>>>>
>>>>>>>>>>> /context
>>>>>>>>>>>
>>>>>>>>>>> So, I am now trying to submit this big run from a remote
>>>>>>>>>>> host
>>>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs,
>>>>>>>>>>> provider
>>>>>>>>>>> coaster. I tried the similar approach on a trial swift
>>>>>>>>>>> script
>>>>>>>>>>> but
>>>>>>>>>>> getting error.
>>>>>>>>>>>
>>>>>>>>>>> Following is the error message:
>>>>>>>>>>>
>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>>>>>>>>> -sites.file
>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088
>>>>>>>>>>> (cog
>>>>>>>>>>> modified
>>>>>>>>>>> locally)
>>>>>>>>>>>
>>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6
>>>>>>>>>>> Progress:
>>>>>>>>>>> The application "cat" is not available in your tc.data
>>>>>>>>>>> catalog
>>>>>>>>>>> Caused by:
>>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException
>>>>>>>>>>> Final status: Failed:1
>>>>>>>>>>> The following errors have occurred:
>>>>>>>>>>> 1. The application "cat" is not available in your tc.data
>>>>>>>>>>> catalog
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files.
>>>>>>>>>>>
>>>>>>>>>>> Could someone indicate if what I am doing is doable and if
>>>>>>>>>>> so
>>>>>>>>>>> how
>>>>>>>>>>> can
>>>>>>>>>>> I correctly configure my sites and tc setup.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Ketan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Michael Wilde
>>>>>>>>>> Computation Institute, University of Chicago
>>>>>>>>>> Mathematics and Computer Science Division
>>>>>>>>>> Argonne National Laboratory
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Michael Wilde
>>>>>>>> Computation Institute, University of Chicago
>>>>>>>> Mathematics and Computer Science Division
>>>>>>>> Argonne National Laboratory
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Michael Wilde
>>>>>> Computation Institute, University of Chicago
>>>>>> Mathematics and Computer Science Division
>>>>>> Argonne National Laboratory
>>>>>>
>>>>
>>>> --
>>>> Michael Wilde
>>>> Computation Institute, University of Chicago
>>>> Mathematics and Computer Science Division
>>>> Argonne National Laboratory
>>>>
>>
>> --
>> Michael Wilde
>> Computation Institute, University of Chicago
>> Mathematics and Computer Science Division
>> Argonne National Laboratory
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
More information about the Swift-devel
mailing list