[Swift-devel] ssh:pbs to beagle

Michael Wilde wilde at mcs.anl.gov
Thu Apr 28 14:11:56 CDT 2011


As far as I can tell from the swift-devel archives, the only feature for disabling coaster security is the -nosec option of the coaster-service command.

- Mike


----- Original Message -----
> Now I think you need to create the same proxy on the Beagle side. For
> starters, just try copying your proxy file from /tmp on communicado to
> /tmp on the Beagle login node on which you are running Swift. Later
> you can do this by creating a proxy on the Beagle size using
> grid-proxy-init, but you'll need to install CA certs there.
> 
> Also, have you considered running a passive coaster server on the
> communicado side, and just having Beagle worker.pl scripts connect
> back to it?
> 
> - Mike
> 
> ----- Original Message -----
> > Ok, I got past CredentialException with grid-proxy-init, now I am
> > facing this (note: I have turned on provider staging) :
> >
> > ========
> > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> > -sites.file
> > beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> > modified
> > locally)
> >
> > RunID: 20110428-1332-llaa031f
> > Progress:
> > Could not start connection handler
> > java.io.EOFException
> > at
> > org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
> > at
> > org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
> > at
> > org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
> > at
> > org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
> > at
> > org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
> > at
> > org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
> > at
> > org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
> > at
> > org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
> > at
> > org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
> > at org.globus.net.BaseServer.run(BaseServer.java:247)
> > at java.lang.Thread.run(Thread.java:662)
> > Progress: Submitted:1
> > Could not start connection handler
> > java.io.EOFException
> > at
> > org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
> > at
> > org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
> > at
> > org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
> > at
> > org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
> > at
> > org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
> > at
> > org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
> > at
> > org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
> > at
> > org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
> > at
> > org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
> > at org.globus.net.BaseServer.run(BaseServer.java:247)
> > at java.lang.Thread.run(Thread.java:662)
> > Progress: Submitted:1
> > Exception in cat:
> > Arguments: [data.txt]
> > Host: beagle-remote-pbs-coasters-ssh
> > Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO:
> > outs
> > ----
> >
> > Caused by: Could not submit job
> > Caused by:
> > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Could not submit job
> > Caused by:
> > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Could not start coaster service
> > Caused by:
> > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Task ended before registration was received.
> > STDOUT:
> > STDERR:
> > Caused by:
> > org.globus.cog.abstraction.impl.common.execution.JobException: Job
> > failed with an exit code of 1
> > Final status: Failed:1
> > The following errors have occurred:
> > 1. Job failed with an exit code of 1
> >
> > ========
> >
> >
> > From bridled to communicado, I see the following error:
> >
> > **************
> > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> > -sites.file
> > coaster-local-ssh-communicado.xml catsn.swift -n=1
> > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> > modified
> > locally)
> >
> > RunID: 20110428-1335-k685b2ye
> > Progress:
> > Progress: Submitted:1
> > Progress: Active:1
> > Exception in cat:
> > Arguments: [data.txt]
> > Host: communicado-ssh
> > Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO:
> > outs
> > ----
> >
> > Caused by: Job failed with an exit code of 524
> > Caused by:
> > org.globus.cog.abstraction.impl.common.execution.JobException: Job
> > failed with an exit code of 524
> > Final status: Failed:1
> > The following errors have occurred:
> > 1. Job failed with an exit code of 524
> >
> > ************
> >
> >
> > --
> > Ketan
> >
> >
> >
> >
> > On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote:
> >
> > > For now - create a proxy using grid-proxy-init on the swift
> > > execution machine.
> > > I think there is an option to set "no security" for this config
> > > but
> > > I cant recall where that is specified. Maybe swift.properties, I
> > > cant recall.
> > >
> > > - Mike
> > >
> > > ----- Original Message -----
> > >> Hi,
> > >>
> > >> It looks better now. However, I am getting the following:
> > >>
> > >> =====
> > >>
> > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> > >> -sites.file
> > >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> > >> modified
> > >> locally)
> > >>
> > >> RunID: 20110428-1251-oi9theh8
> > >> Progress:
> > >> Progress: Stage in:1
> > >> Could not submit job
> > >> Caused by:
> > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > >> Could not submit job
> > >> Caused by:
> > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > >> Could not start coaster service
> > >> Caused by:
> > >> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
> > >> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file
> > >> (/tmp/x509up_u2006) not found.
> > >> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5]
> > >> Proxy
> > >> file (/tmp/x509up_u2006) not found.
> > >> Failed to transfer wrapper log from
> > >> catsn-20110428-1251-oi9theh8/info/e on
> > >> beagle-remote-pbs-coasters-ssh
> > >>
> > >> =====
> > >>
> > >> How do I specify "-nosec" on automatic coasters?
> > >>
> > >> Ketan
> > >>
> > >> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote:
> > >>
> > >>> OK. Was there a cookbook on the ssh settings? Did you set up a
> > >>> $HOME/.ssh/auth.defaults per the user guide?
> > >>>
> > >>> Here is an auth.defaults example. Im not sure its 100% correct,
> > >>> but
> > >>> could serve as a base for you:
> > >>>
> > >>> xlogin1.pads.ci.uchicago.edu.type=password
> > >>> xlogin1.pads.ci.uchicago.edu.username=wilde
> > >>>
> > >>> login.pads.ci.uchicago.edu.type=key
> > >>> login.pads.ci.uchicago.edu.username=wilde
> > >>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> > >>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
> > >>> SURE
> > >>> mode=600!!!
> > >>>
> > >>> login1.pads.ci.uchicago.edu.type=key
> > >>> login1.pads.ci.uchicago.edu.username=wilde
> > >>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> > >>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
> > >>> SURE mode=600!!!
> > >>>
> > >>> login.mcs.anl.gov.type=key
> > >>> login.mcs.anl.gov.username=wilde
> > >>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa
> > >>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE
> > >>> mode=600!!!
> > >>>
> > >>> - Mike
> > >>>
> > >>> ----- Original Message -----
> > >>>> It does look like an ssh problem. I am getting the same stderr
> > >>>> and
> > >>>> log
> > >>>> messages on trying to communicate from Bridled to Communicado.
> > >>>>
> > >>>> Ketan
> > >>>>
> > >>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:
> > >>>>
> > >>>>> Have you already run a simple hellow-world swift test from
> > >>>>> communicado to bridled to make sure you have ssh configured
> > >>>>> correctly? I would do that first.
> > >>>>>
> > >>>>> Im not sure if an ssh problem explains what you show below, or
> > >>>>> not.
> > >>>>>
> > >>>>> - Mike
> > >>>>>
> > >>>>> ----- Original Message -----
> > >>>>>> Thanks, I made the change. However, now, I am getting the
> > >>>>>> following
> > >>>>>> on
> > >>>>>> my stderr
> > >>>>>>
> > >>>>>>
> > >>>>>> ===========
> > >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> > >>>>>> -sites.file
> > >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> > >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> > >>>>>> modified
> > >>>>>> locally)
> > >>>>>>
> > >>>>>> RunID: 20110428-1022-n9s0k0e0
> > >>>>>> Progress:
> > >>>>>> [ketan]
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> [ketan] Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> Progress: Initializing site shared directory:1
> > >>>>>> ========
> > >>>>>>
> > >>>>>> And from the log it seems some network transmission has
> > >>>>>> failed:
> > >>>>>>
> > >>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon
> > >>>>>> Sending
> > >>>>>> SSH_MSG_SERVICE_REQUEST
> > >>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon
> > >>>>>> Received
> > >>>>>> SSH_MSG_SERVICE_ACCEPT
> > >>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The
> > >>>>>> Transport Protocol thread failed
> > >>>>>> java.io.IOException: The socket is EOF
> > >>>>>> at
> > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
> > >>>>>> at
> > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
> > >>>>>> at
> > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
> > >>>>>> at
> > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
> > >>>>>> at
> > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
> > >>>>>> at java.lang.Thread.run(Thread.java:662)
> > >>>>>>
> > >>>>>>
> > >>>>>> Any clues?
> > >>>>>> Ketan
> > >>>>>>
> > >>>>>>
> > >>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
> > >>>>>>
> > >>>>>>> The pool name in your sites file is
> > >>>>>>> pads-remote-pbs-coasters-ssh
> > >>>>>>> but
> > >>>>>>> you used pbs in your tc.data.
> > >>>>>>>
> > >>>>>>> - Mike
> > >>>>>>>
> > >>>>>>> ----- Original Message -----
> > >>>>>>>> Hello,
> > >>>>>>>>
> > >>>>>>>> Some context:
> > >>>>>>>> I am trying to submit a big run on Beagle using swift +
> > >>>>>>>> coasters.
> > >>>>>>>> However, a previous run is already underway on beagle. So,
> > >>>>>>>> there
> > >>>>>>>> are
> > >>>>>>>> two difficulties running a new run from its login node:
> > >>>>>>>>
> > >>>>>>>> 1. Running another swift from the same jvm will result in
> > >>>>>>>> chaos
> > >>>>>>>> on
> > >>>>>>>> the
> > >>>>>>>> logs (As far as I know, please correct me if this is not
> > >>>>>>>> the
> > >>>>>>>> case
> > >>>>>>>> anymore)
> > >>>>>>>>
> > >>>>>>>> 2. Login node is already under load because of my running
> > >>>>>>>> previous
> > >>>>>>>> big
> > >>>>>>>> run.
> > >>>>>>>>
> > >>>>>>>> /context
> > >>>>>>>>
> > >>>>>>>> So, I am now trying to submit this big run from a remote
> > >>>>>>>> host
> > >>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs,
> > >>>>>>>> provider
> > >>>>>>>> coaster. I tried the similar approach on a trial swift
> > >>>>>>>> script
> > >>>>>>>> but
> > >>>>>>>> getting error.
> > >>>>>>>>
> > >>>>>>>> Following is the error message:
> > >>>>>>>>
> > >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> > >>>>>>>> -sites.file
> > >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> > >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088
> > >>>>>>>> (cog
> > >>>>>>>> modified
> > >>>>>>>> locally)
> > >>>>>>>>
> > >>>>>>>> RunID: 20110428-1002-c8rvqhe6
> > >>>>>>>> Progress:
> > >>>>>>>> The application "cat" is not available in your tc.data
> > >>>>>>>> catalog
> > >>>>>>>> Caused by:
> > >>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException
> > >>>>>>>> Final status: Failed:1
> > >>>>>>>> The following errors have occurred:
> > >>>>>>>> 1. The application "cat" is not available in your tc.data
> > >>>>>>>> catalog
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Attached are my .swift, sites.xml and tc.data files.
> > >>>>>>>>
> > >>>>>>>> Could someone indicate if what I am doing is doable and if
> > >>>>>>>> so
> > >>>>>>>> how
> > >>>>>>>> can
> > >>>>>>>> I correctly configure my sites and tc setup.
> > >>>>>>>>
> > >>>>>>>> Thanks.
> > >>>>>>>> Ketan
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> _______________________________________________
> > >>>>>>>> Swift-devel mailing list
> > >>>>>>>> Swift-devel at ci.uchicago.edu
> > >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Michael Wilde
> > >>>>>>> Computation Institute, University of Chicago
> > >>>>>>> Mathematics and Computer Science Division
> > >>>>>>> Argonne National Laboratory
> > >>>>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Michael Wilde
> > >>>>> Computation Institute, University of Chicago
> > >>>>> Mathematics and Computer Science Division
> > >>>>> Argonne National Laboratory
> > >>>>>
> > >>>
> > >>> --
> > >>> Michael Wilde
> > >>> Computation Institute, University of Chicago
> > >>> Mathematics and Computer Science Division
> > >>> Argonne National Laboratory
> > >>>
> > >
> > > --
> > > Michael Wilde
> > > Computation Institute, University of Chicago
> > > Mathematics and Computer Science Division
> > > Argonne National Laboratory
> > >
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list