[Swift-devel] ssh:pbs to beagle

Michael Wilde wilde at mcs.anl.gov
Thu Apr 28 13:46:41 CDT 2011


Now I think you need to create the same proxy on the Beagle side. For starters, just try copying your proxy file from /tmp on communicado to /tmp on the Beagle login node on which you are running Swift. Later you can do this by creating a proxy on the Beagle size using grid-proxy-init, but you'll need to install CA certs there.

Also, have you considered running a passive coaster server on the communicado side, and just having Beagle worker.pl scripts connect back to it?

- Mike

----- Original Message -----
> Ok, I got past CredentialException with grid-proxy-init, now I am
> facing this (note: I have turned on provider staging) :
> 
> ========
> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file
> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified
> locally)
> 
> RunID: 20110428-1332-llaa031f
> Progress:
> Could not start connection handler
> java.io.EOFException
> at
> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
> at
> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
> at
> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
> at
> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
> at
> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
> at
> org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
> at
> org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
> at
> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
> at org.globus.net.BaseServer.run(BaseServer.java:247)
> at java.lang.Thread.run(Thread.java:662)
> Progress: Submitted:1
> Could not start connection handler
> java.io.EOFException
> at
> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
> at
> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
> at
> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
> at
> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
> at
> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
> at
> org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
> at
> org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
> at
> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
> at org.globus.net.BaseServer.run(BaseServer.java:247)
> at java.lang.Thread.run(Thread.java:662)
> Progress: Submitted:1
> Exception in cat:
> Arguments: [data.txt]
> Host: beagle-remote-pbs-coasters-ssh
> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs
> ----
> 
> Caused by: Could not submit job
> Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could not submit job
> Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could not start coaster service
> Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Task ended before registration was received.
> STDOUT:
> STDERR:
> Caused by:
> org.globus.cog.abstraction.impl.common.execution.JobException: Job
> failed with an exit code of 1
> Final status: Failed:1
> The following errors have occurred:
> 1. Job failed with an exit code of 1
> 
> ========
> 
> 
> From bridled to communicado, I see the following error:
> 
> **************
> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file
> coaster-local-ssh-communicado.xml catsn.swift -n=1
> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified
> locally)
> 
> RunID: 20110428-1335-k685b2ye
> Progress:
> Progress: Submitted:1
> Progress: Active:1
> Exception in cat:
> Arguments: [data.txt]
> Host: communicado-ssh
> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs
> ----
> 
> Caused by: Job failed with an exit code of 524
> Caused by:
> org.globus.cog.abstraction.impl.common.execution.JobException: Job
> failed with an exit code of 524
> Final status: Failed:1
> The following errors have occurred:
> 1. Job failed with an exit code of 524
> 
> ************
> 
> 
> --
> Ketan
> 
> 
> 
> 
> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote:
> 
> > For now - create a proxy using grid-proxy-init on the swift
> > execution machine.
> > I think there is an option to set "no security" for this config but
> > I cant recall where that is specified. Maybe swift.properties, I
> > cant recall.
> >
> > - Mike
> >
> > ----- Original Message -----
> >> Hi,
> >>
> >> It looks better now. However, I am getting the following:
> >>
> >> =====
> >>
> >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >> -sites.file
> >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> >> modified
> >> locally)
> >>
> >> RunID: 20110428-1251-oi9theh8
> >> Progress:
> >> Progress: Stage in:1
> >> Could not submit job
> >> Caused by:
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> >> Could not submit job
> >> Caused by:
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> >> Could not start coaster service
> >> Caused by:
> >> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
> >> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file
> >> (/tmp/x509up_u2006) not found.
> >> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5]
> >> Proxy
> >> file (/tmp/x509up_u2006) not found.
> >> Failed to transfer wrapper log from
> >> catsn-20110428-1251-oi9theh8/info/e on
> >> beagle-remote-pbs-coasters-ssh
> >>
> >> =====
> >>
> >> How do I specify "-nosec" on automatic coasters?
> >>
> >> Ketan
> >>
> >> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote:
> >>
> >>> OK. Was there a cookbook on the ssh settings? Did you set up a
> >>> $HOME/.ssh/auth.defaults per the user guide?
> >>>
> >>> Here is an auth.defaults example. Im not sure its 100% correct,
> >>> but
> >>> could serve as a base for you:
> >>>
> >>> xlogin1.pads.ci.uchicago.edu.type=password
> >>> xlogin1.pads.ci.uchicago.edu.username=wilde
> >>>
> >>> login.pads.ci.uchicago.edu.type=key
> >>> login.pads.ci.uchicago.edu.username=wilde
> >>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> >>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
> >>> SURE
> >>> mode=600!!!
> >>>
> >>> login1.pads.ci.uchicago.edu.type=key
> >>> login1.pads.ci.uchicago.edu.username=wilde
> >>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> >>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
> >>> SURE mode=600!!!
> >>>
> >>> login.mcs.anl.gov.type=key
> >>> login.mcs.anl.gov.username=wilde
> >>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa
> >>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE
> >>> mode=600!!!
> >>>
> >>> - Mike
> >>>
> >>> ----- Original Message -----
> >>>> It does look like an ssh problem. I am getting the same stderr
> >>>> and
> >>>> log
> >>>> messages on trying to communicate from Bridled to Communicado.
> >>>>
> >>>> Ketan
> >>>>
> >>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:
> >>>>
> >>>>> Have you already run a simple hellow-world swift test from
> >>>>> communicado to bridled to make sure you have ssh configured
> >>>>> correctly? I would do that first.
> >>>>>
> >>>>> Im not sure if an ssh problem explains what you show below, or
> >>>>> not.
> >>>>>
> >>>>> - Mike
> >>>>>
> >>>>> ----- Original Message -----
> >>>>>> Thanks, I made the change. However, now, I am getting the
> >>>>>> following
> >>>>>> on
> >>>>>> my stderr
> >>>>>>
> >>>>>>
> >>>>>> ===========
> >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >>>>>> -sites.file
> >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> >>>>>> modified
> >>>>>> locally)
> >>>>>>
> >>>>>> RunID: 20110428-1022-n9s0k0e0
> >>>>>> Progress:
> >>>>>> [ketan]
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> [ketan] Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> Progress: Initializing site shared directory:1
> >>>>>> ========
> >>>>>>
> >>>>>> And from the log it seems some network transmission has failed:
> >>>>>>
> >>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon
> >>>>>> Sending
> >>>>>> SSH_MSG_SERVICE_REQUEST
> >>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon
> >>>>>> Received
> >>>>>> SSH_MSG_SERVICE_ACCEPT
> >>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The
> >>>>>> Transport Protocol thread failed
> >>>>>> java.io.IOException: The socket is EOF
> >>>>>> at
> >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
> >>>>>> at
> >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
> >>>>>> at
> >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
> >>>>>> at
> >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
> >>>>>> at
> >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
> >>>>>> at java.lang.Thread.run(Thread.java:662)
> >>>>>>
> >>>>>>
> >>>>>> Any clues?
> >>>>>> Ketan
> >>>>>>
> >>>>>>
> >>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
> >>>>>>
> >>>>>>> The pool name in your sites file is
> >>>>>>> pads-remote-pbs-coasters-ssh
> >>>>>>> but
> >>>>>>> you used pbs in your tc.data.
> >>>>>>>
> >>>>>>> - Mike
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> Some context:
> >>>>>>>> I am trying to submit a big run on Beagle using swift +
> >>>>>>>> coasters.
> >>>>>>>> However, a previous run is already underway on beagle. So,
> >>>>>>>> there
> >>>>>>>> are
> >>>>>>>> two difficulties running a new run from its login node:
> >>>>>>>>
> >>>>>>>> 1. Running another swift from the same jvm will result in
> >>>>>>>> chaos
> >>>>>>>> on
> >>>>>>>> the
> >>>>>>>> logs (As far as I know, please correct me if this is not the
> >>>>>>>> case
> >>>>>>>> anymore)
> >>>>>>>>
> >>>>>>>> 2. Login node is already under load because of my running
> >>>>>>>> previous
> >>>>>>>> big
> >>>>>>>> run.
> >>>>>>>>
> >>>>>>>> /context
> >>>>>>>>
> >>>>>>>> So, I am now trying to submit this big run from a remote host
> >>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs,
> >>>>>>>> provider
> >>>>>>>> coaster. I tried the similar approach on a trial swift script
> >>>>>>>> but
> >>>>>>>> getting error.
> >>>>>>>>
> >>>>>>>> Following is the error message:
> >>>>>>>>
> >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >>>>>>>> -sites.file
> >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> >>>>>>>> modified
> >>>>>>>> locally)
> >>>>>>>>
> >>>>>>>> RunID: 20110428-1002-c8rvqhe6
> >>>>>>>> Progress:
> >>>>>>>> The application "cat" is not available in your tc.data
> >>>>>>>> catalog
> >>>>>>>> Caused by:
> >>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException
> >>>>>>>> Final status: Failed:1
> >>>>>>>> The following errors have occurred:
> >>>>>>>> 1. The application "cat" is not available in your tc.data
> >>>>>>>> catalog
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Attached are my .swift, sites.xml and tc.data files.
> >>>>>>>>
> >>>>>>>> Could someone indicate if what I am doing is doable and if so
> >>>>>>>> how
> >>>>>>>> can
> >>>>>>>> I correctly configure my sites and tc setup.
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>> Ketan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Swift-devel mailing list
> >>>>>>>> Swift-devel at ci.uchicago.edu
> >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>>>>
> >>>>>>> --
> >>>>>>> Michael Wilde
> >>>>>>> Computation Institute, University of Chicago
> >>>>>>> Mathematics and Computer Science Division
> >>>>>>> Argonne National Laboratory
> >>>>>>>
> >>>>>
> >>>>> --
> >>>>> Michael Wilde
> >>>>> Computation Institute, University of Chicago
> >>>>> Mathematics and Computer Science Division
> >>>>> Argonne National Laboratory
> >>>>>
> >>>
> >>> --
> >>> Michael Wilde
> >>> Computation Institute, University of Chicago
> >>> Mathematics and Computer Science Division
> >>> Argonne National Laboratory
> >>>
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list