[Swift-devel] ssh:pbs to beagle
Michael Wilde
wilde at mcs.anl.gov
Thu Apr 28 13:03:46 CDT 2011
For now - create a proxy using grid-proxy-init on the swift execution machine.
I think there is an option to set "no security" for this config but I cant recall where that is specified. Maybe swift.properties, I cant recall.
- Mike
----- Original Message -----
> Hi,
>
> It looks better now. However, I am getting the following:
>
> =====
>
> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file
> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified
> locally)
>
> RunID: 20110428-1251-oi9theh8
> Progress:
> Progress: Stage in:1
> Could not submit job
> Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could not submit job
> Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could not start coaster service
> Caused by:
> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file
> (/tmp/x509up_u2006) not found.
> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy
> file (/tmp/x509up_u2006) not found.
> Failed to transfer wrapper log from
> catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh
>
> =====
>
> How do I specify "-nosec" on automatic coasters?
>
> Ketan
>
> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote:
>
> > OK. Was there a cookbook on the ssh settings? Did you set up a
> > $HOME/.ssh/auth.defaults per the user guide?
> >
> > Here is an auth.defaults example. Im not sure its 100% correct, but
> > could serve as a base for you:
> >
> > xlogin1.pads.ci.uchicago.edu.type=password
> > xlogin1.pads.ci.uchicago.edu.username=wilde
> >
> > login.pads.ci.uchicago.edu.type=key
> > login.pads.ci.uchicago.edu.username=wilde
> > login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> > login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE
> > mode=600!!!
> >
> > login1.pads.ci.uchicago.edu.type=key
> > login1.pads.ci.uchicago.edu.username=wilde
> > login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> > login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE
> > SURE mode=600!!!
> >
> > login.mcs.anl.gov.type=key
> > login.mcs.anl.gov.username=wilde
> > login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa
> > login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE
> > mode=600!!!
> >
> > - Mike
> >
> > ----- Original Message -----
> >> It does look like an ssh problem. I am getting the same stderr and
> >> log
> >> messages on trying to communicate from Bridled to Communicado.
> >>
> >> Ketan
> >>
> >> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:
> >>
> >>> Have you already run a simple hellow-world swift test from
> >>> communicado to bridled to make sure you have ssh configured
> >>> correctly? I would do that first.
> >>>
> >>> Im not sure if an ssh problem explains what you show below, or
> >>> not.
> >>>
> >>> - Mike
> >>>
> >>> ----- Original Message -----
> >>>> Thanks, I made the change. However, now, I am getting the
> >>>> following
> >>>> on
> >>>> my stderr
> >>>>
> >>>>
> >>>> ===========
> >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >>>> -sites.file
> >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> >>>> modified
> >>>> locally)
> >>>>
> >>>> RunID: 20110428-1022-n9s0k0e0
> >>>> Progress:
> >>>> [ketan]
> >>>> Progress: Initializing site shared directory:1
> >>>> [ketan] Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> Progress: Initializing site shared directory:1
> >>>> ========
> >>>>
> >>>> And from the log it seems some network transmission has failed:
> >>>>
> >>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending
> >>>> SSH_MSG_SERVICE_REQUEST
> >>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon
> >>>> Received
> >>>> SSH_MSG_SERVICE_ACCEPT
> >>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The
> >>>> Transport Protocol thread failed
> >>>> java.io.IOException: The socket is EOF
> >>>> at
> >>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
> >>>> at
> >>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
> >>>> at
> >>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
> >>>> at
> >>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
> >>>> at
> >>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
> >>>> at java.lang.Thread.run(Thread.java:662)
> >>>>
> >>>>
> >>>> Any clues?
> >>>> Ketan
> >>>>
> >>>>
> >>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
> >>>>
> >>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh
> >>>>> but
> >>>>> you used pbs in your tc.data.
> >>>>>
> >>>>> - Mike
> >>>>>
> >>>>> ----- Original Message -----
> >>>>>> Hello,
> >>>>>>
> >>>>>> Some context:
> >>>>>> I am trying to submit a big run on Beagle using swift +
> >>>>>> coasters.
> >>>>>> However, a previous run is already underway on beagle. So,
> >>>>>> there
> >>>>>> are
> >>>>>> two difficulties running a new run from its login node:
> >>>>>>
> >>>>>> 1. Running another swift from the same jvm will result in chaos
> >>>>>> on
> >>>>>> the
> >>>>>> logs (As far as I know, please correct me if this is not the
> >>>>>> case
> >>>>>> anymore)
> >>>>>>
> >>>>>> 2. Login node is already under load because of my running
> >>>>>> previous
> >>>>>> big
> >>>>>> run.
> >>>>>>
> >>>>>> /context
> >>>>>>
> >>>>>> So, I am now trying to submit this big run from a remote host
> >>>>>> (bridled). I know this has been done on PADS using ssh:pbs,
> >>>>>> provider
> >>>>>> coaster. I tried the similar approach on a trial swift script
> >>>>>> but
> >>>>>> getting error.
> >>>>>>
> >>>>>> Following is the error message:
> >>>>>>
> >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >>>>>> -sites.file
> >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> >>>>>> modified
> >>>>>> locally)
> >>>>>>
> >>>>>> RunID: 20110428-1002-c8rvqhe6
> >>>>>> Progress:
> >>>>>> The application "cat" is not available in your tc.data catalog
> >>>>>> Caused by:
> >>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException
> >>>>>> Final status: Failed:1
> >>>>>> The following errors have occurred:
> >>>>>> 1. The application "cat" is not available in your tc.data
> >>>>>> catalog
> >>>>>>
> >>>>>>
> >>>>>> Attached are my .swift, sites.xml and tc.data files.
> >>>>>>
> >>>>>> Could someone indicate if what I am doing is doable and if so
> >>>>>> how
> >>>>>> can
> >>>>>> I correctly configure my sites and tc setup.
> >>>>>>
> >>>>>> Thanks.
> >>>>>> Ketan
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Swift-devel mailing list
> >>>>>> Swift-devel at ci.uchicago.edu
> >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>>
> >>>>> --
> >>>>> Michael Wilde
> >>>>> Computation Institute, University of Chicago
> >>>>> Mathematics and Computer Science Division
> >>>>> Argonne National Laboratory
> >>>>>
> >>>
> >>> --
> >>> Michael Wilde
> >>> Computation Institute, University of Chicago
> >>> Mathematics and Computer Science Division
> >>> Argonne National Laboratory
> >>>
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list