[Swift-devel] ssh:pbs to beagle

Michael Wilde wilde at mcs.anl.gov
Thu Apr 28 12:00:24 CDT 2011


OK. Was there a cookbook on the ssh settings? Did you set up a $HOME/.ssh/auth.defaults per the user guide?

Here is an auth.defaults example. Im not sure its 100% correct, but could serve as a base for you:

xlogin1.pads.ci.uchicago.edu.type=password
xlogin1.pads.ci.uchicago.edu.username=wilde

login.pads.ci.uchicago.edu.type=key
login.pads.ci.uchicago.edu.username=wilde
login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!!

login1.pads.ci.uchicago.edu.type=key
login1.pads.ci.uchicago.edu.username=wilde
login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!!

login.mcs.anl.gov.type=key
login.mcs.anl.gov.username=wilde
login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa
login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE mode=600!!!

- Mike

----- Original Message -----
> It does look like an ssh problem. I am getting the same stderr and log
> messages on trying to communicate from Bridled to Communicado.
> 
> Ketan
> 
> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:
> 
> > Have you already run a simple hellow-world swift test from
> > communicado to bridled to make sure you have ssh configured
> > correctly? I would do that first.
> >
> > Im not sure if an ssh problem explains what you show below, or not.
> >
> > - Mike
> >
> > ----- Original Message -----
> >> Thanks, I made the change. However, now, I am getting the following
> >> on
> >> my stderr
> >>
> >>
> >> ===========
> >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >> -sites.file
> >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> >> modified
> >> locally)
> >>
> >> RunID: 20110428-1022-n9s0k0e0
> >> Progress:
> >> [ketan]
> >> Progress: Initializing site shared directory:1
> >> [ketan] Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> Progress: Initializing site shared directory:1
> >> ========
> >>
> >> And from the log it seems some network transmission has failed:
> >>
> >> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending
> >> SSH_MSG_SERVICE_REQUEST
> >> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon Received
> >> SSH_MSG_SERVICE_ACCEPT
> >> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The
> >> Transport Protocol thread failed
> >> java.io.IOException: The socket is EOF
> >> at
> >> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
> >> at
> >> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
> >> at
> >> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
> >> at
> >> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
> >> at
> >> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
> >> at java.lang.Thread.run(Thread.java:662)
> >>
> >>
> >> Any clues?
> >> Ketan
> >>
> >>
> >> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
> >>
> >>> The pool name in your sites file is pads-remote-pbs-coasters-ssh
> >>> but
> >>> you used pbs in your tc.data.
> >>>
> >>> - Mike
> >>>
> >>> ----- Original Message -----
> >>>> Hello,
> >>>>
> >>>> Some context:
> >>>> I am trying to submit a big run on Beagle using swift + coasters.
> >>>> However, a previous run is already underway on beagle. So, there
> >>>> are
> >>>> two difficulties running a new run from its login node:
> >>>>
> >>>> 1. Running another swift from the same jvm will result in chaos
> >>>> on
> >>>> the
> >>>> logs (As far as I know, please correct me if this is not the case
> >>>> anymore)
> >>>>
> >>>> 2. Login node is already under load because of my running
> >>>> previous
> >>>> big
> >>>> run.
> >>>>
> >>>> /context
> >>>>
> >>>> So, I am now trying to submit this big run from a remote host
> >>>> (bridled). I know this has been done on PADS using ssh:pbs,
> >>>> provider
> >>>> coaster. I tried the similar approach on a trial swift script but
> >>>> getting error.
> >>>>
> >>>> Following is the error message:
> >>>>
> >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >>>> -sites.file
> >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> >>>> modified
> >>>> locally)
> >>>>
> >>>> RunID: 20110428-1002-c8rvqhe6
> >>>> Progress:
> >>>> The application "cat" is not available in your tc.data catalog
> >>>> Caused by:
> >>>> org.globus.cog.karajan.scheduler.NoSuchResourceException
> >>>> Final status: Failed:1
> >>>> The following errors have occurred:
> >>>> 1. The application "cat" is not available in your tc.data catalog
> >>>>
> >>>>
> >>>> Attached are my .swift, sites.xml and tc.data files.
> >>>>
> >>>> Could someone indicate if what I am doing is doable and if so how
> >>>> can
> >>>> I correctly configure my sites and tc setup.
> >>>>
> >>>> Thanks.
> >>>> Ketan
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>
> >>> --
> >>> Michael Wilde
> >>> Computation Institute, University of Chicago
> >>> Mathematics and Computer Science Division
> >>> Argonne National Laboratory
> >>>
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list