[Swift-devel] ssh:pbs to beagle

Ketan Maheshwari ketancmaheshwari at gmail.com
Thu Apr 28 11:47:52 CDT 2011


It does look like an ssh problem. I am getting the same stderr and log messages on trying to communicate from Bridled to Communicado.

Ketan

On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:

> Have you already run a simple hellow-world swift test from communicado to bridled to make sure you have ssh configured correctly? I would do that first.
> 
> Im not sure if an ssh problem explains what you show below, or not.
> 
> - Mike
> 
> ----- Original Message -----
>> Thanks, I made the change. However, now, I am getting the following on
>> my stderr
>> 
>> 
>> ===========
>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file
>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified
>> locally)
>> 
>> RunID: 20110428-1022-n9s0k0e0
>> Progress:
>> [ketan]
>> Progress: Initializing site shared directory:1
>> [ketan] Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> Progress: Initializing site shared directory:1
>> ========
>> 
>> And from the log it seems some network transmission has failed:
>> 
>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending
>> SSH_MSG_SERVICE_REQUEST
>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon Received
>> SSH_MSG_SERVICE_ACCEPT
>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The
>> Transport Protocol thread failed
>> java.io.IOException: The socket is EOF
>> at
>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
>> at
>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
>> at
>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
>> at
>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
>> at
>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
>> at java.lang.Thread.run(Thread.java:662)
>> 
>> 
>> Any clues?
>> Ketan
>> 
>> 
>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
>> 
>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh but
>>> you used pbs in your tc.data.
>>> 
>>> - Mike
>>> 
>>> ----- Original Message -----
>>>> Hello,
>>>> 
>>>> Some context:
>>>> I am trying to submit a big run on Beagle using swift + coasters.
>>>> However, a previous run is already underway on beagle. So, there
>>>> are
>>>> two difficulties running a new run from its login node:
>>>> 
>>>> 1. Running another swift from the same jvm will result in chaos on
>>>> the
>>>> logs (As far as I know, please correct me if this is not the case
>>>> anymore)
>>>> 
>>>> 2. Login node is already under load because of my running previous
>>>> big
>>>> run.
>>>> 
>>>> /context
>>>> 
>>>> So, I am now trying to submit this big run from a remote host
>>>> (bridled). I know this has been done on PADS using ssh:pbs,
>>>> provider
>>>> coaster. I tried the similar approach on a trial swift script but
>>>> getting error.
>>>> 
>>>> Following is the error message:
>>>> 
>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
>>>> -sites.file
>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>>> modified
>>>> locally)
>>>> 
>>>> RunID: 20110428-1002-c8rvqhe6
>>>> Progress:
>>>> The application "cat" is not available in your tc.data catalog
>>>> Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException
>>>> Final status: Failed:1
>>>> The following errors have occurred:
>>>> 1. The application "cat" is not available in your tc.data catalog
>>>> 
>>>> 
>>>> Attached are my .swift, sites.xml and tc.data files.
>>>> 
>>>> Could someone indicate if what I am doing is doable and if so how
>>>> can
>>>> I correctly configure my sites and tc setup.
>>>> 
>>>> Thanks.
>>>> Ketan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>> 
>>> --
>>> Michael Wilde
>>> Computation Institute, University of Chicago
>>> Mathematics and Computer Science Division
>>> Argonne National Laboratory
>>> 
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 




More information about the Swift-devel mailing list