[Swift-devel] ssh:pbs to beagle

Michael Wilde wilde at mcs.anl.gov
Thu Apr 28 15:08:28 CDT 2011


524 is most likely an error exit code generated from worker.pl - you can typically find the reason by looking for that message number in the worker.pl source.

- Mike

----- Original Message -----
> The EOFException persists.
> 
> However, on bridled-communicado I get this one:
> 
> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file
> coaster-local-ssh-communicado.xml catsn.swift -n=1
> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified
> locally)
> 
> RunID: 20110428-1457-r7bzx1ha
> Progress:
> Progress: Active:1
> Exception in cat:
> Arguments: [data.txt]
> Host: communicado-ssh
> Directory: catsn-20110428-1457-r7bzx1ha/jobs/t/cat-tlf05d9kTODO: outs
> ----
> 
> Caused by: Job failed with an exit code of 524
> Caused by:
> org.globus.cog.abstraction.impl.common.execution.JobException: Job
> failed with an exit code of 524
> Final status: Failed:1
> The following errors have occurred:
> 1. Job failed with an exit code of 524
> 
> 
> Any clue what could it be due to?
> 
> Ketan
> 
> On Apr 28, 2011, at 2:49 PM, Michael Wilde wrote:
> 
> > Copying these might work for you, Ketan:
> >
> > com$ env | grep 509
> > X509_CERT_DIR=/home/wilde/TRUSTEDCA
> > X509_CADIR=/home/wilde/TRUSTEDCA
> > com$
> >
> >
> > ----- Original Message -----
> >> You have a bunch of uknown CA errors in there.
> >>
> >> You should have the CA public key for your proxy in
> >> ~/.globus/certificates (on both client and server machines).
> >>
> >> Mihael
> >>
> >> On Thu, 2011-04-28 at 14:29 -0500, Ketan Maheshwari wrote:
> >>> They are here : /home/ketan/.globus/coasters
> >>>
> >>>
> >>> On Apr 28, 2011, at 2:26 PM, Mihael Hategan wrote:
> >>>
> >>>> That EOFException doesn't make much sense.
> >>>>
> >>>> On beagle you should have something called coaster.log in
> >>>> ~/.globus/coasters.
> >>>>
> >>>> Can post a link to that?
> >>>>
> >>>> Mihael
> >>>>
> >>>> On Thu, 2011-04-28 at 14:21 -0500, Ketan Maheshwari wrote:
> >>>>> On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote:
> >>>>>
> >>>>>> What does your sites file look like?
> >>>>>
> >>>>> ** For beagle **
> >>>>>
> >>>>> <config>
> >>>>>   <!--<pool handle="pbs">-->
> >>>>> <pool handle="beagle-remote-pbs-coasters-ssh">
> >>>>>   <execution provider="coaster"
> >>>>>   url="login1.beagle.ci.uchicago.edu" jobmanager="ssh:pbs"/>
> >>>>>   <profile namespace="globus"
> >>>>>   key="project">CI-CCR000013</profile>
> >>>>>
> >>>>>   <profile namespace="globus" key="ppn">24:cray:pack</profile>
> >>>>>
> >>>>>   <profile namespace="globus" key="workersPerNode">24</profile>
> >>>>>   <profile namespace="globus" key="maxTime">1000</profile>
> >>>>>   <profile namespace="globus" key="slots">1</profile>
> >>>>>   <profile namespace="globus" key="nodeGranularity">1</profile>
> >>>>>   <profile namespace="globus" key="maxNodes">1</profile>
> >>>>>
> >>>>>   <profile namespace="karajan" key="jobThrottle">.63</profile>
> >>>>>   <profile namespace="karajan"
> >>>>>   key="initialScore">10000</profile>
> >>>>>
> >>>>>   <filesystem provider="ssh" url="login1.beagle.ci.uchicago.edu"
> >>>>>   />
> >>>>>   <workdirectory>$HOME/swift.workdir</workdirectory>
> >>>>> </pool>
> >>>>> </config>
> >>>>>
> >>>>>
> >>>>>
> >>>>> ** for communicado **
> >>>>>
> >>>>> <config>
> >>>>>   <!--<pool handle="pbs">-->
> >>>>> <pool handle="communicado-ssh">
> >>>>>   <execution provider="coaster"
> >>>>>   url="communicado.ci.uchicago.edu" jobmanager="ssh:ssh"/>
> >>>>>
> >>>>>   <profile namespace="karajan" key="jobThrottle">.63</profile>
> >>>>>   <profile namespace="karajan"
> >>>>>   key="initialScore">10000</profile>
> >>>>>
> >>>>>   <filesystem provider="ssh" url="communicado.ci.uchicago.edu"
> >>>>>   />
> >>>>>   <workdirectory>$HOME/swift.workdir</workdirectory>
> >>>>> </pool>
> >>>>> </config>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote:
> >>>>>>> Ok, I got past CredentialException with grid-proxy-init, now I
> >>>>>>> am facing this (note: I have turned on provider staging) :
> >>>>>>>
> >>>>>>> ========
> >>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >>>>>>> -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> >>>>>>> modified locally)
> >>>>>>>
> >>>>>>> RunID: 20110428-1332-llaa031f
> >>>>>>> Progress:
> >>>>>>> Could not start connection handler
> >>>>>>> java.io.EOFException
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
> >>>>>>> 	at
> >>>>>>> 	org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
> >>>>>>> 	at
> >>>>>>> 	org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
> >>>>>>> 	at
> >>>>>>> 	org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
> >>>>>>> 	at
> >>>>>>> 	org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
> >>>>>>> 	at org.globus.net.BaseServer.run(BaseServer.java:247)
> >>>>>>> 	at java.lang.Thread.run(Thread.java:662)
> >>>>>>> Progress: Submitted:1
> >>>>>>> Could not start connection handler
> >>>>>>> java.io.EOFException
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61)
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65)
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127)
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147)
> >>>>>>> 	at
> >>>>>>> 	org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177)
> >>>>>>> 	at
> >>>>>>> 	org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30)
> >>>>>>> 	at
> >>>>>>> 	org.globus.cog.karajan.workflow.service.channels.GSSChannel.<init>(GSSChannel.java:47)
> >>>>>>> 	at
> >>>>>>> 	org.globus.cog.karajan.workflow.service.ConnectionHandler.<init>(ConnectionHandler.java:41)
> >>>>>>> 	at
> >>>>>>> 	org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63)
> >>>>>>> 	at org.globus.net.BaseServer.run(BaseServer.java:247)
> >>>>>>> 	at java.lang.Thread.run(Thread.java:662)
> >>>>>>> Progress: Submitted:1
> >>>>>>> Exception in cat:
> >>>>>>> Arguments: [data.txt]
> >>>>>>> Host: beagle-remote-pbs-coasters-ssh
> >>>>>>> Directory:
> >>>>>>> catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs
> >>>>>>> ----
> >>>>>>>
> >>>>>>> Caused by: Could not submit job
> >>>>>>> Caused by:
> >>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> >>>>>>> Could not submit job
> >>>>>>> Caused by:
> >>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> >>>>>>> Could not start coaster service
> >>>>>>> Caused by:
> >>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> >>>>>>> Task ended before registration was received.
> >>>>>>> STDOUT:
> >>>>>>> STDERR:
> >>>>>>> Caused by:
> >>>>>>> org.globus.cog.abstraction.impl.common.execution.JobException:
> >>>>>>> Job failed with an exit code of 1
> >>>>>>> Final status: Failed:1
> >>>>>>> The following errors have occurred:
> >>>>>>> 1. Job failed with an exit code of 1
> >>>>>>>
> >>>>>>> ========
> >>>>>>>
> >>>>>>>
> >>>>>>> From bridled to communicado, I see the following error:
> >>>>>>>
> >>>>>>> **************
> >>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >>>>>>> -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1
> >>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
> >>>>>>> modified locally)
> >>>>>>>
> >>>>>>> RunID: 20110428-1335-k685b2ye
> >>>>>>> Progress:
> >>>>>>> Progress: Submitted:1
> >>>>>>> Progress: Active:1
> >>>>>>> Exception in cat:
> >>>>>>> Arguments: [data.txt]
> >>>>>>> Host: communicado-ssh
> >>>>>>> Directory:
> >>>>>>> catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs
> >>>>>>> ----
> >>>>>>>
> >>>>>>> Caused by: Job failed with an exit code of 524
> >>>>>>> Caused by:
> >>>>>>> org.globus.cog.abstraction.impl.common.execution.JobException:
> >>>>>>> Job failed with an exit code of 524
> >>>>>>> Final status: Failed:1
> >>>>>>> The following errors have occurred:
> >>>>>>> 1. Job failed with an exit code of 524
> >>>>>>>
> >>>>>>> ************
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Ketan
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote:
> >>>>>>>
> >>>>>>>> For now - create a proxy using grid-proxy-init on the swift
> >>>>>>>> execution machine.
> >>>>>>>> I think there is an option to set "no security" for this
> >>>>>>>> config but I cant recall where that is specified. Maybe
> >>>>>>>> swift.properties, I cant recall.
> >>>>>>>>
> >>>>>>>> - Mike
> >>>>>>>>
> >>>>>>>> ----- Original Message -----
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> It looks better now. However, I am getting the following:
> >>>>>>>>>
> >>>>>>>>> =====
> >>>>>>>>>
> >>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc
> >>>>>>>>> -sites.file
> >>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088
> >>>>>>>>> (cog
> >>>>>>>>> modified
> >>>>>>>>> locally)
> >>>>>>>>>
> >>>>>>>>> RunID: 20110428-1251-oi9theh8
> >>>>>>>>> Progress:
> >>>>>>>>> Progress: Stage in:1
> >>>>>>>>> Could not submit job
> >>>>>>>>> Caused by:
> >>>>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> >>>>>>>>> Could not submit job
> >>>>>>>>> Caused by:
> >>>>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> >>>>>>>>> Could not start coaster service
> >>>>>>>>> Caused by:
> >>>>>>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
> >>>>>>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy
> >>>>>>>>> file
> >>>>>>>>> (/tmp/x509up_u2006) not found.
> >>>>>>>>> Caused by: org.globus.gsi.GlobusCredentialException:
> >>>>>>>>> [JGLOBUS-5] Proxy
> >>>>>>>>> file (/tmp/x509up_u2006) not found.
> >>>>>>>>> Failed to transfer wrapper log from
> >>>>>>>>> catsn-20110428-1251-oi9theh8/info/e on
> >>>>>>>>> beagle-remote-pbs-coasters-ssh
> >>>>>>>>>
> >>>>>>>>> =====
> >>>>>>>>>
> >>>>>>>>> How do I specify "-nosec" on automatic coasters?
> >>>>>>>>>
> >>>>>>>>> Ketan
> >>>>>>>>>
> >>>>>>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote:
> >>>>>>>>>
> >>>>>>>>>> OK. Was there a cookbook on the ssh settings? Did you set
> >>>>>>>>>> up
> >>>>>>>>>> a
> >>>>>>>>>> $HOME/.ssh/auth.defaults per the user guide?
> >>>>>>>>>>
> >>>>>>>>>> Here is an auth.defaults example. Im not sure its 100%
> >>>>>>>>>> correct, but
> >>>>>>>>>> could serve as a base for you:
> >>>>>>>>>>
> >>>>>>>>>> xlogin1.pads.ci.uchicago.edu.type=password
> >>>>>>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde
> >>>>>>>>>>
> >>>>>>>>>> login.pads.ci.uchicago.edu.type=key
> >>>>>>>>>> login.pads.ci.uchicago.edu.username=wilde
> >>>>>>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> >>>>>>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere #
> >>>>>>>>>> MAKE SURE
> >>>>>>>>>> mode=600!!!
> >>>>>>>>>>
> >>>>>>>>>> login1.pads.ci.uchicago.edu.type=key
> >>>>>>>>>> login1.pads.ci.uchicago.edu.username=wilde
> >>>>>>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa
> >>>>>>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere #
> >>>>>>>>>> MAKE
> >>>>>>>>>> SURE mode=600!!!
> >>>>>>>>>>
> >>>>>>>>>> login.mcs.anl.gov.type=key
> >>>>>>>>>> login.mcs.anl.gov.username=wilde
> >>>>>>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa
> >>>>>>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE
> >>>>>>>>>> mode=600!!!
> >>>>>>>>>>
> >>>>>>>>>> - Mike
> >>>>>>>>>>
> >>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>> It does look like an ssh problem. I am getting the same
> >>>>>>>>>>> stderr and
> >>>>>>>>>>> log
> >>>>>>>>>>> messages on trying to communicate from Bridled to
> >>>>>>>>>>> Communicado.
> >>>>>>>>>>>
> >>>>>>>>>>> Ketan
> >>>>>>>>>>>
> >>>>>>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Have you already run a simple hellow-world swift test
> >>>>>>>>>>>> from
> >>>>>>>>>>>> communicado to bridled to make sure you have ssh
> >>>>>>>>>>>> configured
> >>>>>>>>>>>> correctly? I would do that first.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Im not sure if an ssh problem explains what you show
> >>>>>>>>>>>> below, or
> >>>>>>>>>>>> not.
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Mike
> >>>>>>>>>>>>
> >>>>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>>>> Thanks, I made the change. However, now, I am getting
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> following
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>> my stderr
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ===========
> >>>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file
> >>>>>>>>>>>>> tc
> >>>>>>>>>>>>> -sites.file
> >>>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >>>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088
> >>>>>>>>>>>>> (cog
> >>>>>>>>>>>>> modified
> >>>>>>>>>>>>> locally)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> RunID: 20110428-1022-n9s0k0e0
> >>>>>>>>>>>>> Progress:
> >>>>>>>>>>>>> [ketan]
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> [ketan] Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> Progress: Initializing site shared directory:1
> >>>>>>>>>>>>> ========
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> And from the log it seems some network transmission has
> >>>>>>>>>>>>> failed:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO
> >>>>>>>>>>>>> TransportProtocolCommon
> >>>>>>>>>>>>> Sending
> >>>>>>>>>>>>> SSH_MSG_SERVICE_REQUEST
> >>>>>>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO
> >>>>>>>>>>>>> TransportProtocolCommon
> >>>>>>>>>>>>> Received
> >>>>>>>>>>>>> SSH_MSG_SERVICE_ACCEPT
> >>>>>>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO
> >>>>>>>>>>>>> TransportProtocolCommon
> >>>>>>>>>>>>> The
> >>>>>>>>>>>>> Transport Protocol thread failed
> >>>>>>>>>>>>> java.io.IOException: The socket is EOF
> >>>>>>>>>>>>> at
> >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183)
> >>>>>>>>>>>>> at
> >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226)
> >>>>>>>>>>>>> at
> >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440)
> >>>>>>>>>>>>> at
> >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034)
> >>>>>>>>>>>>> at
> >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393)
> >>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Any clues?
> >>>>>>>>>>>>> Ketan
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> The pool name in your sites file is
> >>>>>>>>>>>>>> pads-remote-pbs-coasters-ssh
> >>>>>>>>>>>>>> but
> >>>>>>>>>>>>>> you used pbs in your tc.data.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Mike
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Some context:
> >>>>>>>>>>>>>>> I am trying to submit a big run on Beagle using swift
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> coasters.
> >>>>>>>>>>>>>>> However, a previous run is already underway on beagle.
> >>>>>>>>>>>>>>> So,
> >>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>> two difficulties running a new run from its login
> >>>>>>>>>>>>>>> node:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 1. Running another swift from the same jvm will result
> >>>>>>>>>>>>>>> in chaos
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> logs (As far as I know, please correct me if this is
> >>>>>>>>>>>>>>> not the
> >>>>>>>>>>>>>>> case
> >>>>>>>>>>>>>>> anymore)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 2. Login node is already under load because of my
> >>>>>>>>>>>>>>> running
> >>>>>>>>>>>>>>> previous
> >>>>>>>>>>>>>>> big
> >>>>>>>>>>>>>>> run.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> /context
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> So, I am now trying to submit this big run from a
> >>>>>>>>>>>>>>> remote host
> >>>>>>>>>>>>>>> (bridled). I know this has been done on PADS using
> >>>>>>>>>>>>>>> ssh:pbs,
> >>>>>>>>>>>>>>> provider
> >>>>>>>>>>>>>>> coaster. I tried the similar approach on a trial swift
> >>>>>>>>>>>>>>> script
> >>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>> getting error.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Following is the error message:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file
> >>>>>>>>>>>>>>> tc
> >>>>>>>>>>>>>>> -sites.file
> >>>>>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1
> >>>>>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally)
> >>>>>>>>>>>>>>> cog-r3088 (cog
> >>>>>>>>>>>>>>> modified
> >>>>>>>>>>>>>>> locally)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6
> >>>>>>>>>>>>>>> Progress:
> >>>>>>>>>>>>>>> The application "cat" is not available in your tc.data
> >>>>>>>>>>>>>>> catalog
> >>>>>>>>>>>>>>> Caused by:
> >>>>>>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException
> >>>>>>>>>>>>>>> Final status: Failed:1
> >>>>>>>>>>>>>>> The following errors have occurred:
> >>>>>>>>>>>>>>> 1. The application "cat" is not available in your
> >>>>>>>>>>>>>>> tc.data
> >>>>>>>>>>>>>>> catalog
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Could someone indicate if what I am doing is doable
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> if so
> >>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> I correctly configure my sites and tc setup.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>>>> Ketan
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>> Swift-devel mailing list
> >>>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
> >>>>>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Michael Wilde
> >>>>>>>>>>>>>> Computation Institute, University of Chicago
> >>>>>>>>>>>>>> Mathematics and Computer Science Division
> >>>>>>>>>>>>>> Argonne National Laboratory
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Michael Wilde
> >>>>>>>>>>>> Computation Institute, University of Chicago
> >>>>>>>>>>>> Mathematics and Computer Science Division
> >>>>>>>>>>>> Argonne National Laboratory
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Michael Wilde
> >>>>>>>>>> Computation Institute, University of Chicago
> >>>>>>>>>> Mathematics and Computer Science Division
> >>>>>>>>>> Argonne National Laboratory
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Michael Wilde
> >>>>>>>> Computation Institute, University of Chicago
> >>>>>>>> Mathematics and Computer Science Division
> >>>>>>>> Argonne National Laboratory
> >>>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Swift-devel mailing list
> >>>>>>> Swift-devel at ci.uchicago.edu
> >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list