[Swift-devel] ssh test case on pads/beagle

Michael Wilde wilde at mcs.anl.gov
Thu Aug 11 10:08:47 CDT 2011


Alberto's ssh test now runs.

It was failing because provider staging was specified in the -config file; that seemed to cause the error code 127.  I did not go back and search for a message to that effect in the prior log Alberto sent, but we should, to see if it was reported in some reasonable fashion which could be presented more clearly to the user.

We might want to check to ensure that provider staging is not specified for providers that can't support it.  Is such a check feasible and sensible?

Also, this case illustrates the benefit of having the properties settings (and -config overrides) echoed in the .log file.

- Mike


----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Alberto Chavez" <alberto_chavez at live.com>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Thursday, August 11, 2011 8:57:36 AM
> Subject: Re: [Swift-devel] ssh test case on pads/beagle
> Mihael, Ive never seen sites.xml entries showing up in the log - are
> they supposed to be now? They are not in the log Alberto attached, nor
> have I seen them in any other log yet.
> 
> Can we log all the files mentioned in the command line report (the
> first line of the log) right at the front, along with the source text?
> Ie, script, tc, sites, and config? Ideally values for all of the
> swift.properties? Ideally auth.defaults with suitable masking? 0.94
> feature?
> 
> >> 2011-08-11 08:28:03,762-0500 DEBUG Loader arguments:
> >> [001-catsn-ssh.swift, -tc.file, tc.template.data, -sites.file,
> >> sites.template.xml, -config, cf]
> 
> Alberto, stop by and we can try to debug this in person, as ssh
> requires a fair bit of correct configuration to work.
> 
> We need to look at the cf, sites.template.xml, and cf file.
> 
> - Mike
> 
> 
> ----- Original Message -----
> 
> 
> From: "Alberto Chavez" <alberto_chavez at live.com>
> To: "Mihael Hategan" <hategan at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Thursday, August 11, 2011 8:31:53 AM
> Subject: Re: [Swift-devel] ssh test case on pads/beagle
> 
> 
> Sure, attached are the output of stdout and stderror, and the log
> generated by swift.
> 
> 
> 
> > Subject: RE: [Swift-devel] ssh test case on pads/beagle
> > From: hategan at mcs.anl.gov
> > To: alberto_chavez at live.com
> > CC: jonmon at mcs.anl.gov; swift-devel at ci.uchicago.edu
> > Date: Thu, 11 Aug 2011 00:18:07 -0700
> >
> > Can you post (a link to) the entire log file? Since it contains both
> > the
> > tc.data and sites.xml and the error, it's probably always better to
> > post
> > than individual snippets.
> >
> > On Thu, 2011-08-11 at 01:17 -0500, Alberto Chavez wrote:
> > > Sure:
> > >
> > > <config>
> > > <pool handle="ssh">
> > > <execution provider="ssh" url="steamroller" jobmanager="ssh"/>
> > > <filesystem provider="ssh" url="steamroller" />
> > > <profile key="jobThrottle" namespace="karajan">0</profile>
> > > <workdirectory>/home/achavez/swiftwork</workdirectory>
> > > </pool>
> > > </config>
> > >
> > >
> > > ______________________________________________________________________
> > > To: alberto_chavez at live.com
> > > From: jonmon at mcs.anl.gov
> > > CC: hategan at mcs.anl.gov; swift-devel at ci.uchicago.edu
> > > Subject: Re: [Swift-devel] ssh test case on pads/beagle
> > > Date: Wed, 10 Aug 2011 23:54:24 -0500
> > >
> > > Could you post the sites file?
> > >
> > > ----- Reply message -----
> > > From: "Alberto Chavez" <alberto_chavez at live.com>
> > > Date: Wed, Aug 10, 2011 7:16 pm
> > > Subject: [Swift-devel] ssh test case on pads/beagle
> > > To: <jonmon at mcs.anl.gov>
> > > Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "Swift Devel"
> > > <swift-devel at ci.uchicago.edu>
> > >
> > >
> > >
> > > Exit code "127" normally means that a particular function doesn't
> > > exist. Are you sure that all those paths to apps exist?
> > > > Yes, I doubled check that and those are the right paths to the
> > > > apps.
> > >
> > >
> > > Also, I am not sure if this is a problem but shouldn't there be a
> > > third column in the app file? LIke
> > > "ssh echo /bin/echo null null null"
> > >
> > >
> > >
> > >
> > > Looking at the documentation for the transformation catalog, the
> > > structure should be:
> > >
> > > site, transformation name, executable path, installation status,
> > > platform, and profile entries.
> > >
> > >
> > >
> > >
> > >
> > > The installation status and platform fields are not used. Set them
> > > to INSTALLED and INTEL32::LINUX respectively.
> > >
> > > The profiles field should be set to null if no profile entries are
> > > to
> > > be specified, or should contain the profile entries separated by
> > > semicolons.
> > >
> > >
> > > but even when I switch the columns to INSTALLED and INTEL32::LINUX
> > > and
> > > keep the profiles field set to null, I'm still getting the same
> > > exit
> > > code.
> > >
> > >
> > > On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote:
> > >
> > > I changed my ssh-key, and they worked on the MCS machines
> > > because the authorized_keys file has not been updated yet on
> > > the CI Machines.
> > > I created a new ssh-key using:
> > > ssh-keygen -t rsa -b 2048
> > > exactly as the MCS site suggested,
> > > On the other hand, I still have a problem, I am getting the
> > > following error:
> > >
> > >
> > > Swift svn swift-r4978 cog-r3226
> > >
> > >
> > > RunID: 20110810-1819-1cdo2o62
> > > Progress: time: Wed, 10 Aug 2011 18:19:42 -0500
> > > Exception in cat:
> > > Arguments: [data.txt]
> > > Host: ssh
> > > Directory:
> > > 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek
> > > - - -
> > > Caused by: null
> > > Caused by:
> > > org.globus.cog.abstraction.impl.common.execution.JobException:
> > > Job failed with an exit code of 127
> > > Final status: time: Wed, 10 Aug 2011 18:20:00 -0500
> > > Failed:10
> > > The following errors have occurred:
> > > 1. Job failed with an exit code of 127 (10 times)
> > >
> > >
> > >
> > >
> > > These are the contents of the log:
> > >
> > >
> > > Execution completed with errors
> > >
> > >
> > > 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing
> > > channel 0 [Unnamed Channel]
> > > 2011-08-10 18:19:43,263-0500 INFO Exec Exit code 127
> > > 2011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing
> > > channel 0 [Unnamed Channel]
> > > 2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2
> > > APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application
> > > exception: null
> > > Caused by:
> > > org.globus.cog.abstraction.impl.common.execution.JobException:
> > > Job failed with an exit code of 127
> > > 2011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE
> > > thread=0-5-3-1 tr=cat
> > > 2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in
> > > cat:
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
> > > at
> > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227)
> > > at
> > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104)
> > > at
> > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
> > > at java.util.concurrent.Executors
> > > $RunnableAdapter.call(Executors.java:471)
> > > at java.util.concurrent.FutureTask
> > > $Sync.innerRun(FutureTask.java:334)
> > > at
> > > java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > at
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > at java.util.concurrent.ThreadPoolExecutor
> > > $Worker.run(ThreadPoolExecutor.java:603)
> > > at java.lang.Thread.run(Thread.java:636)
> > > 2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed
> > > exception:
> > >
> > >
> > > Execution completed with errors
> > >
> > >
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139)
> > > at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
> > > at
> > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227)
> > > at
> > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104)
> > > at
> > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
> > > at java.util.concurrent.Executors
> > > $RunnableAdapter.call(Executors.java:471)
> > > at java.util.concurrent.FutureTask
> > > $Sync.innerRun(FutureTask.java:334)
> > > at
> > > java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > at
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > at java.util.concurrent.ThreadPoolExecutor
> > > $Worker.run(ThreadPoolExecutor.java:603)
> > > at java.lang.Thread.run(Thread.java:636)
> > >
> > > I believe that the problem resides on the TC file because when
> > > I run a much simpler SwiftScript like:
> > >
> > >
> > > int i = 9;
> > > trace(i);
> > >
> > >
> > > I get the following output:
> > >
> > >
> > > swift traceme.swift -tc.file tc.template.data
> > > -sites.file sites.template.xml -config cf
> > > Swift svn swift-r4978 cog-r3226
> > >
> > >
> > > RunID: 20110810-1832-buktjj3d
> > > Progress: time: Wed, 10 Aug 2011 18:32:30 -0500
> > > SwiftScript trace: 9.0
> > > Final status: time: Wed, 10 Aug 2011 18:32:30 -0500
> > >
> > >
> > > but as soon as I start using the commands stated the TC file,
> > > I get the "exit code 127"
> > >
> > >
> > > My tc file reads:
> > >
> > >
> > > ssh echo /bin/echo null null
> > > ssh cat /bin/cat null null
> > > ssh ls /bin/ls null null
> > > ssh grep /bin/grep null null
> > > ssh sort /bin/sort null null
> > > ssh paste /bin/paste null null
> > > ssh wc /usr/bin/wc null null
> > >
> > >
> > > I am working on the login node of the MCS machine trying to
> > > ssh via Swift to steamroller.
> > >
> > >
> > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle
> > > > From: hategan at mcs.anl.gov
> > > > To: alberto_chavez at live.com
> > > > CC: swift-devel at ci.uchicago.edu
> > > > Date: Tue, 9 Aug 2011 11:57:06 -0700
> > > >
> > > > Hmm: Unsupported passphrase algorithm: AES-128-CBC
> > > >
> > > > I'll try to see how that can be fixed. In the mean time, can
> > > you
> > > > generate a new key pair with 3DES encryption instead and use
> > > that?
> > > >
> > > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote:
> > > > > Hello,
> > > > >
> > > > >
> > > > > I am trying to run a simpler case than ssh-pbs-coaster
> > > test case, and
> > > > > I'm still having the same error.
> > > > > Now I am running only ssh test case
> > > > > (/tests/providers/ssh/001-catsn-ssn.swift)
> > > > >
> > > > >
> > > > > The command line is:
> > > > > swift -config cf -tc.file tc.template.data -sites.file
> > > > > sites.template.xml 001-catsn-ssh.swift
> > > > >
> > > > >
> > > > > The output:
> > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183
> > > > >
> > > > >
> > > > > RunID: 20110809-1336-ohte788a
> > > > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500
> > > > > Exception in cat:
> > > > > Arguments: [data.txt]
> > > > > Host: ssh
> > > > > Directory:
> > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek
> > > > > - - -
> > > > >
> > > > >
> > > > > Caused by: null
> > > > > Caused by:
> > > > >
> > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
> > > Invalid private key or passphrase
> > > > > Caused by:
> > > > >
> > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException:
> > > Can't
> > > > > read key due to cryptography problems:
> > > > > java.security.NoSuchAlgorithmException: Unsupported
> > > passphrase
> > > > > algorithm: AES-128-CBC
> > > > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting
> > > site:8
> > > > > Submitting:1 Failed:1
> > > > > Exception in cat:
> > > > > Arguments: [data.txt]
> > > > > Host: ssh
> > > > > Directory:
> > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek
> > > > > - - -
> > > > >
> > > > >
> > > > > Caused by: null
> > > > > Caused by:
> > > > >
> > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
> > > Invalid private key or passphrase
> > > > > Caused by:
> > > > >
> > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException:
> > > Can't
> > > > > read key due to cryptography problems:
> > > > > java.security.NoSuchAlgorithmException: Unsupported
> > > passphrase
> > > > > algorithm: AES-128-CBC
> > > > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting
> > > site:7
> > > > > Submitting:1 Failed:2
> > > > > Exception in cat:
> > > > > Arguments: [data.txt]
> > > > > Host: ssh
> > > > > Directory:
> > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek
> > > > > - - -
> > > > >
> > > > >
> > > > > Caused by: null
> > > > > Caused by:
> > > > >
> > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
> > > Invalid private key or passphrase
> > > > > Caused by:
> > > > >
> > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException:
> > > Can't
> > > > > read key due to cryptography problems:
> > > > > java.security.NoSuchAlgorithmException: Unsupported
> > > passphrase
> > > > > algorithm: AES-128-CBC
> > > > > "error_log.log" 105L, 5770C
> > > > >
> > > > >
> > > > > My auth.defaults reads:
> > > > >
> > > > >
> > > > > login1.beagle.ci.uchicago.edu.type=key
> > > > > login1.beagle.ci.uchicago.edu.username=achavez
> > > > >
> > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity
> > > > >
> > > > >
> > > > > login1.pads.ci.uchicago.edu.type=key
> > > > > login1.pads.ci.uchicago.edu.username=achavez
> > > > >
> > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > and it has been set to 600, I ommited the passphrase line,
> > > but it is
> > > > > there, and the passphrase is right because I just verified
> > > it in two
> > > > > ways:
> > > > > 1) by logging to pads and beagle without providing a
> > > password
> > > > > 2) "changed" the password. I the "new" password is the
> > > same as the
> > > > > "old" one.
> > > > >
> > > > > sites.templates.xml:
> > > > >
> > > > > <config>
> > > > > <pool handle="ssh">
> > > > > <execution provider="ssh"
> > > url="login1.pads.ci.uchicago.edu"
> > > > > jobmanager="ssh"/>
> > > > > <filesystem provider="ssh"
> > > url="login1.pads.ci.uchicago.edu" />
> > > > > <profile key="jobThrottle" namespace="karajan">0</profile>
> > > > > <workdirectory>/home/achavez/swiftwork</workdirectory>
> > > > > </pool>
> > > > > </config>
> > > > >
> > > > >
> > > > > config file:
> > > > >
> > > > > wrapperlog.always.transfer=true
> > > > > sitedir.keep=true
> > > > > execution.retries=0
> > > > > lazy.errors=true
> > > > > status.mode=provider
> > > > > use.provider.staging=true
> > > > > provider.staging.pin.swiftfiles=false
> > > > > foreach.max.threads=10
> > > > > provenance.log=true
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I also tried a simpler SwiftScript:
> > > > >
> > > > >
> > > > > type filemsg;
> > > > >
> > > > >
> > > > > app (filemsg output) hello(string s)
> > > > > {
> > > > > echo s stdout=@filename(output);
> > > > > }
> > > > >
> > > > >
> > > > > filemsg myfile<"dogcatdinosaur.out">;
> > > > > myfile = hello("dog,cat,dinosaur");
> > > > >
> > > > >
> > > > > and I get the following output:
> > > > >
> > > > >
> > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183
> > > > >
> > > > >
> > > > > RunID: 20110809-1343-2es2hel2
> > > > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500
> > > > > Exception in echo:
> > > > > Arguments: [dog,cat,dinosaur]
> > > > > Host: ssh
> > > > > Directory:
> > > hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek
> > > > > - - -
> > > > >
> > > > >
> > > > > Caused by: null
> > > > > Caused by:
> > > > >
> > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException:
> > > Invalid private key or passphrase
> > > > > Caused by:
> > > > >
> > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException:
> > > Can't
> > > > > read key due to cryptography problems:
> > > > > java.security.NoSuchAlgorithmException: Unsupported
> > > passphrase
> > > > > algorithm: AES-128-CBC
> > > > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500
> > > Failed:1
> > > > > The following errors have occurred:
> > > > > 1. Can't read key due to cryptography problems:
> > > > > java.security.NoSuchAlgorithmException: Unsupported
> > > passphrase
> > > > > algorithm: AES-128-CBC
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > any thoughts on this?
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > >
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >
> > > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > >
> > >
> >
> >
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> 
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list