From linjiao at caltech.edu Sun Feb 2 13:50:38 2014 From: linjiao at caltech.edu (Jiao Lin) Date: Sun, 2 Feb 2014 11:50:38 -0800 Subject: [Swift-user] how to run swift script against a local cluster Message-ID: <6E65A8FC-4B25-40D3-8194-52119B00F35E@caltech.edu> Hello, We just learned about swift and are trying to use swift to run computations on a local cluster. I was trying to follow the note at http://rmcgibbo.github.io/blog/2013/06/03/setting-up-swift/ When trying to run a simple echo command at a local cluster using coaster provider, I got an error: $ swift echo.swift Swift 0.94.1 swift-r7114 cog-r3803 RunID: 20140202-1140-yyytkso6 Progress: time: Sun, 02 Feb 2014 11:40:58 -0800 Execution failed: Exception in echo: Arguments: [hello] Host: fram Directory: echo-20140202-1140-yyytkso6/jobs/0/echo-0dqo8vll Caused by: Could not submit job Caused by: Could not start coaster service Caused by: java.lang.NullPointerException at org.globus.cog.abstraction.impl.execution.coaster.AutoCA.ensureCACertsExist(AutoCA.java:143) at org.globus.cog.abstraction.impl.execution.coaster.AutoCA.createProxy(AutoCA.java:128) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.setupGSIProxy(ServiceManager.java:238) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.startService(ServiceManager.java:194) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.reserveService(ServiceManager.java:132) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.reserveService(ServiceManager.java:151) at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.getChannel(JobSubmissionTaskHandler.java:119) at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:105) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:97) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) echo, echo.swift, line 11 It seems it failed when globus is trying to get certificates? Is it necessary to install some kind of globus service on the cluster? I wonder what kind of requirements are there for the cluster? The cluster has java 1.7 installed: java version "1.7.0_51" Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) Your help is much appreciated. Jiao -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sun Feb 2 15:31:55 2014 From: wilde at mcs.anl.gov (Wilde, Michael J.) Date: Sun, 2 Feb 2014 21:31:55 +0000 Subject: [Swift-user] how to run swift script against a local cluster In-Reply-To: <6E65A8FC-4B25-40D3-8194-52119B00F35E@caltech.edu> References: <6E65A8FC-4B25-40D3-8194-52119B00F35E@caltech.edu> Message-ID: <85C85E44DD880E498CEA5A501B27954BEA475C@DITKA.anl.gov> Hi Jiao, While I look into the instructions you followed from the blog post below, could you try the tutorial examples at http://swift-lang.org/docs? We try to keep these examples up to date and tested. They should either work on your cluster or be readily adaptable to it. To help us resolve your problem, please send the full log file (scriptname-date-*.log) to swift-support at ci.uchicago.edu and describe your cluster to us (scheduler, mounted filesystems, etc). One possible issue I see in the blog post is this line in the example sites.xml file: /scratch/{env.USER}/.swiftwork That directory needs to be shared (i.e. mounted) on both the host on which your are running the swift comment (e.g. the login host) as well as on every cluster node on which a Swift app might run. The ssl-cl provider should automatically generate internal certificates for you (no separate Globus install needed), but perhaps they cant be accessed by the worker nodes. While you debug this, there's also a new, un-announced tutorial facility at http://swift-lang.org/tryswift which lets you try tutorial examples from the web on cloud nodes, and lets you experiment with small Swift scripts with no others setup. - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago ________________________________ From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of Jiao Lin [linjiao at caltech.edu] Sent: Sunday, February 02, 2014 1:50 PM To: swift-user at ci.uchicago.edu Subject: [Swift-user] how to run swift script against a local cluster Hello, We just learned about swift and are trying to use swift to run computations on a local cluster. I was trying to follow the note at http://rmcgibbo.github.io/blog/2013/06/03/setting-up-swift/ When trying to run a simple echo command at a local cluster using coaster provider, I got an error: $ swift echo.swift Swift 0.94.1 swift-r7114 cog-r3803 RunID: 20140202-1140-yyytkso6 Progress: time: Sun, 02 Feb 2014 11:40:58 -0800 Execution failed: Exception in echo: Arguments: [hello] Host: fram Directory: echo-20140202-1140-yyytkso6/jobs/0/echo-0dqo8vll Caused by: Could not submit job Caused by: Could not start coaster service Caused by: java.lang.NullPointerException at org.globus.cog.abstraction.impl.execution.coaster.AutoCA.ensureCACertsExist(AutoCA.java:143) at org.globus.cog.abstraction.impl.execution.coaster.AutoCA.createProxy(AutoCA.java:128) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.setupGSIProxy(ServiceManager.java:238) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.startService(ServiceManager.java:194) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.reserveService(ServiceManager.java:132) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.reserveService(ServiceManager.java:151) at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.getChannel(JobSubmissionTaskHandler.java:119) at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:105) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:97) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) echo, echo.swift, line 11 It seems it failed when globus is trying to get certificates? Is it necessary to install some kind of globus service on the cluster? I wonder what kind of requirements are there for the cluster? The cluster has java 1.7 installed: java version "1.7.0_51" Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) Your help is much appreciated. Jiao -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Mon Feb 3 11:44:06 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 3 Feb 2014 11:44:06 -0600 Subject: [Swift-user] empty array in Swift Message-ID: Hi, I was working on a problem where user can pass a dynamic number of arguments to an app, including zero. A possible Swift script as follows: type file; app (file _out, file _err) anapp(file _exec, int _i, string _args[]){ sh @_exec _i _args stdout=@_out stderr=@_err; } string args[]=["any", "num", "args"]; # generated via a wrapper script file exec<"./echo.sh">; foreach i in [0:9:1]{ file out ; file err ; (out,err) = anapp(exec, i, args); } However, it seems that I cannot have an array with no elements: string args[]=[]; In this case Swift complains: Compile error in assignment at line 8: You cannot assign value of type [int] to a variable of type string[int] Which works when I say: string args[]=[""] However, it results in an array with one element. So, is there a way in Swift to have an array with zero elements? Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Feb 3 12:22:36 2014 From: wilde at mcs.anl.gov (Wilde, Michael J.) Date: Mon, 3 Feb 2014 18:22:36 +0000 Subject: [Swift-user] empty array in Swift In-Reply-To: References: Message-ID: <85C85E44DD880E498CEA5A501B27954BEA49B9@DITKA.anl.gov> The following following should declare an array with no elements: string args[]; - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago ________________________________ From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of Ketan Maheshwari [ketan at mcs.anl.gov] Sent: Monday, February 03, 2014 11:44 AM To: Swift User Subject: [Swift-user] empty array in Swift Hi, I was working on a problem where user can pass a dynamic number of arguments to an app, including zero. A possible Swift script as follows: type file; app (file _out, file _err) anapp(file _exec, int _i, string _args[]){ sh @_exec _i _args stdout=@_out stderr=@_err; } string args[]=["any", "num", "args"]; # generated via a wrapper script file exec<"./echo.sh">; foreach i in [0:9:1]{ file out ; file err ; (out,err) = anapp(exec, i, args); } However, it seems that I cannot have an array with no elements: string args[]=[]; In this case Swift complains: Compile error in assignment at line 8: You cannot assign value of type [int] to a variable of type string[int] Which works when I say: string args[]=[""] However, it results in an array with one element. So, is there a way in Swift to have an array with zero elements? Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Mon Feb 3 12:31:32 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 3 Feb 2014 12:31:32 -0600 Subject: [Swift-user] empty array in Swift In-Reply-To: <85C85E44DD880E498CEA5A501B27954BEA49B9@DITKA.anl.gov> References: <85C85E44DD880E498CEA5A501B27954BEA49B9@DITKA.anl.gov> Message-ID: Thanks Mike. This worked. On Mon, Feb 3, 2014 at 12:22 PM, Wilde, Michael J. wrote: > The following following should declare an array with no elements: > > string args[]; > > - Mike > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago > > ------------------------------ > *From:* swift-user-bounces at ci.uchicago.edu [ > swift-user-bounces at ci.uchicago.edu] on behalf of Ketan Maheshwari [ > ketan at mcs.anl.gov] > *Sent:* Monday, February 03, 2014 11:44 AM > *To:* Swift User > *Subject:* [Swift-user] empty array in Swift > > Hi, > > I was working on a problem where user can pass a dynamic number of > arguments to an app, including zero. > > A possible Swift script as follows: > > type file; > > app (file _out, file _err) anapp(file _exec, int _i, string _args[]){ > sh @_exec _i _args stdout=@_out stderr=@_err; > } > string args[]=["any", "num", "args"]; # generated via a wrapper script > > file exec<"./echo.sh">; > foreach i in [0:9:1]{ > file out ; > file err ; > (out,err) = anapp(exec, i, args); > } > > However, it seems that I cannot have an array with no elements: > > string args[]=[]; > > In this case Swift complains: > Compile error in assignment at line 8: You cannot assign value of type > [int] to a variable of type string[int] > > Which works when I say: > > string args[]=[""] > > However, it results in an array with one element. > > So, is there a way in Swift to have an array with zero elements? > > Thanks, > Ketan > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Feb 3 12:57:34 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 03 Feb 2014 10:57:34 -0800 Subject: [Swift-user] empty array in Swift In-Reply-To: <85C85E44DD880E498CEA5A501B27954BEA49B9@DITKA.anl.gov> References: <85C85E44DD880E498CEA5A501B27954BEA49B9@DITKA.anl.gov> Message-ID: <1391453854.26062.1.camel@echo> Still a bug IMO. type1 [type2] = []; is a valid statement. Mihael On Mon, 2014-02-03 at 18:22 +0000, Wilde, Michael J. wrote: > The following following should declare an array with no elements: > > string args[]; > > - Mike > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago > > ________________________________ > From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of Ketan Maheshwari [ketan at mcs.anl.gov] > Sent: Monday, February 03, 2014 11:44 AM > To: Swift User > Subject: [Swift-user] empty array in Swift > > Hi, > > I was working on a problem where user can pass a dynamic number of arguments to an app, including zero. > > A possible Swift script as follows: > > type file; > > app (file _out, file _err) anapp(file _exec, int _i, string _args[]){ > sh @_exec _i _args stdout=@_out stderr=@_err; > } > string args[]=["any", "num", "args"]; # generated via a wrapper script > > file exec<"./echo.sh">; > foreach i in [0:9:1]{ > file out ; > file err ; > (out,err) = anapp(exec, i, args); > } > > However, it seems that I cannot have an array with no elements: > > string args[]=[]; > > In this case Swift complains: > Compile error in assignment at line 8: You cannot assign value of type [int] to a variable of type string[int] > > Which works when I say: > > string args[]=[""] > > However, it results in an array with one element. > > So, is there a way in Swift to have an array with zero elements? > > Thanks, > Ketan > The following following should declare an array with no elements: > > > string args[]; > > > - Mike > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of > Chicago > > > > ______________________________________________________________________ > From: swift-user-bounces at ci.uchicago.edu > [swift-user-bounces at ci.uchicago.edu] on behalf of Ketan Maheshwari > [ketan at mcs.anl.gov] > Sent: Monday, February 03, 2014 11:44 AM > To: Swift User > Subject: [Swift-user] empty array in Swift > > > > Hi, > > > I was working on a problem where user can pass a dynamic number of > arguments to an app, including zero. > > > A possible Swift script as follows: > > > type file; > > > app (file _out, file _err) anapp(file _exec, int _i, string _args[]){ > sh @_exec _i _args stdout=@_out stderr=@_err; > } > string args[]=["any", "num", "args"]; # generated via a wrapper script > > > file exec<"./echo.sh">; > foreach i in [0:9:1]{ > file out ".out")>; > file err ".err")>; > (out,err) = anapp(exec, i, args); > } > > > However, it seems that I cannot have an array with no elements: > > > string args[]=[]; > > > In this case Swift complains: > Compile error in assignment at line 8: You cannot assign value of type > [int] to a variable of type string[int] > > > > Which works when I say: > > > string args[]=[""] > > > However, it results in an array with one element. > > > So, is there a way in Swift to have an array with zero elements? > > > Thanks, > Ketan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From gmgall at lncc.br Thu Feb 6 07:09:25 2014 From: gmgall at lncc.br (Guilherme Gall) Date: Thu, 06 Feb 2014 11:09:25 -0200 Subject: [Swift-user] Swift stopping because of problematic foreach iteration Message-ID: <52F38985.1090002@lncc.br> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi everyone, I'm using Swift to automate and parallelize the potential distribution modeling of some species'. Follows the script: - ------------------------------------------------------------------------ type request_file; type environment_layers; type occurrences_file; type serialized_model; type projected_map; type modeling_results { serialized_model model; projected_map map; } request_file requests[] ; environment_layers env_layers[] ; occurrences_file occurrences ; modeling_results results[]; app (request_file out[]) generate_requests(occurrences_file i) { generate_requests @filename(i); } app (modeling_results out) do_modeling(request_file r, environment_layers e[], occurrences_file o) { om_console @filename(r); } requests = generate_requests(occurrences); foreach request,index in requests { results[index] = do_modeling(request, env_layers, occurrences); } - ------------------------------------------------------------------------ The app do_modeling consumes a set of environment files, a file with occurrence points from several species and a request file with parameters for the chosen modeling algorithm (these request files are generated by the generate_requests app). The results of the modeling are 2 files per specie: one XML file and a bitmap file. The script works perfectly if the om_console command ends without errors for **all** species. But if one of the species makes the om_console command generate an error, the output files aren't created and the script stops. Even if there are other species that would not generate errors when modeled individually (out of Swift). Follows the error reported by Swift in these cases: - ------------------------------------------------------------------------ RunID: 20140206-0924-7dacf5h9 (input): found 20 files Progress: time: Qui, 06 Fev 2014 09:24:34 -0200 Progress: time: Qui, 06 Fev 2014 09:24:36 -0200 Checking status:1 Progress: time: Qui, 06 Fev 2014 09:24:38 -0200 Stage in:11 Submitting:1 Finished successfully:1 Progress: time: Qui, 06 Fev 2014 09:24:40 -0200 Active:11 Checking status:1 Finished successfully:1 Progress: time: Qui, 06 Fev 2014 09:24:42 -0200 Active:11 Checking status:1 Finished successfully:1 Progress: time: Qui, 06 Fev 2014 09:24:45 -0200 Active:11 Checking status:1 Finished successfully:1 Execution failed: File not found: /var/tmp/gmgall/swiftwork/workflow-openmodeller-20140206-0924-7dacf5h9/shared/output_Abarema arenaria.xml - ------------------------------------------------------------------------ The problem is that the script is stopping in arbitrary points because of just one problematic iteration of the foreach loop in the end of the script. How can I address this issue to not leave behind species that would be modeled otherwise? Thanks in advance, - -- Guilherme Gall CSR/LNCC GPG Public Key ID: A65ED0D5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJS84mFAAoJEG9WBlOmXtDVViMH/3WNaBG0OUPNJ97Fhc1odRwf 4JZPzuHKiVpvBh16lOcmph2kHehhzdlVbOGmIDRfBMCKWKDajh3k5DVxz08A17wZ kfUnfq5DR+Ein6wr/XWDZ3WJPZBzdnHdtOY5VlIK+1K/71VVDorzQW5ReVAg2QDw cFhvIhfvgepvo9IXyR6D91AaxUXdf44X4yi88XOLdNFjIyR6kxUBmCgf45RWSaeV AwAfADQe+s82Q8qXPBE7iChdD1hmcj9V6Usfi43G1wPYZfKC6xTIFwmiS5nlHLFG Ck1fo/dKMHxzR12zh72lG4yf6X2nhnTORWGA+LZa6Sd6lWlRl0YlifPHYrXnz3M= =+obS -----END PGP SIGNATURE----- From lgadelha at lncc.br Thu Feb 6 10:36:14 2014 From: lgadelha at lncc.br (Luiz Gadelha) Date: Thu, 06 Feb 2014 10:36:14 -0600 Subject: [Swift-user] Swift stopping because of problematic foreach iteration In-Reply-To: <52F38985.1090002@lncc.br> References: <52F38985.1090002@lncc.br> Message-ID: <52F3B9FE.4090304@lncc.br> Hi Guilherme, It seems to be an openmodeller problem, it's returning exit status 0 even when it reports an error. Swift must be interpreting that the execution was successful but then it reports an error because it's not able to find the files that should have been generated by a successful om_console execution. It seems also that in addition to the presence points we are giving as input, the Maxent model requires some absence points as well, it should fail anyway since it doesn't have enough information. $ /usr/bin/time -f exit_status:%x om_console request_Abarema\ brachystachya.txt [Info] openModeller version 1.4.0 [Error] Cannot create model without any presence or absence point. [Info] Exception occurred: Cannot create model without any presence or absence point. exit_status:0 Regards, Luiz On 02/06/2014 07:09 AM, Guilherme Gall wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi everyone, > > I'm using Swift to automate and parallelize the potential distribution > modeling of some species'. > > Follows the script: > > - ------------------------------------------------------------------------ > type request_file; > type environment_layers; > type occurrences_file; > type serialized_model; > type projected_map; > > type modeling_results { > serialized_model model; > projected_map map; > } > > request_file requests[] > ; > environment_layers env_layers[] > ; > > occurrences_file occurrences ; > modeling_results > results[]; > > app (request_file out[]) generate_requests(occurrences_file i) { > generate_requests @filename(i); > } > > app (modeling_results out) do_modeling(request_file r, > environment_layers e[], occurrences_file o) { > om_console @filename(r); > } > > requests = generate_requests(occurrences); > > foreach request,index in requests { > results[index] = do_modeling(request, env_layers, occurrences); > } > - ------------------------------------------------------------------------ > > The app do_modeling consumes a set of environment files, a file with > occurrence points from several species and a request file with > parameters for the chosen modeling algorithm (these request files are > generated by the generate_requests app). > > The results of the modeling are 2 files per specie: one XML file and a > bitmap file. > > The script works perfectly if the om_console command ends without > errors for **all** species. But if one of the species makes the > om_console command generate an error, the output files aren't created > and the script stops. Even if there are other species that would not > generate errors when modeled individually (out of Swift). > > Follows the error reported by Swift in these cases: > > - ------------------------------------------------------------------------ > RunID: 20140206-0924-7dacf5h9 > (input): found 20 files > Progress: time: Qui, 06 Fev 2014 09:24:34 -0200 > Progress: time: Qui, 06 Fev 2014 09:24:36 -0200 Checking status:1 > Progress: time: Qui, 06 Fev 2014 09:24:38 -0200 Stage in:11 > Submitting:1 Finished successfully:1 > Progress: time: Qui, 06 Fev 2014 09:24:40 -0200 Active:11 Checking > status:1 Finished successfully:1 > Progress: time: Qui, 06 Fev 2014 09:24:42 -0200 Active:11 Checking > status:1 Finished successfully:1 > Progress: time: Qui, 06 Fev 2014 09:24:45 -0200 Active:11 Checking > status:1 Finished successfully:1 > Execution failed: > File not found: > /var/tmp/gmgall/swiftwork/workflow-openmodeller-20140206-0924-7dacf5h9/shared/output_Abarema > arenaria.xml > - ------------------------------------------------------------------------ > > The problem is that the script is stopping in arbitrary points because > of just one problematic iteration of the foreach loop in the end of > the script. > > How can I address this issue to not leave behind species that would be > modeled otherwise? > > Thanks in advance, > - -- > Guilherme Gall > CSR/LNCC > GPG Public Key ID: A65ED0D5 > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.14 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQEcBAEBAgAGBQJS84mFAAoJEG9WBlOmXtDVViMH/3WNaBG0OUPNJ97Fhc1odRwf > 4JZPzuHKiVpvBh16lOcmph2kHehhzdlVbOGmIDRfBMCKWKDajh3k5DVxz08A17wZ > kfUnfq5DR+Ein6wr/XWDZ3WJPZBzdnHdtOY5VlIK+1K/71VVDorzQW5ReVAg2QDw > cFhvIhfvgepvo9IXyR6D91AaxUXdf44X4yi88XOLdNFjIyR6kxUBmCgf45RWSaeV > AwAfADQe+s82Q8qXPBE7iChdD1hmcj9V6Usfi43G1wPYZfKC6xTIFwmiS5nlHLFG > Ck1fo/dKMHxzR12zh72lG4yf6X2nhnTORWGA+LZa6Sd6lWlRl0YlifPHYrXnz3M= > =+obS > -----END PGP SIGNATURE----- > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Luiz Gadelha http://www.lncc.br/~lgadelha From gmgall at lncc.br Thu Feb 6 11:58:48 2014 From: gmgall at lncc.br (Guilherme Gall) Date: Thu, 06 Feb 2014 15:58:48 -0200 Subject: [Swift-user] Swift stopping because of problematic foreach iteration In-Reply-To: <52F3B9FE.4090304@lncc.br> References: <52F38985.1090002@lncc.br> <52F3B9FE.4090304@lncc.br> Message-ID: <52F3CD58.9060401@lncc.br> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Em 06-02-2014 14:36, Luiz Gadelha escreveu: > Hi Guilherme, > > It seems to be an openmodeller problem, it's returning exit status > 0 even when it reports an error. Swift must be interpreting that > the execution was successful but then it reports an error because > it's not able to find the files that should have been generated by > a successful om_console execution. It seems also that in addition > to the presence points we are giving as input, the Maxent model > requires some absence points as well, it should fail anyway since > it doesn't have enough information. > > $ /usr/bin/time -f exit_status:%x om_console request_Abarema\ > brachystachya.txt [Info] openModeller version 1.4.0 [Error] Cannot > create model without any presence or absence point. [Info] > Exception occurred: Cannot create model without any presence or > absence point. > > exit_status:0 > > Regards, > > Luiz Hi Luiz, At first I thought that the problem would be the exit status too. To be sure I created a wrapper script that ends with exit status 0 in case of success and with a exit status 1 in case of error. Follows the script: - ------------------------------------------------------------------------ #!/bin/bash if om_console "$1" 2>&1 | grep -q '\[Error\]'; then exit 1 else exit 0 fi - ------------------------------------------------------------------------ I tested the wrapper and it worked fine: 0 in case of success and 1 in case of failure. Then I swapped the direct call for om_console by my wrapper script in the Swift script. But the problem persists. Swift stops with the same "File not found" error when one iteration doesn't generate the output files. Regards, - -- Guilherme Gall CSR/LNCC GPG Public Key ID: A65ED0D5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJS881YAAoJEG9WBlOmXtDVuYMH/0h8ddUytwJuQE4LfyVknGHA g+CMFoUaV8JyZiW1BqLB0xI7DgD2OqcNtBNNUPkvp2N68Wy66SzNzlK0c/lO5edj QDGdy5sctG3mvLffg9eIyKw3IMZJgDZQHhuDHw8+7D8dB4TVnbTRD8AVMXIWDL/4 9MzKbx0YCFeE6MsWgiOrduGLPk5qpFneatwjg2Z+GKYSJEFRNRNPN64kBP6eKPfN nYYgXHqIdApIwPbKSIgZQbJpSUhelp9ebqFNrRH69rfyow6qXIexs67GIeDofhVu uFJmpQXwL68S4YSPljw7J0ZmC/B1v2djDDn6OpdogT4W+AYzVwyOqw/V+uIwJLQ= =kkDn -----END PGP SIGNATURE----- From hategan at mcs.anl.gov Thu Feb 6 12:27:42 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 06 Feb 2014 10:27:42 -0800 Subject: [Swift-user] Swift stopping because of problematic foreach iteration In-Reply-To: <52F38985.1090002@lncc.br> References: <52F38985.1090002@lncc.br> Message-ID: <1391711262.9826.0.camel@echo> On Thu, 2014-02-06 at 11:09 -0200, Guilherme Gall wrote: > How can I address this issue to not leave behind species that would be > modeled otherwise? Hi, If you are trying to have swift run everything it can despite any errors in some of the parts of the script, you can set "lazy.errors" to "true" in swift.properties. Mihael From gmgall at lncc.br Fri Feb 7 10:34:58 2014 From: gmgall at lncc.br (Guilherme Gall) Date: Fri, 07 Feb 2014 14:34:58 -0200 Subject: [Swift-user] Swift stopping because of problematic foreach iteration In-Reply-To: <1391711262.9826.0.camel@echo> References: <52F38985.1090002@lncc.br> <1391711262.9826.0.camel@echo> Message-ID: <52F50B32.90608@lncc.br> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Em 06-02-2014 16:27, Mihael Hategan escreveu: > On Thu, 2014-02-06 at 11:09 -0200, Guilherme Gall wrote: > >> How can I address this issue to not leave behind species that >> would be modeled otherwise? > > Hi, > > If you are trying to have swift run everything it can despite any > errors in some of the parts of the script, you can set > "lazy.errors" to "true" in swift.properties. > > Mihael > Hi Mihael, It is exactly that I'm trying to do. Setting lazy.errors to true did the trick. Now every iteration that doesn't generate an error is executed. Thank you for your help. Regards, - -- Guilherme Gall CSR/LNCC GPG Public Key ID: A65ED0D5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJS9QsyAAoJEG9WBlOmXtDVCz8H/2kwkvm0sZMVZB1L2CfaswhG ljIW1ocMVKuGf6ztBLqNU408QT1qdegHunWmZlPMt8EqYc+3rjkBqVaAqWTTSwOH RAc86Y8+JrIMyzEXKPqq1sOTp2Rfdoxc140mzY2R6Y6ZKXHU3E348UmdK2GNOqdS SIU/KgoTm4DA/ULrRZDkjMlAolMa1dd6DB3tBHfhxF0q0pCJOR8HokmVVmzrXFLR EKjr4uCrHvAYkb21zOHq78n/Wy9PQ3+qAgMa4tD1FE3upNfaR5ZSAm1c+zUxqbgj ymiFoOe97Jk2brLz1z/3IGx73NUowv/+puMu4P9U2rKCJu1isnP6udFakVtoTHQ= =c/Hd -----END PGP SIGNATURE----- From ketan at mcs.anl.gov Mon Feb 10 20:23:07 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 10 Feb 2014 20:23:07 -0600 Subject: [Swift-user] Block task failed on Tukey/Cobalt Message-ID: Hi, Trying some Swift test for Swift-Galaxy demo on Tukey. Using local:cobalt coaster provider. I am getting the following error: $ swift -sites.file sites.xml -tc.file tc -config cf script.swift Swift 0.94 swift-r6637 cog-r3742 RunID: 20140211-0216-ryo7slj7 Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 Submitted:2 Active:1 Execution failed: Exception in sh: Arguments: [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, 0] Host: tukey Directory: script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml Caused by: Block task failed: anapp, script.swift, line 13 Tried many different options in swift.properties to no avail. Also tried the ATPESC 2013 tutorial setup but scripts fail with same pattern/error messages. Attaching the tarball with config, sites file. Thanks for any suggestions. Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swift-gal.NuMG.tgz Type: application/x-gzip Size: 13507 bytes Desc: not available URL: From wozniak at mcs.anl.gov Tue Feb 11 09:27:37 2014 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 11 Feb 2014 09:27:37 -0600 Subject: [Swift-user] Block task failed on Tukey/Cobalt In-Reply-To: References: Message-ID: <52FA4169.3030608@mcs.anl.gov> Has anyone else run successfully on Tukey? Can you inspect/post the Swift-generated Cobalt submit script? You may also want to inspect/post the Cobalt-generated *.submit.stdout/*.submit.stderr logs. On 02/10/2014 08:23 PM, Ketan Maheshwari wrote: > Hi, > > Trying some Swift test for Swift-Galaxy demo on Tukey. Using > local:cobalt coaster provider. I am getting the following error: > > $ swift -sites.file sites.xml -tc.file tc -config cf script.swift > Swift 0.94 swift-r6637 cog-r3742 > > RunID: 20140211-0216-ryo7slj7 > Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 > Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 > Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 > Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 Submitted:2 Active:1 > Execution failed: > Exception in sh: > Arguments: > [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, 0] > Host: tukey > Directory: script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml > Caused by: > Block task failed: > > > anapp, script.swift, line 13 > > Tried many different options in swift.properties to no avail. > > Also tried the ATPESC 2013 tutorial setup but scripts fail with same > pattern/error messages. > > Attaching the tarball with config, sites file. > > Thanks for any suggestions. > > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Justin M Wozniak -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Tue Feb 11 10:12:16 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 11 Feb 2014 10:12:16 -0600 Subject: [Swift-user] Block task failed on Tukey/Cobalt In-Reply-To: <52FA4169.3030608@mcs.anl.gov> References: <52FA4169.3030608@mcs.anl.gov> Message-ID: Swift Tutorial scripts worked on Tukey as of last August during the ATPESC tutorials. I find that empty Cobalt submit script files are created. On Tue, Feb 11, 2014 at 9:27 AM, Justin M Wozniak wrote: > > Has anyone else run successfully on Tukey? > > Can you inspect/post the Swift-generated Cobalt submit script? You may > also want to inspect/post the Cobalt-generated > *.submit.stdout/*.submit.stderr logs. > > On 02/10/2014 08:23 PM, Ketan Maheshwari wrote: > > Hi, > > Trying some Swift test for Swift-Galaxy demo on Tukey. Using > local:cobalt coaster provider. I am getting the following error: > > $ swift -sites.file sites.xml -tc.file tc -config cf script.swift > Swift 0.94 swift-r6637 cog-r3742 > > RunID: 20140211-0216-ryo7slj7 > Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 > Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 > Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 > Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 Submitted:2 Active:1 > Execution failed: > Exception in sh: > Arguments: > [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, 0] > Host: tukey > Directory: script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml > Caused by: > Block task failed: > > > anapp, script.swift, line 13 > > Tried many different options in swift.properties to no avail. > > Also tried the ATPESC 2013 tutorial setup but scripts fail with same > pattern/error messages. > > Attaching the tarball with config, sites file. > > Thanks for any suggestions. > > Ketan > > > _______________________________________________ > Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > Justin M Wozniak > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidkelly at uchicago.edu Tue Feb 11 10:51:21 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Tue, 11 Feb 2014 10:51:21 -0600 Subject: [Swift-user] Block task failed on Tukey/Cobalt In-Reply-To: References: <52FA4169.3030608@mcs.anl.gov> Message-ID: The Cobalt provider submits jobs via the command line rather than a submit script, so I believe empty submit scripts are normal. From the log it looks like jobs are getting submitted and getting a job number. Are you running from a filesystem that is shared on worker nodes? On Tue, Feb 11, 2014 at 10:12 AM, Ketan Maheshwari wrote: > Swift Tutorial scripts worked on Tukey as of last August during the ATPESC > tutorials. > > I find that empty Cobalt submit script files are created. > > > On Tue, Feb 11, 2014 at 9:27 AM, Justin M Wozniak wrote: > >> >> Has anyone else run successfully on Tukey? >> >> Can you inspect/post the Swift-generated Cobalt submit script? You may >> also want to inspect/post the Cobalt-generated >> *.submit.stdout/*.submit.stderr logs. >> >> On 02/10/2014 08:23 PM, Ketan Maheshwari wrote: >> >> Hi, >> >> Trying some Swift test for Swift-Galaxy demo on Tukey. Using >> local:cobalt coaster provider. I am getting the following error: >> >> $ swift -sites.file sites.xml -tc.file tc -config cf script.swift >> Swift 0.94 swift-r6637 cog-r3742 >> >> RunID: 20140211-0216-ryo7slj7 >> Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 >> Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 >> Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 >> Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 Submitted:2 Active:1 >> Execution failed: >> Exception in sh: >> Arguments: >> [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, 0] >> Host: tukey >> Directory: script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml >> Caused by: >> Block task failed: >> >> >> anapp, script.swift, line 13 >> >> Tried many different options in swift.properties to no avail. >> >> Also tried the ATPESC 2013 tutorial setup but scripts fail with same >> pattern/error messages. >> >> Attaching the tarball with config, sites file. >> >> Thanks for any suggestions. >> >> Ketan >> >> >> _______________________________________________ >> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> >> -- >> Justin M Wozniak >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Tue Feb 11 11:12:28 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 11 Feb 2014 11:12:28 -0600 Subject: [Swift-user] Block task failed on Tukey/Cobalt In-Reply-To: References: <52FA4169.3030608@mcs.anl.gov> Message-ID: This was fixed after changing the default project from the expired atpesc to ExM. Thanks! On Tue, Feb 11, 2014 at 10:51 AM, David Kelly wrote: > The Cobalt provider submits jobs via the command line rather than a submit > script, so I believe empty submit scripts are normal. From the log it looks > like jobs are getting submitted and getting a job number. Are you running > from a filesystem that is shared on worker nodes? > > > On Tue, Feb 11, 2014 at 10:12 AM, Ketan Maheshwari wrote: > >> Swift Tutorial scripts worked on Tukey as of last August during the >> ATPESC tutorials. >> >> I find that empty Cobalt submit script files are created. >> >> >> On Tue, Feb 11, 2014 at 9:27 AM, Justin M Wozniak wrote: >> >>> >>> Has anyone else run successfully on Tukey? >>> >>> Can you inspect/post the Swift-generated Cobalt submit script? You may >>> also want to inspect/post the Cobalt-generated >>> *.submit.stdout/*.submit.stderr logs. >>> >>> On 02/10/2014 08:23 PM, Ketan Maheshwari wrote: >>> >>> Hi, >>> >>> Trying some Swift test for Swift-Galaxy demo on Tukey. Using >>> local:cobalt coaster provider. I am getting the following error: >>> >>> $ swift -sites.file sites.xml -tc.file tc -config cf script.swift >>> Swift 0.94 swift-r6637 cog-r3742 >>> >>> RunID: 20140211-0216-ryo7slj7 >>> Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 >>> Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 >>> Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 >>> Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 Submitted:2 Active:1 >>> Execution failed: >>> Exception in sh: >>> Arguments: >>> [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, 0] >>> Host: tukey >>> Directory: script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml >>> Caused by: >>> Block task failed: >>> >>> >>> anapp, script.swift, line 13 >>> >>> Tried many different options in swift.properties to no avail. >>> >>> Also tried the ATPESC 2013 tutorial setup but scripts fail with same >>> pattern/error messages. >>> >>> Attaching the tarball with config, sites file. >>> >>> Thanks for any suggestions. >>> >>> Ketan >>> >>> >>> _______________________________________________ >>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >>> >>> >>> -- >>> Justin M Wozniak >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Tue Feb 11 11:23:45 2014 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 11 Feb 2014 11:23:45 -0600 Subject: [Swift-user] Block task failed on Tukey/Cobalt In-Reply-To: References: <52FA4169.3030608@mcs.anl.gov> Message-ID: <52FA5CA1.6020203@mcs.anl.gov> Let's add a bugzilla entry to make sure this error is reported clearly. On 02/11/2014 11:12 AM, Ketan Maheshwari wrote: > This was fixed after changing the default project from the expired > atpesc to ExM. > > Thanks! > > > On Tue, Feb 11, 2014 at 10:51 AM, David Kelly > wrote: > > The Cobalt provider submits jobs via the command line rather than > a submit script, so I believe empty submit scripts are normal. > From the log it looks like jobs are getting submitted and getting > a job number. Are you running from a filesystem that is shared on > worker nodes? > > > On Tue, Feb 11, 2014 at 10:12 AM, Ketan Maheshwari > > wrote: > > Swift Tutorial scripts worked on Tukey as of last August > during the ATPESC tutorials. > > I find that empty Cobalt submit script files are created. > > > On Tue, Feb 11, 2014 at 9:27 AM, Justin M Wozniak > > wrote: > > > Has anyone else run successfully on Tukey? > > Can you inspect/post the Swift-generated Cobalt submit > script? You may also want to inspect/post the > Cobalt-generated *.submit.stdout/*.submit.stderr logs. > > On 02/10/2014 08:23 PM, Ketan Maheshwari wrote: >> Hi, >> >> Trying some Swift test for Swift-Galaxy demo on Tukey. >> Using local:cobalt coaster provider. I am getting the >> following error: >> >> $ swift -sites.file sites.xml -tc.file tc -config cf >> script.swift >> Swift 0.94 swift-r6637 cog-r3742 >> >> RunID: 20140211-0216-ryo7slj7 >> Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 >> Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 >> Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 >> Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 >> Submitted:2 Active:1 >> Execution failed: >> Exception in sh: >> Arguments: >> [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, >> 0] >> Host: tukey >> Directory: >> script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml >> Caused by: >> Block task failed: >> >> >> anapp, script.swift, line 13 >> >> Tried many different options in swift.properties to no >> avail. >> >> Also tried the ATPESC 2013 tutorial setup but scripts >> fail with same pattern/error messages. >> >> Attaching the tarball with config, sites file. >> >> Thanks for any suggestions. >> >> Ketan >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > -- > Justin M Wozniak > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Justin M Wozniak -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Tue Feb 11 12:35:15 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 11 Feb 2014 12:35:15 -0600 Subject: [Swift-user] Block task failed on Tukey/Cobalt In-Reply-To: <52FA5CA1.6020203@mcs.anl.gov> References: <52FA4169.3030608@mcs.anl.gov> <52FA5CA1.6020203@mcs.anl.gov> Message-ID: There is bug 1185 filed already: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1185 On Tue, Feb 11, 2014 at 11:23 AM, Justin M Wozniak wrote: > > Let's add a bugzilla entry to make sure this error is reported clearly. > > On 02/11/2014 11:12 AM, Ketan Maheshwari wrote: > > This was fixed after changing the default project from the expired atpesc > to ExM. > > Thanks! > > > On Tue, Feb 11, 2014 at 10:51 AM, David Kelly wrote: > >> The Cobalt provider submits jobs via the command line rather than a >> submit script, so I believe empty submit scripts are normal. From the log >> it looks like jobs are getting submitted and getting a job number. Are you >> running from a filesystem that is shared on worker nodes? >> >> >> On Tue, Feb 11, 2014 at 10:12 AM, Ketan Maheshwari wrote: >> >>> Swift Tutorial scripts worked on Tukey as of last August during the >>> ATPESC tutorials. >>> >>> I find that empty Cobalt submit script files are created. >>> >>> >>> On Tue, Feb 11, 2014 at 9:27 AM, Justin M Wozniak wrote: >>> >>>> >>>> Has anyone else run successfully on Tukey? >>>> >>>> Can you inspect/post the Swift-generated Cobalt submit script? You may >>>> also want to inspect/post the Cobalt-generated >>>> *.submit.stdout/*.submit.stderr logs. >>>> >>>> On 02/10/2014 08:23 PM, Ketan Maheshwari wrote: >>>> >>>> Hi, >>>> >>>> Trying some Swift test for Swift-Galaxy demo on Tukey. Using >>>> local:cobalt coaster provider. I am getting the following error: >>>> >>>> $ swift -sites.file sites.xml -tc.file tc -config cf script.swift >>>> Swift 0.94 swift-r6637 cog-r3742 >>>> >>>> RunID: 20140211-0216-ryo7slj7 >>>> Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 >>>> Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 >>>> Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 >>>> Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 Submitted:2 Active:1 >>>> Execution failed: >>>> Exception in sh: >>>> Arguments: >>>> [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, 0] >>>> Host: tukey >>>> Directory: script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml >>>> Caused by: >>>> Block task failed: >>>> >>>> >>>> anapp, script.swift, line 13 >>>> >>>> Tried many different options in swift.properties to no avail. >>>> >>>> Also tried the ATPESC 2013 tutorial setup but scripts fail with same >>>> pattern/error messages. >>>> >>>> Attaching the tarball with config, sites file. >>>> >>>> Thanks for any suggestions. >>>> >>>> Ketan >>>> >>>> >>>> _______________________________________________ >>>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>> >>>> >>>> >>>> -- >>>> Justin M Wozniak >>>> >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>> >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > > _______________________________________________ > Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > Justin M Wozniak > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Feb 11 16:21:47 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 11 Feb 2014 14:21:47 -0800 Subject: [Swift-user] Block task failed on Tukey/Cobalt In-Reply-To: References: <52FA4169.3030608@mcs.anl.gov> <52FA5CA1.6020203@mcs.anl.gov> Message-ID: <1392157307.21435.0.camel@echo> That's for PBS which is only marginally related to the Cobalt provider. Mihael On Tue, 2014-02-11 at 12:35 -0600, Ketan Maheshwari wrote: > There is bug 1185 filed already: > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1185 > > > On Tue, Feb 11, 2014 at 11:23 AM, Justin M Wozniak wrote: > > > > > Let's add a bugzilla entry to make sure this error is reported clearly. > > > > On 02/11/2014 11:12 AM, Ketan Maheshwari wrote: > > > > This was fixed after changing the default project from the expired atpesc > > to ExM. > > > > Thanks! > > > > > > On Tue, Feb 11, 2014 at 10:51 AM, David Kelly wrote: > > > >> The Cobalt provider submits jobs via the command line rather than a > >> submit script, so I believe empty submit scripts are normal. From the log > >> it looks like jobs are getting submitted and getting a job number. Are you > >> running from a filesystem that is shared on worker nodes? > >> > >> > >> On Tue, Feb 11, 2014 at 10:12 AM, Ketan Maheshwari wrote: > >> > >>> Swift Tutorial scripts worked on Tukey as of last August during the > >>> ATPESC tutorials. > >>> > >>> I find that empty Cobalt submit script files are created. > >>> > >>> > >>> On Tue, Feb 11, 2014 at 9:27 AM, Justin M Wozniak wrote: > >>> > >>>> > >>>> Has anyone else run successfully on Tukey? > >>>> > >>>> Can you inspect/post the Swift-generated Cobalt submit script? You may > >>>> also want to inspect/post the Cobalt-generated > >>>> *.submit.stdout/*.submit.stderr logs. > >>>> > >>>> On 02/10/2014 08:23 PM, Ketan Maheshwari wrote: > >>>> > >>>> Hi, > >>>> > >>>> Trying some Swift test for Swift-Galaxy demo on Tukey. Using > >>>> local:cobalt coaster provider. I am getting the following error: > >>>> > >>>> $ swift -sites.file sites.xml -tc.file tc -config cf script.swift > >>>> Swift 0.94 swift-r6637 cog-r3742 > >>>> > >>>> RunID: 20140211-0216-ryo7slj7 > >>>> Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 > >>>> Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 > >>>> Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 > >>>> Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 Submitted:2 Active:1 > >>>> Execution failed: > >>>> Exception in sh: > >>>> Arguments: > >>>> [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, 0] > >>>> Host: tukey > >>>> Directory: script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml > >>>> Caused by: > >>>> Block task failed: > >>>> > >>>> > >>>> anapp, script.swift, line 13 > >>>> > >>>> Tried many different options in swift.properties to no avail. > >>>> > >>>> Also tried the ATPESC 2013 tutorial setup but scripts fail with same > >>>> pattern/error messages. > >>>> > >>>> Attaching the tarball with config, sites file. > >>>> > >>>> Thanks for any suggestions. > >>>> > >>>> Ketan > >>>> > >>>> > >>>> _______________________________________________ > >>>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > >>>> > >>>> > >>>> > >>>> -- > >>>> Justin M Wozniak > >>>> > >>>> > >>>> _______________________________________________ > >>>> Swift-user mailing list > >>>> Swift-user at ci.uchicago.edu > >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > >>>> > >>> > >>> > >>> _______________________________________________ > >>> Swift-user mailing list > >>> Swift-user at ci.uchicago.edu > >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > >>> > >> > >> > >> _______________________________________________ > >> Swift-user mailing list > >> Swift-user at ci.uchicago.edu > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > >> > > > > > > > > _______________________________________________ > > Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > -- > > Justin M Wozniak > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From ketan at mcs.anl.gov Tue Feb 11 16:38:09 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 11 Feb 2014 16:38:09 -0600 Subject: [Swift-user] Block task failed on Tukey/Cobalt In-Reply-To: <1392157307.21435.0.camel@echo> References: <52FA4169.3030608@mcs.anl.gov> <52FA5CA1.6020203@mcs.anl.gov> <1392157307.21435.0.camel@echo> Message-ID: Filed bug 1199 for Cobalt: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1199 On Tue, Feb 11, 2014 at 4:21 PM, Mihael Hategan wrote: > That's for PBS which is only marginally related to the Cobalt provider. > > Mihael > > On Tue, 2014-02-11 at 12:35 -0600, Ketan Maheshwari wrote: > > There is bug 1185 filed already: > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1185 > > > > > > On Tue, Feb 11, 2014 at 11:23 AM, Justin M Wozniak >wrote: > > > > > > > > Let's add a bugzilla entry to make sure this error is reported clearly. > > > > > > On 02/11/2014 11:12 AM, Ketan Maheshwari wrote: > > > > > > This was fixed after changing the default project from the expired > atpesc > > > to ExM. > > > > > > Thanks! > > > > > > > > > On Tue, Feb 11, 2014 at 10:51 AM, David Kelly >wrote: > > > > > >> The Cobalt provider submits jobs via the command line rather than a > > >> submit script, so I believe empty submit scripts are normal. From the > log > > >> it looks like jobs are getting submitted and getting a job number. > Are you > > >> running from a filesystem that is shared on worker nodes? > > >> > > >> > > >> On Tue, Feb 11, 2014 at 10:12 AM, Ketan Maheshwari < > ketan at mcs.anl.gov>wrote: > > >> > > >>> Swift Tutorial scripts worked on Tukey as of last August during the > > >>> ATPESC tutorials. > > >>> > > >>> I find that empty Cobalt submit script files are created. > > >>> > > >>> > > >>> On Tue, Feb 11, 2014 at 9:27 AM, Justin M Wozniak < > wozniak at mcs.anl.gov>wrote: > > >>> > > >>>> > > >>>> Has anyone else run successfully on Tukey? > > >>>> > > >>>> Can you inspect/post the Swift-generated Cobalt submit script? You > may > > >>>> also want to inspect/post the Cobalt-generated > > >>>> *.submit.stdout/*.submit.stderr logs. > > >>>> > > >>>> On 02/10/2014 08:23 PM, Ketan Maheshwari wrote: > > >>>> > > >>>> Hi, > > >>>> > > >>>> Trying some Swift test for Swift-Galaxy demo on Tukey. Using > > >>>> local:cobalt coaster provider. I am getting the following error: > > >>>> > > >>>> $ swift -sites.file sites.xml -tc.file tc -config cf script.swift > > >>>> Swift 0.94 swift-r6637 cog-r3742 > > >>>> > > >>>> RunID: 20140211-0216-ryo7slj7 > > >>>> Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 > > >>>> Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 > > >>>> Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 > > >>>> Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 Submitted:2 > Active:1 > > >>>> Execution failed: > > >>>> Exception in sh: > > >>>> Arguments: > > >>>> [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, > 0] > > >>>> Host: tukey > > >>>> Directory: script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml > > >>>> Caused by: > > >>>> Block task failed: > > >>>> > > >>>> > > >>>> anapp, script.swift, line 13 > > >>>> > > >>>> Tried many different options in swift.properties to no avail. > > >>>> > > >>>> Also tried the ATPESC 2013 tutorial setup but scripts fail with > same > > >>>> pattern/error messages. > > >>>> > > >>>> Attaching the tarball with config, sites file. > > >>>> > > >>>> Thanks for any suggestions. > > >>>> > > >>>> Ketan > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps:// > lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Justin M Wozniak > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> Swift-user mailing list > > >>>> Swift-user at ci.uchicago.edu > > >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > >>>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> Swift-user mailing list > > >>> Swift-user at ci.uchicago.edu > > >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > >>> > > >> > > >> > > >> _______________________________________________ > > >> Swift-user mailing list > > >> Swift-user at ci.uchicago.edu > > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > >> > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing listSwift-user at ci.uchicago.eduhttps:// > lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > -- > > > Justin M Wozniak > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidkelly at uchicago.edu Wed Feb 12 15:45:04 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Wed, 12 Feb 2014 15:45:04 -0600 Subject: [Swift-user] Block task failed on Tukey/Cobalt In-Reply-To: References: <52FA4169.3030608@mcs.anl.gov> <52FA5CA1.6020203@mcs.anl.gov> <1392157307.21435.0.camel@echo> Message-ID: In 0.95, when a scheduler submit command fails with a non-zero exit code, the task will fail and you should see output from the submit command. When I try to run with an invalid project on Tukey using 0.95, it displays this: Progress: Wed, 12 Feb 2014 21:20:59+0000 Progress: Wed, 12 Feb 2014 21:21:00+0000 Submitted:1 Could not submit job (cqsub reported an exit code of 1). The allocation for ATPESC2013 on tukey has expired. Projects available: ATPESC2013 ExM For assistance, contact support at alcf.anl.gov Filter /soft/cobalt/scripts/clusterbank-account failed ... On Tue, Feb 11, 2014 at 4:38 PM, Ketan Maheshwari wrote: > Filed bug 1199 for Cobalt: > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1199 > > > On Tue, Feb 11, 2014 at 4:21 PM, Mihael Hategan wrote: > >> That's for PBS which is only marginally related to the Cobalt provider. >> >> Mihael >> >> On Tue, 2014-02-11 at 12:35 -0600, Ketan Maheshwari wrote: >> > There is bug 1185 filed already: >> > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1185 >> > >> > >> > On Tue, Feb 11, 2014 at 11:23 AM, Justin M Wozniak > >wrote: >> > >> > > >> > > Let's add a bugzilla entry to make sure this error is reported >> clearly. >> > > >> > > On 02/11/2014 11:12 AM, Ketan Maheshwari wrote: >> > > >> > > This was fixed after changing the default project from the expired >> atpesc >> > > to ExM. >> > > >> > > Thanks! >> > > >> > > >> > > On Tue, Feb 11, 2014 at 10:51 AM, David Kelly < >> davidkelly at uchicago.edu>wrote: >> > > >> > >> The Cobalt provider submits jobs via the command line rather than a >> > >> submit script, so I believe empty submit scripts are normal. From >> the log >> > >> it looks like jobs are getting submitted and getting a job number. >> Are you >> > >> running from a filesystem that is shared on worker nodes? >> > >> >> > >> >> > >> On Tue, Feb 11, 2014 at 10:12 AM, Ketan Maheshwari < >> ketan at mcs.anl.gov>wrote: >> > >> >> > >>> Swift Tutorial scripts worked on Tukey as of last August during the >> > >>> ATPESC tutorials. >> > >>> >> > >>> I find that empty Cobalt submit script files are created. >> > >>> >> > >>> >> > >>> On Tue, Feb 11, 2014 at 9:27 AM, Justin M Wozniak < >> wozniak at mcs.anl.gov>wrote: >> > >>> >> > >>>> >> > >>>> Has anyone else run successfully on Tukey? >> > >>>> >> > >>>> Can you inspect/post the Swift-generated Cobalt submit script? You >> may >> > >>>> also want to inspect/post the Cobalt-generated >> > >>>> *.submit.stdout/*.submit.stderr logs. >> > >>>> >> > >>>> On 02/10/2014 08:23 PM, Ketan Maheshwari wrote: >> > >>>> >> > >>>> Hi, >> > >>>> >> > >>>> Trying some Swift test for Swift-Galaxy demo on Tukey. Using >> > >>>> local:cobalt coaster provider. I am getting the following error: >> > >>>> >> > >>>> $ swift -sites.file sites.xml -tc.file tc -config cf script.swift >> > >>>> Swift 0.94 swift-r6637 cog-r3742 >> > >>>> >> > >>>> RunID: 20140211-0216-ryo7slj7 >> > >>>> Progress: time: Tue, 11 Feb 2014 02:16:53 +0000 >> > >>>> Progress: time: Tue, 11 Feb 2014 02:17:23 +0000 Submitted:3 >> > >>>> Progress: time: Tue, 11 Feb 2014 02:17:53 +0000 Submitted:3 >> > >>>> Progress: time: Tue, 11 Feb 2014 02:17:58 +0000 Submitted:2 >> Active:1 >> > >>>> Execution failed: >> > >>>> Exception in sh: >> > >>>> Arguments: >> > >>>> >> [gpfs/mira-home/ketan/galaxy-dist/database/files/000/dataset_1.dat, 0] >> > >>>> Host: tukey >> > >>>> Directory: script-20140211-0216-ryo7slj7/jobs/u/sh-uv25u9ml >> > >>>> Caused by: >> > >>>> Block task failed: >> > >>>> >> > >>>> >> > >>>> anapp, script.swift, line 13 >> > >>>> >> > >>>> Tried many different options in swift.properties to no avail. >> > >>>> >> > >>>> Also tried the ATPESC 2013 tutorial setup but scripts fail with >> same >> > >>>> pattern/error messages. >> > >>>> >> > >>>> Attaching the tarball with config, sites file. >> > >>>> >> > >>>> Thanks for any suggestions. >> > >>>> >> > >>>> Ketan >> > >>>> >> > >>>> >> > >>>> _______________________________________________ >> > >>>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps:// >> lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > >>>> >> > >>>> >> > >>>> >> > >>>> -- >> > >>>> Justin M Wozniak >> > >>>> >> > >>>> >> > >>>> _______________________________________________ >> > >>>> Swift-user mailing list >> > >>>> Swift-user at ci.uchicago.edu >> > >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > >>>> >> > >>> >> > >>> >> > >>> _______________________________________________ >> > >>> Swift-user mailing list >> > >>> Swift-user at ci.uchicago.edu >> > >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > >>> >> > >> >> > >> >> > >> _______________________________________________ >> > >> Swift-user mailing list >> > >> Swift-user at ci.uchicago.edu >> > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > >> >> > > >> > > >> > > >> > > _______________________________________________ >> > > Swift-user mailing listSwift-user at ci.uchicago.eduhttps:// >> lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > >> > > >> > > >> > > -- >> > > Justin M Wozniak >> > > >> > > >> > > _______________________________________________ >> > > Swift-user mailing list >> > > Swift-user at ci.uchicago.edu >> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > >> > _______________________________________________ >> > Swift-user mailing list >> > Swift-user at ci.uchicago.edu >> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Matthew.Shaxted at som.com Thu Feb 13 12:23:24 2014 From: Matthew.Shaxted at som.com (Matthew Shaxted) Date: Thu, 13 Feb 2014 13:23:24 -0500 Subject: [Swift-user] Coaster Service Startup Time on OSDC Message-ID: Dear Swift User Group: I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start. This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again. Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster? Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF28B4.AA67FE00] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF28B4.AA67FE00] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: image002.png URL: From davidkelly at uchicago.edu Thu Feb 13 12:52:41 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Thu, 13 Feb 2014 12:52:41 -0600 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: References: Message-ID: Hi Matthew, Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this? There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks! Regards, David On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted wrote: > Dear Swift User Group: > > > > I working with a 60-node cluster running on OSDC now, and when I try to > start the Swift coaster-service for these nodes, it takes about 30 seconds > (or more) per node to successfully start. > > > > This is an issue for me as it limits how often I want to shut down the > coaster-service - for this 60 node cluster it could take up to 30 min to > start up again. > > > > Is this starting coaster behavior normal? Is there anything I can do to > make the coaster-service start faster? > > > > Thanks, > Matthew > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: not available URL: From Matthew.Shaxted at som.com Thu Feb 13 13:04:54 2014 From: Matthew.Shaxted at som.com (Matthew Shaxted) Date: Thu, 13 Feb 2014 14:04:54 -0500 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: References: Message-ID: Sure thing David, I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt file (see attached) - all nodes are running centOS. The conf file I am using is also attached. I'm using swift-0.94.1. Many thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF28BC.0C826AB0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF28BC.0C826AB0] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Thursday, February 13, 2014 12:53 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this? There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks! Regards, David On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: Dear Swift User Group: I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start. This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again. Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster? Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF28BC.0C826AB0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF28BC.0C826AB0] _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: coaster-service.conf Type: application/octet-stream Size: 1594 bytes Desc: coaster-service.conf URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: hosts.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: setup.sh Type: application/octet-stream Size: 133 bytes Desc: setup.sh URL: From davidkelly at uchicago.edu Mon Feb 17 14:14:29 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Mon, 17 Feb 2014 14:14:29 -0600 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: References: Message-ID: Hi Matthew, I set up a test on OSDC with 10 nodes. I did notice something strange there. When I try to SSH from the Sullivan head node to one of my CentOS instances, it takes much longer than it should: dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts. anaconda-ks.cfg install.log install.log.syslog real 0m25.197s Are you running start-coaster-service from the Sullivan head node? For each node in your node list, there are three SSH commands (one to create a directory structure, one to scp the worker.pl script there, and one to launch worker.pl). This is done serially in the 0.94 branch. I can see how this would take very long. I will make some changes to speed up start-coaster-service, but in the meantime, here are a few suggestions: 1. The SSH slowness seems to only be from the Sullivan head node to the VMs. SSH connections from one VM to another VM is pretty quick. Are you able to run Swift and start-coaster-service on a VM? 2. Is a persistent coasters setup needed here? OSDC has an option to launch instances as a cluster, which makes available a PBS scheduler. You could set this up and avoid the need to start and manage workers yourself. Let me know what you think. I have some example OSDC/PBS configs if you decide to go that route. Thanks, David On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted wrote: > Sure thing David, > > > > I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt > file (see attached) - all nodes are running centOS. > > > > The conf file I am using is also attached. I'm using swift-0.94.1. > > > > Many thanks, > > Matthew > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > *From:* David Kelly [mailto:davidkelly at uchicago.edu] > *Sent:* Thursday, February 13, 2014 12:53 PM > *To:* Matthew Shaxted > *Cc:* swift-user at ci.uchicago.edu > *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC > > > > Hi Matthew, > > > > Could you please explain a little more about how you're starting the > coaster-service and workers? Are you using the start-coaster-service > script? If you are, could you please send the coaster-service.conf file > you're using? Which version of Swift is this? > > > > There may be some things we can do to speed up the process - just need to > get a better understanding of how things are set up and where the delays > are coming from. Thanks! > > > > Regards, > > David > > > > On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: > > Dear Swift User Group: > > > > I working with a 60-node cluster running on OSDC now, and when I try to > start the Swift coaster-service for these nodes, it takes about 30 seconds > (or more) per node to successfully start. > > > > This is an issue for me as it limits how often I want to shut down the > coaster-service - for this 60 node cluster it could take up to 30 min to > start up again. > > > > Is this starting coaster behavior normal? Is there anything I can do to > make the coaster-service start faster? > > > > Thanks, > Matthew > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: not available URL: From Matthew.Shaxted at som.com Mon Feb 17 15:34:19 2014 From: Matthew.Shaxted at som.com (Matthew Shaxted) Date: Mon, 17 Feb 2014 16:34:19 -0500 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: <85C85E44DD880E498CEA5A501B27954BEA78ED@DITKA.anl.gov> References: , <85C85E44DD880E498CEA5A501B27954BEA78ED@DITKA.anl.gov> Message-ID: Hi David, Indeed I am running start-coaster-service from the head node. The first recommendation is a good one, I will try this out and let you know how it works. I am also very interested in the cluster launch/PBS scheduler approach, although I have never used it before. An example OSDC/PBS config would be really helpful. Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2BF5.125EFA70] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2BF5.125EFA70] From: Wilde, Michael J. [mailto:wilde at anl.gov] Sent: Monday, February 17, 2014 2:36 PM To: David Kelly; Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: RE: [Swift-user] Coaster Service Startup Time on OSDC Good find, David. Did you file a ticket on the slowness with OSDC Support? When you run ssh -vvv, does the timing of log output suggest where the problem is? Can you run into some tool like typescript, or a "screen" log, that will timestamp the records, and send those to OSDC Support? - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago ________________________________ From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 2:14 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, I set up a test on OSDC with 10 nodes. I did notice something strange there. When I try to SSH from the Sullivan head node to one of my CentOS instances, it takes much longer than it should: dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts. anaconda-ks.cfg install.log install.log.syslog real 0m25.197s Are you running start-coaster-service from the Sullivan head node? For each node in your node list, there are three SSH commands (one to create a directory structure, one to scp the worker.pl script there, and one to launch worker.pl). This is done serially in the 0.94 branch. I can see how this would take very long. I will make some changes to speed up start-coaster-service, but in the meantime, here are a few suggestions: 1. The SSH slowness seems to only be from the Sullivan head node to the VMs. SSH connections from one VM to another VM is pretty quick. Are you able to run Swift and start-coaster-service on a VM? 2. Is a persistent coasters setup needed here? OSDC has an option to launch instances as a cluster, which makes available a PBS scheduler. You could set this up and avoid the need to start and manage workers yourself. Let me know what you think. I have some example OSDC/PBS configs if you decide to go that route. Thanks, David On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted > wrote: Sure thing David, I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt file (see attached) - all nodes are running centOS. The conf file I am using is also attached. I'm using swift-0.94.1. Many thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2BF5.125EFA70] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2BF5.125EFA70] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Thursday, February 13, 2014 12:53 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this? There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks! Regards, David On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: Dear Swift User Group: I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start. This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again. Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster? Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2BF5.125EFA70] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2BF5.125EFA70] _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: image002.png URL: From davidkelly at uchicago.edu Mon Feb 17 19:21:06 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Mon, 17 Feb 2014 19:21:06 -0600 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: References: <85C85E44DD880E498CEA5A501B27954BEA78ED@DITKA.anl.gov> Message-ID: Sounds good, Matthew. Let me know how that works for you. I filed a ticket with osdc support about the SSH issue, so hopefully they can offer some help there. I added an entry to our site guide documentation about how to run on OSDC in cluster mode. It is at http://swiftlang.org/guides/release-0.94/siteguide/siteguide.html#_open_science_data_grid. The only potential issue is, it seems to only work with the standard Ubuntu images. On Mon, Feb 17, 2014 at 3:34 PM, Matthew Shaxted wrote: > Hi David, > > > > Indeed I am running start-coaster-service from the head node. > > > > The first recommendation is a good one, I will try this out and let you > know how it works. > > > > I am also very interested in the cluster launch/PBS scheduler approach, > although I have never used it before. An example OSDC/PBS config would be > really helpful. > > > > Thanks, > > Matthew > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > *From:* Wilde, Michael J. [mailto:wilde at anl.gov] > *Sent:* Monday, February 17, 2014 2:36 PM > *To:* David Kelly; Matthew Shaxted > *Cc:* swift-user at ci.uchicago.edu > *Subject:* RE: [Swift-user] Coaster Service Startup Time on OSDC > > > > Good find, David. Did you file a ticket on the slowness with OSDC Support? > > When you run ssh -vvv, does the timing of log output suggest where the > problem is? > > Can you run into some tool like typescript, or a "screen" log, that will > timestamp the records, and send those to OSDC Support? > > > > - Mike > > -- > > Michael Wilde > > Mathematics and Computer Science Computation Institute > > Argonne National Laboratory The University of Chicago > > > ------------------------------ > > *From:* swift-user-bounces at ci.uchicago.edu [ > swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [ > davidkelly at uchicago.edu] > *Sent:* Monday, February 17, 2014 2:14 PM > *To:* Matthew Shaxted > *Cc:* swift-user at ci.uchicago.edu > *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC > > Hi Matthew, > > > > I set up a test on OSDC with 10 nodes. I did notice something strange > there. When I try to SSH from the Sullivan head node to one of my CentOS > instances, it takes much longer than it should: > > > > dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls > Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts. > anaconda-ks.cfg > install.log > install.log.syslog > > > real 0m25.197s > > > > Are you running start-coaster-service from the Sullivan head node? For > each node in your node list, there are three SSH commands (one to create a > directory structure, one to scp the worker.pl script there, and one to > launch worker.pl). This is done serially in the 0.94 branch. I can see > how this would take very long. I will make some changes to speed up > start-coaster-service, but in the meantime, here are a few suggestions: > > > > 1. The SSH slowness seems to only be from the Sullivan head node to the > VMs. SSH connections from one VM to another VM is pretty quick. Are you > able to run Swift and start-coaster-service on a VM? > > > > 2. Is a persistent coasters setup needed here? OSDC has an option to > launch instances as a cluster, which makes available a PBS scheduler. You > could set this up and avoid the need to start and manage workers yourself. > > > > Let me know what you think. I have some example OSDC/PBS configs if you > decide to go that route. > > > > Thanks, > > David > > > > On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted > wrote: > > Sure thing David, > > > > I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt > file (see attached) - all nodes are running centOS. > > > > The conf file I am using is also attached. I'm using swift-0.94.1. > > > > Many thanks, > > Matthew > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > *From:* David Kelly [mailto:davidkelly at uchicago.edu] > *Sent:* Thursday, February 13, 2014 12:53 PM > *To:* Matthew Shaxted > *Cc:* swift-user at ci.uchicago.edu > *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC > > > > Hi Matthew, > > > > Could you please explain a little more about how you're starting the > coaster-service and workers? Are you using the start-coaster-service > script? If you are, could you please send the coaster-service.conf file > you're using? Which version of Swift is this? > > > > There may be some things we can do to speed up the process - just need to > get a better understanding of how things are set up and where the delays > are coming from. Thanks! > > > > Regards, > > David > > > > On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: > > Dear Swift User Group: > > > > I working with a 60-node cluster running on OSDC now, and when I try to > start the Swift coaster-service for these nodes, it takes about 30 seconds > (or more) per node to successfully start. > > > > This is an issue for me as it limits how often I want to shut down the > coaster-service - for this 60 node cluster it could take up to 30 min to > start up again. > > > > Is this starting coaster behavior normal? Is there anything I can do to > make the coaster-service start faster? > > > > Thanks, > Matthew > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: not available URL: From Matthew.Shaxted at som.com Mon Feb 17 19:54:01 2014 From: Matthew.Shaxted at som.com (Matthew Shaxted) Date: Mon, 17 Feb 2014 20:54:01 -0500 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: References: <85C85E44DD880E498CEA5A501B27954BEA78ED@DITKA.anl.gov> Message-ID: Thanks David, I'm going to test the pbs sites.xml approach and see how it works. Indeed the time to start coasters is really quite irritating, especially since my coasters are now not staying in a persistent state. I have a 12 hr job running on all my OSDC cores now (actually just requested more) so I will test it after this finishes. Now that I'm looking at my below sites.xml file though, it seems only my localhost pool is not 'coaster-persistent' so this is most likely what is causing the issue. On a slightly different note sites.xml file note, my start-coaster-service is rewriting the sites.xml file each time. Is there a simple way to prevent this from happening, or it does need to happen, can I at least add my localhost pool into the file automatically. My current OSDC sites file looks like below and I have been manually adding in the localhost pool everytime I restart the coasters: passive 1 1000 10000 /tmp/mshaxted/swiftwork .23 10000 /tmp/mshaxted/swiftwork MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2C19.5DF4E9D0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2C19.5DF4E9D0] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 7:21 PM To: Matthew Shaxted Cc: Wilde, Michael J.; swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Sounds good, Matthew. Let me know how that works for you. I filed a ticket with osdc support about the SSH issue, so hopefully they can offer some help there. I added an entry to our site guide documentation about how to run on OSDC in cluster mode. It is at http://swiftlang.org/guides/release-0.94/siteguide/siteguide.html#_open_science_data_grid. The only potential issue is, it seems to only work with the standard Ubuntu images. On Mon, Feb 17, 2014 at 3:34 PM, Matthew Shaxted > wrote: Hi David, Indeed I am running start-coaster-service from the head node. The first recommendation is a good one, I will try this out and let you know how it works. I am also very interested in the cluster launch/PBS scheduler approach, although I have never used it before. An example OSDC/PBS config would be really helpful. Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2C19.5DF4E9D0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2C19.5DF4E9D0] From: Wilde, Michael J. [mailto:wilde at anl.gov] Sent: Monday, February 17, 2014 2:36 PM To: David Kelly; Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: RE: [Swift-user] Coaster Service Startup Time on OSDC Good find, David. Did you file a ticket on the slowness with OSDC Support? When you run ssh -vvv, does the timing of log output suggest where the problem is? Can you run into some tool like typescript, or a "screen" log, that will timestamp the records, and send those to OSDC Support? - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago ________________________________ From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 2:14 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, I set up a test on OSDC with 10 nodes. I did notice something strange there. When I try to SSH from the Sullivan head node to one of my CentOS instances, it takes much longer than it should: dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts. anaconda-ks.cfg install.log install.log.syslog real 0m25.197s Are you running start-coaster-service from the Sullivan head node? For each node in your node list, there are three SSH commands (one to create a directory structure, one to scp the worker.pl script there, and one to launch worker.pl). This is done serially in the 0.94 branch. I can see how this would take very long. I will make some changes to speed up start-coaster-service, but in the meantime, here are a few suggestions: 1. The SSH slowness seems to only be from the Sullivan head node to the VMs. SSH connections from one VM to another VM is pretty quick. Are you able to run Swift and start-coaster-service on a VM? 2. Is a persistent coasters setup needed here? OSDC has an option to launch instances as a cluster, which makes available a PBS scheduler. You could set this up and avoid the need to start and manage workers yourself. Let me know what you think. I have some example OSDC/PBS configs if you decide to go that route. Thanks, David On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted > wrote: Sure thing David, I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt file (see attached) - all nodes are running centOS. The conf file I am using is also attached. I'm using swift-0.94.1. Many thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2C19.5DF4E9D0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2C19.5DF4E9D0] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Thursday, February 13, 2014 12:53 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this? There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks! Regards, David On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: Dear Swift User Group: I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start. This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again. Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster? Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2C19.5DF4E9D0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2C19.5DF4E9D0] _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: image002.png URL: From davidkelly at uchicago.edu Mon Feb 17 20:30:10 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Mon, 17 Feb 2014 20:30:10 -0600 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: References: <85C85E44DD880E498CEA5A501B27954BEA78ED@DITKA.anl.gov> Message-ID: Hi Matthew, That sounds good. I prefer the scheduler option too when I run there. Let me know if there's anything that needs clarification when you get a chance to test it. There is a way to change the sites.xml file that gets generated by start-coaster-service. You can add additional pools to the etc/sites/persistent-coasters file that is contained in the Swift installation directory. On Mon, Feb 17, 2014 at 7:54 PM, Matthew Shaxted wrote: > Thanks David, > > > > I'm going to test the pbs sites.xml approach and see how it works. Indeed > the time to start coasters is really quite irritating, especially since my > coasters are now not staying in a persistent state. I have a 12 hr job > running on all my OSDC cores now (actually just requested more) so I will > test it after this finishes. > > > > Now that I'm looking at my below sites.xml file though, it seems only my > localhost pool is not 'coaster-persistent' so this is most likely what is > causing the issue. > > > > On a slightly different note sites.xml file note, my start-coaster-service > is rewriting the sites.xml file each time. Is there a simple way to prevent > this from happening, or it does need to happen, can > I at least add my localhost pool into the file automatically. My current > OSDC sites file looks like below and I have been manually adding in the > localhost pool everytime I restart the coasters: > > > > > > > handle="persistent-coasters"> > > > > > url="http://localhost:42860" > > > jobmanager="local:local"/> > > > key="workerManager">passive > > > key="jobsPerNode">1 > > > namespace="karajan">1000 > > > key="initialScore">10000 > > > /> > > > /tmp/mshaxted/swiftwork > > > > > > > > > > > namespace="karajan">.23 > > > key="initialScore">10000 > > > /tmp/mshaxted/swiftwork > > > > > > > > > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > *From:* David Kelly [mailto:davidkelly at uchicago.edu] > *Sent:* Monday, February 17, 2014 7:21 PM > *To:* Matthew Shaxted > *Cc:* Wilde, Michael J.; swift-user at ci.uchicago.edu > > *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC > > > > Sounds good, Matthew. Let me know how that works for you. I filed a ticket > with osdc support about the SSH issue, so hopefully they can offer some > help there. > > > > I added an entry to our site guide documentation about how to run on OSDC > in cluster mode. It is at > http://swiftlang.org/guides/release-0.94/siteguide/siteguide.html#_open_science_data_grid. > The only potential issue is, it seems to only work with the standard Ubuntu > images. > > > > On Mon, Feb 17, 2014 at 3:34 PM, Matthew Shaxted > wrote: > > Hi David, > > > > Indeed I am running start-coaster-service from the head node. > > > > The first recommendation is a good one, I will try this out and let you > know how it works. > > > > I am also very interested in the cluster launch/PBS scheduler approach, > although I have never used it before. An example OSDC/PBS config would be > really helpful. > > > > Thanks, > > Matthew > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > *From:* Wilde, Michael J. [mailto:wilde at anl.gov] > *Sent:* Monday, February 17, 2014 2:36 PM > *To:* David Kelly; Matthew Shaxted > *Cc:* swift-user at ci.uchicago.edu > *Subject:* RE: [Swift-user] Coaster Service Startup Time on OSDC > > > > Good find, David. Did you file a ticket on the slowness with OSDC Support? > > When you run ssh -vvv, does the timing of log output suggest where the > problem is? > > Can you run into some tool like typescript, or a "screen" log, that will > timestamp the records, and send those to OSDC Support? > > > > - Mike > > -- > > Michael Wilde > > Mathematics and Computer Science Computation Institute > > Argonne National Laboratory The University of Chicago > > > ------------------------------ > > *From:* swift-user-bounces at ci.uchicago.edu [ > swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [ > davidkelly at uchicago.edu] > *Sent:* Monday, February 17, 2014 2:14 PM > *To:* Matthew Shaxted > *Cc:* swift-user at ci.uchicago.edu > *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC > > Hi Matthew, > > > > I set up a test on OSDC with 10 nodes. I did notice something strange > there. When I try to SSH from the Sullivan head node to one of my CentOS > instances, it takes much longer than it should: > > > > dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls > Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts. > anaconda-ks.cfg > install.log > install.log.syslog > > > real 0m25.197s > > > > Are you running start-coaster-service from the Sullivan head node? For > each node in your node list, there are three SSH commands (one to create a > directory structure, one to scp the worker.pl script there, and one to > launch worker.pl). This is done serially in the 0.94 branch. I can see > how this would take very long. I will make some changes to speed up > start-coaster-service, but in the meantime, here are a few suggestions: > > > > 1. The SSH slowness seems to only be from the Sullivan head node to the > VMs. SSH connections from one VM to another VM is pretty quick. Are you > able to run Swift and start-coaster-service on a VM? > > > > 2. Is a persistent coasters setup needed here? OSDC has an option to > launch instances as a cluster, which makes available a PBS scheduler. You > could set this up and avoid the need to start and manage workers yourself. > > > > Let me know what you think. I have some example OSDC/PBS configs if you > decide to go that route. > > > > Thanks, > > David > > > > On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted > wrote: > > Sure thing David, > > > > I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt > file (see attached) - all nodes are running centOS. > > > > The conf file I am using is also attached. I'm using swift-0.94.1. > > > > Many thanks, > > Matthew > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > *From:* David Kelly [mailto:davidkelly at uchicago.edu] > *Sent:* Thursday, February 13, 2014 12:53 PM > *To:* Matthew Shaxted > *Cc:* swift-user at ci.uchicago.edu > *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC > > > > Hi Matthew, > > > > Could you please explain a little more about how you're starting the > coaster-service and workers? Are you using the start-coaster-service > script? If you are, could you please send the coaster-service.conf file > you're using? Which version of Swift is this? > > > > There may be some things we can do to speed up the process - just need to > get a better understanding of how things are set up and where the delays > are coming from. Thanks! > > > > Regards, > > David > > > > On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: > > Dear Swift User Group: > > > > I working with a 60-node cluster running on OSDC now, and when I try to > start the Swift coaster-service for these nodes, it takes about 30 seconds > (or more) per node to successfully start. > > > > This is an issue for me as it limits how often I want to shut down the > coaster-service - for this 60 node cluster it could take up to 30 min to > start up again. > > > > Is this starting coaster behavior normal? Is there anything I can do to > make the coaster-service start faster? > > > > Thanks, > Matthew > > > > > > MATTHEW SHAXTED > > > > SKIDMORE, OWINGS & MERRILL LLP > > 224 South Michigan Ave. > > Chicago, IL 60604 > > TEL: 312.360.4368 > > FAX: 312.360.4545 > > matthew.shaxted at som.com > > > > [image: cid:image9d6458.png at 2965c709.c87949ac] > > WWW.SOM.COM > > > > The information contained in this communication may be confidential, is > intended only for the use of the recipient(s) named above, and may be > legally privileged. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution, or > copying of this communication, or any of its contents, is strictly > prohibited and may be unlawful. If you have received this communication in > error, please return it to the sender immediately and delete the original > message and any copy of it from your computer system. If you have any > questions concerning this message, please contact the sender. > > > > [image: cid:image93798a.gif at f078f826.ddd94773] > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: not available URL: From ketan at mcs.anl.gov Mon Feb 17 21:05:34 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 17 Feb 2014 21:05:34 -0600 Subject: [Swift-user] _swiftwrap command not found Message-ID: Hi, Running Swift on my mac I get the following error: Host: localhost Directory: c-ray-20140217-2054-ux0ohil4/jobs/2/cray-2rvnelml stderr.txt: /tmp/swift.work/c-ray-20140217-2054-ux0ohil4/shared/_swiftwrap: line 518: : command not found stdout.txt: Caused by: Application /Users/ketan/c-rayapp/c-ray-1.1/c-ray-mt failed with an exit code of 127 render, c-ray.swift, line 43 Looking into the line 518 of _swiftwrap: "$TIMECMD" "${TIMEARGS[@]}" "$EXEC" "${CMDARGS[@]}" 1>>"$STDOUT" 2>>"$STDERR" <"$STDIN" And a few lines prior: if [[ "$OSTYPE" == *darwin* ]]; then TIMECMD= TIMEARGS= Possibly these lines are causing the error, wondering if this is tested. Attached is a compressed runlog. Any suggestions? Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: c-ray-20140217-2059-3tqqjov4.log.tgz Type: application/x-gzip Size: 117742 bytes Desc: not available URL: From davidkelly at uchicago.edu Mon Feb 17 21:20:27 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Mon, 17 Feb 2014 21:20:27 -0600 Subject: [Swift-user] _swiftwrap command not found In-Reply-To: References: Message-ID: Hey Ketan, This was a bug related to adding the time command to _swiftwrap that caused failures on some versions of osx. It should be fixed in 0.95. On Mon, Feb 17, 2014 at 9:05 PM, Ketan Maheshwari wrote: > Hi, > > Running Swift on my mac I get the following error: > > Host: localhost > Directory: c-ray-20140217-2054-ux0ohil4/jobs/2/cray-2rvnelml > stderr.txt: > /tmp/swift.work/c-ray-20140217-2054-ux0ohil4/shared/_swiftwrap: line 518: : > command not found > stdout.txt: > Caused by: > Application /Users/ketan/c-rayapp/c-ray-1.1/c-ray-mt failed with an exit > code of 127 > render, c-ray.swift, line 43 > > Looking into the line 518 of _swiftwrap: > > "$TIMECMD" "${TIMEARGS[@]}" "$EXEC" "${CMDARGS[@]}" 1>>"$STDOUT" > 2>>"$STDERR" <"$STDIN" > > And a few lines prior: > if [[ "$OSTYPE" == *darwin* ]]; then > TIMECMD= > TIMEARGS= > > Possibly these lines are causing the error, wondering if this is tested. > > Attached is a compressed runlog. > > Any suggestions? > > Thanks, > Ketan > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Mon Feb 17 22:36:38 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 17 Feb 2014 22:36:38 -0600 Subject: [Swift-user] _swiftwrap command not found In-Reply-To: References: Message-ID: Thanks David! Picked some bits from 0.95 _swiftwrap and it worked. On Mon, Feb 17, 2014 at 9:20 PM, David Kelly wrote: > Hey Ketan, > > This was a bug related to adding the time command to _swiftwrap that > caused failures on some versions of osx. It should be fixed in 0.95. > > > On Mon, Feb 17, 2014 at 9:05 PM, Ketan Maheshwari wrote: > >> Hi, >> >> Running Swift on my mac I get the following error: >> >> Host: localhost >> Directory: c-ray-20140217-2054-ux0ohil4/jobs/2/cray-2rvnelml >> stderr.txt: >> /tmp/swift.work/c-ray-20140217-2054-ux0ohil4/shared/_swiftwrap: line 518: : >> command not found >> stdout.txt: >> Caused by: >> Application /Users/ketan/c-rayapp/c-ray-1.1/c-ray-mt failed with an exit >> code of 127 >> render, c-ray.swift, line 43 >> >> Looking into the line 518 of _swiftwrap: >> >> "$TIMECMD" "${TIMEARGS[@]}" "$EXEC" "${CMDARGS[@]}" >> 1>>"$STDOUT" 2>>"$STDERR" <"$STDIN" >> >> And a few lines prior: >> if [[ "$OSTYPE" == *darwin* ]]; then >> TIMECMD= >> TIMEARGS= >> >> Possibly these lines are causing the error, wondering if this is tested. >> >> Attached is a compressed runlog. >> >> Any suggestions? >> >> Thanks, >> Ketan >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Matthew.Shaxted at som.com Wed Feb 19 12:37:37 2014 From: Matthew.Shaxted at som.com (Matthew Shaxted) Date: Wed, 19 Feb 2014 13:37:37 -0500 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: References: <85C85E44DD880E498CEA5A501B27954BEA78ED@DITKA.anl.gov> Message-ID: Hi David, I tried out the cluster and it does seem to make the start-coaster-service much faster - however, I am getting VM heap space issues when starting swift on the configuration I am using so it is making it difficult to use the cluster head node. Also, I have many of my workflows setup to read from the glusterfs so I may just stick with the slow coaster start for now. The bigger issue is that even when these start, they do not remain persistent for some reason (could be my localhost coasters). Do I need to explicitly express my localhost coasters as persistent in the sites.xml file? I have been getting strange issues when I try to scale my runs (10-12 hrs each) on the glusterfs configuration - they seem to be failing midway through for some reason although they work with smaller runs, so I'm going to put sometime into understanding why this is now. Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2D6E.DB47EC10] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2D6E.DB47EC10] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 7:21 PM To: Matthew Shaxted Cc: Wilde, Michael J.; swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Sounds good, Matthew. Let me know how that works for you. I filed a ticket with osdc support about the SSH issue, so hopefully they can offer some help there. I added an entry to our site guide documentation about how to run on OSDC in cluster mode. It is at http://swiftlang.org/guides/release-0.94/siteguide/siteguide.html#_open_science_data_grid. The only potential issue is, it seems to only work with the standard Ubuntu images. On Mon, Feb 17, 2014 at 3:34 PM, Matthew Shaxted > wrote: Hi David, Indeed I am running start-coaster-service from the head node. The first recommendation is a good one, I will try this out and let you know how it works. I am also very interested in the cluster launch/PBS scheduler approach, although I have never used it before. An example OSDC/PBS config would be really helpful. Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2D6E.DB47EC10] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2D6E.DB47EC10] From: Wilde, Michael J. [mailto:wilde at anl.gov] Sent: Monday, February 17, 2014 2:36 PM To: David Kelly; Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: RE: [Swift-user] Coaster Service Startup Time on OSDC Good find, David. Did you file a ticket on the slowness with OSDC Support? When you run ssh -vvv, does the timing of log output suggest where the problem is? Can you run into some tool like typescript, or a "screen" log, that will timestamp the records, and send those to OSDC Support? - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago ________________________________ From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 2:14 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, I set up a test on OSDC with 10 nodes. I did notice something strange there. When I try to SSH from the Sullivan head node to one of my CentOS instances, it takes much longer than it should: dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts. anaconda-ks.cfg install.log install.log.syslog real 0m25.197s Are you running start-coaster-service from the Sullivan head node? For each node in your node list, there are three SSH commands (one to create a directory structure, one to scp the worker.pl script there, and one to launch worker.pl). This is done serially in the 0.94 branch. I can see how this would take very long. I will make some changes to speed up start-coaster-service, but in the meantime, here are a few suggestions: 1. The SSH slowness seems to only be from the Sullivan head node to the VMs. SSH connections from one VM to another VM is pretty quick. Are you able to run Swift and start-coaster-service on a VM? 2. Is a persistent coasters setup needed here? OSDC has an option to launch instances as a cluster, which makes available a PBS scheduler. You could set this up and avoid the need to start and manage workers yourself. Let me know what you think. I have some example OSDC/PBS configs if you decide to go that route. Thanks, David On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted > wrote: Sure thing David, I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt file (see attached) - all nodes are running centOS. The conf file I am using is also attached. I'm using swift-0.94.1. Many thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2D6E.DB47EC10] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2D6E.DB47EC10] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Thursday, February 13, 2014 12:53 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this? There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks! Regards, David On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: Dear Swift User Group: I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start. This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again. Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster? Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2D6E.DB47EC10] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2D6E.DB47EC10] _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: image002.png URL: From Matthew.Shaxted at som.com Wed Feb 19 12:50:47 2014 From: Matthew.Shaxted at som.com (Matthew Shaxted) Date: Wed, 19 Feb 2014 13:50:47 -0500 Subject: [Swift-user] Coaster Service Startup Time on OSDC References: <85C85E44DD880E498CEA5A501B27954BEA78ED@DITKA.anl.gov> Message-ID: I suppose the VM heap space issue on the cluster head node is due to 32bit ubuntu OS... From: Matthew Shaxted Sent: Wednesday, February 19, 2014 12:38 PM To: 'David Kelly' Cc: Wilde, Michael J.; swift-user at ci.uchicago.edu Subject: RE: [Swift-user] Coaster Service Startup Time on OSDC Hi David, I tried out the cluster and it does seem to make the start-coaster-service much faster - however, I am getting VM heap space issues when starting swift on the configuration I am using so it is making it difficult to use the cluster head node. Also, I have many of my workflows setup to read from the glusterfs so I may just stick with the slow coaster start for now. The bigger issue is that even when these start, they do not remain persistent for some reason (could be my localhost coasters). Do I need to explicitly express my localhost coasters as persistent in the sites.xml file? I have been getting strange issues when I try to scale my runs (10-12 hrs each) on the glusterfs configuration - they seem to be failing midway through for some reason although they work with smaller runs, so I'm going to put sometime into understanding why this is now. Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2D71.35CF0DB0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2D71.35CF0DB0] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 7:21 PM To: Matthew Shaxted Cc: Wilde, Michael J.; swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Sounds good, Matthew. Let me know how that works for you. I filed a ticket with osdc support about the SSH issue, so hopefully they can offer some help there. I added an entry to our site guide documentation about how to run on OSDC in cluster mode. It is at http://swiftlang.org/guides/release-0.94/siteguide/siteguide.html#_open_science_data_grid. The only potential issue is, it seems to only work with the standard Ubuntu images. On Mon, Feb 17, 2014 at 3:34 PM, Matthew Shaxted > wrote: Hi David, Indeed I am running start-coaster-service from the head node. The first recommendation is a good one, I will try this out and let you know how it works. I am also very interested in the cluster launch/PBS scheduler approach, although I have never used it before. An example OSDC/PBS config would be really helpful. Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2D71.35CF0DB0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2D71.35CF0DB0] From: Wilde, Michael J. [mailto:wilde at anl.gov] Sent: Monday, February 17, 2014 2:36 PM To: David Kelly; Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: RE: [Swift-user] Coaster Service Startup Time on OSDC Good find, David. Did you file a ticket on the slowness with OSDC Support? When you run ssh -vvv, does the timing of log output suggest where the problem is? Can you run into some tool like typescript, or a "screen" log, that will timestamp the records, and send those to OSDC Support? - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago ________________________________ From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 2:14 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, I set up a test on OSDC with 10 nodes. I did notice something strange there. When I try to SSH from the Sullivan head node to one of my CentOS instances, it takes much longer than it should: dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts. anaconda-ks.cfg install.log install.log.syslog real 0m25.197s Are you running start-coaster-service from the Sullivan head node? For each node in your node list, there are three SSH commands (one to create a directory structure, one to scp the worker.pl script there, and one to launch worker.pl). This is done serially in the 0.94 branch. I can see how this would take very long. I will make some changes to speed up start-coaster-service, but in the meantime, here are a few suggestions: 1. The SSH slowness seems to only be from the Sullivan head node to the VMs. SSH connections from one VM to another VM is pretty quick. Are you able to run Swift and start-coaster-service on a VM? 2. Is a persistent coasters setup needed here? OSDC has an option to launch instances as a cluster, which makes available a PBS scheduler. You could set this up and avoid the need to start and manage workers yourself. Let me know what you think. I have some example OSDC/PBS configs if you decide to go that route. Thanks, David On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted > wrote: Sure thing David, I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt file (see attached) - all nodes are running centOS. The conf file I am using is also attached. I'm using swift-0.94.1. Many thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2D71.35CF0DB0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2D71.35CF0DB0] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Thursday, February 13, 2014 12:53 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this? There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks! Regards, David On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: Dear Swift User Group: I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start. This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again. Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster? Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image001.png at 01CF2D71.35CF0DB0] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image002.png at 01CF2D71.35CF0DB0] _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: image002.png URL: From bronevetsky1 at llnl.gov Thu Feb 27 18:31:40 2014 From: bronevetsky1 at llnl.gov (Bronevetsky, Greg) Date: Fri, 28 Feb 2014 00:31:40 +0000 Subject: [Swift-user] array append Message-ID: <8635C0D1735D2C4BA6E571FD97486FF172E6E868@PRDEXMBX-05.the-lab.llnl.gov> I'm trying to use the array append function in Swift 0.94.1, based on Section 2.3 of the user guide (http://swift-lang.org/guides/release-0.94/userguide/userguide.html) Given the following program: int[auto] array; foreach i in [1:100] { append(array, i * 2); } I run swift on it and get this following error: > swift test.swift Could not start execution Compile error in foreach statement at line 3 Compile error in procedure invocation at line 4 Procedure append is not declared. How should I be using the append function to avoid this error? Thank you! Greg Bronevetsky Lawrence Livermore National Lab (925) 424-5756 bronevetsky at llnl.gov http://greg.bronevetsky.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri Feb 28 08:10:12 2014 From: wilde at mcs.anl.gov (Wilde, Michael J.) Date: Fri, 28 Feb 2014 14:10:12 +0000 Subject: [Swift-user] array append Message-ID: Greg, thanks for reporting this. We can reproduce it and have created a bug ticket: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1207 It seems that this feature was not in the test suite. Will post when we have a fix. - Mike From: , Greg > Date: Thursday, February 27, 2014 6:31 PM To: Swift User > Subject: [Swift-user] array append I?m trying to use the array append function in Swift 0.94.1, based on Section 2.3 of the user guide (http://swift-lang.org/guides/release-0.94/userguide/userguide.html) Given the following program: int[auto] array; foreach i in [1:100] { append(array, i * 2); } I run swift on it and get this following error: > swift test.swift Could not start execution Compile error in foreach statement at line 3 Compile error in procedure invocation at line 4 Procedure append is not declared. How should I be using the append function to avoid this error? Thank you! Greg Bronevetsky Lawrence Livermore National Lab (925) 424-5756 bronevetsky at llnl.gov http://greg.bronevetsky.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bronevetsky1 at llnl.gov Wed Feb 26 12:38:07 2014 From: bronevetsky1 at llnl.gov (Bronevetsky, Greg) Date: Wed, 26 Feb 2014 18:38:07 +0000 Subject: [Swift-user] Swift on MOAB Message-ID: <8635C0D1735D2C4BA6E571FD97486FF172E6CD33@PRDEXMBX-05.the-lab.llnl.gov> I'm running on clusters that use MOAB as the main scheduler, with SLURM on each cluster. Jobs need to be submitted to MOAB. Is there a way for Swift to use MOAB as the scheduler? Thanks! Greg Bronevetsky Lawrence Livermore National Lab (925) 424-5756 bronevetsky at llnl.gov http://greg.bronevetsky.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Fri Feb 28 16:19:49 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Fri, 28 Feb 2014 16:19:49 -0600 Subject: [Swift-user] array append In-Reply-To: References: Message-ID: Hi Greg, Array appends are done using the append operator <<. I'll correct the userguide to use the right syntax. The following corrected snippet should work for you : int[auto] array; foreach i in [1:100] { array << (i*2); } Thanks for reporting the bug. -Yadu On Fri, Feb 28, 2014 at 8:10 AM, Wilde, Michael J. wrote: > Greg, thanks for reporting this. > > We can reproduce it and have created a bug ticket: > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1207 > > It seems that this feature was not in the test suite. Will post when we > have a fix. > > - Mike > > > From: , Greg > Date: Thursday, February 27, 2014 6:31 PM > To: Swift User > Subject: [Swift-user] array append > > I?m trying to use the array append function in Swift 0.94.1, based on > Section 2.3 of the user guide ( > http://swift-lang.org/guides/release-0.94/userguide/userguide.html) > > > > Given the following program: > > int[auto] array; > > > > foreach i in [1:100] { > > append(array, i * 2); > > } > > > > I run swift on it and get this following error: > > > swift test.swift > > Could not start execution > > Compile error in foreach statement at line 3 > > Compile error in procedure invocation at line 4 > > Procedure append is not declared. > > > > How should I be using the append function to avoid this error? Thank you! > > > > Greg Bronevetsky > > Lawrence Livermore National Lab > > (925) 424-5756 > > bronevetsky at llnl.gov > > http://greg.bronevetsky.com > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Fri Feb 28 19:34:41 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Fri, 28 Feb 2014 19:34:41 -0600 Subject: [Swift-user] Swift on MOAB In-Reply-To: <8635C0D1735D2C4BA6E571FD97486FF172E6CD33@PRDEXMBX-05.the-lab.llnl.gov> References: <8635C0D1735D2C4BA6E571FD97486FF172E6CD33@PRDEXMBX-05.the-lab.llnl.gov> Message-ID: Hi Greg, I went through this page : https://computing.llnl.gov/tutorials/moab/ which mentions that PBS submit scripts are compatible with Moab. So the Swift PBS provider should work for you. If there are cluster specific details that differ, we could patch the existing PBS provider to work with your cluster. Thanks, Yadu On Wed, Feb 26, 2014 at 12:38 PM, Bronevetsky, Greg wrote: > I?m running on clusters that use MOAB as the main scheduler, with SLURM > on each cluster. Jobs need to be submitted to MOAB. Is there a way for > Swift to use MOAB as the scheduler? Thanks! > > > > Greg Bronevetsky > > Lawrence Livermore National Lab > > (925) 424-5756 > > bronevetsky at llnl.gov > > http://greg.bronevetsky.com > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Mon Feb 17 14:36:12 2014 From: wilde at anl.gov (Wilde, Michael J.) Date: Mon, 17 Feb 2014 20:36:12 -0000 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: References: , Message-ID: <85C85E44DD880E498CEA5A501B27954BEA78ED@DITKA.anl.gov> Good find, David. Did you file a ticket on the slowness with OSDC Support? When you run ssh -vvv, does the timing of log output suggest where the problem is? Can you run into some tool like typescript, or a "screen" log, that will timestamp the records, and send those to OSDC Support? - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago ________________________________ From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 2:14 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, I set up a test on OSDC with 10 nodes. I did notice something strange there. When I try to SSH from the Sullivan head node to one of my CentOS instances, it takes much longer than it should: dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts. anaconda-ks.cfg install.log install.log.syslog real 0m25.197s Are you running start-coaster-service from the Sullivan head node? For each node in your node list, there are three SSH commands (one to create a directory structure, one to scp the worker.pl script there, and one to launch worker.pl). This is done serially in the 0.94 branch. I can see how this would take very long. I will make some changes to speed up start-coaster-service, but in the meantime, here are a few suggestions: 1. The SSH slowness seems to only be from the Sullivan head node to the VMs. SSH connections from one VM to another VM is pretty quick. Are you able to run Swift and start-coaster-service on a VM? 2. Is a persistent coasters setup needed here? OSDC has an option to launch instances as a cluster, which makes available a PBS scheduler. You could set this up and avoid the need to start and manage workers yourself. Let me know what you think. I have some example OSDC/PBS configs if you decide to go that route. Thanks, David On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted > wrote: Sure thing David, I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt file (see attached) ? all nodes are running centOS. The conf file I am using is also attached. I?m using swift-0.94.1. Many thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image9d6458.png at 2965c709.c87949ac] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image93798a.gif at f078f826.ddd94773] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Thursday, February 13, 2014 12:53 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this? There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks! Regards, David On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: Dear Swift User Group: I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start. This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again. Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster? Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image9d6458.png at 2965c709.c87949ac] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image93798a.gif at f078f826.ddd94773] _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: image001.png URL: From wilde at anl.gov Mon Feb 17 20:09:00 2014 From: wilde at anl.gov (Wilde, Michael J.) Date: Tue, 18 Feb 2014 02:09:00 -0000 Subject: [Swift-user] Coaster Service Startup Time on OSDC In-Reply-To: References: <85C85E44DD880E498CEA5A501B27954BEA78ED@DITKA.anl.gov> , Message-ID: <85C85E44DD880E498CEA5A501B27954BEA7B88@DITKA.anl.gov> Maybe we can set up an ssh master-channel to each node once, and then it will go faster? Maybe replace the node image with a different OS? - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago ________________________________ From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of Matthew Shaxted [Matthew.Shaxted at som.com] Sent: Monday, February 17, 2014 7:54 PM To: David Kelly Cc: Wilde, Michael J.; swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Thanks David, I?m going to test the pbs sites.xml approach and see how it works. Indeed the time to start coasters is really quite irritating, especially since my coasters are now not staying in a persistent state. I have a 12 hr job running on all my OSDC cores now (actually just requested more) so I will test it after this finishes. Now that I?m looking at my below sites.xml file though, it seems only my localhost pool is not ?coaster-persistent? so this is most likely what is causing the issue. On a slightly different note sites.xml file note, my start-coaster-service is rewriting the sites.xml file each time. Is there a simple way to prevent this from happening, or it does need to happen, can I at least add my localhost pool into the file automatically. My current OSDC sites file looks like below and I have been manually adding in the localhost pool everytime I restart the coasters: passive 1 1000 10000 /tmp/mshaxted/swiftwork .23 10000 /tmp/mshaxted/swiftwork MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image9d6458.png at 2965c709.c87949ac] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image93798a.gif at f078f826.ddd94773] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 7:21 PM To: Matthew Shaxted Cc: Wilde, Michael J.; swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Sounds good, Matthew. Let me know how that works for you. I filed a ticket with osdc support about the SSH issue, so hopefully they can offer some help there. I added an entry to our site guide documentation about how to run on OSDC in cluster mode. It is at http://swiftlang.org/guides/release-0.94/siteguide/siteguide.html#_open_science_data_grid. The only potential issue is, it seems to only work with the standard Ubuntu images. On Mon, Feb 17, 2014 at 3:34 PM, Matthew Shaxted > wrote: Hi David, Indeed I am running start-coaster-service from the head node. The first recommendation is a good one, I will try this out and let you know how it works. I am also very interested in the cluster launch/PBS scheduler approach, although I have never used it before. An example OSDC/PBS config would be really helpful. Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image9d6458.png at 2965c709.c87949ac] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image93798a.gif at f078f826.ddd94773] From: Wilde, Michael J. [mailto:wilde at anl.gov] Sent: Monday, February 17, 2014 2:36 PM To: David Kelly; Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: RE: [Swift-user] Coaster Service Startup Time on OSDC Good find, David. Did you file a ticket on the slowness with OSDC Support? When you run ssh -vvv, does the timing of log output suggest where the problem is? Can you run into some tool like typescript, or a "screen" log, that will timestamp the records, and send those to OSDC Support? - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago ________________________________ From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [davidkelly at uchicago.edu] Sent: Monday, February 17, 2014 2:14 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, I set up a test on OSDC with 10 nodes. I did notice something strange there. When I try to SSH from the Sullivan head node to one of my CentOS instances, it takes much longer than it should: dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts. anaconda-ks.cfg install.log install.log.syslog real 0m25.197s Are you running start-coaster-service from the Sullivan head node? For each node in your node list, there are three SSH commands (one to create a directory structure, one to scp the worker.pl script there, and one to launch worker.pl). This is done serially in the 0.94 branch. I can see how this would take very long. I will make some changes to speed up start-coaster-service, but in the meantime, here are a few suggestions: 1. The SSH slowness seems to only be from the Sullivan head node to the VMs. SSH connections from one VM to another VM is pretty quick. Are you able to run Swift and start-coaster-service on a VM? 2. Is a persistent coasters setup needed here? OSDC has an option to launch instances as a cluster, which makes available a PBS scheduler. You could set this up and avoid the need to start and manage workers yourself. Let me know what you think. I have some example OSDC/PBS configs if you decide to go that route. Thanks, David On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted > wrote: Sure thing David, I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt file (see attached) ? all nodes are running centOS. The conf file I am using is also attached. I?m using swift-0.94.1. Many thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image9d6458.png at 2965c709.c87949ac] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image93798a.gif at f078f826.ddd94773] From: David Kelly [mailto:davidkelly at uchicago.edu] Sent: Thursday, February 13, 2014 12:53 PM To: Matthew Shaxted Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC Hi Matthew, Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this? There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks! Regards, David On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted > wrote: Dear Swift User Group: I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start. This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again. Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster? Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 South Michigan Ave. Chicago, IL 60604 TEL: 312.360.4368 FAX: 312.360.4545 matthew.shaxted at som.com [cid:image9d6458.png at 2965c709.c87949ac] WWW.SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image93798a.gif at f078f826.ddd94773] _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 6643 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 3047 bytes Desc: image002.png URL: From anamarija at uchicago.edu Thu Feb 6 14:21:52 2014 From: anamarija at uchicago.edu (Ana Marija Sokovic) Date: Thu, 06 Feb 2014 20:21:52 -0000 Subject: [Swift-user] Fwd: indicator that the swift job failed References: <21C752B8-C6C4-478F-9FC1-B9A4F2444328@uchicago.edu> Message-ID: <66827251-3FBF-431D-BB9B-8DA9811B327A@uchicago.edu> > > Hi, > > I am making a parser for log files generated by Swift 0.94. Can you please tell me what would be the indicator that the job failed. for example here: > > 2013-10-25 04:25:16,114+0000 INFO swift FAILURE jobid=sh-fn5ho5hl - Failure file found > 2013-10-25 04:25:16,120+0000 INFO swift END_FAILURE thread=0-1-44-2 tr=sh > 2013-10-25 04:25:16,179+0000 INFO swift FAILURE jobid=sh-in5ho5hl - Failure file found > 2013-10-25 04:25:16,186+0000 INFO swift END_FAILURE thread=0-1-93-2 tr=sh > > is it for my purpose more relevant END_FAILURE or FAILURE and can you please explain me the difference between these two. > > Best, > Ana Marija Sokovic