From wozniak at mcs.anl.gov Tue Mar 1 10:38:04 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 1 Mar 2011 10:38:04 -0600 (CST) Subject: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: <1201387603.107616.1298910243273.JavaMail.root@zimbra.anl.gov> References: <1201387603.107616.1298910243273.JavaMail.root@zimbra.anl.gov> Message-ID: On Mon, 28 Feb 2011, Michael Wilde wrote: > David wrote: > >> >> $ google docs get --title "Userguide 0.92" >> >> >> http://code.google.com/p/googlecl > > This would be a big plus for making Google sites/docs usable as our > document editor, without loosing the ability to do proper svn > management. If this works, great, but I'm currently thinking that svn versioning for old manuals is not as important as having a good up-to-date manual. My train of thought is that if we are going to edit XML actually should edit HTML+CSS. Content management is the next logical step from there. I've taken a look at the ExM Google site and done some editing for the JETS manual. Pasting formatted content into the site does not work cleanly (on Firefox/Windows), and the controls report the wrong formatting (font sizes, etc.). Versioning seems limited to going back in time for the given document. Even so, I think we could get the current manual in there and looking decent in two or three days' work. I'm curious what a completed high-quality publically available site might look like, does anyone know of one? And if Google Docs is any indication, the service will probably improve over time. I strongly feel we should make a firm decision at the concall this week and go from there. Justin -- Justin M Wozniak From hategan at mcs.anl.gov Tue Mar 1 11:19:08 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 01 Mar 2011 09:19:08 -0800 Subject: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: References: <1201387603.107616.1298910243273.JavaMail.root@zimbra.anl.gov> Message-ID: <1298999948.22040.6.camel@blabla2.none> For what it's worth, I think that if folks don't want docbook (which I'd more inclined to drop if there was any other reasonable equivalent system that produces good html output), then html in SVN still mostly meets our needs. The only problem I see down the road is that there will be as many ways to write a table as there are editiing sessions. So we would be shifting complexity from learning docbook to managing large html. But that's not necessarily true. Mihael On Tue, 2011-03-01 at 10:38 -0600, Justin M Wozniak wrote: > On Mon, 28 Feb 2011, Michael Wilde wrote: > > > David wrote: > > > >> > >> $ google docs get --title "Userguide 0.92" > >> > >> > >> http://code.google.com/p/googlecl > > > > This would be a big plus for making Google sites/docs usable as our > > document editor, without loosing the ability to do proper svn > > management. > > If this works, great, but I'm currently thinking that svn versioning for > old manuals is not as important as having a good up-to-date manual. My > train of thought is that if we are going to edit XML actually should edit > HTML+CSS. Content management is the next logical step from there. > > I've taken a look at the ExM Google site and done some editing for the > JETS manual. Pasting formatted content into the site does not work > cleanly (on Firefox/Windows), and the controls report the wrong formatting > (font sizes, etc.). Versioning seems limited to going back in time for > the given document. Even so, I think we could get the current manual in > there and looking decent in two or three days' work. I'm curious what a > completed high-quality publically available site might look like, does > anyone know of one? And if Google Docs is any indication, the service > will probably improve over time. > > I strongly feel we should make a firm decision at the concall this week > and go from there. > > Justin > From wilde at mcs.anl.gov Tue Mar 1 11:54:37 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 Mar 2011 11:54:37 -0600 (CST) Subject: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: <1298999948.22040.6.camel@blabla2.none> Message-ID: <1149171708.113482.1299002077848.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > For what it's worth, I think that if folks don't want docbook (which > I'd > more inclined to drop if there was any other reasonable equivalent > system that produces good html output), Did you look at the reStructured text and Sphinx alternative I described, which is used by Python and many other projects? - Mike > then html in SVN still mostly > meets our needs. The only problem I see down the road is that there > will > be as many ways to write a table as there are editiing sessions. So we > would be shifting complexity from learning docbook to managing large > html. But that's not necessarily true. > > Mihael > > On Tue, 2011-03-01 at 10:38 -0600, Justin M Wozniak wrote: > > On Mon, 28 Feb 2011, Michael Wilde wrote: > > > > > David wrote: > > > > > >> > > >> $ google docs get --title "Userguide 0.92" > > >> > > >> > > >> http://code.google.com/p/googlecl > > > > > > This would be a big plus for making Google sites/docs usable as > > > our > > > document editor, without loosing the ability to do proper svn > > > management. > > > > If this works, great, but I'm currently thinking that svn versioning > > for > > old manuals is not as important as having a good up-to-date manual. > > My > > train of thought is that if we are going to edit XML actually should > > edit > > HTML+CSS. Content management is the next logical step from there. > > > > I've taken a look at the ExM Google site and done some editing for > > the > > JETS manual. Pasting formatted content into the site does not work > > cleanly (on Firefox/Windows), and the controls report the wrong > > formatting > > (font sizes, etc.). Versioning seems limited to going back in time > > for > > the given document. Even so, I think we could get the current manual > > in > > there and looking decent in two or three days' work. I'm curious > > what a > > completed high-quality publically available site might look like, > > does > > anyone know of one? And if Google Docs is any indication, the > > service > > will probably improve over time. > > > > I strongly feel we should make a firm decision at the concall this > > week > > and go from there. > > > > Justin > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Tue Mar 1 12:09:10 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 01 Mar 2011 10:09:10 -0800 Subject: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: <1149171708.113482.1299002077848.JavaMail.root@zimbra.anl.gov> References: <1149171708.113482.1299002077848.JavaMail.root@zimbra.anl.gov> Message-ID: <1299002950.23068.14.camel@blabla2.none> On Tue, 2011-03-01 at 11:54 -0600, Michael Wilde wrote: > > ----- Original Message ----- > > For what it's worth, I think that if folks don't want docbook (which > > I'd > > more inclined to drop if there was any other reasonable equivalent > > system that produces good html output), > > Did you look at the reStructured text and Sphinx alternative I described, which is used by Python and many other projects? I believe that expressing tables like this: +--------------+----------+-----------+-----------+ | row 1, col 1 | column 2 | column 3 | column 4 | +--------------+----------+-----------+-----------+ | row 2 | Use the command ``ls | more``. | +--------------+----------+-----------+-----------+ | row 3 | | | | +--------------+----------+-----------+-----------+ ... is silly. You are drawing tables in text. Imagine the amount of work required to add or remove a column. The reason one would use this is if it would be desirable for the documentation to look presentable in a plain text source file. From wilde at mcs.anl.gov Tue Mar 1 18:00:55 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 Mar 2011 18:00:55 -0600 (CST) Subject: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: <1299002950.23068.14.camel@blabla2.none> Message-ID: <1516053419.116000.1299024055345.JavaMail.root@zimbra.anl.gov> Yeah, that looks tedious. But there is also this form which works for most layouts: Simple table: ===== ===== ====== Inputs Output ------------ ------ A B A or B ===== ===== ====== False False False True False True False True True True True True ===== ===== ====== I dont think tables will make or break it for us. I like the simple style of most of the other text constructs. - Mike ----- Original Message ----- > On Tue, 2011-03-01 at 11:54 -0600, Michael Wilde wrote: > > > > ----- Original Message ----- > > > For what it's worth, I think that if folks don't want docbook > > > (which > > > I'd > > > more inclined to drop if there was any other reasonable equivalent > > > system that produces good html output), > > > > Did you look at the reStructured text and Sphinx alternative I > > described, which is used by Python and many other projects? > > I believe that expressing tables like this: > +--------------+----------+-----------+-----------+ > | row 1, col 1 | column 2 | column 3 | column 4 | > +--------------+----------+-----------+-----------+ > | row 2 | Use the command ``ls | more``. | > +--------------+----------+-----------+-----------+ > | row 3 | | | | > +--------------+----------+-----------+-----------+ > > ... is silly. You are drawing tables in text. Imagine the amount of > work > required to add or remove a column. > > The reason one would use this is if it would be desirable for the > documentation to look presentable in a plain text source file. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Tue Mar 1 21:34:36 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 Mar 2011 21:34:36 -0600 (CST) Subject: asciidoc: Re: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: <1516053419.116000.1299024055345.JavaMail.root@zimbra.anl.gov> Message-ID: <1916319493.116314.1299036876182.JavaMail.root@zimbra.anl.gov> And, in defense of docbook, there is "asciidoc" - like reStructured text but for docbook: http://www.methods.co.nz/asciidoc/index.html and http://kaczanowscy.pl/tomek/2010-09/a-perfect-environment-for-docbook The asciidoc user guide, presumably done using asciidoc, has some resemblance to our current User Guide: http://www.methods.co.nz/asciidoc/userguide.html This might let us make a smoother evolution from our current docbook markup to a kinder gentler markup. Justin, I am with you on deciding this soon. We may need some more experiments to do that. - Mike ----- Original Message ----- > Yeah, that looks tedious. But there is also this form which works for > most layouts: > > Simple table: > > ===== ===== ====== > Inputs Output > ------------ ------ > A B A or B > ===== ===== ====== > False False False > True False True > False True True > True True True > ===== ===== ====== > > I dont think tables will make or break it for us. > > I like the simple style of most of the other text constructs. > > - Mike > > ----- Original Message ----- > > On Tue, 2011-03-01 at 11:54 -0600, Michael Wilde wrote: > > > > > > ----- Original Message ----- > > > > For what it's worth, I think that if folks don't want docbook > > > > (which > > > > I'd > > > > more inclined to drop if there was any other reasonable > > > > equivalent > > > > system that produces good html output), > > > > > > Did you look at the reStructured text and Sphinx alternative I > > > described, which is used by Python and many other projects? > > > > I believe that expressing tables like this: > > +--------------+----------+-----------+-----------+ > > | row 1, col 1 | column 2 | column 3 | column 4 | > > +--------------+----------+-----------+-----------+ > > | row 2 | Use the command ``ls | more``. | > > +--------------+----------+-----------+-----------+ > > | row 3 | | | | > > +--------------+----------+-----------+-----------+ > > > > ... is silly. You are drawing tables in text. Imagine the amount of > > work > > required to add or remove a column. > > > > The reason one would use this is if it would be desirable for the > > documentation to look presentable in a plain text source file. > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From iraicu at cs.iit.edu Wed Mar 2 10:47:17 2011 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Wed, 02 Mar 2011 10:47:17 -0600 Subject: [Swift-devel] Call for Extended Abstracts: Cloud Computing and its Applications (CCA) 2011 Message-ID: <4D6E7495.5000105@cs.iit.edu> --------------------------------------------------------------------------------- *** Call for Extended Abstracts *** The 3rd Workshop on Cloud Computing and its Applications (CCA) 2011 In conjunction with GlobusWorld 2011, April 12th-13th, Argonne, Illinois http://www.cca11.org/ --------------------------------------------------------------------------------- Dramatic growth in data and equally rapid decline in the cost of highly integrated clusters has spurred the emergence of the data center as the platform of choice for a growing class of data-intensive applications. To encourage conversations between those developing applications, algorithms, software, and hardware for such "cloud" platforms, we are convening the third workshop on Cloud Computing and Its Applications (CCA11). CCA11 will provide reception and poster session on April 12th (Call for Extended Abstract - due March 15th), as well as a full day of distinguished invited talks on April 13th on cloud computing, data intensive scalable computing, and related topics. CCA11 will be held at Argonne National Laboratory (Bldg. 240 Conference Center) in Argonne Illinois, just 25 miles west of Chicago, Illinois. TENTATIVE PROGRAM --------------------------------------------------------------------------------- The tentative program for CCA11 is (the latest updates can be found at http://cca11.org/agenda/): Tuesday, April 12 5:00pm - 7:00pm Reception and Posters Wednesday, April 13 8:15a - 9:00a Continental Breakfast 9:00a - 10:00a Keynote 10:30a - 12:00p Session 1 - Data Center and Cloud Networking -Data Center Networks -Steven Carter, Cisco Virtual Networks 1:00p - 2:30p Session 2 - Cloud Software -An Introduction to Open Stack -Ian Foster, ANL and Univ. of Chicago, Globus Online -Best CCA11 Poster Talk 3:00p - 4:30p Session 3 - Cloud Applications -Matt Arrott, UCSD, Ocean Observatories Initiative -Alex Szalay, John Hopkins University, Simulation and Large Data -Robert Grossman, University of Chicago, Open Science Data Cloud 4:30p - 5:30p Panel and Discussion EXTENDED ABSTRACT TOPICS --------------------------------------------------------------------------------- * compute and storage cloud architectures and implementations * map-reduce and its generalizations * programming models and tools * novel data-intensive computing applications * data intensive scalable computing * distributed data intensive computing * content distribution systems for large data * data management within and across data centers * models, frameworks and systems for cloud security IMPORTANT DATES --------------------------------------------------------------------------------- Extended Abstract submission: March 15th, 2011 Acceptance notification: March 22nd, 2011 Final extended abstracts due: April 1st, 2011 Workshop date: April 12th-13th, 2011 EXTENDED ABSTRACT SUBMISSION --------------------------------------------------------------------------------- Authors are invited to submit extended abstracts of not more than 2 pages of double column text using single spaced 10 point size on 8.5 x 11 inch pages (including all text, figures, and references); please use the ACM 8.5 x 11 manuscript templates from http://www.acm.org/sigs/publications/proceedings-templates. A 150 word abstract and the final 2 page extended abstract (PDF format) must be submitted online at https://cmt.research.microsoft.com/CCA2011/ before the deadline of March 15th, 2011 at 11:59PM PST. The extended abstracts will be reviewed, and accepted abstracts will be published online at the CCA11 website (http://www.cca11.org/). Notifications of the paper decisions will be sent out by March 22nd, 2011. Submission implies the willingness of at least one of the authors to register and present the abstract in the poster session on April 12, 2011. One extended abstract will be chosen for a 30-minute presentation slot in the final program, which consists of invited leading researchers in Cloud Computing. For more information about the poster session, please visithttp://www.cca11.org/, or send questions to Ioan Raicu (iraicu at cs.iit.edu ). WORKSHOP CHAIRS --------------------------------------------------------------------------------- * Ian Foster, University of Chicago& Argonne National Laboratory * Bob Grossman, University of Chicago -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Wed Mar 2 17:05:14 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 2 Mar 2011 15:05:14 -0800 Subject: asciidoc: Re: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: <1916319493.116314.1299036876182.JavaMail.root@zimbra.anl.gov> References: <1516053419.116000.1299024055345.JavaMail.root@zimbra.anl.gov> <1916319493.116314.1299036876182.JavaMail.root@zimbra.anl.gov> Message-ID: i thought it worth mentioning that google sites has a java tool for exporting your entire site (or subset if you like): http://code.google.com/p/google-sites-liberation/ i just tried it and it worked pretty smoothly...should make it easy to dump into svn i would think... On Tue, Mar 1, 2011 at 7:34 PM, Michael Wilde wrote: > And, in defense of docbook, there is "asciidoc" - like reStructured text > but for docbook: > > http://www.methods.co.nz/asciidoc/index.html > > and > > http://kaczanowscy.pl/tomek/2010-09/a-perfect-environment-for-docbook > > The asciidoc user guide, presumably done using asciidoc, has some > resemblance to our current User Guide: > > http://www.methods.co.nz/asciidoc/userguide.html > > This might let us make a smoother evolution from our current docbook markup > to a kinder gentler markup. > > Justin, I am with you on deciding this soon. We may need some more > experiments to do that. > > - Mike > > > ----- Original Message ----- > > Yeah, that looks tedious. But there is also this form which works for > > most layouts: > > > > Simple table: > > > > ===== ===== ====== > > Inputs Output > > ------------ ------ > > A B A or B > > ===== ===== ====== > > False False False > > True False True > > False True True > > True True True > > ===== ===== ====== > > > > I dont think tables will make or break it for us. > > > > I like the simple style of most of the other text constructs. > > > > - Mike > > > > ----- Original Message ----- > > > On Tue, 2011-03-01 at 11:54 -0600, Michael Wilde wrote: > > > > > > > > ----- Original Message ----- > > > > > For what it's worth, I think that if folks don't want docbook > > > > > (which > > > > > I'd > > > > > more inclined to drop if there was any other reasonable > > > > > equivalent > > > > > system that produces good html output), > > > > > > > > Did you look at the reStructured text and Sphinx alternative I > > > > described, which is used by Python and many other projects? > > > > > > I believe that expressing tables like this: > > > +--------------+----------+-----------+-----------+ > > > | row 1, col 1 | column 2 | column 3 | column 4 | > > > +--------------+----------+-----------+-----------+ > > > | row 2 | Use the command ``ls | more``. | > > > +--------------+----------+-----------+-----------+ > > > | row 3 | | | | > > > +--------------+----------+-----------+-----------+ > > > > > > ... is silly. You are drawing tables in text. Imagine the amount of > > > work > > > required to add or remove a column. > > > > > > The reason one would use this is if it would be desirable for the > > > documentation to look presentable in a plain text source file. > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Wed Mar 2 19:50:27 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 2 Mar 2011 17:50:27 -0800 Subject: [Swift-devel] google sites doc Message-ID: i put the new 'production' site here: https://sites.google.com/site/swiftguide/ i just copied over the existing site which i've renamed the 'development' site: https://sites.google.com/site/swiftparallelscripting/ i believe the proper permissions were inherited on the copy, but let me know if you have any trouble accessing. ~sk -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Wed Mar 2 20:07:41 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 2 Mar 2011 20:07:41 -0600 Subject: asciidoc: Re: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: <1916319493.116314.1299036876182.JavaMail.root@zimbra.anl.gov> References: <1516053419.116000.1299024055345.JavaMail.root@zimbra.anl.gov> <1916319493.116314.1299036876182.JavaMail.root@zimbra.anl.gov> Message-ID: Hello, Check out the following: http://diveintopython3.org I thought they have a beautiful layout and if you see one of the chapters the code and its associated description has a nice highlight feature. Additionally the text into blue boxes that appears on the side gives it more elegance. e.g. http://diveintopython3.org/strings.html I am looking into how they are generated; meanwhile I thought to push it here just in case someone knows already the strategy used to generate these pages. >From the footer links, seems the pages are available in multiple languages. This kinda indicates they are autogenerated. Regards, Ketan On Tue, Mar 1, 2011 at 9:34 PM, Michael Wilde wrote: > And, in defense of docbook, there is "asciidoc" - like reStructured text > but for docbook: > > http://www.methods.co.nz/asciidoc/index.html > > and > > http://kaczanowscy.pl/tomek/2010-09/a-perfect-environment-for-docbook > > The asciidoc user guide, presumably done using asciidoc, has some > resemblance to our current User Guide: > > http://www.methods.co.nz/asciidoc/userguide.html > > This might let us make a smoother evolution from our current docbook markup > to a kinder gentler markup. > > Justin, I am with you on deciding this soon. We may need some more > experiments to do that. > > - Mike > > > ----- Original Message ----- > > Yeah, that looks tedious. But there is also this form which works for > > most layouts: > > > > Simple table: > > > > ===== ===== ====== > > Inputs Output > > ------------ ------ > > A B A or B > > ===== ===== ====== > > False False False > > True False True > > False True True > > True True True > > ===== ===== ====== > > > > I dont think tables will make or break it for us. > > > > I like the simple style of most of the other text constructs. > > > > - Mike > > > > ----- Original Message ----- > > > On Tue, 2011-03-01 at 11:54 -0600, Michael Wilde wrote: > > > > > > > > ----- Original Message ----- > > > > > For what it's worth, I think that if folks don't want docbook > > > > > (which > > > > > I'd > > > > > more inclined to drop if there was any other reasonable > > > > > equivalent > > > > > system that produces good html output), > > > > > > > > Did you look at the reStructured text and Sphinx alternative I > > > > described, which is used by Python and many other projects? > > > > > > I believe that expressing tables like this: > > > +--------------+----------+-----------+-----------+ > > > | row 1, col 1 | column 2 | column 3 | column 4 | > > > +--------------+----------+-----------+-----------+ > > > | row 2 | Use the command ``ls | more``. | > > > +--------------+----------+-----------+-----------+ > > > | row 3 | | | | > > > +--------------+----------+-----------+-----------+ > > > > > > ... is silly. You are drawing tables in text. Imagine the amount of > > > work > > > required to add or remove a column. > > > > > > The reason one would use this is if it would be desirable for the > > > documentation to look presentable in a plain text source file. > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Wed Mar 2 20:17:16 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 2 Mar 2011 20:17:16 -0600 Subject: asciidoc: Re: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: References: <1516053419.116000.1299024055345.JavaMail.root@zimbra.anl.gov> <1916319493.116314.1299036876182.JavaMail.root@zimbra.anl.gov> Message-ID: Hello again, This page: http://diveintopython3.org/about.html indicates jquery is used along with some javascript programs for syntax highlight. Regards, Ketan On Wed, Mar 2, 2011 at 8:07 PM, Ketan Maheshwari wrote: > Hello, > > Check out the following: > > http://diveintopython3.org > > I thought they have a beautiful layout and if you see one of the chapters > the code and its associated description has a nice highlight feature. > Additionally the text into blue boxes that appears on the side gives it more > elegance. e.g. > > http://diveintopython3.org/strings.html > > I am looking into how they are generated; meanwhile I thought to push it > here just in case someone knows already the strategy used to generate these > pages. > > From the footer links, seems the pages are available in multiple languages. > This kinda indicates they are autogenerated. > > Regards, > Ketan > > On Tue, Mar 1, 2011 at 9:34 PM, Michael Wilde wrote: > >> And, in defense of docbook, there is "asciidoc" - like reStructured text >> but for docbook: >> >> http://www.methods.co.nz/asciidoc/index.html >> >> and >> >> http://kaczanowscy.pl/tomek/2010-09/a-perfect-environment-for-docbook >> >> The asciidoc user guide, presumably done using asciidoc, has some >> resemblance to our current User Guide: >> >> http://www.methods.co.nz/asciidoc/userguide.html >> >> This might let us make a smoother evolution from our current docbook >> markup to a kinder gentler markup. >> >> Justin, I am with you on deciding this soon. We may need some more >> experiments to do that. >> >> - Mike >> >> >> ----- Original Message ----- >> > Yeah, that looks tedious. But there is also this form which works for >> > most layouts: >> > >> > Simple table: >> > >> > ===== ===== ====== >> > Inputs Output >> > ------------ ------ >> > A B A or B >> > ===== ===== ====== >> > False False False >> > True False True >> > False True True >> > True True True >> > ===== ===== ====== >> > >> > I dont think tables will make or break it for us. >> > >> > I like the simple style of most of the other text constructs. >> > >> > - Mike >> > >> > ----- Original Message ----- >> > > On Tue, 2011-03-01 at 11:54 -0600, Michael Wilde wrote: >> > > > >> > > > ----- Original Message ----- >> > > > > For what it's worth, I think that if folks don't want docbook >> > > > > (which >> > > > > I'd >> > > > > more inclined to drop if there was any other reasonable >> > > > > equivalent >> > > > > system that produces good html output), >> > > > >> > > > Did you look at the reStructured text and Sphinx alternative I >> > > > described, which is used by Python and many other projects? >> > > >> > > I believe that expressing tables like this: >> > > +--------------+----------+-----------+-----------+ >> > > | row 1, col 1 | column 2 | column 3 | column 4 | >> > > +--------------+----------+-----------+-----------+ >> > > | row 2 | Use the command ``ls | more``. | >> > > +--------------+----------+-----------+-----------+ >> > > | row 3 | | | | >> > > +--------------+----------+-----------+-----------+ >> > > >> > > ... is silly. You are drawing tables in text. Imagine the amount of >> > > work >> > > required to add or remove a column. >> > > >> > > The reason one would use this is if it would be desirable for the >> > > documentation to look presentable in a plain text source file. >> > >> > -- >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Thu Mar 3 00:50:04 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 3 Mar 2011 06:50:04 +0000 (GMT) Subject: asciidoc: Re: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: References: <1516053419.116000.1299024055345.JavaMail.root@zimbra.anl.gov> <1916319493.116314.1299036876182.JavaMail.root@zimbra.anl.gov> Message-ID: > Check out the following: > > http://diveintopython3.org > > I thought they have a beautiful layout and if you see one of the chapters > the code and its associated description has a nice highlight feature. > I am looking into how they are generated; meanwhile I thought to push it > here just in case someone knows already the strategy used to generate these > pages. Looks like HTML in their source code respository. Its not immediately apparent what they are using to convert that into PDF. -- From ketancmaheshwari at gmail.com Thu Mar 3 09:28:39 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 3 Mar 2011 09:28:39 -0600 Subject: asciidoc: Re: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: References: <1516053419.116000.1299024055345.JavaMail.root@zimbra.anl.gov> <1916319493.116314.1299036876182.JavaMail.root@zimbra.anl.gov> Message-ID: Ben, They are using prince utility to convert html to pdf: http://www.princexml.com/overview/ http://www.princexml.com/doc/6.0/python/ It is indeed an automated system built around python scripts hosted on a mercurial system. I checked out the bundle and could see several text processing python scripts being used to perform publishing from the bundle. Regards, Ketan On Thu, Mar 3, 2011 at 12:50 AM, Ben Clifford wrote: > > > Check out the following: > > > > http://diveintopython3.org > > > > I thought they have a beautiful layout and if you see one of the chapters > > the code and its associated description has a nice highlight feature. > > > I am looking into how they are generated; meanwhile I thought to push it > > here just in case someone knows already the strategy used to generate > these > > pages. > > Looks like HTML in their source code respository. Its not immediately > apparent what they are using to convert that into PDF. > > -- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri Mar 4 13:41:32 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 4 Mar 2011 13:41:32 -0600 (CST) Subject: [Swift-devel] Early access to swift 0.92 release and documentation In-Reply-To: Message-ID: <223397951.149021.1299267692874.JavaMail.root@zimbra.anl.gov> Neil, Wei, Below is the pointer that Sarah sent to another user group on where to find the 0.92 release and some still-underway starter-roadmap documentation to guide you to the download and tutorial. If you can try using this as your "first" Swift experience, and send feedback back to us at swift-user, we're very eager to make this work well as the "new user roadmap" for Swift. Please let us know how it goes, and we will all listen to swift-user to help you in any way needed. Regards, Mike ----- Original Message ----- From: "Sarah Kenny" To: "Michael Wilde" Cc: ... Sent: Tuesday, March 1, 2011 6:23:25 PM Subject: Re: Status of swift 0.92 release in the mean time, john, you're welcome to give our new 'experimental' site a try https://sites.google.com/site/swiftparallelscripting/home i put the latest .92 binary up there for download and it would be good to know how this content feels for a user. ~sk -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri Mar 4 15:26:37 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 4 Mar 2011 15:26:37 -0600 (CST) Subject: [Swift-devel] google sites doc In-Reply-To: Message-ID: <936065867.149748.1299273997449.JavaMail.root@zimbra.anl.gov> Sarah, Im going to give Ketan edit access to the development site, to work on "Cookbook" pages for eventual merger into the doc set (eg, like on using OSG). For now I will just add him explicitly instead of trying to grant access through the Google Group. - Mike ----- Original Message ----- i put the new 'production' site here: https://sites.google.com/site/swiftguide/ i just copied over the existing site which i've renamed the 'development' site: https://sites.google.com/site/swiftparallelscripting/ i believe the proper permissions were inherited on the copy, but let me know if you have any trouble accessing. ~sk _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Fri Mar 4 15:32:51 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 4 Mar 2011 15:32:51 -0600 Subject: [Swift-devel] google sites doc In-Reply-To: <936065867.149748.1299273997449.JavaMail.root@zimbra.anl.gov> References: <936065867.149748.1299273997449.JavaMail.root@zimbra.anl.gov> Message-ID: Mike, Yes, now I can edit. Ketan On Fri, Mar 4, 2011 at 3:26 PM, Michael Wilde wrote: > Sarah, Im going to give Ketan edit access to the development site, to work > on "Cookbook" pages for eventual merger into the doc set (eg, like on using > OSG). > > For now I will just add him explicitly instead of trying to grant access > through the Google Group. > > - Mike > > > ------------------------------ > > i put the new 'production' site here: > https://sites.google.com/site/swiftguide/ > > i just copied over the existing site which i've renamed the 'development' > site: https://sites.google.com/site/swiftparallelscripting/ > > i believe the proper permissions were inherited on the copy, but let me > know if you have any trouble accessing. > > ~sk > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sat Mar 5 08:37:27 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 5 Mar 2011 08:37:27 -0600 (CST) Subject: [Swift-devel] Work on generalizing OSG scripts for a future Swift release In-Reply-To: Message-ID: <608238971.151509.1299335847561.JavaMail.root@zimbra.anl.gov> Ketan, this message is to follow up on the longer-term task of making Swift's OSG execution capability end-user ready. Im cc'ing swift-devel so that I and others on the team can help you on various aspects of this. You can find the versions that I had started to work on, on the CI network here: /home/wilde/swift/lab/osg/allantools/{ pool_coaster,site_gen} site_gen has tools to generate the sites.xml file based on OSG configuration services (ReSS on Condor). pool_coaster has the script that starts coaster workers on all the sites, using what we call the "Queue-N" algorithm. There is a README file in pool_coaster Basically the way it works is you cd to site_gen and run gen_goodsites.sh Then you cd to pool_coaster and run start_services.sh But I think there are several other manual steps involved as decribed in the README Before you start this work, you should test a few other Swift mechanisms for running simpler manual coaster pools. You can do that while you wait for your OSG cert to be approved and then for your OSG Engage VO registration to be approved and to propagate to all the OSG sites. That will take about a week, during which its good to learn (and help us document) what end-users need to know about how the coaster mechanism works in manually-run pools (and, in its automated mode as well). A paper that Mihael has in progress on coasters is in svn at: URL: https://svn.ci.uchicago.edu/svn/swift/2010CloudCom-coasters We will need to help you understand the scripts above. And the core scripts in both directories are written in Ruby (and some in Karajan) so there is a lot to learn! But you can start looking at them and send me any questions about the (hopefully Allan and others can get involved as well depending on their time availability). We need to find from Justin, Sarah, and Tim (who has done similar things in SwiftR) what the latest scripts we have for starting coasters in various configurations are, and which are heading towards end-user readiness. - Mike ----- Original Message ----- Congratulations Allan. Mike discussed your strategy for obtaining upto 2K OSG processors the other day which, I thought is awesome. Would you be presenting your work somewhere? It would be nice to attend. Cheers, Ketan On Fri, Mar 4, 2011 at 10:35 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: Hi All, Allan's just completed MS thesis (advised by Ian and Dan) is on a workflow that includes hundreds of thousands of small tasks among many large ones, and is a good candidate to study and perform in ExM. Its on the ExM web Documents page and at: https://sites.google.com/site/exmproject/documents/AEspinosa.MSThesis.2011.0304_main.pdf I'm sure Allan and Dan would be happy to discuss it with us. Congrats, Allan, on completing your thesis! - Mike _______________________________________________ ExM mailing list ExM at lists.mcs.anl.gov https://lists.mcs.anl.gov/mailman/listinfo/exm -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sun Mar 6 10:06:36 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 6 Mar 2011 10:06:36 -0600 (CST) Subject: [Swift-devel] Fwd: [Swift-user] Errors compiling swift-0.92 In-Reply-To: Message-ID: <806005505.152697.1299427596007.JavaMail.root@zimbra.anl.gov> Mihael, thanks for debugging this with Andriy. I updated the download page. All, just fyi: In doing so I noticed that we have a more recent svn on PADS (1.6) than on the other CI hosts (1.4). The www/ dir in /ci/www/projects/swift was checked out with a recent svn, so it cant be svn update'ed from many of our hosts running svn 1.4 (svn --version). - Mike ----- Forwarded Message ----- From: "Andriy Fedorov" To: "Mihael Hategan" Cc: swift-user at ci.uchicago.edu Sent: Sunday, March 6, 2011 8:01:31 AM Subject: Re: [Swift-user] Errors compiling swift-0.92 On Sat, Mar 5, 2011 at 20:54, Mihael Hategan wrote: > But now that I look at the instructions, I think they are wrong. The > correct cog branch is 4.1.8, not 4.1.7. > Yes, this indeed was the problem. The instructions on the Swift web page need an update. Thank you for resolving this! Andrey _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Sun Mar 6 10:11:28 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 6 Mar 2011 10:11:28 -0600 (CST) Subject: [Swift-devel] [Bug 261] New: update.sh script (for pushing web content live) gives errors Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=261 Summary: update.sh script (for pushing web content live) gives errors Product: Swift Version: 0.92 Platform: PC OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: skenny at uchicago.edu ReportedBy: wilde at mcs.anl.gov I get the following errors from this script, but the specific file I was pushing made it to the live conent anyways. if we keep using this script, we should make it run cleanly. - Mike --- $ s pads1 Last login: Sat Mar 5 08:09:31 2011 from c-24-14-89-232.hsd1.il.comcast.net login1$ cd /ci/www/projects/swift/ login1$ svn up U downloads/index.php Updated to revision 4169. login1$ ./update.sh chmod: changing permissions of `./update.sh': Operation not permitted chmod: changing permissions of `updatenodocs.sh': Operation not permitted --------- Updating www... ---------- At revision 4169. find: ./guides/trunk/historical: Permission denied find: ./guides/trunk/formatting: Permission denied find: ./guides/trunk/plot-tour: Permission denied find: ./guides/release-0.91/historical: Permission denied find: ./guides/release-0.91/formatting: Permission denied find: ./guides/release-0.91/plot-tour: Permission denied --------- Updating docs... ---------- --------- Updating guide: guides/trunk ---------- /ci/www/projects/swift/guides/trunk /ci/www/projects/swift ln: accessing `formatting/docbook': Permission denied ln: accessing `formatting/fop': Permission denied Skipped '.' ./update.sh: line 23: ./buildguides.sh: Permission denied /ci/www/projects/swift/guides/trunk/userguide /ci/www/projects/swift/guides/trunk /ci/www/projects/swift chmod: changing permissions of `appmodel.php': Operation not permitted chmod: changing permissions of `buildoptions.php': Operation not permitted chmod: changing permissions of `cdm.php': Operation not permitted chmod: changing permissions of `clustering.php': Operation not permitted chmod: changing permissions of `coasters.php': Operation not permitted chmod: changing permissions of `commands.php': Operation not permitted chmod: changing permissions of `engineconfiguration.php': Operation not permitted chmod: changing permissions of `extending.php': Operation not permitted chmod: changing permissions of `functions.php': Operation not permitted chmod: changing permissions of `index.php': Operation not permitted chmod: changing permissions of `kickstart.php': Operation not permitted chmod: changing permissions of `language.php': Operation not permitted chmod: changing permissions of `localhowtos.php': Operation not permitted chmod: changing permissions of `mappers.php': Operation not permitted chmod: changing permissions of `overview.php': Operation not permitted chmod: changing permissions of `procedures.php': Operation not permitted chmod: changing permissions of `profiles.php': Operation not permitted chmod: changing permissions of `reliability.php': Operation not permitted chmod: changing permissions of `sitecatalog.php': Operation not permitted chmod: changing permissions of `techoverview.php': Operation not permitted chmod: changing permissions of `transformationcatalog.php': Operation not permitted chmod: changing permissions of `userguide-rotated.jpeg': Operation not permitted chmod: changing permissions of `userguide-shane.jpeg': Operation not permitted chmod: changing permissions of `swift-site-model.png': Operation not permitted chmod: changing permissions of `type-hierarchy.png': Operation not permitted chmod: changing permissions of `appmodel.php': Operation not permitted chmod: changing permissions of `buildoptions.php': Operation not permitted chmod: changing permissions of `cdm.php': Operation not permitted chmod: changing permissions of `clustering.php': Operation not permitted chmod: changing permissions of `coasters.php': Operation not permitted chmod: changing permissions of `commands.php': Operation not permitted chmod: changing permissions of `engineconfiguration.php': Operation not permitted chmod: changing permissions of `extending.php': Operation not permitted chmod: changing permissions of `functions.php': Operation not permitted chmod: changing permissions of `index.php': Operation not permitted chmod: changing permissions of `kickstart.php': Operation not permitted chmod: changing permissions of `language.php': Operation not permitted chmod: changing permissions of `localhowtos.php': Operation not permitted chmod: changing permissions of `mappers.php': Operation not permitted chmod: changing permissions of `overview.php': Operation not permitted chmod: changing permissions of `procedures.php': Operation not permitted chmod: changing permissions of `profiles.php': Operation not permitted chmod: changing permissions of `reliability.php': Operation not permitted chmod: changing permissions of `sitecatalog.php': Operation not permitted chmod: changing permissions of `techoverview.php': Operation not permitted chmod: changing permissions of `transformationcatalog.php': Operation not permitted chmod: cannot access `*.pdf': No such file or directory /ci/www/projects/swift/guides/trunk /ci/www/projects/swift chmod: changing permissions of `userguide': Operation not permitted chmod: changing permissions of `userguide/procedures.php': Operation not permitted chmod: changing permissions of `userguide/commands.php': Operation not permitted chmod: changing permissions of `userguide/localhowtos.php': Operation not permitted chmod: changing permissions of `userguide/profiles.php': Operation not permitted chmod: changing permissions of `userguide/appmodel.php': Operation not permitted chmod: changing permissions of `userguide/coasters.php': Operation not permitted chmod: changing permissions of `userguide/swift-site-model.png': Operation not permitted chmod: changing permissions of `userguide/mappers.php': Operation not permitted chmod: changing permissions of `userguide/overview.php': Operation not permitted chmod: changing permissions of `userguide/kickstart.php': Operation not permitted chmod: changing permissions of `userguide/sitecatalog.php': Operation not permitted chmod: changing permissions of `userguide/functions.php': Operation not permitted chmod: changing permissions of `userguide/cdm.php': Operation not permitted chmod: changing permissions of `userguide/extending.php': Operation not permitted chmod: changing permissions of `userguide/language.php': Operation not permitted chmod: changing permissions of `userguide/buildoptions.php': Operation not permitted chmod: changing permissions of `userguide/index.php': Operation not permitted chmod: changing permissions of `userguide/reliability.php': Operation not permitted chmod: changing permissions of `userguide/techoverview.php': Operation not permitted chmod: changing permissions of `userguide/type-hierarchy.png': Operation not permitted chmod: changing permissions of `userguide/engineconfiguration.php': Operation not permitted chmod: changing permissions of `userguide/clustering.php': Operation not permitted chmod: changing permissions of `userguide/userguide-shane.jpeg': Operation not permitted chmod: changing permissions of `userguide/userguide-rotated.jpeg': Operation not permitted chmod: changing permissions of `userguide/transformationcatalog.php': Operation not permitted chmod: changing permissions of `userguide': Operation not permitted /ci/www/projects/swift --------- Updating guide: guides/release-0.91 ---------- /ci/www/projects/swift/guides/release-0.91 /ci/www/projects/swift ln: accessing `formatting/docbook': Permission denied ln: accessing `formatting/fop': Permission denied Skipped '.' ./update.sh: line 23: ./buildguides.sh: Permission denied /ci/www/projects/swift/guides/release-0.91/userguide /ci/www/projects/swift/guides/release-0.91 /ci/www/projects/swift chmod: changing permissions of `appmodel.php': Operation not permitted chmod: changing permissions of `buildoptions.php': Operation not permitted chmod: changing permissions of `clustering.php': Operation not permitted chmod: changing permissions of `coasters.php': Operation not permitted chmod: changing permissions of `commands.php': Operation not permitted chmod: changing permissions of `engineconfiguration.php': Operation not permitted chmod: changing permissions of `extending.php': Operation not permitted chmod: changing permissions of `functions.php': Operation not permitted chmod: changing permissions of `index.php': Operation not permitted chmod: changing permissions of `kickstart.php': Operation not permitted chmod: changing permissions of `language.php': Operation not permitted chmod: changing permissions of `localhowtos.php': Operation not permitted chmod: changing permissions of `mappers.php': Operation not permitted chmod: changing permissions of `overview.php': Operation not permitted chmod: changing permissions of `procedures.php': Operation not permitted chmod: changing permissions of `profiles.php': Operation not permitted chmod: changing permissions of `reliability.php': Operation not permitted chmod: changing permissions of `sitecatalog.php': Operation not permitted chmod: changing permissions of `techoverview.php': Operation not permitted chmod: changing permissions of `transformationcatalog.php': Operation not permitted chmod: changing permissions of `userguide-rotated.jpeg': Operation not permitted chmod: changing permissions of `userguide-shane.jpeg': Operation not permitted chmod: changing permissions of `swift-site-model.png': Operation not permitted chmod: changing permissions of `type-hierarchy.png': Operation not permitted chmod: changing permissions of `appmodel.php': Operation not permitted chmod: changing permissions of `buildoptions.php': Operation not permitted chmod: changing permissions of `clustering.php': Operation not permitted chmod: changing permissions of `coasters.php': Operation not permitted chmod: changing permissions of `commands.php': Operation not permitted chmod: changing permissions of `engineconfiguration.php': Operation not permitted chmod: changing permissions of `extending.php': Operation not permitted chmod: changing permissions of `functions.php': Operation not permitted chmod: changing permissions of `index.php': Operation not permitted chmod: changing permissions of `kickstart.php': Operation not permitted chmod: changing permissions of `language.php': Operation not permitted chmod: changing permissions of `localhowtos.php': Operation not permitted chmod: changing permissions of `mappers.php': Operation not permitted chmod: changing permissions of `overview.php': Operation not permitted chmod: changing permissions of `procedures.php': Operation not permitted chmod: changing permissions of `profiles.php': Operation not permitted chmod: changing permissions of `reliability.php': Operation not permitted chmod: changing permissions of `sitecatalog.php': Operation not permitted chmod: changing permissions of `techoverview.php': Operation not permitted chmod: changing permissions of `transformationcatalog.php': Operation not permitted chmod: cannot access `*.pdf': No such file or directory /ci/www/projects/swift/guides/release-0.91 /ci/www/projects/swift chmod: changing permissions of `userguide': Operation not permitted chmod: changing permissions of `userguide/procedures.php': Operation not permitted chmod: changing permissions of `userguide/commands.php': Operation not permitted chmod: changing permissions of `userguide/localhowtos.php': Operation not permitted chmod: changing permissions of `userguide/profiles.php': Operation not permitted chmod: changing permissions of `userguide/appmodel.php': Operation not permitted chmod: changing permissions of `userguide/coasters.php': Operation not permitted chmod: changing permissions of `userguide/swift-site-model.png': Operation not permitted chmod: changing permissions of `userguide/mappers.php': Operation not permitted chmod: changing permissions of `userguide/overview.php': Operation not permitted chmod: changing permissions of `userguide/kickstart.php': Operation not permitted chmod: changing permissions of `userguide/sitecatalog.php': Operation not permitted chmod: changing permissions of `userguide/functions.php': Operation not permitted chmod: changing permissions of `userguide/extending.php': Operation not permitted chmod: changing permissions of `userguide/language.php': Operation not permitted chmod: changing permissions of `userguide/buildoptions.php': Operation not permitted chmod: changing permissions of `userguide/index.php': Operation not permitted chmod: changing permissions of `userguide/reliability.php': Operation not permitted chmod: changing permissions of `userguide/techoverview.php': Operation not permitted chmod: changing permissions of `userguide/type-hierarchy.png': Operation not permitted chmod: changing permissions of `userguide/engineconfiguration.php': Operation not permitted chmod: changing permissions of `userguide/clustering.php': Operation not permitted chmod: changing permissions of `userguide/userguide-shane.jpeg': Operation not permitted chmod: changing permissions of `userguide/userguide-rotated.jpeg': Operation not permitted chmod: changing permissions of `userguide/transformationcatalog.php': Operation not permitted chmod: changing permissions of `userguide': Operation not permitted /ci/www/projects/swift --------- All done ---------- login1$ login1$ -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wilde at mcs.anl.gov Sun Mar 6 12:18:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 6 Mar 2011 12:18:46 -0600 (CST) Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <782300081.147363.1299253048180.JavaMail.root@zimbra.anl.gov> Message-ID: <708342452.152917.1299435526644.JavaMail.root@zimbra.anl.gov> Hi All, Here's my proposal for Swift documentation. Please comment. We'll discuss this for a few days, make changes people feel are necessary, and then go with it and see how it works. I want to nail this down by this Fri Mar 11 if not sooner. Im proposing after a lot of thought and analysis that we *use Google tools* for the next 6 months to aggressively grow and improve our documentation; then we assess whether to stay with Google or consider going back to a markup-language-based approach. Barring strong feedback that "this wont work" its the direction I've decide to go in. Below are first the motivations and requirements that Ive based this recommendation on. After that are the elements that I thing we need to work into a plan. Justin, Sarah, Mihael, Keatan, David - this email is long but intended primarily for you. Sarah, David and Ketan, in particular, you should read this throughly and comment, as you will need to be the prime movers to get this going. *** Motivation The plan is based on two principles: - More comprehensive and useful documentation is the highest current priority for making Swift successful. - We need so much documentation work now (relative to our team size and available time) that *ease of creating and improving the content* and getting it to users should be given higher priority (for some period) than the cost or effectiveness of maintaining the content. *** High level Plan Based on these driving factors, I'm proposing this: - we move *all* our site and documents to Google for the next 6 months - for 6 months we work hard on improving the scope and effectiveness of the documentation - after 6 months, we evaluate the process and the results, and decide if we stay with Google or a similar online content management system, or we revert back to an svn-controlled markup process. If we revert, we select a markup language and toolchain based on an evaluation, and we absorb the cost of pushing the content back into markup language at that time. *** Requirements Here are the requirements I feel our documentation approach needs to address, and my view of their priorities. --- HIGH H: environment for content writing that encourages writing and continuous and collaborative improvement. (a good process for preview, multiple editors, comments, improvement) H: easy enough to make doc fixes so that fixes get made as soon as we spot problems that impede our users (or us) H: keeping the docs in sync with the trunk and releases. Ability to associate (and get) the docs for any release branch H: reasonable and easy page saving/reviewing/copying/releasing process (ideally easy to automate) H: a content writing and management guide to encourage writing and maintain quality and conventions --- Medium M: ability to review documents and changes as they evolve so that people can make improvements aggressively, knowing their changes will be notified to and reviewed by the community (both developers and users) for quality assurance. M: nice clean crisp look that is both aesthetic and readable M: doc style should stay uniform when many people write and edit M: ability to change the style throughout, when needed - ideally from a small number of style definitions M: code testing: how to ensure that code examples in the docs are correct? (M because we can do this manually, and will always need to, to some extent) M: If we post documents from multiple sources (eg Google vs svn) we need to be able to maintain reasonable linkages between them that wont break. M: Page navigation within the doc set should be pleasant and effective for the user, and use good web principles. Page naming should be reasonable. --- low L: multiple high-quality renderings: as PDFs, standalone HTML, and text. This will be more important later, but is not at the moment. L: style definitions that anyone can improve L: ability to get the docs for any *trunk* revision. L: ability to render HTML as single page or a page hierarchy (higher prio as doc set grows) L: index terms: can we make an index? L: interactive viewing capabilities with highlighting ala DiveIntoPython L: we should track and publish for both us and our users a list of changes to both the software and the documentation (maybe M?) --- *** Detailed plan elements We need to develop a plan. Here are some elements it needs to address, and some specific issues related to Google use that we need to decide on (which will take some experimentation and testing). Sarah, Im hoping you can take these on (as youve already gotten a great headstart on Google evaluation), but lets make it a team-wide shared effort. Using Google should facilitate that. o Determine document structure. Site, Quickstart, Tutorial, User Guide, Cookbook, and Case Studies, for now is what I think we need. Roughly in that priority order. Where Cookbook is things that should go into the User Guide eventually. For each of those we should next outline the substructure. How/where can we maintain such an outline? Can it be simply the page structure of the "development" site? o Determine document style: headers, section numbering, colors and emphasis, paragraph styles. I feel that the Userguide format established by Ben is a great starting point. I have no desire to change it much for now. Eventually it can be made a bit crisper and perhaps more consistent. o Move the content we have now into this structure and style. Move all end-user docs out of the SWFT wiki. (In fact, lets eventually move that wiki entirely into google and decommission it). o Enhance, specifically, the tutorial, and integrate the user-driven elements of the cookbook into the docset. *** Open issues (many!): These are in no particular order. A nextstep is to walk through these and make reommendations and decisions. --- (Note related to style: Sarah, the more I looked at the brain-mapping site you sent, the more I like it and see the possibilities here. Can you determine how they did some of the things they do? Very nice clean look; very readable text styles.) determine doc style elements and how to enforce. How to provide a swift-specific style drop-down for editing. timing and steps for the transition Things we can do to make the URL more transparent and CI-branded? How to make it look like our URL is same as now? Or at least swift-branded? site style improvements (eg logo stripe; page width (min? fixed?) automation of pdf production? Quality of PDF? indexing? (low prio) searching extensively *within* the site - higher prio google sites vs google docs? Push docs into the site? Docs better for maintaining a style? Better for printing? how to do and maintain graphical figures. where to keep templates for uniformity (swift writers guide) how to do backups? table of contents management. link management conventions (both within and outside of site) how to track comments for public revision control? (eg, posting to svn log?) style for reviewer comments? (both internal and user-based) facility for user comments and process for addressing? what site, page, and doc naming should we use? can we and should we use google more as an editor but route revisions into - automation tool ala the CL tools David pointed out to the list Can we get good unified search within the site? (that stays within the site) Use of Google Groups for access control: didnt work for Swift buti is working for ExM : not sure why. Ive asked Ketan to start writing and moving pages from the Swift Wiki to the Google Site "cookbook" area. Sarah, Ketan, can you make a structure for adding and maintaining the "cookbook" pages? The main cookbooks Im thinking of at the moment is (1) a guide for the various coaster configs and (2) a guide for OSG users: how to get their certs and start running Swift on OSG sites using the new scripts we are working on. Like I said up front: comments welcome. But barring feedback that "this wont work" its the direction I've decide to go in. - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Sun Mar 6 13:29:47 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 06 Mar 2011 11:29:47 -0800 Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <708342452.152917.1299435526644.JavaMail.root@zimbra.anl.gov> References: <708342452.152917.1299435526644.JavaMail.root@zimbra.anl.gov> Message-ID: <1299439787.13564.2.camel@blabla2.none> I agree with the fact that documentation is important at this point. I also agree that we should try new things. It will allow us to make a more educated decision in the future. I may have suspicions about what's the right (or wrong) way to go, but those suspicions may be wrong. So I support your plan. Mihael On Sun, 2011-03-06 at 12:18 -0600, Michael Wilde wrote: > Hi All, > > Here's my proposal for Swift documentation. Please comment. We'll discuss this for a few days, make changes people feel are necessary, and then go with it and see how it works. I want to nail this down by this Fri Mar 11 if not sooner. > > Im proposing after a lot of thought and analysis that we *use Google tools* for the next 6 months to aggressively grow and improve our documentation; then we assess whether to stay with Google or consider going back to a markup-language-based approach. > > Barring strong feedback that "this wont work" its the direction I've decide to go in. > > Below are first the motivations and requirements that Ive based this recommendation on. After that are the elements that I thing we need to work into a plan. > > Justin, Sarah, Mihael, Keatan, David - this email is long but intended primarily for you. Sarah, David and Ketan, in particular, you should read this throughly and comment, as you will need to be the prime movers to get this going. > > *** Motivation > > The plan is based on two principles: > > - More comprehensive and useful documentation is the highest current priority for making Swift successful. > > - We need so much documentation work now (relative to our team size and available time) that *ease of creating and improving the content* and getting it to users should be given higher priority (for some period) than the cost or effectiveness of maintaining the content. > > *** High level Plan > > Based on these driving factors, I'm proposing this: > > - we move *all* our site and documents to Google for the next 6 months > > - for 6 months we work hard on improving the scope and effectiveness of the documentation > > - after 6 months, we evaluate the process and the results, and decide if we stay with Google or a similar online content management system, or we revert back to an svn-controlled markup process. If we revert, we select a markup language and toolchain based on an evaluation, and we absorb the cost of pushing the content back into markup language at that time. > > > *** Requirements > > Here are the requirements I feel our documentation approach needs to address, and my view of their priorities. > > --- HIGH > > H: environment for content writing that encourages writing and continuous and collaborative improvement. (a good process for preview, multiple editors, comments, improvement) > > H: easy enough to make doc fixes so that fixes get made as soon as we spot problems that impede our users (or us) > > H: keeping the docs in sync with the trunk and releases. Ability to associate (and get) the docs for any release branch > > H: reasonable and easy page saving/reviewing/copying/releasing process (ideally easy to automate) > > H: a content writing and management guide to encourage writing and maintain quality and conventions > > > --- Medium > > M: ability to review documents and changes as they evolve so that people can make improvements aggressively, knowing their changes will be notified to and reviewed by the community (both developers and users) for quality assurance. > > M: nice clean crisp look that is both aesthetic and readable > > M: doc style should stay uniform when many people write and edit > > M: ability to change the style throughout, when needed - ideally from a small number of style definitions > > M: code testing: how to ensure that code examples in the docs are correct? > (M because we can do this manually, and will always need to, to some extent) > > M: If we post documents from multiple sources (eg Google vs svn) we need to be able to maintain reasonable linkages between them that wont break. > > M: Page navigation within the doc set should be pleasant and effective for the user, and use good web principles. Page naming should be reasonable. > > --- low > > L: multiple high-quality renderings: as PDFs, standalone HTML, and text. This will be more important later, but is not at the moment. > > L: style definitions that anyone can improve > > L: ability to get the docs for any *trunk* revision. > > L: ability to render HTML as single page or a page hierarchy (higher prio as doc set grows) > > L: index terms: can we make an index? > > L: interactive viewing capabilities with highlighting ala DiveIntoPython > > L: we should track and publish for both us and our users a list of changes to both the software and the documentation (maybe M?) > > --- > > > *** Detailed plan elements > > We need to develop a plan. Here are some elements it needs to address, and some specific issues related to Google use that we need to decide on (which will take some experimentation and testing). > > Sarah, Im hoping you can take these on (as youve already gotten a great headstart on Google evaluation), but lets make it a team-wide shared effort. Using Google should facilitate that. > > o Determine document structure. Site, Quickstart, Tutorial, User Guide, Cookbook, and Case Studies, for now is what I think we need. Roughly in that priority order. Where Cookbook is things that should go into the User Guide eventually. For each of those we should next outline the substructure. How/where can we maintain such an outline? Can it be simply the page structure of the "development" site? > > o Determine document style: headers, section numbering, colors and emphasis, paragraph styles. I feel that the Userguide format established by Ben is a great starting point. I have no desire to change it much for now. Eventually it can be made a bit crisper and perhaps more consistent. > > o Move the content we have now into this structure and style. Move all end-user docs out of the SWFT wiki. (In fact, lets eventually move that wiki entirely into google and decommission it). > > o Enhance, specifically, the tutorial, and integrate the user-driven elements of the cookbook into the docset. > > *** Open issues (many!): > > These are in no particular order. A nextstep is to walk through these and make reommendations and decisions. > > --- > > (Note related to style: Sarah, the more I looked at the brain-mapping site you sent, the more I like it and see the possibilities here. Can you determine how they did some of the things they do? Very nice clean look; very readable text styles.) > > determine doc style elements and how to enforce. How to provide a swift-specific style drop-down for editing. > > timing and steps for the transition > > Things we can do to make the URL more transparent and CI-branded? > How to make it look like our URL is same as now? Or at least swift-branded? > > site style improvements (eg logo stripe; page width (min? fixed?) > > automation of pdf production? Quality of PDF? > > indexing? (low prio) > > searching extensively *within* the site - higher prio > > google sites vs google docs? Push docs into the site? Docs better for maintaining a style? Better for printing? > > how to do and maintain graphical figures. > > where to keep templates for uniformity (swift writers guide) > > how to do backups? > > table of contents management. > > link management conventions (both within and outside of site) > > how to track comments for public revision control? (eg, posting to svn log?) > > style for reviewer comments? (both internal and user-based) > facility for user comments and process for addressing? > > what site, page, and doc naming should we use? > > can we and should we use google more as an editor but route revisions into - automation tool ala the CL tools David pointed out to the list > > Can we get good unified search within the site? (that stays within the site) > > Use of Google Groups for access control: didnt work for Swift buti is working for ExM : not sure why. > > Ive asked Ketan to start writing and moving pages from the Swift Wiki to the Google Site "cookbook" area. Sarah, Ketan, can you make a structure for adding and maintaining the "cookbook" pages? The main cookbooks Im thinking of at the moment is (1) a guide for the various coaster configs and (2) a guide for OSG users: how to get their certs and start running Swift on OSG sites using the new scripts we are working on. > > Like I said up front: comments welcome. But barring feedback that "this wont work" its the direction I've decide to go in. > > - Mike > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun Mar 6 15:46:44 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 06 Mar 2011 13:46:44 -0800 Subject: [Swift-devel] Re: Workflow waiting on condition hang In-Reply-To: <4D602DF3.6000306@gmail.com> References: <4D5D8F6A.20906@gmail.com> <1297978769.20789.2.camel@blabla2.none> <4D602DF3.6000306@gmail.com> Message-ID: <1299448004.16332.2.camel@blabla2.none> Given that this does not seem to be a java deadlock, I added a hang checker to swift. If nothing is being executed inside karajan and no jobs are running in any ten second interval, it will dump future and thread information to the log file. This is in swift trunk r4171. Can you give that a try and report back the details? Mihael On Sat, 2011-02-19 at 14:54 -0600, Jonathan Monette wrote: > Yes. It always seems to hang at the same place. > > Attached is my montage script. It hangs in the mFitBatch function at > the mConcatFit app call. All other files have been created up to that > step but that app never runs. > > On 2/17/11 3:39 PM, Mihael Hategan wrote: > > On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote: > >> Hello, > >> My workflow seems to be hanging. This is trunk swift-r4107 and > >> cog-r3051. Attached is a compressed log file and the jstack output for > >> my workflow. The jstack file says it is waiting for a condition and my > >> workflow hangs. > > There's lots of stuff waiting because that's what they do when they > > don't have anything else to do. So I don't see a problem there. > > > > There are no jobs going to the coaster service, so clearly things aren't > > progressing. > > > > So now the question is: does this happen every time you run it or just > > some times? > > > > Also, please send the swift script. > > > > Mihael > > > > From wilde at mcs.anl.gov Sun Mar 6 16:46:35 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 6 Mar 2011 16:46:35 -0600 (CST) Subject: [Swift-devel] How to get CoG svn commit email notifications? Message-ID: <2119651847.153133.1299451595684.JavaMail.root@zimbra.anl.gov> Is this the right list to subscribed to: https://lists.sourceforge.net/lists/listinfo/cogkit-svn ? I dont see anything in the archives there after 2008. If not, can you point me to the write place, or subscribe me? Thanks, - Mike From wilde at mcs.anl.gov Sun Mar 6 17:01:49 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 6 Mar 2011 17:01:49 -0600 (CST) Subject: [Swift-devel] Scripts for persistent and passive coaster configurations In-Reply-To: <821325397.153141.1299451878340.JavaMail.root@zimbra.anl.gov> Message-ID: <816172851.153151.1299452509608.JavaMail.root@zimbra.anl.gov> Hi All, I'd like Ketan to learn about this aspect of Coasters and to help automate and document the techniques in the process. Justin, David, can you point out to this list the latest tools, REDMEs, etc you have on the topic? Ketan, my scripts for doing this are on the CI net under: /home/wilde/swift/lab/pecos (for "persistent coasters") I dont recall what state those scripts were, but I will help you figure them out. Some background: Persistent coaster pools ("sites") are useful for: - running on ad-hoc collections of hosts that you can ssh to - running on OSG under a "pilot factory job" mechanism - running on clouds - running on machines where Globus is not available and the local scheduler is not supported by or does not work well under Swift (eg: Eureka with a deficient Cobalt) - running on large machines like BG/P or Cray where you want to run many tests within a longer-running fixed partition "Passive" coaster mode means that the coaster service does not start workers, but rather listens on a port for workers to connect to it. "Persistent" coasters means that the service, running outside of the Swift JVM (as a separate JVM) is started manually and has a lifetime that persists independent of any Swift clients that connect to it. Thus the coaster service has two relevant TCP ports: - a service port by which Swift reaches it (specified in the pool entry) - a worker port to which workers connect in passive mode At the moment, the coaster-service command lacks a "-passive" flag to put it in passive mode. Instead, the config script that launches the service needs to run a Swift script using a pool that specifies passive. Once this is done, the coaster service emits its passive port # on stdout/err. - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Sun Mar 6 17:04:09 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 06 Mar 2011 15:04:09 -0800 Subject: [Swift-devel] How to get CoG svn commit email notifications? In-Reply-To: <2119651847.153133.1299451595684.JavaMail.root@zimbra.anl.gov> References: <2119651847.153133.1299451595684.JavaMail.root@zimbra.anl.gov> Message-ID: <1299452649.21818.0.camel@blabla2.none> Justin has been trying to get this sorted out with the sourceforge folks. Unsuccessfully so far. I can subscribe you, but commit messages aren't sent. On Sun, 2011-03-06 at 16:46 -0600, Michael Wilde wrote: > Is this the right list to subscribed to: > https://lists.sourceforge.net/lists/listinfo/cogkit-svn ? > > I dont see anything in the archives there after 2008. > > If not, can you point me to the write place, or subscribe me? > > Thanks, > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wozniak at mcs.anl.gov Sun Mar 6 18:13:46 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sun, 6 Mar 2011 18:13:46 -0600 (Central Standard Time) Subject: [Swift-devel] How to get CoG svn commit email notifications? In-Reply-To: <1299452649.21818.0.camel@blabla2.none> References: <2119651847.153133.1299451595684.JavaMail.root@zimbra.anl.gov> <1299452649.21818.0.camel@blabla2.none> Message-ID: Right- I can mail them again. On Sun, 6 Mar 2011, Mihael Hategan wrote: > Justin has been trying to get this sorted out with the sourceforge > folks. Unsuccessfully so far. I can subscribe you, but commit messages > aren't sent. > > On Sun, 2011-03-06 at 16:46 -0600, Michael Wilde wrote: >> Is this the right list to subscribed to: >> https://lists.sourceforge.net/lists/listinfo/cogkit-svn ? >> >> I dont see anything in the archives there after 2008. >> >> If not, can you point me to the write place, or subscribe me? >> >> Thanks, >> - Mike >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From hockyg at uchicago.edu Mon Mar 7 11:31:41 2011 From: hockyg at uchicago.edu (Glen Hocky) Date: Mon, 7 Mar 2011 12:31:41 -0500 Subject: [Swift-devel] problem with max num jobs, array entries? Message-ID: hey Mike, devs i was wondering if you could help me track something down. i may not have noticed this before because of the way I was running my jobs but i'm having a problem running more than ~100 jobs w/ my swift script (with pbs or pbs+coasters). it just hangs with "Progress: " "Progress: " "Progress: " in the swift log it just stalls at this point ... 2011-03-07 10:26:53,280-0600 INFO SetFieldValue Set: force=FALSE 2011-03-07 10:26:53,284-0600 INFO VDLFunction FUNCTION: arg() 2011-03-07 10:26:53,285-0600 INFO VDLFunction FUNCTION: toint() 2011-03-07 10:26:53,285-0600 INFO SetFieldValue Set: printfreq=500 2011-03-07 10:26:53,285-0600 INFO VDLFunction FUNCTION: arg() 2011-03-07 10:26:53,285-0600 INFO VDLFunction FUNCTION: toint() 2011-03-07 10:26:53,285-0600 INFO SetFieldValue Set: nmodels=5 2011-03-07 10:26:53,286-0600 INFO VDLFunction FUNCTION: arg() 2011-03-07 10:26:53,286-0600 INFO VDLFunction FUNCTION: toint() 2011-03-07 10:26:53,286-0600 INFO SetFieldValue Set: nsub=20 whereas when i decrease the number of total jobs it goes to ... 2011-03-07 11:04:34,001-0600 INFO SetFieldValue Set: nmodels=4 2011-03-07 11:04:34,001-0600 INFO SetFieldValue Set: temperature=0.9 2011-03-07 11:04:34,002-0600 INFO SetFieldValue Set: rundir=/home/hockyg/reichman/glassy_dynamics/code/runs/overlaps/replica_exchange/code/swift/run_beagle 2011-03-07 11:04:34,001-0600 INFO SetFieldValue Set: label=1 2011-03-07 11:04:34,001-0600 INFO SetFieldValue Set: radii=unnamed SwiftScript value.$[]/1 2011-03-07 11:04:34,002-0600 INFO SetFieldValue Set: nsub=24 2011-03-07 11:04:39,581-0600 INFO AbstractDataNode Found data modelIn.$[]/1.[0][3][19].inputstructure 2011-03-07 11:04:39,581-0600 INFO AbstractDataNode Found data modelIn.$[]/1.[0][4][17].inputstructure 2011-03-07 11:04:39,582-0600 INFO AbstractDataNode Found data modelIn.$[]/1.[0][4][18].inputstructure 2011-03-07 11:04:39,582-0600 INFO AbstractDataNode Found data modelIn.$[]/1.[0][4][19].inputstructure 2011-03-07 11:04:39,582-0600 INFO AbstractDataNode Found data modelIn.$[]/1.[0][0][20].inputstructure any ideas of where to look to troubleshoot this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Mar 7 12:34:51 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 07 Mar 2011 10:34:51 -0800 Subject: [Swift-devel] problem with max num jobs, array entries? In-Reply-To: References: Message-ID: <1299522891.11120.2.camel@blabla2.none> Usually, hangs occur for two major reasons: 1. a java deadlock. You can find out whether this is the case by getting a stack dump of the swift process with jstack. 2. a swift/karajan deadlock. I added some code yesterday to trunk to detect these and dump the situation to the log. Apart from that, there is also the possibility of a silent OOM, but I doubt it is the case with 100 jobs. Mihael On Mon, 2011-03-07 at 12:31 -0500, Glen Hocky wrote: > hey Mike, devs > > i was wondering if you could help me track something down. i may not > have noticed this before because of the way I was running my jobs but > i'm having a problem running more than ~100 jobs w/ my swift script > (with pbs or pbs+coasters). it just hangs with > "Progress: " > "Progress: " > "Progress: " > > > in the swift log it just stalls at this point > ... > 2011-03-07 10:26:53,280-0600 INFO SetFieldValue Set: force=FALSE > 2011-03-07 10:26:53,284-0600 INFO VDLFunction FUNCTION: arg() > 2011-03-07 10:26:53,285-0600 INFO VDLFunction FUNCTION: toint() > 2011-03-07 10:26:53,285-0600 INFO SetFieldValue Set: printfreq=500 > 2011-03-07 10:26:53,285-0600 INFO VDLFunction FUNCTION: arg() > 2011-03-07 10:26:53,285-0600 INFO VDLFunction FUNCTION: toint() > 2011-03-07 10:26:53,285-0600 INFO SetFieldValue Set: nmodels=5 > 2011-03-07 10:26:53,286-0600 INFO VDLFunction FUNCTION: arg() > 2011-03-07 10:26:53,286-0600 INFO VDLFunction FUNCTION: toint() > 2011-03-07 10:26:53,286-0600 INFO SetFieldValue Set: nsub=20 > > > > > whereas when i decrease the number of total jobs it goes to > ... > 2011-03-07 11:04:34,001-0600 INFO SetFieldValue Set: nmodels=4 > 2011-03-07 11:04:34,001-0600 INFO SetFieldValue Set: temperature=0.9 > 2011-03-07 11:04:34,002-0600 INFO SetFieldValue Set: > rundir=/home/hockyg/reichman/glassy_dynamics/code/runs/overlaps/replica_exchange/code/swift/run_beagle > 2011-03-07 11:04:34,001-0600 INFO SetFieldValue Set: label=1 > 2011-03-07 11:04:34,001-0600 INFO SetFieldValue Set: radii=unnamed > SwiftScript value.$[]/1 > 2011-03-07 11:04:34,002-0600 INFO SetFieldValue Set: nsub=24 > 2011-03-07 11:04:39,581-0600 INFO AbstractDataNode Found data > modelIn.$[]/1.[0][3][19].inputstructure > 2011-03-07 11:04:39,581-0600 INFO AbstractDataNode Found data > modelIn.$[]/1.[0][4][17].inputstructure > 2011-03-07 11:04:39,582-0600 INFO AbstractDataNode Found data > modelIn.$[]/1.[0][4][18].inputstructure > 2011-03-07 11:04:39,582-0600 INFO AbstractDataNode Found data > modelIn.$[]/1.[0][4][19].inputstructure > 2011-03-07 11:04:39,582-0600 INFO AbstractDataNode Found data > modelIn.$[]/1.[0][0][20].inputstructure > > > any ideas of where to look to troubleshoot this? > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From bugzilla-daemon at mcs.anl.gov Mon Mar 7 14:19:24 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 7 Mar 2011 14:19:24 -0600 (CST) Subject: [Swift-devel] [Bug 257] Race condition in Swiftscript execution In-Reply-To: References: Message-ID: <20110307201924.E4E7A2E657@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=257 --- Comment #1 from Tim Armstrong 2011-03-07 14:19:24 --- I captured some additional information from the latest Swift trunk version, in particular a java thread dump. If you look at the thread dump, there is a Java thread deadlock which is preventing the script from proceeding. This deadlock only seems to occur once in every 4 or 5 runs of the script See: http://people.cs.uchicago.edu/~tga/SwiftR-deadlock.tgz -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From tim.g.armstrong at gmail.com Mon Mar 7 14:24:22 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Mon, 7 Mar 2011 14:24:22 -0600 Subject: [Swift-devel] Deadlock with current Swift trunk Message-ID: Hi All, I just wanted to send an email to the list to draw attention to a bug report I posted on the bugzilla. I spent some time today to get some additional information on the issue and have confirmed that the problem is a Java thread deadlock. https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=257 Unfortunately this bug seems to be triggered quite frequently by the script used by SwiftR, so this is forcing us to use the pre-fast branch Swift trunk/ Cheers, Tim Found one Java-level deadlock: ============================= "pool-1-thread-7": waiting to lock monitor 0x00007f8a1cc49450 (object 0x00007f8a2e3d7288, a org.griphyn.vdl.karajan.DSHandleFutureWrapper), which is held by "pool-1-thread-1" "pool-1-thread-1": waiting to lock monitor 0x00007f8a186c72e0 (object 0x00007f8a2e20d0a0, a org.griphyn.vdl.mapping.RootDataNode), which is held by "pool-1-thread-7" Java stack information for the threads listed above: =================================================== "pool-1-thread-7": at org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:65) - waiting to lock <0x00007f8a2e3d7288> (a org.griphyn.vdl.karajan.DSHandleFutureWrapper) at org.griphyn.vdl.karajan.DSHandleFutureWrapper.handleClosed(DSHandleFutureWrapper.java:116) at org.griphyn.vdl.mapping.AbstractDataNode.notifyListeners(AbstractDataNode.java:583) - locked <0x00007f8a2e20d0a0> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:396) - locked <0x00007f8a2e20d0a0> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:346) at org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) at org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) - locked <0x00007f8a2e20d0a0> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-1": at org.griphyn.vdl.karajan.lib.SwiftArg.unwrap(SwiftArg.java:52) - waiting to lock <0x00007f8a2e20d0a0> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.karajan.lib.SwiftArg$Vargs.asArray(SwiftArg.java:177) at org.griphyn.vdl.karajan.lib.swiftscript.Misc.swiftscript_strcat(Misc.java:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.globus.cog.karajan.workflow.nodes.functions.FunctionsCollection.function(FunctionsCollection.java:82) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) at org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) - locked <0x00007f8a2e3d7288> (a org.griphyn.vdl.karajan.DSHandleFutureWrapper) at org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) - locked <0x00007f8a2e3d7288> (a org.griphyn.vdl.karajan.DSHandleFutureWrapper) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Found 1 deadlock. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Mar 7 15:14:30 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 07 Mar 2011 13:14:30 -0800 Subject: [Swift-devel] Deadlock with current Swift trunk In-Reply-To: References: Message-ID: <1299532470.11120.3.camel@blabla2.none> Yes. I saw it. I'll try to fix it as soon as I can. Thanks for the report. Mihael On Mon, 2011-03-07 at 14:24 -0600, Tim Armstrong wrote: > Hi All, > I just wanted to send an email to the list to draw attention to a > bug report I posted on the bugzilla. I spent some time today to get > some additional information on the issue and have confirmed that the > problem is a Java thread deadlock. > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=257 > > Unfortunately this bug seems to be triggered quite frequently by the > script used by SwiftR, so this is forcing us to use the pre-fast > branch Swift trunk/ > > Cheers, > Tim > > > Found one Java-level deadlock: > ============================= > "pool-1-thread-7": > waiting to lock monitor 0x00007f8a1cc49450 (object > 0x00007f8a2e3d7288, a org.griphyn.vdl.karajan.DSHandleFutureWrapper), > which is held by "pool-1-thread-1" > "pool-1-thread-1": > waiting to lock monitor 0x00007f8a186c72e0 (object > 0x00007f8a2e20d0a0, a org.griphyn.vdl.mapping.RootDataNode), > which is held by "pool-1-thread-7" > > Java stack information for the threads listed above: > =================================================== > "pool-1-thread-7": > at > org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:65) > - waiting to lock <0x00007f8a2e3d7288> (a > org.griphyn.vdl.karajan.DSHandleFutureWrapper) > at > org.griphyn.vdl.karajan.DSHandleFutureWrapper.handleClosed(DSHandleFutureWrapper.java:116) > at > org.griphyn.vdl.mapping.AbstractDataNode.notifyListeners(AbstractDataNode.java:583) > - locked <0x00007f8a2e20d0a0> (a > org.griphyn.vdl.mapping.RootDataNode) > at > org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:396) > - locked <0x00007f8a2e20d0a0> (a > org.griphyn.vdl.mapping.RootDataNode) > at > org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:346) > at > org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) > at > org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) > at > org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) > - locked <0x00007f8a2e20d0a0> (a > org.griphyn.vdl.mapping.RootDataNode) > at > org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors > $RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask > $Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor > $Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor > $Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > "pool-1-thread-1": > at org.griphyn.vdl.karajan.lib.SwiftArg.unwrap(SwiftArg.java:52) > - waiting to lock <0x00007f8a2e20d0a0> (a > org.griphyn.vdl.mapping.RootDataNode) > at org.griphyn.vdl.karajan.lib.SwiftArg > $Vargs.asArray(SwiftArg.java:177) > at > org.griphyn.vdl.karajan.lib.swiftscript.Misc.swiftscript_strcat(Misc.java:77) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.globus.cog.karajan.workflow.nodes.functions.FunctionsCollection.function(FunctionsCollection.java:82) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) > at > org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) > - locked <0x00007f8a2e3d7288> (a > org.griphyn.vdl.karajan.DSHandleFutureWrapper) > at > org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) > - locked <0x00007f8a2e3d7288> (a > org.griphyn.vdl.karajan.DSHandleFutureWrapper) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors > $RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask > $Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor > $Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor > $Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > Found 1 deadlock. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Mon Mar 7 18:26:50 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 07 Mar 2011 16:26:50 -0800 Subject: [Swift-devel] Deadlock with current Swift trunk In-Reply-To: <1299532470.11120.3.camel@blabla2.none> References: <1299532470.11120.3.camel@blabla2.none> Message-ID: <1299544010.20558.1.camel@blabla2.none> Should be fixed in swift trunk r4173. As a side note, I'm still a little uncertain about the necessity to lock on the root of a piece of data for every operation on that data. We may want to think about that a bit harder. On Mon, 2011-03-07 at 13:14 -0800, Mihael Hategan wrote: > Yes. I saw it. I'll try to fix it as soon as I can. Thanks for the > report. > > Mihael > > On Mon, 2011-03-07 at 14:24 -0600, Tim Armstrong wrote: > > Hi All, > > I just wanted to send an email to the list to draw attention to a > > bug report I posted on the bugzilla. I spent some time today to get > > some additional information on the issue and have confirmed that the > > problem is a Java thread deadlock. > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=257 > > > > Unfortunately this bug seems to be triggered quite frequently by the > > script used by SwiftR, so this is forcing us to use the pre-fast > > branch Swift trunk/ > > > > Cheers, > > Tim > > > > > > Found one Java-level deadlock: > > ============================= > > "pool-1-thread-7": > > waiting to lock monitor 0x00007f8a1cc49450 (object > > 0x00007f8a2e3d7288, a org.griphyn.vdl.karajan.DSHandleFutureWrapper), > > which is held by "pool-1-thread-1" > > "pool-1-thread-1": > > waiting to lock monitor 0x00007f8a186c72e0 (object > > 0x00007f8a2e20d0a0, a org.griphyn.vdl.mapping.RootDataNode), > > which is held by "pool-1-thread-7" > > > > Java stack information for the threads listed above: > > =================================================== > > "pool-1-thread-7": > > at > > org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:65) > > - waiting to lock <0x00007f8a2e3d7288> (a > > org.griphyn.vdl.karajan.DSHandleFutureWrapper) > > at > > org.griphyn.vdl.karajan.DSHandleFutureWrapper.handleClosed(DSHandleFutureWrapper.java:116) > > at > > org.griphyn.vdl.mapping.AbstractDataNode.notifyListeners(AbstractDataNode.java:583) > > - locked <0x00007f8a2e20d0a0> (a > > org.griphyn.vdl.mapping.RootDataNode) > > at > > org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:396) > > - locked <0x00007f8a2e20d0a0> (a > > org.griphyn.vdl.mapping.RootDataNode) > > at > > org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:346) > > at > > org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) > > at > > org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) > > at > > org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) > > - locked <0x00007f8a2e20d0a0> (a > > org.griphyn.vdl.mapping.RootDataNode) > > at > > org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:441) > > at java.util.concurrent.FutureTask > > $Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.runTask(ThreadPoolExecutor.java:886) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > "pool-1-thread-1": > > at org.griphyn.vdl.karajan.lib.SwiftArg.unwrap(SwiftArg.java:52) > > - waiting to lock <0x00007f8a2e20d0a0> (a > > org.griphyn.vdl.mapping.RootDataNode) > > at org.griphyn.vdl.karajan.lib.SwiftArg > > $Vargs.asArray(SwiftArg.java:177) > > at > > org.griphyn.vdl.karajan.lib.swiftscript.Misc.swiftscript_strcat(Misc.java:77) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.globus.cog.karajan.workflow.nodes.functions.FunctionsCollection.function(FunctionsCollection.java:82) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) > > at > > org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) > > - locked <0x00007f8a2e3d7288> (a > > org.griphyn.vdl.karajan.DSHandleFutureWrapper) > > at > > org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) > > - locked <0x00007f8a2e3d7288> (a > > org.griphyn.vdl.karajan.DSHandleFutureWrapper) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:441) > > at java.util.concurrent.FutureTask > > $Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.runTask(ThreadPoolExecutor.java:886) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > > > Found 1 deadlock. > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From ketancmaheshwari at gmail.com Tue Mar 8 09:42:36 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 8 Mar 2011 09:42:36 -0600 Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <708342452.152917.1299435526644.JavaMail.root@zimbra.anl.gov> References: <782300081.147363.1299253048180.JavaMail.root@zimbra.anl.gov> <708342452.152917.1299435526644.JavaMail.root@zimbra.anl.gov> Message-ID: Mike, The plan looks very good. My comments and reactions below: 1. "M: code testing: how to ensure that code examples in the docs are correct? (M because we can do this manually, and will always need to, to some extent)" This should be the Highest priority at all costs. I have seen users come again and again in search of code snippets, templates, examples to copy and paste into their own environments. What we want to do here is to make sure the users do not encounter any kind of error as far as we can. Ideally the first error a user encounters should be at a point of no return for her (in a lighter sense). Most users are not very patient and will tend to jump into code right away after (or even before) first skimming through the text. We want to make sure the code behaves nice to them. 2. Scripts are dry and kind of black and white and classy in the world of visual and graphical workflows. I would say let's not make our documents so. I am not saying to be funky but might we adopt a lighter style of documentation! For instance using direct and active speech among other elements of writing. That is all I have to say at the moment. Mike, I am working on OSG Swift + Coaster experiments on CI nodes and will soon start documenting them in the cookbook. Best Regards, Ketan On Sun, Mar 6, 2011 at 12:18 PM, Michael Wilde wrote: > Hi All, > > Here's my proposal for Swift documentation. Please comment. We'll discuss > this for a few days, make changes people feel are necessary, and then go > with it and see how it works. I want to nail this down by this Fri Mar 11 if > not sooner. > > Im proposing after a lot of thought and analysis that we *use Google tools* > for the next 6 months to aggressively grow and improve our documentation; > then we assess whether to stay with Google or consider going back to a > markup-language-based approach. > > Barring strong feedback that "this wont work" its the direction I've decide > to go in. > > Below are first the motivations and requirements that Ive based this > recommendation on. After that are the elements that I thing we need to work > into a plan. > > Justin, Sarah, Mihael, Keatan, David - this email is long but intended > primarily for you. Sarah, David and Ketan, in particular, you should read > this throughly and comment, as you will need to be the prime movers to get > this going. > > *** Motivation > > The plan is based on two principles: > > - More comprehensive and useful documentation is the highest current > priority for making Swift successful. > > - We need so much documentation work now (relative to our team size and > available time) that *ease of creating and improving the content* and > getting it to users should be given higher priority (for some period) than > the cost or effectiveness of maintaining the content. > > *** High level Plan > > Based on these driving factors, I'm proposing this: > > - we move *all* our site and documents to Google for the next 6 months > > - for 6 months we work hard on improving the scope and effectiveness of the > documentation > > - after 6 months, we evaluate the process and the results, and decide if we > stay with Google or a similar online content management system, or we revert > back to an svn-controlled markup process. If we revert, we select a markup > language and toolchain based on an evaluation, and we absorb the cost of > pushing the content back into markup language at that time. > > > *** Requirements > > Here are the requirements I feel our documentation approach needs to > address, and my view of their priorities. > > --- HIGH > > H: environment for content writing that encourages writing and continuous > and collaborative improvement. (a good process for preview, multiple > editors, comments, improvement) > > H: easy enough to make doc fixes so that fixes get made as soon as we spot > problems that impede our users (or us) > > H: keeping the docs in sync with the trunk and releases. Ability to > associate (and get) the docs for any release branch > > H: reasonable and easy page saving/reviewing/copying/releasing process > (ideally easy to automate) > > H: a content writing and management guide to encourage writing and maintain > quality and conventions > > > --- Medium > > M: ability to review documents and changes as they evolve so that people > can make improvements aggressively, knowing their changes will be notified > to and reviewed by the community (both developers and users) for quality > assurance. > > M: nice clean crisp look that is both aesthetic and readable > > M: doc style should stay uniform when many people write and edit > > M: ability to change the style throughout, when needed - ideally from a > small number of style definitions > > M: code testing: how to ensure that code examples in the docs are correct? > (M because we can do this manually, and will always need to, to some > extent) > > M: If we post documents from multiple sources (eg Google vs svn) we need to > be able to maintain reasonable linkages between them that wont break. > > M: Page navigation within the doc set should be pleasant and effective for > the user, and use good web principles. Page naming should be reasonable. > > --- low > > L: multiple high-quality renderings: as PDFs, standalone HTML, and text. > This will be more important later, but is not at the moment. > > L: style definitions that anyone can improve > > L: ability to get the docs for any *trunk* revision. > > L: ability to render HTML as single page or a page hierarchy (higher prio > as doc set grows) > > L: index terms: can we make an index? > > L: interactive viewing capabilities with highlighting ala DiveIntoPython > > L: we should track and publish for both us and our users a list of changes > to both the software and the documentation (maybe M?) > > --- > > > *** Detailed plan elements > > We need to develop a plan. Here are some elements it needs to address, and > some specific issues related to Google use that we need to decide on (which > will take some experimentation and testing). > > Sarah, Im hoping you can take these on (as youve already gotten a great > headstart on Google evaluation), but lets make it a team-wide shared effort. > Using Google should facilitate that. > > o Determine document structure. Site, Quickstart, Tutorial, User Guide, > Cookbook, and Case Studies, for now is what I think we need. Roughly in > that priority order. Where Cookbook is things that should go into the User > Guide eventually. For each of those we should next outline the substructure. > How/where can we maintain such an outline? Can it be simply the page > structure of the "development" site? > > o Determine document style: headers, section numbering, colors and > emphasis, paragraph styles. I feel that the Userguide format established by > Ben is a great starting point. I have no desire to change it much for now. > Eventually it can be made a bit crisper and perhaps more consistent. > > o Move the content we have now into this structure and style. Move all > end-user docs out of the SWFT wiki. (In fact, lets eventually move that wiki > entirely into google and decommission it). > > o Enhance, specifically, the tutorial, and integrate the user-driven > elements of the cookbook into the docset. > > *** Open issues (many!): > > These are in no particular order. A nextstep is to walk through these and > make reommendations and decisions. > > --- > > (Note related to style: Sarah, the more I looked at the brain-mapping site > you sent, the more I like it and see the possibilities here. Can you > determine how they did some of the things they do? Very nice clean look; > very readable text styles.) > > determine doc style elements and how to enforce. How to provide a > swift-specific style drop-down for editing. > > timing and steps for the transition > > Things we can do to make the URL more transparent and CI-branded? > How to make it look like our URL is same as now? Or at least swift-branded? > > site style improvements (eg logo stripe; page width (min? fixed?) > > automation of pdf production? Quality of PDF? > > indexing? (low prio) > > searching extensively *within* the site - higher prio > > google sites vs google docs? Push docs into the site? Docs better for > maintaining a style? Better for printing? > > how to do and maintain graphical figures. > > where to keep templates for uniformity (swift writers guide) > > how to do backups? > > table of contents management. > > link management conventions (both within and outside of site) > > how to track comments for public revision control? (eg, posting to svn > log?) > > style for reviewer comments? (both internal and user-based) > facility for user comments and process for addressing? > > what site, page, and doc naming should we use? > > can we and should we use google more as an editor but route revisions into > - automation tool ala the CL tools David pointed out to the list > > Can we get good unified search within the site? (that stays within the > site) > > Use of Google Groups for access control: didnt work for Swift buti is > working for ExM : not sure why. > > Ive asked Ketan to start writing and moving pages from the Swift Wiki to > the Google Site "cookbook" area. Sarah, Ketan, can you make a structure for > adding and maintaining the "cookbook" pages? The main cookbooks Im thinking > of at the moment is (1) a guide for the various coaster configs and (2) a > guide for OSG users: how to get their certs and start running Swift on OSG > sites using the new scripts we are working on. > > Like I said up front: comments welcome. But barring feedback that "this > wont work" its the direction I've decide to go in. > > - Mike > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Tue Mar 8 11:54:41 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 8 Mar 2011 11:54:41 -0600 Subject: [Swift-devel] Homepage of the Go programming language Message-ID: Hello, Here is the homepage of the google supported (sponsored?) Go language: http://golang.org/ What I liked about this homepage is the "Go playground" wherein, one can try the language right in the browser without installation. The code snippets are also editable so one can tweak them. Appears it is a webservice that sends the code to their servers-- compiles, runs and sends the results back. Perhaps we could learn a lesson or two. Regards, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Mar 8 11:58:07 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 08 Mar 2011 09:58:07 -0800 Subject: [Swift-devel] Homepage of the Go programming language In-Reply-To: References: Message-ID: <1299607087.16515.0.camel@blabla2.none> On Tue, 2011-03-08 at 11:54 -0600, Ketan Maheshwari wrote: > Hello, > > > Here is the homepage of the google supported (sponsored?) Go language: > http://golang.org/ > > > What I liked about this homepage is the "Go playground" wherein, one > can try the language right in the browser without installation. The > code snippets are also editable so one can tweak them. > > > Appears it is a webservice that sends the code to their servers-- > compiles, runs and sends the results back. > > > Perhaps we could learn a lesson or two. > And what specifically would be that lesson or two? From benc at hawaga.org.uk Tue Mar 8 12:02:00 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 8 Mar 2011 18:02:00 +0000 (GMT) Subject: [Swift-devel] Homepage of the Go programming language In-Reply-To: References: Message-ID: > What I liked about this homepage is the "Go playground" wherein, one can try > the language right in the browser without installation. The code snippets > are also editable so one can tweak them. tryhaskell.org is pretty interesting in that respect too. However a big selling point of Swift is its expressivity for scripting external applications, and when I thought about it previously, it did not seem a good fit. -- From ketancmaheshwari at gmail.com Tue Mar 8 12:06:17 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 8 Mar 2011 12:06:17 -0600 Subject: [Swift-devel] Homepage of the Go programming language In-Reply-To: <1299607087.16515.0.camel@blabla2.none> References: <1299607087.16515.0.camel@blabla2.none> Message-ID: > > And what specifically would be that lesson or two? > > Implementing a similar webservice on one of our clusters with Swift installation and in addition to compilation and running, do a plot (say using gnuplot scripts) and respond with performance metrics in the form of nice graphs; -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Mar 8 12:10:18 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 08 Mar 2011 10:10:18 -0800 Subject: [Swift-devel] Homepage of the Go programming language In-Reply-To: References: <1299607087.16515.0.camel@blabla2.none> Message-ID: <1299607818.16989.1.camel@blabla2.none> On Tue, 2011-03-08 at 12:06 -0600, Ketan Maheshwari wrote: > And what specifically would be that lesson or two? > > Implementing a similar webservice on one of our clusters with Swift > installation and in addition to compilation and running, do a plot > (say using gnuplot scripts) and respond with performance metrics in > the form of nice graphs; How much time do you think this project would take, and what would be the benefit to the users besides the cool factor? From ketancmaheshwari at gmail.com Tue Mar 8 12:40:29 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 8 Mar 2011 12:40:29 -0600 Subject: [Swift-devel] Homepage of the Go programming language In-Reply-To: References: Message-ID: Ben, tryhaskell was copied (admittedly by author) from tryruby.org which seems richer. Ketan On Tue, Mar 8, 2011 at 12:02 PM, Ben Clifford wrote: > > > What I liked about this homepage is the "Go playground" wherein, one can > try > > the language right in the browser without installation. The code snippets > > are also editable so one can tweak them. > > tryhaskell.org is pretty interesting in that respect too. However a big > selling point of Swift is its expressivity for scripting external > applications, and when I thought about it previously, it did not seem a > good fit. > > -- > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Tue Mar 8 15:41:19 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 8 Mar 2011 13:41:19 -0800 Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: References: <782300081.147363.1299253048180.JavaMail.root@zimbra.anl.gov> <708342452.152917.1299435526644.JavaMail.root@zimbra.anl.gov> Message-ID: On Tue, Mar 8, 2011 at 7:42 AM, Ketan Maheshwari wrote: > Mike, > > The plan looks very good. My comments and reactions below: > > 1. "M: code testing: how to ensure that code examples in the docs are > correct? (M because we can do this manually, and will always need to, to > some extent)" > This should be the Highest priority at all costs. I have seen users come > again and again in search of code snippets, templates, examples to copy and > paste into their own environments. What we want to do here is to make sure > the users do not encounter any kind of error as far as we can. Ideally the > first error a user encounters should be at a point of no return for her (in > a lighter sense). Most users are not very patient and will tend to jump into > code right away after (or even before) first skimming through the text. We > want to make sure the code behaves nice to them. > i agree with this pretty strongly as well...code snippets to copy and paste are the first thing most users want to get from a site. if they're ADD like me they copy/paste/run first and read further when things break :) as far as the other points in the email...i'm wondering if we want to discuss via skype and try to get it down to a concrete list and divide up the work? given how many varying issues we've touched on here i'd feel a little better if we could get them into action items that we can put into bugzilla as enhancements and assign them...does this seem reasonable? i'm worried back and forth via email might be a little less effective (though i'm happy to put all my comments in email if others prefer that). > On Sun, Mar 6, 2011 at 12:18 PM, Michael Wilde wrote: > >> Hi All, >> >> Here's my proposal for Swift documentation. Please comment. We'll discuss >> this for a few days, make changes people feel are necessary, and then go >> with it and see how it works. I want to nail this down by this Fri Mar 11 if >> not sooner. >> >> Im proposing after a lot of thought and analysis that we *use Google >> tools* for the next 6 months to aggressively grow and improve our >> documentation; then we assess whether to stay with Google or consider going >> back to a markup-language-based approach. >> >> Barring strong feedback that "this wont work" its the direction I've >> decide to go in. >> >> Below are first the motivations and requirements that Ive based this >> recommendation on. After that are the elements that I thing we need to work >> into a plan. >> >> Justin, Sarah, Mihael, Keatan, David - this email is long but intended >> primarily for you. Sarah, David and Ketan, in particular, you should read >> this throughly and comment, as you will need to be the prime movers to get >> this going. >> >> *** Motivation >> >> The plan is based on two principles: >> >> - More comprehensive and useful documentation is the highest current >> priority for making Swift successful. >> >> - We need so much documentation work now (relative to our team size and >> available time) that *ease of creating and improving the content* and >> getting it to users should be given higher priority (for some period) than >> the cost or effectiveness of maintaining the content. >> >> *** High level Plan >> >> Based on these driving factors, I'm proposing this: >> >> - we move *all* our site and documents to Google for the next 6 months >> >> - for 6 months we work hard on improving the scope and effectiveness of >> the documentation >> >> - after 6 months, we evaluate the process and the results, and decide if >> we stay with Google or a similar online content management system, or we >> revert back to an svn-controlled markup process. If we revert, we select a >> markup language and toolchain based on an evaluation, and we absorb the cost >> of pushing the content back into markup language at that time. >> >> >> *** Requirements >> >> Here are the requirements I feel our documentation approach needs to >> address, and my view of their priorities. >> >> --- HIGH >> >> H: environment for content writing that encourages writing and continuous >> and collaborative improvement. (a good process for preview, multiple >> editors, comments, improvement) >> >> H: easy enough to make doc fixes so that fixes get made as soon as we spot >> problems that impede our users (or us) >> >> H: keeping the docs in sync with the trunk and releases. Ability to >> associate (and get) the docs for any release branch >> >> H: reasonable and easy page saving/reviewing/copying/releasing process >> (ideally easy to automate) >> >> H: a content writing and management guide to encourage writing and >> maintain quality and conventions >> >> >> --- Medium >> >> M: ability to review documents and changes as they evolve so that people >> can make improvements aggressively, knowing their changes will be notified >> to and reviewed by the community (both developers and users) for quality >> assurance. >> >> M: nice clean crisp look that is both aesthetic and readable >> >> M: doc style should stay uniform when many people write and edit >> >> M: ability to change the style throughout, when needed - ideally from a >> small number of style definitions >> >> M: code testing: how to ensure that code examples in the docs are correct? >> (M because we can do this manually, and will always need to, to some >> extent) >> >> M: If we post documents from multiple sources (eg Google vs svn) we need >> to be able to maintain reasonable linkages between them that wont break. >> >> M: Page navigation within the doc set should be pleasant and effective for >> the user, and use good web principles. Page naming should be reasonable. >> >> --- low >> >> L: multiple high-quality renderings: as PDFs, standalone HTML, and text. >> This will be more important later, but is not at the moment. >> >> L: style definitions that anyone can improve >> >> L: ability to get the docs for any *trunk* revision. >> >> L: ability to render HTML as single page or a page hierarchy (higher prio >> as doc set grows) >> >> L: index terms: can we make an index? >> >> L: interactive viewing capabilities with highlighting ala DiveIntoPython >> >> L: we should track and publish for both us and our users a list of changes >> to both the software and the documentation (maybe M?) >> >> --- >> >> >> *** Detailed plan elements >> >> We need to develop a plan. Here are some elements it needs to address, and >> some specific issues related to Google use that we need to decide on (which >> will take some experimentation and testing). >> >> Sarah, Im hoping you can take these on (as youve already gotten a great >> headstart on Google evaluation), but lets make it a team-wide shared effort. >> Using Google should facilitate that. >> >> o Determine document structure. Site, Quickstart, Tutorial, User Guide, >> Cookbook, and Case Studies, for now is what I think we need. Roughly in >> that priority order. Where Cookbook is things that should go into the User >> Guide eventually. For each of those we should next outline the substructure. >> How/where can we maintain such an outline? Can it be simply the page >> structure of the "development" site? >> >> o Determine document style: headers, section numbering, colors and >> emphasis, paragraph styles. I feel that the Userguide format established by >> Ben is a great starting point. I have no desire to change it much for now. >> Eventually it can be made a bit crisper and perhaps more consistent. >> >> o Move the content we have now into this structure and style. Move all >> end-user docs out of the SWFT wiki. (In fact, lets eventually move that wiki >> entirely into google and decommission it). >> >> o Enhance, specifically, the tutorial, and integrate the user-driven >> elements of the cookbook into the docset. >> >> *** Open issues (many!): >> >> These are in no particular order. A nextstep is to walk through these and >> make reommendations and decisions. >> >> --- >> >> (Note related to style: Sarah, the more I looked at the brain-mapping site >> you sent, the more I like it and see the possibilities here. Can you >> determine how they did some of the things they do? Very nice clean look; >> very readable text styles.) >> >> determine doc style elements and how to enforce. How to provide a >> swift-specific style drop-down for editing. >> >> timing and steps for the transition >> >> Things we can do to make the URL more transparent and CI-branded? >> How to make it look like our URL is same as now? Or at least >> swift-branded? >> >> site style improvements (eg logo stripe; page width (min? fixed?) >> >> automation of pdf production? Quality of PDF? >> >> indexing? (low prio) >> >> searching extensively *within* the site - higher prio >> >> google sites vs google docs? Push docs into the site? Docs better for >> maintaining a style? Better for printing? >> >> how to do and maintain graphical figures. >> >> where to keep templates for uniformity (swift writers guide) >> >> how to do backups? >> >> table of contents management. >> >> link management conventions (both within and outside of site) >> >> how to track comments for public revision control? (eg, posting to svn >> log?) >> >> style for reviewer comments? (both internal and user-based) >> facility for user comments and process for addressing? >> >> what site, page, and doc naming should we use? >> >> can we and should we use google more as an editor but route revisions into >> - automation tool ala the CL tools David pointed out to the list >> >> Can we get good unified search within the site? (that stays within the >> site) >> >> Use of Google Groups for access control: didnt work for Swift buti is >> working for ExM : not sure why. >> >> Ive asked Ketan to start writing and moving pages from the Swift Wiki to >> the Google Site "cookbook" area. Sarah, Ketan, can you make a structure for >> adding and maintaining the "cookbook" pages? The main cookbooks Im thinking >> of at the moment is (1) a guide for the various coaster configs and (2) a >> guide for OSG users: how to get their certs and start running Swift on OSG >> sites using the new scripts we are working on. >> >> Like I said up front: comments welcome. But barring feedback that "this >> wont work" its the direction I've decide to go in. >> >> - Mike >> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Mar 8 16:21:01 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 8 Mar 2011 16:21:01 -0600 (CST) Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: Message-ID: <1536232120.167165.1299622861082.JavaMail.root@zimbra.anl.gov> > as far as the other points in the email...i'm wondering if we want to > discuss via skype and try to get it down to a concrete list and divide > up the work? given how many varying issues we've touched on here i'd > feel a little better if we could get them into action items that we > can put into bugzilla as enhancements and assign them...does this seem > reasonable? i'm worried back and forth via email might be a little > less effective (though i'm happy to put all my comments in email if > others prefer that). Im at a meeting all this week and cant do much email or voice calls. I think, Sarah, that you and Ketan meeting and proposing a plan back to the group would be great. Maybe after you meet, a bit of email before you turn the plan into bugzilla trackable items would be good. Then using bugz for this (and more of our work tracking) would be great. I'll pipe in as time permits. - Mike From dsk at ci.uchicago.edu Tue Mar 8 17:24:09 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Tue, 8 Mar 2011 15:24:09 -0800 Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: References: <782300081.147363.1299253048180.JavaMail.root@zimbra.anl.gov> <708342452.152917.1299435526644.JavaMail.root@zimbra.anl.gov> Message-ID: <4E5595F2-BFCA-4C33-9499-4EFBC8CBD882@ci.uchicago.edu> On Mar 8, 2011, at 1:41 PM, Sarah Kenny wrote: > On Tue, Mar 8, 2011 at 7:42 AM, Ketan Maheshwari wrote: > Mike, > > The plan looks very good. My comments and reactions below: > > 1. "M: code testing: how to ensure that code examples in the docs are correct? (M because we can do this manually, and will always need to, to some extent)" > This should be the Highest priority at all costs. I have seen users come again and again in search of code snippets, templates, examples to copy and paste into their own environments. What we want to do here is to make sure the users do not encounter any kind of error as far as we can. Ideally the first error a user encounters should be at a point of no return for her (in a lighter sense). Most users are not very patient and will tend to jump into code right away after (or even before) first skimming through the text. We want to make sure the code behaves nice to them. > > i agree with this pretty strongly as well...code snippets to copy and paste are the first thing most users want to get from a site. if they're ADD like me they copy/paste/run first and read further when things break :) Can we build a site that has code fragments/examples that can be downloaded, and then autogenerate the docs from there as well to keep things consistent? > > as far as the other points in the email...i'm wondering if we want to discuss via skype and try to get it down to a concrete list and divide up the work? given how many varying issues we've touched on here i'd feel a little better if we could get them into action items that we can put into bugzilla as enhancements and assign them...does this seem reasonable? i'm worried back and forth via email might be a little less effective (though i'm happy to put all my comments in email if others prefer that). > > > On Sun, Mar 6, 2011 at 12:18 PM, Michael Wilde wrote: > Hi All, > > Here's my proposal for Swift documentation. Please comment. We'll discuss this for a few days, make changes people feel are necessary, and then go with it and see how it works. I want to nail this down by this Fri Mar 11 if not sooner. > > Im proposing after a lot of thought and analysis that we *use Google tools* for the next 6 months to aggressively grow and improve our documentation; then we assess whether to stay with Google or consider going back to a markup-language-based approach. > > Barring strong feedback that "this wont work" its the direction I've decide to go in. > > Below are first the motivations and requirements that Ive based this recommendation on. After that are the elements that I thing we need to work into a plan. > > Justin, Sarah, Mihael, Keatan, David - this email is long but intended primarily for you. Sarah, David and Ketan, in particular, you should read this throughly and comment, as you will need to be the prime movers to get this going. > > *** Motivation > > The plan is based on two principles: > > - More comprehensive and useful documentation is the highest current priority for making Swift successful. > > - We need so much documentation work now (relative to our team size and available time) that *ease of creating and improving the content* and getting it to users should be given higher priority (for some period) than the cost or effectiveness of maintaining the content. > > *** High level Plan > > Based on these driving factors, I'm proposing this: > > - we move *all* our site and documents to Google for the next 6 months > > - for 6 months we work hard on improving the scope and effectiveness of the documentation > > - after 6 months, we evaluate the process and the results, and decide if we stay with Google or a similar online content management system, or we revert back to an svn-controlled markup process. If we revert, we select a markup language and toolchain based on an evaluation, and we absorb the cost of pushing the content back into markup language at that time. > > > *** Requirements > > Here are the requirements I feel our documentation approach needs to address, and my view of their priorities. > > --- HIGH > > H: environment for content writing that encourages writing and continuous and collaborative improvement. (a good process for preview, multiple editors, comments, improvement) > > H: easy enough to make doc fixes so that fixes get made as soon as we spot problems that impede our users (or us) > > H: keeping the docs in sync with the trunk and releases. Ability to associate (and get) the docs for any release branch > > H: reasonable and easy page saving/reviewing/copying/releasing process (ideally easy to automate) > > H: a content writing and management guide to encourage writing and maintain quality and conventions > > > --- Medium > > M: ability to review documents and changes as they evolve so that people can make improvements aggressively, knowing their changes will be notified to and reviewed by the community (both developers and users) for quality assurance. > > M: nice clean crisp look that is both aesthetic and readable > > M: doc style should stay uniform when many people write and edit > > M: ability to change the style throughout, when needed - ideally from a small number of style definitions > > M: code testing: how to ensure that code examples in the docs are correct? > (M because we can do this manually, and will always need to, to some extent) > > M: If we post documents from multiple sources (eg Google vs svn) we need to be able to maintain reasonable linkages between them that wont break. > > M: Page navigation within the doc set should be pleasant and effective for the user, and use good web principles. Page naming should be reasonable. > > --- low > > L: multiple high-quality renderings: as PDFs, standalone HTML, and text. This will be more important later, but is not at the moment. > > L: style definitions that anyone can improve > > L: ability to get the docs for any *trunk* revision. > > L: ability to render HTML as single page or a page hierarchy (higher prio as doc set grows) > > L: index terms: can we make an index? > > L: interactive viewing capabilities with highlighting ala DiveIntoPython > > L: we should track and publish for both us and our users a list of changes to both the software and the documentation (maybe M?) > > --- > > > *** Detailed plan elements > > We need to develop a plan. Here are some elements it needs to address, and some specific issues related to Google use that we need to decide on (which will take some experimentation and testing). > > Sarah, Im hoping you can take these on (as youve already gotten a great headstart on Google evaluation), but lets make it a team-wide shared effort. Using Google should facilitate that. > > o Determine document structure. Site, Quickstart, Tutorial, User Guide, Cookbook, and Case Studies, for now is what I think we need. Roughly in that priority order. Where Cookbook is things that should go into the User Guide eventually. For each of those we should next outline the substructure. How/where can we maintain such an outline? Can it be simply the page structure of the "development" site? > > o Determine document style: headers, section numbering, colors and emphasis, paragraph styles. I feel that the Userguide format established by Ben is a great starting point. I have no desire to change it much for now. Eventually it can be made a bit crisper and perhaps more consistent. > > o Move the content we have now into this structure and style. Move all end-user docs out of the SWFT wiki. (In fact, lets eventually move that wiki entirely into google and decommission it). > > o Enhance, specifically, the tutorial, and integrate the user-driven elements of the cookbook into the docset. > > *** Open issues (many!): > > These are in no particular order. A nextstep is to walk through these and make reommendations and decisions. > > --- > > (Note related to style: Sarah, the more I looked at the brain-mapping site you sent, the more I like it and see the possibilities here. Can you determine how they did some of the things they do? Very nice clean look; very readable text styles.) > > determine doc style elements and how to enforce. How to provide a swift-specific style drop-down for editing. > > timing and steps for the transition > > Things we can do to make the URL more transparent and CI-branded? > How to make it look like our URL is same as now? Or at least swift-branded? > > site style improvements (eg logo stripe; page width (min? fixed?) > > automation of pdf production? Quality of PDF? > > indexing? (low prio) > > searching extensively *within* the site - higher prio > > google sites vs google docs? Push docs into the site? Docs better for maintaining a style? Better for printing? > > how to do and maintain graphical figures. > > where to keep templates for uniformity (swift writers guide) > > how to do backups? > > table of contents management. > > link management conventions (both within and outside of site) > > how to track comments for public revision control? (eg, posting to svn log?) > > style for reviewer comments? (both internal and user-based) > facility for user comments and process for addressing? > > what site, page, and doc naming should we use? > > can we and should we use google more as an editor but route revisions into - automation tool ala the CL tools David pointed out to the list > > Can we get good unified search within the site? (that stays within the site) > > Use of Google Groups for access control: didnt work for Swift buti is working for ExM : not sure why. > > Ive asked Ketan to start writing and moving pages from the Swift Wiki to the Google Site "cookbook" area. Sarah, Ketan, can you make a structure for adding and maintaining the "cookbook" pages? The main cookbooks Im thinking of at the moment is (1) a guide for the various coaster configs and (2) a guide for OSG users: how to get their certs and start running Swift on OSG sites using the new scripts we are working on. > > Like I said up front: comments welcome. But barring feedback that "this wont work" its the direction I've decide to go in. > > - Mike > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Mar 8 18:49:52 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 8 Mar 2011 18:49:52 -0600 (CST) Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <4E5595F2-BFCA-4C33-9499-4EFBC8CBD882@ci.uchicago.edu> Message-ID: <1301394704.167764.1299631792459.JavaMail.root@zimbra.anl.gov> Dan asked: > Can we build a site that has code fragments/examples that can be > downloaded, and then autogenerate the docs from there as well to keep > things consistent? This is one of those things that is easier to do using the markup-based approaches that with the Google sites/docs approach. Perhaps we can embed some kind of html includes in the google docs? (and have these URLs pull from tag-based URLs that pull from tested example docs? Or, we just defer this goal for 6 months and do this step by hand. Ie, we test the examples, but we manually past them into the google content till we are ready to automate this step. - Mike From wilde at mcs.anl.gov Wed Mar 9 11:37:10 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 9 Mar 2011 11:37:10 -0600 (CST) Subject: [Swift-devel] Fwd: [GSoC] REMINDER: Project ideas due this *Friday* In-Reply-To: <4D7657A7.1090901@cs.uchicago.edu> Message-ID: <1326824555.169710.1299692230763.JavaMail.root@zimbra.anl.gov> Ive moved many of the projects from last year's GSOC list to this years. I added 2 for interaction with Globus Online. I'd like to trim down, make sure we have all the best projects here, and prune the list down to projects that are good for students, interesting but feasible, and useful to the project. Please either post ideas or comments here and/or edit the list on the GSOC wiki page below. - Mike ----- Forwarded Message ----- From: "Borja Sotomayor" To: "John Bresnahan" , keahey at mcs.anl.gov, tfreeman at mcs.anl.gov, kettimut at mcs.anl.gov, ranantha at mcs.anl.gov, mlink at mcs.anl.gov, trhowe at uchicago.edu, "Alejandro Lorca" , jlvazquez at fdi.ucm.es, bester at mcs.anl.gov, "Ian Gable" , childers at mcs.anl.gov, wtan at mcs.anl.gov, jiazhang at cs.niu.edu, wilde at mcs.anl.gov, "Mattias Lidman" , "Justin Wozniak" , "Ravi Madduri" , "Stuart Martin" Sent: Tuesday, March 8, 2011 10:21:59 AM Subject: [GSoC] REMINDER: Project ideas due this *Friday* Hi GSoC mentors (2008, 2009, 2010), Our application to participate in GSoC 2011 is due this Friday at 5pm and, so far, there is only *one* idea in the idea list. Our application has always had a strong list of project ideas (numbering in the dozens; last year we had 55), and a weak list of ideas will hurt our chances of getting in or (even if we get in) of getting a large number of student slots. So please remember to add your project ideas here: http://dev.globus.org/wiki/Google_Summer_of_Code_2011_Ideas Don't forget that you can re-list project ideas from last year if no student worked on them last year (there should still be a substantial number of new ideas, though) Cheers! -- Borja Sotomayor Scientific Writer, Computation Institute Lecturer, Department of Computer Science University of Chicago http://people.cs.uchicago.edu/~borja/ Community Manager, OpenNebula project http://www.opennebula.org/ -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed Mar 9 14:33:05 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 09 Mar 2011 12:33:05 -0800 Subject: [Swift-devel] Error 521 provider-staging files to PADS nodes In-Reply-To: <1296068803.17931.14.camel@blabla2.none> References: <1112858620.81735.1295465835538.JavaMail.root@zimbra.anl.gov> <1295471524.6134.0.camel@blabla2.none> <1295818334.4211.1.camel@blabla2.none> <1295862088.29849.6.camel@blabla2.none> <1295916851.9232.2.camel@blabla2.none> <1296068803.17931.14.camel@blabla2.none> Message-ID: <1299702785.7087.3.camel@blabla2.none> Funny thing. I reran this and I don't see the dramatic performance drop when using multiple nodes any more. Mihael On Wed, 2011-01-26 at 11:06 -0800, Mihael Hategan wrote: > First we ignore wpn, or rather said we calculate the total worker > throughput (for all 4 concurrent jobs per worker). In any event that > stays a constant, so when I say 1 worker I mean 1 worker with 4 > concurrent jobs. I'm doing that to remove the job life-cycle latencies > from the picture, and keep I/O at maximum. > > That said, here's the summary: > a. 1 worker (clearly on one node): 80 MB/s in/80 out aggregate > b. 2 workers on the same node: 80 MB/s in/80 out aggregate > c. 2 workers on different nodes: 20 MB/s in/20 out aggregate > > I ran these a sufficiently large number of times to not believe that the > difference can be attributed to statistical variation. > > If what you say were true (job scheduled along other jobs on the same > node), then I believe that (a) would also have 20 MB/s. > > Mihael > > On Wed, 2011-01-26 at 11:02 -0600, Allan Espinosa wrote: > > Shouldn't we use ppn=4 to guarantee different nodes? > > > > It might be the case that the 3 other cores got assigned to other jobs > > by PBS. > > > > -Allan (mobile) > > On Jan 24, 2011 6:55 PM, "Mihael Hategan" wrote: > > > > > > And then here's the funny thing: > > > 2 workers, 4 wpn. > > > When running with ppn=2 (so both on the same node): > > > [IN]: Total transferred: 7.99 GB, current rate: 13.07 MB/s, average > > > rate: 85.23 MB/s > > > [OUT] Total transferred: 8 GB, current rate: 42 B/s, average rate: > > 85.38 > > > MB/s > > > > > > Same situation, but with ppn=1 (so the two are on different nodes): > > > [IN]: Total transferred: 5.83 GB, current rate: 20.79 MB/s, average > > > rate: 20.31 MB/s > > > [OUT] Total transferred: 5.97 GB, current rate: 32.01 MB/s, average > > > rate: 20.8 MB/s > > > > > > This, to me, looks fine because it's the opposite of what I'm > > expecting. > > > The service itself should see no difference between the two, and I > > > suspect it doesn't. But something else is going on. Any ideas? > > > > > > Mihael > > > > > > > > > On Mon, 2011-01-24 at 01:41 -0800, Mihael Hategan wrote: > > > > Play with buffer sizes and ye shall be rewarded. > > > > > > > > Turns out that setting TCP buffer sizes to obscene numbers, like > > 2M, > > > > gives you quite a bit: 70MB/s in + 70MB/s out on average. Those > > pads > > > > nodes must have some fast disks (though maybe it's just the > > cache). > > > > > > > > This is with 1 worker and 4wpn. I'm assuming that with many > > workers, the > > > > fact that each worker connection has its separate buffer will > > > > essentially achieve a similar effect. But then there should be an > > option > > > > for setting the buffer size. > > > > > > > > The numbers are attached. This all goes from head node local disk > > to > > > > worker node local disk directly, so there is no nfs. I'd be > > curious to > > > > know how that compares, but I am done for the day. > > > > > > > > Mihael > > > > > > > > On Sun, 2011-01-23 at 13:32 -0800, Mihael Hategan wrote: > > > > > I'm trying to run tests on pads. The queues aren't quite empty. > > In the > > > > > mean time, I committed a bit of a patch to trunk to measure > > aggregate > > > > > traffic on TCP channels (those are only used by the workers). > > You can > > > > > enable it by setting the "tcp.channel.log.io.performance" system > > > > > property to "true". > > > > > > > > > > Mihael > > > > > > > > > > On Wed, 2011-01-19 at 13:12 -0800, Mihael Hategan wrote: > > > > > > might be due to one of the recent patches. > > > > > > > > > > > > you could try to set IOBLOCKSZ to 1 in worker.pl and rerun. > > > > > > > > > > > > On Wed, 2011-01-19 at 13:37 -0600, Michael Wilde wrote: > > > > > > > An interesting observation on the returned output files: > > there are exactly 33 files in the output dir from this run: the same > > as the number of jobs Swift reports as Finished successfully. But of > > those 33, the last 4 are only of partial length, and one of the 4 is > > length zero (see below). > > > > > > > > > > > > > > Its surprising and perhaps a bug that the jobs are reported > > finished before the output file is fully written??? > > > > > > > > > > > > > > Also this 3-partial plus 1-zero file looks to me like one > > worker staging op hung (the oldest of the 4 incomplete output files) > > and then perhaps 3 were cut short when the coaster service data > > protocol froze? > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > login1$ pwd > > > > > > > /scratch/local/wilde/lab > > > > > > > login1$ cd outdir > > > > > > > login1$ ls -lt | grep 10:48 > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0023.out > > > > > > > -rw-r--r-- 1 wilde ci-users 2686976 Jan 19 10:48 f.0125.out > > > > > > > -rw-r--r-- 1 wilde ci-users 2621440 Jan 19 10:48 f.0167.out > > > > > > > -rw-r--r-- 1 wilde ci-users 0 Jan 19 10:48 f.0259.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0336.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0380.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0015.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0204.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0379.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0066.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0221.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0281.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0403.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0142.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0187.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0067.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0081.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0134.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0136.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0146.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0254.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0362.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0312.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0370.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0389.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0027.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0094.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0183.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0363.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0016.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0025.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0429.out > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 f.0239.out > > > > > > > login1$ ls -lt | grep 10:48 | wc -l > > > > > > > 33 > > > > > > > login1$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > Mihael, > > > > > > > > > > > > > > > > The following test on pads failed/hung with an error 521 > > from > > > > > > > > worker.pl: > > > > > > > > > > > > > > > > --- > > > > > > > > sub getFileCBDataInIndirect { > > > > > > > > ... > > > > > > > > elsif ($timeout) { > > > > > > > > queueCmd((nullCB(), "JOBSTATUS", $jobid, FAILED, "521", > > "Timeout > > > > > > > > staging in file")); > > > > > > > > delete($JOBDATA{$jobid}); > > > > > > > > --- > > > > > > > > > > > > > > > > single foreach loop, doing 1,000 "mv" commands > > > > > > > > > > > > > > > > throttle was 200 jobs to this coaster pool (1 4-node > > 32-core PBS job): > > > > > > > > > > > > > > > > > > > > > > > > > jobmanager="local:pbs"/> > > > > > > > > > key="workersPerNode">8 > > > > > > > > 3500 > > > > > > > > 1 > > > > > > > > > key="nodeGranularity">4 > > > > > > > > 4 > > > > > > > > short > > > > > > > > > key="jobThrottle">2.0 > > > > > > > > > key="initialScore">10000 > > > > > > > > > > > > > > > > > > /scratch/local/wilde/test/swiftwork > > > > > > > > > key="stagingMethod">file > > > > > > > > /scratch/local/wilde/swiftscratch > > > > > > > > > > > > > > > > > > > > > > > > Ran 33 jobs - 1 job over 1 "wave" of 32 and then one or > > more workers > > > > > > > > timed out. Note that the hang may have happened earlier, > > as no new > > > > > > > > jobs were starting as the jobs in the first wave were > > finishing. > > > > > > > > > > > > > > > > time swift -tc.file tc -sites.file pbscoasters.xml -config > > cf.ps > > > > > > > > mvn.swift -n=1000 >& out & > > > > > > > > > > > > > > > > > > > > > > > > The log is in ~wilde/mvn-20110119-0956-s3s8h9h2.log on CI > > net. > > > > > > > > > > > > > > > > Swift stdout showed the following after waiting a while > > for a 4-node > > > > > > > > PADS coaster allocation to start: > > > > > > > > > > > > > > > > Progress: Selecting site:799 Submitted:201 > > > > > > > > Progress: Selecting site:799 Submitted:201 > > > > > > > > Progress: Selecting site:799 Submitted:200 Active:1 > > > > > > > > Progress: Selecting site:798 Submitted:177 Active:24 > > Finished > > > > > > > > successfully:1 > > > > > > > > Progress: Selecting site:796 Submitted:172 Active:28 > > Finished > > > > > > > > successfully:4 > > > > > > > > Progress: Selecting site:792 Submitted:176 Active:24 > > Finished > > > > > > > > successfully:8 > > > > > > > > Progress: Selecting site:788 Submitted:180 Active:20 > > Finished > > > > > > > > successfully:12 > > > > > > > > Progress: Selecting site:784 Submitted:184 Active:16 > > Finished > > > > > > > > successfully:16 > > > > > > > > Progress: Selecting site:780 Submitted:188 Active:12 > > Finished > > > > > > > > successfully:20 > > > > > > > > Progress: Selecting site:777 Submitted:191 Active:9 > > Finished > > > > > > > > successfully:23 > > > > > > > > Progress: Selecting site:773 Submitted:195 Active:5 > > Finished > > > > > > > > successfully:27 > > > > > > > > Progress: Selecting site:770 Submitted:197 Active:3 > > Finished > > > > > > > > successfully:30 > > > > > > > > Progress: Selecting site:767 Submitted:200 Finished > > successfully:33 > > > > > > > > Progress: Selecting site:766 Submitted:201 Finished > > successfully:33 > > > > > > > > Progress: Selecting site:766 Submitted:201 Finished > > successfully:33 > > > > > > > > Progress: Selecting site:766 Submitted:201 Finished > > successfully:33 > > > > > > > > Progress: Selecting site:766 Submitted:201 Finished > > successfully:33 > > > > > > > > Progress: Selecting site:766 Submitted:201 Finished > > successfully:33 > > > > > > > > Progress: Selecting site:766 Submitted:201 Finished > > successfully:33 > > > > > > > > Progress: Selecting site:766 Submitted:200 Active:1 > > Finished > > > > > > > > successfully:33 > > > > > > > > Execution failed: > > > > > > > > Job failed with an exit code of 521 > > > > > > > > login1$ > > > > > > > > login1$ > > > > > > > > login1$ pwd > > > > > > > > /scratch/local/wilde/lab > > > > > > > > login1$ ls -lt | head > > > > > > > > total 51408 > > > > > > > > -rw-r--r-- 1 wilde ci-users 5043350 Jan 19 10:51 > > > > > > > > mvn-20110119-0956-s3s8h9h2.log > > > > > > > > > > > > > > > > (copied to ~wilde) > > > > > > > > > > > > > > > > script was: > > > > > > > > > > > > > > > > login1$ cat mvn.swift > > > > > > > > type file; > > > > > > > > > > > > > > > > app (file o) mv (file i) > > > > > > > > { > > > > > > > > mv @i @o; > > > > > > > > } > > > > > > > > > > > > > > > > file out[] > > > > > > > prefix="f.",suffix=".out">; > > > > > > > > foreach j in [1:@toint(@arg("n","1"))] { > > > > > > > > file data<"data.txt">; > > > > > > > > out[j] = mv(data); > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > data.txt was 3MB > > > > > > > > > > > > > > > > A look at the outdir gives a clue to where things hung: > > The files of > > > > > > > > <= ~3MB from time 10:48 are from this job. Files from > > 10:39 and > > > > > > > > earlier are from other manual runs executed on login1, > > Note that 3 of > > > > > > > > the 3MB output files have length 0 or <3MB, and were > > likely in transit > > > > > > > > back from the worker: > > > > > > > > > > > > > > > > -rw-r--r-- 1 wilde ci-users 2686976 Jan 19 10:48 > > f.0125.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 2621440 Jan 19 10:48 > > f.0167.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 0 Jan 19 10:48 f.0259.out > > > > > > > > > > > > > > > > > > > > > > > > login1$ pwd > > > > > > > > /scratch/local/wilde/lab > > > > > > > > login1$ cd outdir > > > > > > > > login1$ ls -lt | head -40 > > > > > > > > total 2772188 > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0023.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 2686976 Jan 19 10:48 > > f.0125.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 2621440 Jan 19 10:48 > > f.0167.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 0 Jan 19 10:48 f.0259.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0336.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0380.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0015.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0204.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0379.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0066.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0221.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0281.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0403.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0142.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0187.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0067.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0081.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0134.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0136.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0146.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0254.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0362.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0312.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0370.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0389.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0027.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0094.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0183.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0363.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0016.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0025.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0429.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 3010301 Jan 19 10:48 > > f.0239.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 30103010 Jan 19 10:39 > > f.0024.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 30103010 Jan 19 10:39 > > f.0037.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 30103010 Jan 19 10:39 > > f.0001.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 30103010 Jan 19 10:39 > > f.0042.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 30103010 Jan 19 10:39 > > f.0033.out > > > > > > > > -rw-r--r-- 1 wilde ci-users 30103010 Jan 19 10:39 > > f.0051.out > > > > > > > > l > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Michael Wilde > > > > > > > > Computation Institute, University of Chicago > > > > > > > > Mathematics and Computer Science Division > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Swift-devel mailing list > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From skenny at uchicago.edu Wed Mar 9 17:04:12 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 9 Mar 2011 15:04:12 -0800 Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <1536232120.167165.1299622861082.JavaMail.root@zimbra.anl.gov> References: <1536232120.167165.1299622861082.JavaMail.root@zimbra.anl.gov> Message-ID: hi all, here's a list of tasks, based on mike's email, that ketan and i discussed and think can be fairly easily assigned as action items. we determined which ones we think we can/should work on and i'm taking a stab at assigning the others. feel free to edit as you see fit and then ideally we can get them into bugzilla. *Google docs/sites for Swift User doc * * * 1 KETAN move all our current doc to google 2 KETAN determine method for save/review/push to live site (google?s java app for dumping to file) 3 JUSTIN keeping the docs in sync with the trunk and releases. Ability to associate (and get) the docs for any release branch 4 KETAN content writing and management guide to encourage writing and maintain quality and conventions 5 KETAN clear/uniform look and feel with ability to change the style throughout, when needed (example: https://sites.google.com/a/brain-connectivity-toolbox.net/bct/Home) 6 KETAN code testing: how to ensure that code examples in the docs are correct? 7 DAVID maintenance of external links (eg Google vs svn) 8 DAVID ability to (automatically) render PDF?s, standalone HTML, and text 9 SKENNY ability to index terms 10 SKENNY automated system for tracking and publishing changes 11 SKENNY Define document structure & substructure: Site, Quickstart, Tutorial, User Guide, Cookbook (should eventually go into user guide), and Case Studies. 12 SKENNY migrate all of the SWFT wiki content (including docbooks which should go in user doc). (eventually decomission). 13 MIHAEL determine timeline for transition 14 SKENNY make URL more transparent/CI-branded 15 DAVID site style improvements (eg logo stripe; page width , etc) 16 KETAN determine best editor: google sites vs google docs 17 KETAN table of contents management. 18 DAVID how to track/address comments for public revision control? (eg, posting to svn log? Automated tool?) 19 SKENNY determine site, page, and doc naming convetions 20 DAVID Use of Google Groups for access control ~sk On Tue, Mar 8, 2011 at 2:21 PM, Michael Wilde wrote: > > > as far as the other points in the email...i'm wondering if we want to > > discuss via skype and try to get it down to a concrete list and divide > > up the work? given how many varying issues we've touched on here i'd > > feel a little better if we could get them into action items that we > > can put into bugzilla as enhancements and assign them...does this seem > > reasonable? i'm worried back and forth via email might be a little > > less effective (though i'm happy to put all my comments in email if > > others prefer that). > > Im at a meeting all this week and cant do much email or voice calls. > > I think, Sarah, that you and Ketan meeting and proposing a plan back to the > group would be great. > > Maybe after you meet, a bit of email before you turn the plan into bugzilla > trackable items would be good. Then using bugz for this (and more of our > work tracking) would be great. > > I'll pipe in as time permits. > > - Mike > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Mar 10 01:35:23 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 10 Mar 2011 01:35:23 -0600 (CST) Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: Message-ID: <1953554851.172751.1299742523011.JavaMail.root@zimbra.anl.gov> Sarah, All, this looks very good, and comprehensive. You might need to break it into chunks with some kind of rough timetable. A few notes below. ----- Original Message ----- > hi all, here's a list of tasks, based on mike's email, that ketan and > i discussed and think can be fairly easily assigned as action items. > we determined which ones we think we can/should work on and i'm taking > a stab at assigning the others. feel free to edit as you see fit and > then ideally we can get them into bugzilla. > > Google docs/sites for Swift User doc > > 1 KETAN move all our current doc to google > 2 KETAN determine method for save/review/push to live site (google?s > java app for dumping to file) > 3 JUSTIN keeping the docs in sync with the trunk and releases. Ability > to associate (and get) the docs for any release branch > > 4 KETAN content writing and management guide to encourage writing and > maintain quality and conventions > 5 KETAN clear/uniform look and feel with ability to change the style > throughout, when needed (example: > https://sites.google.com/a/brain-connectivity-toolbox.net/bct/Home ) > 6 KETAN code testing: how to ensure that code examples in the docs are > correct? > 7 DAVID maintenance of external links (eg Google vs svn) ??? vs svn or vs ci.uchicago? > 8 DAVID ability to (automatically) render PDF?s, standalone HTML, and > text Did you discuss how to decide between Google Sites vs Docs for eg the User Guide? > 9 SKENNY ability to index terms Low prio, right? Perhaps for now search-within-site is more important than index? > 10 SKENNY automated system for tracking and publishing changes Lets make sure we keep it as simple as we can. Maybe this item is done simply by using good conventions for develop-review-launch? > 11 SKENNY Define document structure & substructure: Site, Quickstart, > Tutorial, User Guide, Cookbook (should eventually go into user guide), > and Case Studies. > > 12 SKENNY migrate all of the SWFT wiki content (including docbooks > which should go in user doc). (eventually decomission). Can be done gradually. > 13 MIHAEL determine timeline for transition ??? transition of all the above? Seems an odd role for Mihael? I'd have thought you'd do a timeline for this, as Mihael seems least involved in much of these tasks? > 14 SKENNY make URL more transparent/CI-branded Then maybe I didnt understand 7 above. Note, Google Sites has some info on this, and CI support can perhaps help. Also, its low prio. > 15 DAVID site style improvements (eg logo stripe; page width , etc) > 16 KETAN determine best editor: google sites vs google docs OK, cool. Bears on my comment to 8 above (rendering). I see lots of nice examples in Google Docs documentation that suggest nice page-numbered PDFs are possible from Docs "documents". I think there may be some css involved in this. > > 17 KETAN table of contents management. > 18 DAVID how to track/address > comments for public revision control? (eg, posting to svn log? > Automated tool?) Seems Sites has public comment boxes. Not sure we want these unless they can be moderated. > 19 SKENNY determine site, page, and doc naming convetions > 20 DAVID Use of Google Groups for access control Sounds great. A good thing to do next is to filter out the essential from the lower prios. Nice! - Mike > ~sk > > > On Tue, Mar 8, 2011 at 2:21 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > > > as far as the other points in the email...i'm wondering if we want > > to > > discuss via skype and try to get it down to a concrete list and > > divide > > up the work? given how many varying issues we've touched on here i'd > > feel a little better if we could get them into action items that we > > can put into bugzilla as enhancements and assign them...does this > > seem > > reasonable? i'm worried back and forth via email might be a little > > less effective (though i'm happy to put all my comments in email if > > others prefer that). > > Im at a meeting all this week and cant do much email or voice calls. > > I think, Sarah, that you and Ketan meeting and proposing a plan back > to the group would be great. > > Maybe after you meet, a bit of email before you turn the plan into > bugzilla trackable items would be good. Then using bugz for this (and > more of our work tracking) would be great. > > I'll pipe in as time permits. > > - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu Mar 10 01:45:49 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 09 Mar 2011 23:45:49 -0800 Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <1953554851.172751.1299742523011.JavaMail.root@zimbra.anl.gov> References: <1953554851.172751.1299742523011.JavaMail.root@zimbra.anl.gov> Message-ID: <1299743149.23454.3.camel@blabla2.none> On Thu, 2011-03-10 at 01:35 -0600, Michael Wilde wrote: > > 13 MIHAEL determine timeline for transition > > ??? transition of all the above? Seems an odd role for Mihael? I'd > have thought you'd do a timeline for this, as Mihael seems least > involved in much of these tasks? I think that's exactly why that is. Sarah couldn't find anything else respectable for me to do in this process :) From wilde at mcs.anl.gov Thu Mar 10 02:04:01 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 10 Mar 2011 02:04:01 -0600 (CST) Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <1299743149.23454.3.camel@blabla2.none> Message-ID: <1498450580.172772.1299744241627.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > On Thu, 2011-03-10 at 01:35 -0600, Michael Wilde wrote: > > > > 13 MIHAEL determine timeline for transition > > > > ??? transition of all the above? Seems an odd role for Mihael? I'd > > have thought you'd do a timeline for this, as Mihael seems least > > involved in much of these tasks? > > I think that's exactly why that is. Sarah couldn't find anything else > respectable for me to do in this process :) Except write (lots of Coaster) documentation ;) - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu Mar 10 02:20:35 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 10 Mar 2011 00:20:35 -0800 Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <1498450580.172772.1299744241627.JavaMail.root@zimbra.anl.gov> References: <1498450580.172772.1299744241627.JavaMail.root@zimbra.anl.gov> Message-ID: <1299745235.24741.2.camel@blabla2.none> On Thu, 2011-03-10 at 02:04 -0600, Michael Wilde wrote: > ----- Original Message ----- > > On Thu, 2011-03-10 at 01:35 -0600, Michael Wilde wrote: > > > > > > 13 MIHAEL determine timeline for transition > > > > > > ??? transition of all the above? Seems an odd role for Mihael? I'd > > > have thought you'd do a timeline for this, as Mihael seems least > > > involved in much of these tasks? > > > > I think that's exactly why that is. Sarah couldn't find anything else > > respectable for me to do in this process :) > > Except write (lots of Coaster) documentation ;) Maybe that's not respectable :) From wilde at mcs.anl.gov Thu Mar 10 08:27:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 10 Mar 2011 08:27:46 -0600 (CST) Subject: [Swift-devel] Proposal for coaster service options Message-ID: <533054956.173062.1299767266419.JavaMail.root@zimbra.anl.gov> Mihael, I want to work with Ketan to make the various coater startup scripts that we've developed suitable and documented for end user use. As a start, can you consider something like the following enhancement to coaster-service: -L Go into passive mode -s service port -S file to write service port number to (forces dynamic port selection starting with -s port number) -w passive worker connection port -W file to write worker port to (forces dynamic port selection starting with -s port number) This would make the startup scripts much simpler and cleaner: we would not have to run a dummy swift script to force the service to go into passive mode, and we would not have to scrape the standard output for the passive port number, and we could control whether we use a fixed port or a dynamically selected port. - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Mar 10 11:55:01 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 10 Mar 2011 11:55:01 -0600 (CST) Subject: [Swift-devel] Re: [Swift-user] Swift Coasters on OSG In-Reply-To: Message-ID: <2087247844.175507.1299779701095.JavaMail.root@zimbra.anl.gov> Hi Ketan, We should try to talk this week briefly about approaches, etc. But Im delighted that you are plowing ahead! One suggestion: try each coaster config manually and fully understand its behavior (and how error messages are manifested). That might help you isolate whats happening in the OSG case. What we're doing in these OSG scripts is actually one of the *harder* coaster configs; I though you might get a better fell by trying some of the simpler configs manually. We need to work out diagrams of each coaster config, so you can see how everything is configured and connects. Look through Mihael's paper, and this web page: http://wiki.cogkit.org/wiki/Coasters You need to make sure you understand these concepts: - persistent service vs transient service - embedded service vs external service (embedded is always transient) - passive service vs active service Regards, Mike ----- Original Message ----- Hello, I have been trying the Swift Coasters on OSG as an exercise from Allan's scripts on CI endpoints (communicado). The coasters services seems to start well. However, I get error when I submit the Swift workflow. Following are the details of what I am doing: 0. source /opt/osg-1.2.16/setup.sh ok. 1. ./mk_catalog.rb whitelist extenci contents of whitelist: SPRACE_osg-ce.sprace.org.br EOF Generates empty {worker, slave}.swift, tc.data, {gt2_osg, condor_osg, coaster_osg}.xml 2. ./start_services.sh 2 Seems to start service normally as per service-*.log files 3. swift -config swift.properties -sites.file coaster_osg.xml slave.swift To configure services to run in passive mode, no errors 4. swift -config swift.properties -sites.file condor_osg.xml worker.swift Request coaster jobs: no errors 5. swift -config swift.properties -sites.file coaster_osg.xml sleep.swift Submit workflow sleep.swift. I get error message: No service contacts available. Any clues on this error? Regards, Ketan _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Mar 10 11:59:37 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 10 Mar 2011 09:59:37 -0800 Subject: [Swift-devel] Re: Proposal for coaster service options In-Reply-To: <533054956.173062.1299767266419.JavaMail.root@zimbra.anl.gov> References: <533054956.173062.1299767266419.JavaMail.root@zimbra.anl.gov> Message-ID: <1299779977.28297.0.camel@blabla2.none> Sounds good. These make sense. Mihael On Thu, 2011-03-10 at 08:27 -0600, Michael Wilde wrote: > Mihael, > > I want to work with Ketan to make the various coater startup scripts that we've developed suitable and documented for end user use. > > As a start, can you consider something like the following enhancement to coaster-service: > > -L Go into passive mode > -s service port > -S file to write service port number to > (forces dynamic port selection starting with -s port number) > -w passive worker connection port > -W file to write worker port to > (forces dynamic port selection starting with -s port number) > > This would make the startup scripts much simpler and cleaner: we would not have to run a dummy swift script to force the service to go into passive mode, and we would not have to scrape the standard output for the passive port number, and we could control whether we use a fixed port or a dynamically selected port. > > - Mike > From hategan at mcs.anl.gov Thu Mar 10 12:38:10 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 10 Mar 2011 10:38:10 -0800 Subject: [Swift-devel] coaster IO buffer management Message-ID: <1299782290.29597.4.camel@blabla2.none> cog trunk r3057 contains an automatic coaster service to worker TCP buffer size management. There is a (now hardcoded) range of shared buffer sizes and the code divides that up for all the worker sockets. As new sockets come or go, the size is adjusted accordingly. This is done to ensure that: 1. If few workers are present, they get a sufficiently large buffer size to improve I/O performance. 2. If many workers are present, there are some attempts at limiting the amount of memory consumed by TCP buffers. This should theoretically be OK since it is assumed that many parallel transfers on small TCP buffers will still result in good IO performance since latencies are now parallelized. Mihael From ketancmaheshwari at gmail.com Thu Mar 10 15:22:57 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 10 Mar 2011 15:22:57 -0600 Subject: [Swift-devel] Markmail Message-ID: Hello, While using ANTLR, Taverna and its mailing list, I stumbled upon the Markmail service: http://www.markmail.org, http://taverna.markmail.org/, http://antlr.markmail.org/ Relevance to us is that among other services, they provide a mailing list indexing service that we could use for swift-user mailing list. Interesting features are search, sort, filter, and customized views of the mails in the archives. There is a feedback page through which we can indicate our interest in adding swift-user list for indexing. I thought it would be nice to have swift mailing lists up there as well. Regards, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Mar 10 15:37:39 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 10 Mar 2011 13:37:39 -0800 Subject: [Swift-devel] Markmail In-Reply-To: References: Message-ID: <1299793059.3127.1.camel@blabla2.none> I think we get this for free courtesy of google. One can search for: site:mail.ci.uchicago.edu/pipermail/swift-user problem Perhaps we should add a form to automatically add site: to our site. On Thu, 2011-03-10 at 15:22 -0600, Ketan Maheshwari wrote: > Hello, > > > While using ANTLR, Taverna and its mailing list, I stumbled upon the > Markmail service: > > > http://www.markmail.org, http://taverna.markmail.org/, http://antlr.markmail.org/ > > > Relevance to us is that among other services, they provide a mailing > list indexing service that we could use for swift-user mailing list. > Interesting features are search, sort, filter, and customized views of > the mails in the archives. > > > There is a feedback page through which we can indicate our interest in > adding swift-user list for indexing. > > > I thought it would be nice to have swift mailing lists up there as > well. > > > Regards, > Ketan > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From jon.monette at gmail.com Thu Mar 10 15:56:25 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Thu, 10 Mar 2011 15:56:25 -0600 Subject: [Swift-devel] Re: Workflow waiting on condition hang In-Reply-To: <4D743355.4050206@gmail.com> References: <4D5D8F6A.20906@gmail.com> <1297978769.20789.2.camel@blabla2.none> <4D602DF3.6000306@gmail.com> <1299448004.16332.2.camel@blabla2.none> <4D743355.4050206@gmail.com> Message-ID: <4D794909.3010803@gmail.com> Ok. My mapping error has magically disappeared. I updated to swift-r4175 and cog-r3057 from swift-r4171 and cog-r3056. I am not sure what happened but have run my code 4-5 times with no mapping error. I still have the condition hang. However the output that is reported every 10s if no events happen is very helpful. I am going through my scripts to track down on exactly what variable is hanging and if it hangs at the same place. On 3/6/11 7:22 PM, Jonathan Monette wrote: > I will test this out. Currently I have another problem that > appeared. It was reported in thread "[Swift-devel] Error in Swift > mapping". I have not put together a simple script that re-creates > this problem as I just went through my first wave of midterms. I will > see if I can put together a script this week. > > On 3/6/11 3:46 PM, Mihael Hategan wrote: >> Given that this does not seem to be a java deadlock, I added a hang >> checker to swift. If nothing is being executed inside karajan and no >> jobs are running in any ten second interval, it will dump future and >> thread information to the log file. >> >> This is in swift trunk r4171. >> >> Can you give that a try and report back the details? >> >> Mihael >> >> On Sat, 2011-02-19 at 14:54 -0600, Jonathan Monette wrote: >>> Yes. It always seems to hang at the same place. >>> >>> Attached is my montage script. It hangs in the mFitBatch function at >>> the mConcatFit app call. All other files have been created up to that >>> step but that app never runs. >>> >>> On 2/17/11 3:39 PM, Mihael Hategan wrote: >>>> On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote: >>>>> Hello, >>>>> My workflow seems to be hanging. This is trunk swift-r4107 and >>>>> cog-r3051. Attached is a compressed log file and the jstack output for >>>>> my workflow. The jstack file says it is waiting for a condition and my >>>>> workflow hangs. >>>> There's lots of stuff waiting because that's what they do when they >>>> don't have anything else to do. So I don't see a problem there. >>>> >>>> There are no jobs going to the coaster service, so clearly things aren't >>>> progressing. >>>> >>>> So now the question is: does this happen every time you run it or just >>>> some times? >>>> >>>> Also, please send the swift script. >>>> >>>> Mihael >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Fri Mar 11 13:03:31 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 11 Mar 2011 13:03:31 -0600 Subject: [Swift-devel] logo Message-ID: Do we have swift logo somewhere? I found swift banner on the homepage but no logo. Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Fri Mar 11 13:26:04 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Fri, 11 Mar 2011 13:26:04 -0600 Subject: [Swift-devel] Executing a swift task from Java application Message-ID: Hi swift-devel, I'm looking currently at using swift to manage files (staging in and out, etc) and execute tasks on behalf of another application. Is it possible to programmatically construct and execute Swift app tasks? Would it make sense to just use the coasters module on its own without the Swift runtime? I've had a look through the swift code base, and it hasn't become obvious to me yet how I would achieve that, so I thought I should ask some people who know their way around the code better. Is there a class or set of classes I should look at start looking at to understand how App tasks are instantiated and run? - Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Fri Mar 11 13:31:07 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 11 Mar 2011 13:31:07 -0600 Subject: [Swift-devel] Executing a swift task from Java application In-Reply-To: References: Message-ID: Hi Tim, I use Ruby or Python's templating engines to generate some of my app tasks that cannot be wrapped around Swift's current constructs. Templating engines are also useful to generate the sites.xml and tc.data to take input from a resource monitoring service like OSG's ReSS. 2011/3/11 Tim Armstrong : > Hi swift-devel, > ? I'm looking currently at using swift to manage files (staging in and out, > etc) and execute tasks on behalf of another application. > > Is it possible to programmatically construct and execute Swift app tasks? > Would it make sense to just use the coasters module on its own without the > Swift runtime? > > I've had a look through the swift code base, and it hasn't become obvious to > me yet how I would achieve that, so I thought I should ask some people who > know their way around the code better.? Is there a class or set of classes I > should look at start looking at to understand how App tasks are instantiated > and run? vdl-int.k in libexec should be a good start. If you want something lower-level and more fine grained, try playing around with direct Karajan scripts. > > - Tim > > -Allan From hategan at mcs.anl.gov Fri Mar 11 15:30:00 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 11 Mar 2011 13:30:00 -0800 Subject: [Swift-devel] Executing a swift task from Java application In-Reply-To: References: Message-ID: <1299879000.28860.9.camel@blabla2.none> On Fri, 2011-03-11 at 13:26 -0600, Tim Armstrong wrote: > Hi swift-devel, > I'm looking currently at using swift to manage files (staging in and > out, etc) and execute tasks on behalf of another application. > > Is it possible to programmatically construct and execute Swift app > tasks? Yes. Swift runs on top of something that can do that. Here's an example: http://cogkit.svn.sourceforge.net/viewvc/cogkit/trunk/current/src/cog/modules/abstraction-common/src/org/globus/cog/abstraction/tools/execution/JobSubmission.java?revision=2969&view=markup Essentially you have a Task which has a Specification (which contains the executable and all the details), a provider (which is the mechanism used to submit the job - coasters for example), and a service (which may have a job manager). Swift profiles (such as coaster parameters) are directly put into the specification attributes. > Would it make sense to just use the coasters module on its own > without the Swift runtime? That I can't answer, but I can imagine certain problems for which that would be the case. You won't have scheduling, throttling, retries, and any of the high level management that swift does though. Mihael From iraicu at cs.iit.edu Fri Mar 11 16:02:30 2011 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Fri, 11 Mar 2011 16:02:30 -0600 Subject: [Swift-devel] Executing a swift task from Java application In-Reply-To: References: Message-ID: <4D7A9BF6.4080602@cs.iit.edu> Perhaps what you want is something like Falkon? Falkon is a stand-alone system, that probably gives you most of the functionality of coasters. If you are already familiar with the Swift code-base, perhaps its best to figure out how to use coasters separately. Ioan -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= On 3/11/2011 1:26 PM, Tim Armstrong wrote: > Hi swift-devel, > I'm looking currently at using swift to manage files (staging in and > out, etc) and execute tasks on behalf of another application. > > Is it possible to programmatically construct and execute Swift app > tasks? Would it make sense to just use the coasters module on its own > without the Swift runtime? > > I've had a look through the swift code base, and it hasn't become > obvious to me yet how I would achieve that, so I thought I should ask > some people who know their way around the code better. Is there a > class or set of classes I should look at start looking at to > understand how App tasks are instantiated and run? > > - Tim > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Fri Mar 11 16:02:53 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Fri, 11 Mar 2011 16:02:53 -0600 Subject: [Swift-devel] Executing a swift task from Java application In-Reply-To: <1299879000.28860.9.camel@blabla2.none> References: <1299879000.28860.9.camel@blabla2.none> Message-ID: Cool, thanks, that helps a lot - this is about the level of abstraction that it would be good to work with - Tim On Fri, Mar 11, 2011 at 3:30 PM, Mihael Hategan wrote: > On Fri, 2011-03-11 at 13:26 -0600, Tim Armstrong wrote: > > Hi swift-devel, > > I'm looking currently at using swift to manage files (staging in and > > out, etc) and execute tasks on behalf of another application. > > > > Is it possible to programmatically construct and execute Swift app > > tasks? > > Yes. Swift runs on top of something that can do that. > > Here's an example: > > http://cogkit.svn.sourceforge.net/viewvc/cogkit/trunk/current/src/cog/modules/abstraction-common/src/org/globus/cog/abstraction/tools/execution/JobSubmission.java?revision=2969&view=markup > > Essentially you have a Task which has a Specification (which contains > the executable and all the details), a provider (which is the mechanism > used to submit the job - coasters for example), and a service (which may > have a job manager). > > Swift profiles (such as coaster parameters) are directly put into the > specification attributes. > > > Would it make sense to just use the coasters module on its own > > without the Swift runtime? > > That I can't answer, but I can imagine certain problems for which that > would be the case. You won't have scheduling, throttling, retries, and > any of the high level management that swift does though. > > Mihael > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sat Mar 12 16:04:59 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 12 Mar 2011 16:04:59 -0600 (CST) Subject: [Swift-devel] logo In-Reply-To: Message-ID: <443783718.183204.1299967499077.JavaMail.root@zimbra.anl.gov> Attached is what I think is a vector version in PDF and same image as PNG. I think its from the attached Illustrator file, not sure. The PDF seems to have the word swift and the birds in vector format but the shadow looks like a bitmap. Also, the background color is red rather than the maroon used on the web. Just use as-is and we'll find an artist to touch up the image for us (not worth researcher time to mess with). - Mike ----- Original Message ----- Do we have swift logo somewhere? I found swift banner on the homepage but no logo. Ketan _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swift_logo0101.ai Type: application/illustrator Size: 74271 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SwiftLogo_Red.pdf Type: application/pdf Size: 42748 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SwiftLogo_Red.png Type: image/png Size: 72206 bytes Desc: not available URL: From wilde at mcs.anl.gov Mon Mar 14 11:07:10 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 14 Mar 2011 11:07:10 -0500 (CDT) Subject: [Swift-devel] Fwd: [Swift-user] Swift is stuck with 5K jobs In-Reply-To: Message-ID: <517817969.186036.1300118830021.JavaMail.root@zimbra.anl.gov> We should document this along with an easy way to set the limit. - Mike ----- Forwarded Message ----- From: "Allan Espinosa" To: "Andriy Fedorov" Cc: swift-user at ci.uchicago.edu Sent: Monday, March 14, 2011 10:15:57 AM Subject: Re: [Swift-user] Swift is stuck with 5K jobs Hello Andriy, The default package may have a small max heap limit. Usually, I apply this patch whenever I get a new version of Swift: --- old/bin/swift 2010-10-12 12:18:47.000000000 -0500 +++ new/bin/swift 2010-10-12 12:18:37.000000000 -0500 @@ -9,7 +9,7 @@ CYGWIN= CPDELIM=":" -HEAPMAX=256M +HEAPMAX=4096M if echo `uname` | grep -i "cygwin"; then CYGWIN="yes" Works well with 800K jobs. -Allan 2011/3/14 Andriy Fedorov : > Hi, > > I am using swift with coasters on NCSA Abe. I use binary build of > swift 0.92. My script should generate about 5K individual jobs. When I > try to run it, I have > > Swift svn swift-r4157 cog-r3056 > > RunID: 20110314-0951-f3c45zja > Progress: > Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space > > Exception in thread "SIGINT handler" > Exception in thread "SIGINT handler" Exception in thread "SIGTERM handler" > > After this error, I am not able to terminate the script, and no jobs > get scheduled to pbs apparently. > > Am I hitting some limit? Is 5K jobs too much? > > How do I terminate swift now not to waste cycles of the head node? > > Thanks > -- > Andriy Fedorov, Ph.D. > > Research Fellow > Brigham and Women's Hospital > Harvard Medical School > 75 Francis Street > Boston, MA 02115 USA > fedorov at bwh.harvard.edu > (617) 525-6258 (office) > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Mon Mar 14 11:34:44 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 14 Mar 2011 11:34:44 -0500 (CDT) Subject: [Swift-devel] [Bug 261] update.sh script (for pushing web content live) gives errors In-Reply-To: References: Message-ID: <20110314163444.6D3D81C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=261 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov AssignedTo|skenny at uchicago.edu |wozniak at mcs.anl.gov --- Comment #1 from Justin Wozniak 2011-03-14 11:34:44 --- I probably caused this, I'll take a look. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wilde at mcs.anl.gov Tue Mar 15 12:54:30 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 15 Mar 2011 12:54:30 -0500 (CDT) Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: Message-ID: <815705898.191345.1300211670943.JavaMail.root@zimbra.anl.gov> Sounds good - but related to this, something very important: can you create as soon as possible a page we can all share that specifies the (evolving) doc conventions, the most important part of which is the names of the sites to use for various stages of the process (ie: development, review, live) and what page naming strategy and structure to use. Ive lost track of what is going where and dont know where to turn to find things, nor where I can edit prototypes or contribute to docs in progress. Ketan tells me that he doesnt seem to have the right access to edit / create in the development site(s)? And he too doesnt know the answers to what is where - so we need more coordination and communication to get things going so that everyone can contribute - which is the major reason we are moving to online doc tools. Keep in mind that we can use email to *discuss* things, but decisions and spec info need to be in a well known, easy to find place that we all can share. Lastly, this really needs to all stay on swift-devel so that we can coordinate as a team, especially since many people dont work on Swift every day. Thanks, Mike ----- Original Message ----- i have not done so...if no one else (and you haven't already) you want to go ahead and create it and share that info? ~sk On Fri, Mar 11, 2011 at 2:44 PM, David Kelly < dk0966 at cs.ship.edu > wrote: Has anyone already created a generic swift google account? If so, could they please send me the username and password for the purpose of setting up automation via google command line tools? Thanks, David On Thu, Mar 10, 2011 at 2:35 AM, Michael Wilde < wilde at mcs.anl.gov > wrote: Sarah, All, this looks very good, and comprehensive. You might need to break it into chunks with some kind of rough timetable. A few notes below. ----- Original Message ----- > hi all, here's a list of tasks, based on mike's email, that ketan and > i discussed and think can be fairly easily assigned as action items. > we determined which ones we think we can/should work on and i'm taking > a stab at assigning the others. feel free to edit as you see fit and > then ideally we can get them into bugzilla. > > Google docs/sites for Swift User doc > > 1 KETAN move all our current doc to google > 2 KETAN determine method for save/review/push to live site (google?s > java app for dumping to file) > 3 JUSTIN keeping the docs in sync with the trunk and releases. Ability > to associate (and get) the docs for any release branch > > 4 KETAN content writing and management guide to encourage writing and > maintain quality and conventions > 5 KETAN clear/uniform look and feel with ability to change the style > throughout, when needed (example: > https://sites.google.com/a/brain-connectivity-toolbox.net/bct/Home ) > 6 KETAN code testing: how to ensure that code examples in the docs are > correct? > 7 DAVID maintenance of external links (eg Google vs svn) ??? vs svn or vs ci.uchicago? > 8 DAVID ability to (automatically) render PDF?s, standalone HTML, and > text Did you discuss how to decide between Google Sites vs Docs for eg the User Guide? > 9 SKENNY ability to index terms Low prio, right? Perhaps for now search-within-site is more important than index? > 10 SKENNY automated system for tracking and publishing changes Lets make sure we keep it as simple as we can. Maybe this item is done simply by using good conventions for develop-review-launch? > 11 SKENNY Define document structure & substructure: Site, Quickstart, > Tutorial, User Guide, Cookbook (should eventually go into user guide), > and Case Studies. > > 12 SKENNY migrate all of the SWFT wiki content (including docbooks > which should go in user doc). (eventually decomission). Can be done gradually. > 13 MIHAEL determine timeline for transition ??? transition of all the above? Seems an odd role for Mihael? I'd have thought you'd do a timeline for this, as Mihael seems least involved in much of these tasks? > 14 SKENNY make URL more transparent/CI-branded Then maybe I didnt understand 7 above. Note, Google Sites has some info on this, and CI support can perhaps help. Also, its low prio. > 15 DAVID site style improvements (eg logo stripe; page width , etc) > 16 KETAN determine best editor: google sites vs google docs OK, cool. Bears on my comment to 8 above (rendering). I see lots of nice examples in Google Docs documentation that suggest nice page-numbered PDFs are possible from Docs "documents". I think there may be some css involved in this. > > 17 KETAN table of contents management. > 18 DAVID how to track/address > comments for public revision control? (eg, posting to svn log? > Automated tool?) Seems Sites has public comment boxes. Not sure we want these unless they can be moderated. > 19 SKENNY determine site, page, and doc naming convetions > 20 DAVID Use of Google Groups for access control Sounds great. A good thing to do next is to filter out the essential from the lower prios. Nice! - Mike > ~sk > > > On Tue, Mar 8, 2011 at 2:21 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > > > as far as the other points in the email...i'm wondering if we want > > to > > discuss via skype and try to get it down to a concrete list and > > divide > > up the work? given how many varying issues we've touched on here i'd > > feel a little better if we could get them into action items that we > > can put into bugzilla as enhancements and assign them...does this > > seem > > reasonable? i'm worried back and forth via email might be a little > > less effective (though i'm happy to put all my comments in email if > > others prefer that). > > Im at a meeting all this week and cant do much email or voice calls. > > I think, Sarah, that you and Ketan meeting and proposing a plan back > to the group would be great. > > Maybe after you meet, a bit of email before you turn the plan into > bugzilla trackable items would be good. Then using bugz for this (and > more of our work tracking) would be great. > > I'll pipe in as time permits. > > - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Tue Mar 15 13:16:08 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 15 Mar 2011 11:16:08 -0700 Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <815705898.191345.1300211670943.JavaMail.root@zimbra.anl.gov> References: <815705898.191345.1300211670943.JavaMail.root@zimbra.anl.gov> Message-ID: right, so, my list of tasks was an attempt at distilling mike's initial email into action items and was meant as a rough first pass that i was hoping others would contribute to or help to clarify/refine...it seems that i didn't fully understand/correctly interpret some of the items mike had in mind. mike, it sounds like you're saying it's best to move that list to a shared doc so that everyone can edit (?) if so i can do that...are we thinking i should create a google doc for this or a page on the site? otherwise i can try to reply to the inline comments. however, in either case i may need some help clarifying what was initially intended :) On Tue, Mar 15, 2011 at 10:54 AM, Michael Wilde wrote: > Sounds good - but related to this, something very important: can you create > as soon as possible a page we can all share that specifies the (evolving) > doc conventions, the most important part of which is the names of the sites > to use for various stages of the process (ie: development, review, live) and > what page naming strategy and structure to use. > > Ive lost track of what is going where and dont know where to turn to find > things, nor where I can edit prototypes or contribute to docs in progress. > > Ketan tells me that he doesnt seem to have the right access to edit / > create in the development site(s)? > > And he too doesnt know the answers to what is where - so we need more > coordination and communication to get things going so that everyone can > contribute - which is the major reason we are moving to online doc tools. > > Keep in mind that we can use email to *discuss* things, but decisions and > spec info need to be in a well known, easy to find place that we all can > share. > > Lastly, this really needs to all stay on swift-devel so that we can > coordinate as a team, especially since many people dont work on Swift every > day. > > > Thanks, > > Mike > > > > ------------------------------ > > i have not done so...if no one else (and you haven't already) you want to > go ahead and create it and share that info? > > ~sk > > On Fri, Mar 11, 2011 at 2:44 PM, David Kelly wrote: > >> Has anyone already created a generic swift google account? If so, could >> they please send me the username and password for the purpose of setting up >> automation via google command line tools? >> >> Thanks, >> David >> >> On Thu, Mar 10, 2011 at 2:35 AM, Michael Wilde wrote: >> >>> Sarah, All, this looks very good, and comprehensive. >>> >>> You might need to break it into chunks with some kind of rough timetable. >>> >>> A few notes below. >>> >>> ----- Original Message ----- >>> > hi all, here's a list of tasks, based on mike's email, that ketan and >>> > i discussed and think can be fairly easily assigned as action items. >>> > we determined which ones we think we can/should work on and i'm taking >>> > a stab at assigning the others. feel free to edit as you see fit and >>> > then ideally we can get them into bugzilla. >>> > >>> > Google docs/sites for Swift User doc >>> > >>> > 1 KETAN move all our current doc to google >>> > 2 KETAN determine method for save/review/push to live site (google?s >>> > java app for dumping to file) >>> > 3 JUSTIN keeping the docs in sync with the trunk and releases. Ability >>> > to associate (and get) the docs for any release branch >>> > >>> > 4 KETAN content writing and management guide to encourage writing and >>> > maintain quality and conventions >>> > 5 KETAN clear/uniform look and feel with ability to change the style >>> > throughout, when needed (example: >>> > https://sites.google.com/a/brain-connectivity-toolbox.net/bct/Home ) >>> > 6 KETAN code testing: how to ensure that code examples in the docs are >>> > correct? >>> > 7 DAVID maintenance of external links (eg Google vs svn) >>> >>> ??? vs svn or vs ci.uchicago? >>> >>> > 8 DAVID ability to (automatically) render PDF?s, standalone HTML, and >>> > text >>> >>> Did you discuss how to decide between Google Sites vs Docs for eg the >>> User Guide? >>> >>> > 9 SKENNY ability to index terms >>> >>> Low prio, right? Perhaps for now search-within-site is more important >>> than index? >>> >>> > 10 SKENNY automated system for tracking and publishing changes >>> >>> Lets make sure we keep it as simple as we can. Maybe this item is done >>> simply by using good conventions for develop-review-launch? >>> >>> > 11 SKENNY Define document structure & substructure: Site, Quickstart, >>> > Tutorial, User Guide, Cookbook (should eventually go into user guide), >>> > and Case Studies. >>> > >>> > 12 SKENNY migrate all of the SWFT wiki content (including docbooks >>> > which should go in user doc). (eventually decomission). >>> >>> Can be done gradually. >>> >>> > 13 MIHAEL determine timeline for transition >>> >>> ??? transition of all the above? Seems an odd role for Mihael? I'd have >>> thought you'd do a timeline for this, as Mihael seems least involved in much >>> of these tasks? >>> >>> > 14 SKENNY make URL more transparent/CI-branded >>> >>> Then maybe I didnt understand 7 above. Note, Google Sites has some info >>> on this, and CI support can perhaps help. Also, its low prio. >>> >>> > 15 DAVID site style improvements (eg logo stripe; page width , etc) >>> > 16 KETAN determine best editor: google sites vs google docs >>> >>> OK, cool. Bears on my comment to 8 above (rendering). >>> >>> I see lots of nice examples in Google Docs documentation that suggest >>> nice page-numbered PDFs are possible from Docs "documents". I think there >>> may be some css involved in this. >>> >>> > >>> > 17 KETAN table of contents management. >>> > 18 DAVID how to track/address >>> > comments for public revision control? (eg, posting to svn log? >>> > Automated tool?) >>> >>> Seems Sites has public comment boxes. Not sure we want these unless they >>> can be moderated. >>> >>> > 19 SKENNY determine site, page, and doc naming convetions >>> > 20 DAVID Use of Google Groups for access control >>> >>> Sounds great. A good thing to do next is to filter out the essential from >>> the lower prios. >>> >>> Nice! >>> >>> - Mike >>> >>> >>> > ~sk >>> > >>> > >>> > On Tue, Mar 8, 2011 at 2:21 PM, Michael Wilde < wilde at mcs.anl.gov > >>> > wrote: >>> > >>> > >>> > >>> > >>> > > as far as the other points in the email...i'm wondering if we want >>> > > to >>> > > discuss via skype and try to get it down to a concrete list and >>> > > divide >>> > > up the work? given how many varying issues we've touched on here i'd >>> > > feel a little better if we could get them into action items that we >>> > > can put into bugzilla as enhancements and assign them...does this >>> > > seem >>> > > reasonable? i'm worried back and forth via email might be a little >>> > > less effective (though i'm happy to put all my comments in email if >>> > > others prefer that). >>> > >>> > Im at a meeting all this week and cant do much email or voice calls. >>> > >>> > I think, Sarah, that you and Ketan meeting and proposing a plan back >>> > to the group would be great. >>> > >>> > Maybe after you meet, a bit of email before you turn the plan into >>> > bugzilla trackable items would be good. Then using bugz for this (and >>> > more of our work tracking) would be great. >>> > >>> > I'll pipe in as time permits. >>> > >>> > - Mike >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> >> > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Mar 15 13:26:55 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 15 Mar 2011 13:26:55 -0500 (CDT) Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: Message-ID: <100221705.191579.1300213615260.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > right, so, my list of tasks was an attempt at distilling mike's > initial email into action items and was meant as a rough first pass > that i was hoping others would contribute to or help to > clarify/refine... Exactly - and it was an excellent start! I think this is going real well, and Im just trying to keep it moving and do some tuning. > it seems that i didn't fully understand/correctly > interpret some of the items mike had in mind. No problem - the list is long and writing docs is a non-trivial effort that like code takes much discussion and coordination. Plus what's in my mind is just a starting point and certainly needs much clarification, discussion and improvement. > mike, it sounds like you're saying it's best to move that list to a > shared doc so that everyone can edit (?) Right - edit, track, revise the plan, etc. > if so i can do that...are we > thinking i should create a google doc for this or a page on the site? Either. Its the kind of thing we would have previously kept in the SWFT wiki. We can either to that, or, as I think better, migrate the SWFT wiki to a fresh SWFT site that we use for the exact same purpose. But do a better separation between internal group process docs and user-facing prelim docs. Also, we need I think 2 such internal wikis - one for swift-devel as an open source project, and one for our team, for inwards-facing issues related to our jobs at Argonne and CI. So lets make a swift-devel web (or a sub dir for this on the main swift swift) and a SWFT web for private group coordination. But all that said, the *main* thing I was asking for in the prior message was to put on the planning page the info that Swift writers need to know where to write what, and how it goes form draft to review to live: mainly just site names and structure for now. Thanks, Mike > otherwise i can try to reply to the inline comments. however, in > either case i may need some help clarifying what was initially > intended :) > > > On Tue, Mar 15, 2011 at 10:54 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > > Sounds good - but related to this, something very important: can you > create as soon as possible a page we can all share that specifies the > (evolving) doc conventions, the most important part of which is the > names of the sites to use for various stages of the process (ie: > development, review, live) and what page naming strategy and structure > to use. > > > Ive lost track of what is going where and dont know where to turn to > find things, nor where I can edit prototypes or contribute to docs in > progress. > > > Ketan tells me that he doesnt seem to have the right access to edit / > create in the development site(s)? > > > And he too doesnt know the answers to what is where - so we need more > coordination and communication to get things going so that everyone > can contribute - which is the major reason we are moving to online doc > tools. > > > Keep in mind that we can use email to *discuss* things, but decisions > and spec info need to be in a well known, easy to find place that we > all can share. > > > Lastly, this really needs to all stay on swift-devel so that we can > coordinate as a team, especially since many people dont work on Swift > every day. > > > > > Thanks, > > > Mike > > > > > > > > > > > i have not done so...if no one else (and you haven't already) you want > to go ahead and create it and share that info? > > ~sk > > > On Fri, Mar 11, 2011 at 2:44 PM, David Kelly < dk0966 at cs.ship.edu > > wrote: > > > > Has anyone already created a generic swift google account? If so, > could they please send me the username and password for the purpose of > setting up automation via google command line tools? > > > Thanks, > David > > > > > > On Thu, Mar 10, 2011 at 2:35 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > > > Sarah, All, this looks very good, and comprehensive. > > You might need to break it into chunks with some kind of rough > timetable. > > A few notes below. > > > ----- Original Message ----- > > hi all, here's a list of tasks, based on mike's email, that ketan > > and > > i discussed and think can be fairly easily assigned as action items. > > we determined which ones we think we can/should work on and i'm > > taking > > a stab at assigning the others. feel free to edit as you see fit and > > then ideally we can get them into bugzilla. > > > > Google docs/sites for Swift User doc > > > > 1 KETAN move all our current doc to google > > 2 KETAN determine method for save/review/push to live site (google?s > > java app for dumping to file) > > 3 JUSTIN keeping the docs in sync with the trunk and releases. > > Ability > > to associate (and get) the docs for any release branch > > > > 4 KETAN content writing and management guide to encourage writing > > and > > maintain quality and conventions > > 5 KETAN clear/uniform look and feel with ability to change the style > > throughout, when needed (example: > > https://sites.google.com/a/brain-connectivity-toolbox.net/bct/Home ) > > 6 KETAN code testing: how to ensure that code examples in the docs > > are > > correct? > > 7 DAVID maintenance of external links (eg Google vs svn) > > ??? vs svn or vs ci.uchicago? > > > > 8 DAVID ability to (automatically) render PDF?s, standalone HTML, > > and > > text > > Did you discuss how to decide between Google Sites vs Docs for eg the > User Guide? > > > > 9 SKENNY ability to index terms > > Low prio, right? Perhaps for now search-within-site is more important > than index? > > > > 10 SKENNY automated system for tracking and publishing changes > > Lets make sure we keep it as simple as we can. Maybe this item is done > simply by using good conventions for develop-review-launch? > > > > 11 SKENNY Define document structure & substructure: Site, > > Quickstart, > > Tutorial, User Guide, Cookbook (should eventually go into user > > guide), > > and Case Studies. > > > > 12 SKENNY migrate all of the SWFT wiki content (including docbooks > > which should go in user doc). (eventually decomission). > > Can be done gradually. > > > > 13 MIHAEL determine timeline for transition > > ??? transition of all the above? Seems an odd role for Mihael? I'd > have thought you'd do a timeline for this, as Mihael seems least > involved in much of these tasks? > > > > 14 SKENNY make URL more transparent/CI-branded > > Then maybe I didnt understand 7 above. Note, Google Sites has some > info on this, and CI support can perhaps help. Also, its low prio. > > > > 15 DAVID site style improvements (eg logo stripe; page width , etc) > > 16 KETAN determine best editor: google sites vs google docs > > OK, cool. Bears on my comment to 8 above (rendering). > > I see lots of nice examples in Google Docs documentation that suggest > nice page-numbered PDFs are possible from Docs "documents". I think > there may be some css involved in this. > > > > > > 17 KETAN table of contents management. > > 18 DAVID how to track/address > > comments for public revision control? (eg, posting to svn log? > > Automated tool?) > > Seems Sites has public comment boxes. Not sure we want these unless > they can be moderated. > > > > 19 SKENNY determine site, page, and doc naming convetions > > 20 DAVID Use of Google Groups for access control > > Sounds great. A good thing to do next is to filter out the essential > from the lower prios. > > Nice! > > - Mike > > > > > ~sk > > > > > > On Tue, Mar 8, 2011 at 2:21 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > > > > > > as far as the other points in the email...i'm wondering if we want > > > to > > > discuss via skype and try to get it down to a concrete list and > > > divide > > > up the work? given how many varying issues we've touched on here > > > i'd > > > feel a little better if we could get them into action items that > > > we > > > can put into bugzilla as enhancements and assign them...does this > > > seem > > > reasonable? i'm worried back and forth via email might be a little > > > less effective (though i'm happy to put all my comments in email > > > if > > > others prefer that). > > > > Im at a meeting all this week and cant do much email or voice calls. > > > > I think, Sarah, that you and Ketan meeting and proposing a plan back > > to the group would be great. > > > > Maybe after you meet, a bit of email before you turn the plan into > > bugzilla trackable items would be good. Then using bugz for this > > (and > > more of our work tracking) would be great. > > > > I'll pipe in as time permits. > > > > - Mike > > -- > > > > > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > > > > > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Tue Mar 15 15:21:35 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 15 Mar 2011 15:21:35 -0500 Subject: [Swift-devel] site level rights to google sites Message-ID: Hi, Could you kindly grant me site level rights to swift google site. It seems I have only page level rights which is not enabling all the site level options for me. Perhaps generation of ToC is also part of that. I could not figure yet how ToC works on Google Sites from the menu/tools available to me. Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Tue Mar 15 15:29:23 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 15 Mar 2011 13:29:23 -0700 Subject: [Swift-devel] site level rights to google sites In-Reply-To: References: Message-ID: done On Tue, Mar 15, 2011 at 1:21 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > Hi, > > Could you kindly grant me site level rights to swift google site. It seems > I have only page level rights which is not enabling all the site level > options for me. Perhaps generation of ToC is also part of that. I could not > figure yet how ToC works on Google Sites from the menu/tools available to > me. > > Ketan > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Mar 15 16:25:52 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 15 Mar 2011 16:25:52 -0500 (CDT) Subject: [Swift-devel] Which swift doc site to edit? In-Reply-To: Message-ID: <1372398904.193634.1300224352921.JavaMail.root@zimbra.anl.gov> Ketan, I just added you as owner to this other site: https://sites.google.com/site/swiftguide/ Sarah, thats the one I was referring to in that prior email this afternoon. I didnt have the site URL at my fingertips, but I recalled that we had one site for development and one to show users. Im not sure I have that right, and thats what I wanted to see documented so that people like Ketan who I am asking to write things know where to put things. - Mike ----- Original Message ----- Mike, I have access to this site only: https://sites.google.com/site/swiftparallelscripting/ Ketan On Tue, Mar 15, 2011 at 3:18 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com > wrote: Mike, If you can start a page somewhere, I can start adding to it with instructions for running the swift scripts (which are in that /scripts dir) https://sites.google.com/site/swiftparallelscripting/home/cookbook I think we can add contents under the Swift on Beagle header here on cookbook. Ketan -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Tue Mar 15 16:33:48 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 15 Mar 2011 14:33:48 -0700 Subject: [Swift-devel] Re: Which swift doc site to edit? Message-ID: here is how i set it up initially and i was intending to keep with this plan: one for development and the other for production. i'm also in the process of creating a site just for developers...but others should let me know if we want to do it differently ~sk ---------- Forwarded message ---------- From: Sarah Kenny Date: Wed, Mar 2, 2011 at 5:50 PM Subject: google sites doc To: Swift Devel i put the new 'production' site here: https://sites.google.com/site/swiftguide/ i just copied over the existing site which i've renamed the 'development' site: https://sites.google.com/site/swiftparallelscripting/ i believe the proper permissions were inherited on the copy, but let me know if you have any trouble accessing. ~sk -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Tue Mar 15 16:39:14 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 15 Mar 2011 14:39:14 -0700 Subject: [Swift-devel] Re: Which swift doc site to edit? In-Reply-To: References: Message-ID: looks like wilde, ketan and david have ownership of both the development and production sites. not sure about justin and mihael, i think i need their gmail usernames to add them since i don't see them on there (?) ~sk On Tue, Mar 15, 2011 at 2:33 PM, Sarah Kenny wrote: > here is how i set it up initially and i was intending to keep with this > plan: one for development and the other for production. > > i'm also in the process of creating a site just for developers...but others > should let me know if we want to do it differently > ~sk > > ---------- Forwarded message ---------- > From: Sarah Kenny > Date: Wed, Mar 2, 2011 at 5:50 PM > Subject: google sites doc > To: Swift Devel > > > i put the new 'production' site here: > https://sites.google.com/site/swiftguide/ > > i just copied over the existing site which i've renamed the 'development' > site: https://sites.google.com/site/swiftparallelscripting/ > > i believe the proper permissions were inherited on the copy, but let me > know if you have any trouble accessing. > > ~sk > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Tue Mar 15 16:40:52 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 15 Mar 2011 16:40:52 -0500 (CDT) Subject: [Swift-devel] Re: Which swift doc site to edit? In-Reply-To: References: Message-ID: I am wozniak.mcs at gmail.com On Tue, 15 Mar 2011, Sarah Kenny wrote: > looks like wilde, ketan and david have ownership of both the development and > production sites. not sure about justin and mihael, i think i need their > gmail usernames to add them since i don't see them on there (?) > > ~sk > > On Tue, Mar 15, 2011 at 2:33 PM, Sarah Kenny wrote: > >> here is how i set it up initially and i was intending to keep with this >> plan: one for development and the other for production. >> >> i'm also in the process of creating a site just for developers...but others >> should let me know if we want to do it differently >> ~sk >> >> ---------- Forwarded message ---------- >> From: Sarah Kenny >> Date: Wed, Mar 2, 2011 at 5:50 PM >> Subject: google sites doc >> To: Swift Devel >> >> >> i put the new 'production' site here: >> https://sites.google.com/site/swiftguide/ >> >> i just copied over the existing site which i've renamed the 'development' >> site: https://sites.google.com/site/swiftparallelscripting/ >> >> i believe the proper permissions were inherited on the copy, but let me >> know if you have any trouble accessing. >> >> ~sk >> >> > -- Justin M Wozniak From wilde at mcs.anl.gov Tue Mar 15 16:41:12 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 15 Mar 2011 16:41:12 -0500 (CDT) Subject: [Swift-devel] Re: Which swift doc site to edit? In-Reply-To: Message-ID: <288812277.193753.1300225272439.JavaMail.root@zimbra.anl.gov> right, and I also think you might want to try to find out why our google group is not working for this. If we had that working, then all we'd need to do for any access control is add the whole developer group as one entity. - Mike ----- Original Message ----- looks like wilde, ketan and david have ownership of both the development and production sites. not sure about justin and mihael, i think i need their gmail usernames to add them since i don't see them on there (?) ~sk On Tue, Mar 15, 2011 at 2:33 PM, Sarah Kenny < skenny at uchicago.edu > wrote: here is how i set it up initially and i was intending to keep with this plan: one for development and the other for production. i'm also in the process of creating a site just for developers...but others should let me know if we want to do it differently ~sk ---------- Forwarded message ---------- From: Sarah Kenny < skenny at uchicago.edu > Date: Wed, Mar 2, 2011 at 5:50 PM Subject: google sites doc To: Swift Devel < swift-devel at ci.uchicago.edu > i put the new 'production' site here: https://sites.google.com/site/swiftguide/ i just copied over the existing site which i've renamed the 'development' site: https://sites.google.com/site/swiftparallelscripting/ i believe the proper permissions were inherited on the copy, but let me know if you have any trouble accessing. ~sk -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Mar 15 16:46:30 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 15 Mar 2011 16:46:30 -0500 (CDT) Subject: [Swift-devel] Re: Which swift doc site to edit? In-Reply-To: Message-ID: <1234861832.193785.1300225590506.JavaMail.root@zimbra.anl.gov> I just changed the settings of the Swift Research Google group to make all members "owners" and managers of the group. SO now, Sarah, I think you can both get everyones gmail addrs from the group, and also see if you can set all the necessary site access rights via the group. Long term this will save a lot of work. It seems to work for another project we work on, but for some reason not for the Swift group - I missing setting somewhere I presume. - Mike ----- Original Message ----- looks like wilde, ketan and david have ownership of both the development and production sites. not sure about justin and mihael, i think i need their gmail usernames to add them since i don't see them on there (?) ~sk On Tue, Mar 15, 2011 at 2:33 PM, Sarah Kenny < skenny at uchicago.edu > wrote: here is how i set it up initially and i was intending to keep with this plan: one for development and the other for production. i'm also in the process of creating a site just for developers...but others should let me know if we want to do it differently ~sk ---------- Forwarded message ---------- From: Sarah Kenny < skenny at uchicago.edu > Date: Wed, Mar 2, 2011 at 5:50 PM Subject: google sites doc To: Swift Devel < swift-devel at ci.uchicago.edu > i put the new 'production' site here: https://sites.google.com/site/swiftguide/ i just copied over the existing site which i've renamed the 'development' site: https://sites.google.com/site/swiftparallelscripting/ i believe the proper permissions were inherited on the copy, but let me know if you have any trouble accessing. ~sk -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Mar 15 16:50:29 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 15 Mar 2011 16:50:29 -0500 (CDT) Subject: [Swift-devel] Re: Which swift doc site to edit? In-Reply-To: Message-ID: <427236510.193815.1300225829468.JavaMail.root@zimbra.anl.gov> Great, that sounds good, Sarah - thanks. I think Ketan was also wondering how you created the TOC on the current User Guide example. Was that based on the tags that were created (and retained) when I pasted the contents from the docbook-generated live site to the Google sites prototype? Did any ideas get discussed on whether Google Sites or Documents is best for the Users Guide? - Mike ----- Original Message ----- > here is how i set it up initially and i was intending to keep with > this plan: one for development and the other for production. > > i'm also in the process of creating a site just for developers...but > others should let me know if we want to do it differently > ~sk > > > ---------- Forwarded message ---------- > From: Sarah Kenny < skenny at uchicago.edu > > Date: Wed, Mar 2, 2011 at 5:50 PM > Subject: google sites doc > To: Swift Devel < swift-devel at ci.uchicago.edu > > > > i put the new 'production' site here: > https://sites.google.com/site/swiftguide/ > > i just copied over the existing site which i've renamed the > 'development' site: > https://sites.google.com/site/swiftparallelscripting/ > > i believe the proper permissions were inherited on the copy, but let > me know if you have any trouble accessing. > > ~sk -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Tue Mar 15 17:16:21 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 15 Mar 2011 15:16:21 -0700 Subject: [Swift-devel] Re: Which swift doc site to edit? In-Reply-To: References: Message-ID: <1300227381.10328.0.camel@blabla2.none> On Tue, 2011-03-15 at 14:39 -0700, Sarah Kenny wrote: > looks like wilde, ketan and david have ownership of both the > development and production sites. not sure about justin and mihael, i > think i need their gmail usernames to add them since i don't see them > on there (?) hategan at mcs.anl.gov > > ~sk > > On Tue, Mar 15, 2011 at 2:33 PM, Sarah Kenny > wrote: > here is how i set it up initially and i was intending to keep > with this plan: one for development and the other for > production. > > i'm also in the process of creating a site just for > developers...but others should let me know if we want to do it > differently > ~sk > > ---------- Forwarded message ---------- > From: Sarah Kenny > Date: Wed, Mar 2, 2011 at 5:50 PM > Subject: google sites doc > To: Swift Devel > > > i put the new 'production' site here: > https://sites.google.com/site/swiftguide/ > > i just copied over the existing site which i've renamed the > 'development' site: > https://sites.google.com/site/swiftparallelscripting/ > > i believe the proper permissions were inherited on the copy, > but let me know if you have any trouble accessing. > > ~sk > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From skenny at uchicago.edu Tue Mar 15 18:31:32 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 15 Mar 2011 16:31:32 -0700 Subject: [Swift-devel] Re: Which swift doc site to edit? In-Reply-To: <427236510.193815.1300225829468.JavaMail.root@zimbra.anl.gov> References: <427236510.193815.1300225829468.JavaMail.root@zimbra.anl.gov> Message-ID: On Tue, Mar 15, 2011 at 2:50 PM, Michael Wilde wrote: > Great, that sounds good, Sarah - thanks. > > I think Ketan was also wondering how you created the TOC on the current > User Guide example. Was that based on the tags that were created (and > retained) when I pasted the contents from the docbook-generated live site to > the Google sites prototype? > https://sites.google.com/a/googleapps.com/edu-training-center/Training-Home/module-5-sites/chapter-5/2-6 > > Did any ideas get discussed on whether Google Sites or Documents is best > for the Users Guide? > my personal preference is for Sites unless anyone has a compelling reason to use Docs instead (?) > > - Mike > > > ----- Original Message ----- > > here is how i set it up initially and i was intending to keep with > > this plan: one for development and the other for production. > > > > i'm also in the process of creating a site just for developers...but > > others should let me know if we want to do it differently > > ~sk > > > > > > ---------- Forwarded message ---------- > > From: Sarah Kenny < skenny at uchicago.edu > > > Date: Wed, Mar 2, 2011 at 5:50 PM > > Subject: google sites doc > > To: Swift Devel < swift-devel at ci.uchicago.edu > > > > > > > i put the new 'production' site here: > > https://sites.google.com/site/swiftguide/ > > > > i just copied over the existing site which i've renamed the > > 'development' site: > > https://sites.google.com/site/swiftparallelscripting/ > > > > i believe the proper permissions were inherited on the copy, but let > > me know if you have any trouble accessing. > > > > ~sk > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Wed Mar 16 02:02:00 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 16 Mar 2011 00:02:00 -0700 Subject: [Swift-devel] Plan for managing Swift docs In-Reply-To: <100221705.191579.1300213615260.JavaMail.root@zimbra.anl.gov> References: <100221705.191579.1300213615260.JavaMail.root@zimbra.anl.gov> Message-ID: hi all, i started the documentation planning page here: https://sites.google.com/site/swiftdevel/ i attempted to give everyone full access using swift-research-group at googlegroups.com please let me know if you're able to edit, etc. ~sk On Tue, Mar 15, 2011 at 11:26 AM, Michael Wilde wrote: > > > ----- Original Message ----- > > right, so, my list of tasks was an attempt at distilling mike's > > initial email into action items and was meant as a rough first pass > > that i was hoping others would contribute to or help to > > clarify/refine... > > Exactly - and it was an excellent start! I think this is going real well, > and Im just trying to keep it moving and do some tuning. > > > it seems that i didn't fully understand/correctly > > interpret some of the items mike had in mind. > > No problem - the list is long and writing docs is a non-trivial effort that > like code takes much discussion and coordination. Plus what's in my mind is > just a starting point and certainly needs much clarification, discussion and > improvement. > > > mike, it sounds like you're saying it's best to move that list to a > > shared doc so that everyone can edit (?) > > Right - edit, track, revise the plan, etc. > > > if so i can do that...are we > > thinking i should create a google doc for this or a page on the site? > > Either. Its the kind of thing we would have previously kept in the SWFT > wiki. We can either to that, or, as I think better, migrate the SWFT wiki to > a fresh SWFT site that we use for the exact same purpose. But do a better > separation between internal group process docs and user-facing prelim docs. > > Also, we need I think 2 such internal wikis - one for swift-devel as an > open source project, and one for our team, for inwards-facing issues related > to our jobs at Argonne and CI. So lets make a swift-devel web (or a sub dir > for this on the main swift swift) and a SWFT web for private group > coordination. > > But all that said, the *main* thing I was asking for in the prior message > was to put on the planning page the info that Swift writers need to know > where to write what, and how it goes form draft to review to live: mainly > just site names and structure for now. > > Thanks, > > Mike > > > otherwise i can try to reply to the inline comments. however, in > > either case i may need some help clarifying what was initially > > intended :) > > > > > > On Tue, Mar 15, 2011 at 10:54 AM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > > > > > Sounds good - but related to this, something very important: can you > > create as soon as possible a page we can all share that specifies the > > (evolving) doc conventions, the most important part of which is the > > names of the sites to use for various stages of the process (ie: > > development, review, live) and what page naming strategy and structure > > to use. > > > > > > Ive lost track of what is going where and dont know where to turn to > > find things, nor where I can edit prototypes or contribute to docs in > > progress. > > > > > > Ketan tells me that he doesnt seem to have the right access to edit / > > create in the development site(s)? > > > > > > And he too doesnt know the answers to what is where - so we need more > > coordination and communication to get things going so that everyone > > can contribute - which is the major reason we are moving to online doc > > tools. > > > > > > Keep in mind that we can use email to *discuss* things, but decisions > > and spec info need to be in a well known, easy to find place that we > > all can share. > > > > > > Lastly, this really needs to all stay on swift-devel so that we can > > coordinate as a team, especially since many people dont work on Swift > > every day. > > > > > > > > > > Thanks, > > > > > > Mike > > > > > > > > > > > > > > > > > > > > > > i have not done so...if no one else (and you haven't already) you want > > to go ahead and create it and share that info? > > > > ~sk > > > > > > On Fri, Mar 11, 2011 at 2:44 PM, David Kelly < dk0966 at cs.ship.edu > > > wrote: > > > > > > > > Has anyone already created a generic swift google account? If so, > > could they please send me the username and password for the purpose of > > setting up automation via google command line tools? > > > > > > Thanks, > > David > > > > > > > > > > > > On Thu, Mar 10, 2011 at 2:35 AM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > > > > > > > Sarah, All, this looks very good, and comprehensive. > > > > You might need to break it into chunks with some kind of rough > > timetable. > > > > A few notes below. > > > > > > ----- Original Message ----- > > > hi all, here's a list of tasks, based on mike's email, that ketan > > > and > > > i discussed and think can be fairly easily assigned as action items. > > > we determined which ones we think we can/should work on and i'm > > > taking > > > a stab at assigning the others. feel free to edit as you see fit and > > > then ideally we can get them into bugzilla. > > > > > > Google docs/sites for Swift User doc > > > > > > 1 KETAN move all our current doc to google > > > 2 KETAN determine method for save/review/push to live site (google?s > > > java app for dumping to file) > > > 3 JUSTIN keeping the docs in sync with the trunk and releases. > > > Ability > > > to associate (and get) the docs for any release branch > > > > > > 4 KETAN content writing and management guide to encourage writing > > > and > > > maintain quality and conventions > > > 5 KETAN clear/uniform look and feel with ability to change the style > > > throughout, when needed (example: > > > https://sites.google.com/a/brain-connectivity-toolbox.net/bct/Home ) > > > 6 KETAN code testing: how to ensure that code examples in the docs > > > are > > > correct? > > > 7 DAVID maintenance of external links (eg Google vs svn) > > > > ??? vs svn or vs ci.uchicago? > > > > > > > 8 DAVID ability to (automatically) render PDF?s, standalone HTML, > > > and > > > text > > > > Did you discuss how to decide between Google Sites vs Docs for eg the > > User Guide? > > > > > > > 9 SKENNY ability to index terms > > > > Low prio, right? Perhaps for now search-within-site is more important > > than index? > > > > > > > 10 SKENNY automated system for tracking and publishing changes > > > > Lets make sure we keep it as simple as we can. Maybe this item is done > > simply by using good conventions for develop-review-launch? > > > > > > > 11 SKENNY Define document structure & substructure: Site, > > > Quickstart, > > > Tutorial, User Guide, Cookbook (should eventually go into user > > > guide), > > > and Case Studies. > > > > > > 12 SKENNY migrate all of the SWFT wiki content (including docbooks > > > which should go in user doc). (eventually decomission). > > > > Can be done gradually. > > > > > > > 13 MIHAEL determine timeline for transition > > > > ??? transition of all the above? Seems an odd role for Mihael? I'd > > have thought you'd do a timeline for this, as Mihael seems least > > involved in much of these tasks? > > > > > > > 14 SKENNY make URL more transparent/CI-branded > > > > Then maybe I didnt understand 7 above. Note, Google Sites has some > > info on this, and CI support can perhaps help. Also, its low prio. > > > > > > > 15 DAVID site style improvements (eg logo stripe; page width , etc) > > > 16 KETAN determine best editor: google sites vs google docs > > > > OK, cool. Bears on my comment to 8 above (rendering). > > > > I see lots of nice examples in Google Docs documentation that suggest > > nice page-numbered PDFs are possible from Docs "documents". I think > > there may be some css involved in this. > > > > > > > > > > 17 KETAN table of contents management. > > > 18 DAVID how to track/address > > > comments for public revision control? (eg, posting to svn log? > > > Automated tool?) > > > > Seems Sites has public comment boxes. Not sure we want these unless > > they can be moderated. > > > > > > > 19 SKENNY determine site, page, and doc naming convetions > > > 20 DAVID Use of Google Groups for access control > > > > Sounds great. A good thing to do next is to filter out the essential > > from the lower prios. > > > > Nice! > > > > - Mike > > > > > > > > > ~sk > > > > > > > > > On Tue, Mar 8, 2011 at 2:21 PM, Michael Wilde < wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > > > > > > > > as far as the other points in the email...i'm wondering if we want > > > > to > > > > discuss via skype and try to get it down to a concrete list and > > > > divide > > > > up the work? given how many varying issues we've touched on here > > > > i'd > > > > feel a little better if we could get them into action items that > > > > we > > > > can put into bugzilla as enhancements and assign them...does this > > > > seem > > > > reasonable? i'm worried back and forth via email might be a little > > > > less effective (though i'm happy to put all my comments in email > > > > if > > > > others prefer that). > > > > > > Im at a meeting all this week and cant do much email or voice calls. > > > > > > I think, Sarah, that you and Ketan meeting and proposing a plan back > > > to the group would be great. > > > > > > Maybe after you meet, a bit of email before you turn the plan into > > > bugzilla trackable items would be good. Then using bugz for this > > > (and > > > more of our work tracking) would be great. > > > > > > I'll pipe in as time permits. > > > > > > - Mike > > > > -- > > > > > > > > > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > > > > > > > > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > -- > > > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Wed Mar 16 19:19:00 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 16 Mar 2011 19:19:00 -0500 Subject: [Swift-devel] Re: Which swift doc site to edit? In-Reply-To: References: <427236510.193815.1300225829468.JavaMail.root@zimbra.anl.gov> Message-ID: > > On Tue, Mar 15, 2011 at 2:50 PM, Michael Wilde wrote: > >> Great, that sounds good, Sarah - thanks. >> >> I think Ketan was also wondering how you created the TOC on the current >> User Guide example. Was that based on the tags that were created (and >> retained) when I pasted the contents from the docbook-generated live site to >> the Google sites prototype? >> > > > https://sites.google.com/a/googleapps.com/edu-training-center/Training-Home/module-5-sites/chapter-5/2-6 > > Sorry I missed this link earlier. So, indeed it is possible to autogenerate the ToC. However, there is one thing here: The ToC automatically adds numbering to sections and subsections in *ToC* but not at the actual sections which means if we put numbers manually in front of sections, they will show up *twice* as I just tested. I am trying to see some tweaks to get past this: any insights are welcome. Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Mar 16 23:25:04 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 16 Mar 2011 23:25:04 -0500 (CDT) Subject: [Swift-devel] Meeting notes - 0.93 release plans Message-ID: <907204846.199548.1300335904703.JavaMail.root@zimbra.anl.gov> Notes from our 3/16 meeting including tentative plans and features for the 0.93 release are at: https://sites.google.com/site/swiftdevel/release-plans - Mike From wilde at mcs.anl.gov Wed Mar 16 23:29:19 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 16 Mar 2011 23:29:19 -0500 (CDT) Subject: [Swift-devel] Made Google swift-devel site world-readable Message-ID: <18525257.199553.1300336159306.JavaMail.root@zimbra.anl.gov> Im assuming thats what we want. If not, please suggest an access policy. - Mike From skenny at uchicago.edu Wed Mar 16 23:40:57 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 16 Mar 2011 21:40:57 -0700 Subject: [Swift-devel] svn commits and changelog Message-ID: just wanted to send out a reminder to all the developers that when committing code to svn, if you want your commit comment to make it into the changelog for the next release it should begin with "note:" this has been discussed in the past and came up again today but i thought it was worth mentioning on the list. ~sk -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Thu Mar 17 16:07:42 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Fri, 18 Mar 2011 02:37:42 +0530 Subject: [Swift-devel] Clarifications regarding a GSoC project idea [swift] Message-ID: Hi, I am interested in working on the project on Implementing efficient Map-Reduce models using the Swift parallel scripting language as mentioned on the ideas page[1]. I fairly understand the map-reduce concept and have done a toy erlang implementation [2]. As of now, I have gotten swift compiled and running. I am also going through the papers on swift [3] and map-reduce [4] [5]. Any help or directions to get a clear picture of the problem at hand would be greatly appreciated. Secondly, I am working on a wrapper for RBUDP[6] on XIO (for college project, as part of a team). If there are any projects that could be done around profiling GridFTP over UDT and RBUDP separately and performance comparisons, I would be interested in that as well. Implementing RBUDP driver for XIO was mentioned here [7] I am not quite sure where exactly I should be mailing this, so if this gets posted to the wrong mailing list please excuse my clumsiness. [1] http://dev.globus.org/wiki/Google_Summer_of_Code_2011_Ideas [2] https://github.com/yadudoc/erlang/blob/master/mapred.erl [3] http://www.ci.uchicago.edu/swift/papers/SwiftParallelScripting.pdf [4] http://labs.google.com/papers/mapreduce-osdi04.pdf [5] Hadoop: The Definitive Guide , O'Reilly Media (Chapter 6) [6] http://www.evl.uic.edu/cavern/papers/cluster2002.pdf [7] http://dev.globus.org/wiki/Project_Ideas -- Thanks and Regards, Yadu Nand B (+91 94477 80725) ( http://humanint.posterous.com ) From wilde at mcs.anl.gov Thu Mar 17 16:42:42 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Mar 2011 16:42:42 -0500 (CDT) Subject: [Swift-devel] Re: Clarifications regarding a GSoC project idea [swift] In-Reply-To: Message-ID: <411915138.202778.1300398162897.JavaMail.root@zimbra.anl.gov> Hi Yadu, A good source of ideas for how to do map reduce in Swift might be the work that Ed Walker did to implement map reduce in his parallel shell: http://portal.acm.org/citation.cfm?id=1645175 and http://sites.google.com/site/ewalker544/research-2/dataflowshell (where you can download Ed's parallel bash) I think there are two main (and somewhat separate) aspects here: - how to work both with and without name/value pairs: Swift has no intrinsic name/value concept, and one can find good use cases both with and without keys. Note that the separate project to add associative arrays to Swift is one way to integrate the concept of keys - how to do a reduction trees (especially for non-key-based workflows) in a manner that reduces the amount of data and avoids the requirement of sending the output of every map operation back to a single site for reduction I should also mention that this project is one of the more research-oriented and less focused projects on our list. There are several more concrete projects that you may also find interesting. So if you find this one fascinating by all means keep thinking about it. But if you want something more concrete I can disscuss a few other possibilities with you. - Mike ----- Original Message ----- > Hi, > > I am interested in working on the project on Implementing efficient > Map-Reduce models using the Swift parallel scripting language as > mentioned on the ideas page[1]. I fairly understand the map-reduce > concept and have done a toy erlang implementation [2]. > > As of now, I have gotten swift compiled and running. I am also going > through the papers on swift [3] and map-reduce [4] [5]. Any help or > directions to get a clear picture of the problem at hand would be > greatly appreciated. > > Secondly, I am working on a wrapper for RBUDP[6] on XIO (for college > project, as part of a team). If there are any projects that could be > done > around profiling GridFTP over UDT and RBUDP separately and > performance comparisons, I would be interested in that as well. > Implementing RBUDP driver for XIO was mentioned here [7] > > I am not quite sure where exactly I should be mailing this, so if this > gets posted to the wrong mailing list please excuse my clumsiness. > > [1] http://dev.globus.org/wiki/Google_Summer_of_Code_2011_Ideas > [2] https://github.com/yadudoc/erlang/blob/master/mapred.erl > [3] http://www.ci.uchicago.edu/swift/papers/SwiftParallelScripting.pdf > [4] http://labs.google.com/papers/mapreduce-osdi04.pdf > [5] Hadoop: The Definitive Guide , O'Reilly Media (Chapter 6) > [6] http://www.evl.uic.edu/cavern/papers/cluster2002.pdf > [7] http://dev.globus.org/wiki/Project_Ideas > > -- > Thanks and Regards, > Yadu Nand B > (+91 94477 80725) > ( http://humanint.posterous.com ) -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Mar 17 18:28:42 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 17 Mar 2011 18:28:42 -0500 Subject: [Swift-devel] Swift error: Application not available Message-ID: Hello, I am trying Swift running on beagle talking to its PBS resources. However, it seems I am getting inconsistent error messages. I have following files(all attached): a swift source, tc(transaction), sites.xml, cf(config), a binary First, I try the following commandline: swift -config cf -tc.file tc -sites.file sites.xml catsnsleep.swift -n=1 And get the following error: The application "catnap" is not available in your tc.data catalog Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException Final status: Failed:1 The following errors have occurred: 1. The application "catnap" is not available in your tc.data catalog ==== However, the application *is* available in tc since the following altered commandline (intended for local execution) works: swift -tc.file tc catsnsleep.swift -n=1 Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) RunID: 20110317-1709-9akduibd Progress: Progress: Checking status:1 Final status: Finished successfully:1 ==== Finally when I just skip the config and use only the sites.xml as in the following commandline: swift -tc.file tc -sites.file sites.xml catsnsleep.swift -n=1 I get the following: Execution failed: The application "catnap" is not available in your tc.data catalog ==== May be, I am missing something or doing something wrong. However, from an end user/ beginning user point of view it seems the error messages are not consistent/informative. Regards, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ================= Ketan's Test Data ================= -------------- next part -------------- A non-text attachment was scrubbed... Name: catsnsleep.swift Type: application/octet-stream Size: 290 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cf Type: application/octet-stream Size: 297 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sites.xml Type: text/xml Size: 959 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tc Type: application/octet-stream Size: 196 bytes Desc: not available URL: From ketancmaheshwari at gmail.com Thu Mar 17 18:31:22 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 17 Mar 2011 18:31:22 -0500 Subject: [Swift-devel] Re: Swift error: Application not available In-Reply-To: References: Message-ID: Attaching the missing catnap.sh On Thu, Mar 17, 2011 at 6:28 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > Hello, > > I am trying Swift running on beagle talking to its PBS resources. However, > it seems I am getting inconsistent error messages. > > I have following files(all attached): a swift source, tc(transaction), > sites.xml, cf(config), a binary > > First, I try the following commandline: > > swift -config cf -tc.file tc -sites.file sites.xml catsnsleep.swift -n=1 > > And get the following error: > > The application "catnap" is not available in your tc.data catalog > Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException > Final status: Failed:1 > The following errors have occurred: > 1. The application "catnap" is not available in your tc.data catalog > > ==== > > However, the application *is* available in tc since the following altered > commandline (intended for local execution) works: > > swift -tc.file tc catsnsleep.swift -n=1 > > Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified > locally) > > RunID: 20110317-1709-9akduibd > Progress: > Progress: Checking status:1 > Final status: Finished successfully:1 > ==== > > Finally when I just skip the config and use only the sites.xml as in the > following commandline: > > swift -tc.file tc -sites.file sites.xml catsnsleep.swift -n=1 > > I get the following: > > Execution failed: > The application "catnap" is not available in your tc.data catalog > ==== > > May be, I am missing something or doing something wrong. However, from an > end user/ beginning user point of view it seems the error messages are not > consistent/informative. > > > Regards, > Ketan > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: catnap.sh Type: application/x-sh Size: 32 bytes Desc: not available URL: From wilde at mcs.anl.gov Thu Mar 17 18:45:59 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Mar 2011 18:45:59 -0500 (CDT) Subject: [Swift-devel] Swift error: Application not available In-Reply-To: Message-ID: <1533631830.203209.1300405559788.JavaMail.root@zimbra.anl.gov> Ketan, catnap is defined in your tc to be on site "localhost", but your sites file only includes the site "pbs". The message should say " 1. The application "catnap" is not available in your tc.data catalog on any site provided in sites.xml" or something to that effect. - Mike ----- Original Message ----- Hello, I am trying Swift running on beagle talking to its PBS resources. However, it seems I am getting inconsistent error messages. I have following files(all attached): a swift source, tc(transaction), sites.xml, cf(config), a binary First, I try the following commandline: swift -config cf -tc.file tc -sites.file sites.xml catsnsleep.swift -n=1 And get the following error: The application "catnap" is not available in your tc.data catalog Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException Final status: Failed:1 The following errors have occurred: 1. The application "catnap" is not available in your tc.data catalog ==== However, the application *is* available in tc since the following altered commandline (intended for local execution) works: swift -tc.file tc catsnsleep.swift -n=1 Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) RunID: 20110317-1709-9akduibd Progress: Progress: Checking status:1 Final status: Finished successfully:1 ==== Finally when I just skip the config and use only the sites.xml as in the following commandline: swift -tc.file tc -sites.file sites.xml catsnsleep.swift -n=1 I get the following: Execution failed: The application "catnap" is not available in your tc.data catalog ==== May be, I am missing something or doing something wrong. However, from an end user/ beginning user point of view it seems the error messages are not consistent/informative. Regards, Ketan _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Mar 17 23:43:53 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Mar 2011 23:43:53 -0500 (CDT) Subject: [Swift-devel] Re: Proposal for coaster service options In-Reply-To: <1299779977.28297.0.camel@blabla2.none> Message-ID: <2090735166.203724.1300423433175.JavaMail.root@zimbra.anl.gov> > > One other item that came up in yesterday's meeting was the set of > > command line features to add to coaster-service (and to swift itself > > which we didnt mention) to put the integrated coaster service into > > passive mode and to make it save port numbers in a file for > > integration with scripts. > > > > That might be a good task to do soon if its easy/feasible. > > Yes. Seems like a quick and useful thing to have. Though doesn't the > sites.xml scheme work in this case? By "this case" do you mean the case where the coaster service is running in the Swift JVM? I.e. from jobmanager=local:something in the coaster pool entry? I think we may need to check on that case - I cant recall how it behaves. - Mike ----- Original Message ----- > Sounds good. These make sense. > > Mihael > > On Thu, 2011-03-10 at 08:27 -0600, Michael Wilde wrote: > > Mihael, > > > > I want to work with Ketan to make the various coater startup scripts > > that we've developed suitable and documented for end user use. > > > > As a start, can you consider something like the following > > enhancement to coaster-service: > > > > -L Go into passive mode > > -s service port > > -S file to write service port number to > > (forces dynamic port selection starting with -s port number) > > -w passive worker connection port > > -W file to write worker port to > > (forces dynamic port selection starting with -s port number) > > > > This would make the startup scripts much simpler and cleaner: we > > would not have to run a dummy swift script to force the service to > > go into passive mode, and we would not have to scrape the standard > > output for the passive port number, and we could control whether we > > use a fixed port or a dynamically selected port. > > > > - Mike > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Mar 18 00:33:11 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 17 Mar 2011 22:33:11 -0700 Subject: [Swift-devel] Re: Proposal for coaster service options In-Reply-To: <2090735166.203724.1300423433175.JavaMail.root@zimbra.anl.gov> References: <2090735166.203724.1300423433175.JavaMail.root@zimbra.anl.gov> Message-ID: <1300426391.6444.1.camel@blabla2.none> On Thu, 2011-03-17 at 23:43 -0500, Michael Wilde wrote: > > > One other item that came up in yesterday's meeting was the set of > > > command line features to add to coaster-service (and to swift itself > > > which we didnt mention) to put the integrated coaster service into > > > passive mode and to make it save port numbers in a file for > > > integration with scripts. > > > > > > That might be a good task to do soon if its easy/feasible. > > > > Yes. Seems like a quick and useful thing to have. Though doesn't the > > sites.xml scheme work in this case? > > By "this case" do you mean the case where the coaster service is > running in the Swift JVM? I.e. from jobmanager=local:something in the > coaster pool entry? I think I'm misunderstanding the issue. Are you referring to having the standalone service configured for passive mode? From wilde at mcs.anl.gov Fri Mar 18 09:58:47 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 18 Mar 2011 09:58:47 -0500 (CDT) Subject: [Swift-devel] Re: Proposal for coaster service options In-Reply-To: <1300426391.6444.1.camel@blabla2.none> Message-ID: <707163068.205329.1300460327403.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > On Thu, 2011-03-17 at 23:43 -0500, Michael Wilde wrote: > > > > One other item that came up in yesterday's meeting was the set > > > > of > > > > command line features to add to coaster-service (and to swift > > > > itself > > > > which we didnt mention) to put the integrated coaster service > > > > into > > > > passive mode and to make it save port numbers in a file for > > > > integration with scripts. > > > > > > > > That might be a good task to do soon if its easy/feasible. > > > > > > Yes. Seems like a quick and useful thing to have. Though doesn't > > > the > > > sites.xml scheme work in this case? > > > > By "this case" do you mean the case where the coaster service is > > running in the Swift JVM? I.e. from jobmanager=local:something in > > the > > coaster pool entry? > > I think I'm misunderstanding the issue. > > Are you referring to having the standalone service configured for > passive mode? Yes. The original mail I sent, proposing new command line options, was referring entirely to the coaster-service command. In a later email, I realized that some of those issues might apply to the coaster service when running within the swift command's jvm as well. - it seems that some or all of port management options (for setting and reporting port numbers) may apply to swift as well - its likely that the option to set passive *does not* apply, as it already works. I think I was confused on the various combinations when I brought that up. Since currently we get the standalone service to enter passive mode by running a swift script that has passive mode set in the sites entry for that service, I realized on reflection that setting the passive option when the coaster service is running with the swift command JVM *must* be working correctly. It would be good to verify and create tests for this, but that is my current assumption. Related to all this: I think that to do this job fully, we need to complete the set of wrapper commands that make manually run coasters an end-user-ready feature. And then to create scripts in the test framework to verify that they work. That will take more discussion, specification work, and devel time. But I feel we need to now get this feature completed and working, as there is user need for it. Mihael, if you can get the changes into coaster-service and the swift command, I think others can get the wrapper scripts done and tested. There is I think a prototype for this command support somewhere (Justin, you reminded me of these a few days ago: can you point out where they are?) - Mike From wilde at mcs.anl.gov Fri Mar 18 09:59:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 18 Mar 2011 09:59:46 -0500 (CDT) Subject: [Swift-devel] Re: Proposal for coaster service options In-Reply-To: <707163068.205329.1300460327403.JavaMail.root@zimbra.anl.gov> Message-ID: <970913622.205347.1300460386760.JavaMail.root@zimbra.anl.gov> ----- Forwarded Message ----- From: "Justin M Wozniak" To: "Michael Wilde" Sent: Friday, March 18, 2011 9:51:13 AM Subject: Re: WHere are manual coaster startup scripts? They're in: https://svn.ci.uchicago.edu/svn/vdl2/usertools/persistent-coasters ----- Original Message ----- > ----- Original Message ----- > > On Thu, 2011-03-17 at 23:43 -0500, Michael Wilde wrote: > > > > > One other item that came up in yesterday's meeting was the set > > > > > of > > > > > command line features to add to coaster-service (and to swift > > > > > itself > > > > > which we didnt mention) to put the integrated coaster service > > > > > into > > > > > passive mode and to make it save port numbers in a file for > > > > > integration with scripts. > > > > > > > > > > That might be a good task to do soon if its easy/feasible. > > > > > > > > Yes. Seems like a quick and useful thing to have. Though doesn't > > > > the > > > > sites.xml scheme work in this case? > > > > > > By "this case" do you mean the case where the coaster service is > > > running in the Swift JVM? I.e. from jobmanager=local:something in > > > the > > > coaster pool entry? > > > > I think I'm misunderstanding the issue. > > > > Are you referring to having the standalone service configured for > > passive mode? > > Yes. The original mail I sent, proposing new command line options, was > referring entirely to the coaster-service command. > > In a later email, I realized that some of those issues might apply to > the coaster service when running within the swift command's jvm as > well. > > - it seems that some or all of port management options (for setting > and reporting port numbers) may apply to swift as well > > - its likely that the option to set passive *does not* apply, as it > already works. I think I was confused on the various combinations when > I brought that up. Since currently we get the standalone service to > enter passive mode by running a swift script that has passive mode set > in the sites entry for that service, I realized on reflection that > setting the passive option when the coaster service is running with > the swift command JVM *must* be working correctly. It would be good to > verify and create tests for this, but that is my current assumption. > > Related to all this: I think that to do this job fully, we need to > complete the set of wrapper commands that make manually run coasters > an end-user-ready feature. And then to create scripts in the test > framework to verify that they work. That will take more discussion, > specification work, and devel time. But I feel we need to now get this > feature completed and working, as there is user need for it. > > Mihael, if you can get the changes into coaster-service and the swift > command, I think others can get the wrapper scripts done and tested. > > There is I think a prototype for this command support somewhere > (Justin, you reminded me of these a few days ago: can you point out > where they are?) > > - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sat Mar 19 07:01:05 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 19 Mar 2011 07:01:05 -0500 (CDT) Subject: [Swift-devel] Re: GSoC project (Globus Online data provider for Swift) In-Reply-To: <1214710791.185960.1300118359243.JavaMail.root@zimbra.anl.gov> Message-ID: <1110428141.209011.1300536065881.JavaMail.root@zimbra.anl.gov> Claudiu, Here are two papers on Swift that will appear in Parallel COmputing Journal later this year. I think it will help you get a feel for the language: http://www.ci.uchicago.edu/swift/papers/SwiftParallelScripting.pdf https://sites.google.com/site/swiftguide/papers/SwiftLanguage.ParCo.to-appear.2011.0316.pdf To lean about data providers in Swift, you should download the code for the latest release (0.92) from SVN (instructions are in the Swift download page) and look at the Java class structure for the local data provider and the GridFTP data provider. You'll need to study the structure of the interfaces and abstract classes involved in writing a provider, but for the most part you can copy an existing provider as a base for the Globus Online data provider. You'll likely need to use a Java interface for REST. You can find some (but I think not extensive) documentation on the Globus Java GoC class structure for writing a provider in the CoG Kit documentation wiki. Mihael, can you provide some guidance on this, which I can copy to the GSoC page that describes this project, for Claudiu and other GSoC students that may be interested in this project? Thanks, Mike ----- Original Message ----- > Hi Claudiu, > > You can learn more about (and try) Swift at www.ci.uchicago.edu/swift. > > For this project, you will also need to: > - understand how Globus Online (GO) works > - learn how to write programatic interfaces to GO (REST) > - learn how to write "data providers" for Swift using the Java CoG kit > (will involve learning about the CoG kit, finding the existing > providers, and adding a new one > > Regards, > > Mike > > > ----- Original Message ----- > > Hello, > > My name is Claudiu Constantin Ghioc, and I study Computer Science at > > the > > Polytechnic University of Bucharest, Romania. I have been searching > > for > > a an interesting project for this year Google Summer of Code, and I > > came > > across the Swift project: Develop data management provider (driver) > > for > > Globus Online. I was wondering if you could send me more details > > about > > it. How can I get to know more about the task and how can I get more > > acquainted with Swift? > > Thank you! > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From yadudoc1729 at gmail.com Sun Mar 20 16:49:43 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Mon, 21 Mar 2011 03:19:43 +0530 Subject: [Swift-devel] Running swift after fresh install giving errors Message-ID: Hi, I am fairly new to swift and I'm getting the following error when I try to run swift : Error: SWIFT_HOME is not set, and all attempts at guessing it failed I am using the development version and the build was reported to be successful. Googling for this didn't help much and setting SWIFT_HOME or swift.home manually is advised against here [1]. Nevertheless I tries that as well to no use. Any suggestion to fix this will be greatly appreciated. The details of my system are : Ubuntu 10.10 sun Java version "1.6.0_24" [1] http://www.ci.uchicago.edu/swift/guides/userguide.php -- Thanks and Regards, Yadu Nand B (+91 94477 80725) ( http://humanint.posterous.com ) From wilde at mcs.anl.gov Sun Mar 20 20:14:31 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 20 Mar 2011 20:14:31 -0500 (CDT) Subject: [Swift-devel] Running swift after fresh install giving errors In-Reply-To: Message-ID: <1090317580.1548.1300670071089.JavaMail.root@zimbra.anl.gov> Yadu, can you verify that you are running the swift command from the bin/ directory that was generated when you built swift? Since you built it from the svn trunk src it seems, you should be doing something like this to run it (assuming your src tree start at $HOME/swift/src/): PATH=$HOME/swift/src/cog/modules/swift/dist/swift-svn/bin::$PATH swift mytest.swift The swift command expects to find the jar files it requires at swift/bin/../lib as you can see from these lines in the bin/swift shell script: --- SWIFT_HOME=`dirname $MY_PATH`"/.." if [ ! -f "$SWIFT_HOME/lib/cog-swift-svn.jar" ] && [ ! -f "$SWIFT_HOME/lib/cog.jar" ] ; then echo "Error: SWIFT_HOME is not set, and all attempts at guessing it failed" --- - Mike ----- Original Message ----- > Hi, > > I am fairly new to swift and I'm getting the following error when I > try to > run swift : > Error: SWIFT_HOME is not set, and all attempts at guessing it failed > > I am using the development version and the build was reported to be > successful. Googling for this didn't help much and setting SWIFT_HOME > or swift.home manually is advised against here [1]. Nevertheless I > tries > that as well to no use. > > Any suggestion to fix this will be greatly appreciated. > > The details of my system are : > Ubuntu 10.10 > sun Java version "1.6.0_24" > > > [1] http://www.ci.uchicago.edu/swift/guides/userguide.php > -- > Thanks and Regards, > Yadu Nand B > (+91 94477 80725) > ( http://humanint.posterous.com ) > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Mon Mar 21 11:23:42 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 11:23:42 -0500 (CDT) Subject: Fwd: [Swift-devel] Running swift after fresh install giving errors In-Reply-To: Message-ID: <45698470.4047.1300724622626.JavaMail.root@zimbra.anl.gov> Cc'ing the list to confirm that this worked. We should add this to the user guide. Ketan, please add this to your doc-to-do list or bugzilla. - Mike ----- Forwarded Message ----- From: "Yadu Nand" To: "Michael Wilde" Sent: Monday, March 21, 2011 6:32:53 AM Subject: Re: [Swift-devel] Running swift after fresh install giving errors Hi Michael, > PATH=$HOME/swift/src/cog/modules/swift/dist/swift-svn/bin::$PATH Thanks, that fixed it. I was adding $HOME/bin/cog/module/swift/bin to the PATH. yadu at yadu-laptop:~/bin/cog/modules/swift/examples$ swift first.swift Swift svn swift-r4187 cog-r3062 RunID: 20110321-1653-akxbkrv9 Progress: time:14 Final status: time:316 Finished successfully:1 -- Thanks and Regards, Yadu Nand B (+91 94477 80725) ( http://humanint.posterous.com ) -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Mon Mar 21 12:14:44 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 12:14:44 -0500 (CDT) Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1846396096.4600.1300727051671.JavaMail.root@zimbra.anl.gov> Message-ID: <1599436007.4663.1300727684327.JavaMail.root@zimbra.anl.gov> I had not run Swift on beagle for the past two weeks. When I started again last night, Im now getting some kind of coaster service timeout error when the script terminates. The script seems to run OK up to termination. The error is: Canceling job 19876.sdb Got exception in send java.lang.IllegalStateException: Timer was cancelled Is anyone seeing the same thing? Any idea why? This is running swift at: login1$ which swift /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast/bin/swift login1$ swift -version Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) The log for this example is on CI net in /home/wilde/swift/lab/catsn-20110320-2257-tpoi32je.log Run command and output: login1$ swift -config cf -tc.file tc -sites.file beagle-coaster.xml catsn.swift -n=10 Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) RunID: 20110320-2257-tpoi32je Progress: Progress: Submitted:9 Active:1 Trying to release previously released contact: 3 Final status: Finished successfully:10 Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) Canceling job 19876.sdb Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) login1$ -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Mon Mar 21 12:37:23 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 12:37:23 -0500 (CDT) Subject: [Swift-devel] Please send Swift documentation enhancement needs to this list In-Reply-To: Message-ID: <1508842266.4780.1300729043772.JavaMail.root@zimbra.anl.gov> was: Re: [Swift-devel] Running swift after fresh install giving errors Ketan, Sounds good. I think for now the CI Twiki Swift Cookbook page is one of the main semi-public collections of missing documentation needed. You should scan bugzilla as well. Does anyone else have a list of Swift documentation enhancements needed? I also have few "developers diaries" which I used to keep about Swift things that I found confusing and which needed further documentation. Sarah, I think I passed these to you at one point. Were you able to by any chance extract any documentation "to do" items from those notes? you have anything else to pass to Ketan? Thanks, Mike ----- Original Message ----- Mike, Done. Tonight I am going to give a big push to the user guide documentation. Please keep me updated of more such ideas and suggestions if any. Ketan On Mon, Mar 21, 2011 at 11:23 AM, Michael Wilde < wilde at mcs.anl.gov > wrote: Cc'ing the list to confirm that this worked. We should add this to the user guide. Ketan, please add this to your doc-to-do list or bugzilla. - Mike ----- Forwarded Message ----- From: "Yadu Nand" < yadudoc1729 at gmail.com > To: "Michael Wilde" < wilde at mcs.anl.gov > Sent: Monday, March 21, 2011 6:32:53 AM Subject: Re: [Swift-devel] Running swift after fresh install giving errors Hi Michael, > PATH=$HOME/swift/src/cog/modules/swift/dist/swift-svn/bin::$PATH Thanks, that fixed it. I was adding $HOME/bin/cog/module/swift/bin to the PATH. yadu at yadu-laptop:~/bin/cog/modules/swift/examples$ swift first.swift Swift svn swift-r4187 cog-r3062 RunID: 20110321-1653-akxbkrv9 Progress: time:14 Final status: time:316 Finished successfully:1 -- Thanks and Regards, Yadu Nand B (+91 94477 80725 ) ( http://humanint.posterous.com ) -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon Mar 21 13:16:34 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 21 Mar 2011 13:16:34 -0500 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1599436007.4663.1300727684327.JavaMail.root@zimbra.anl.gov> References: <1846396096.4600.1300727051671.JavaMail.root@zimbra.anl.gov> <1599436007.4663.1300727684327.JavaMail.root@zimbra.anl.gov> Message-ID: Hi, I tried on my home and am getting the same error stack as well. ==== [ketan at login2:catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster.xml catsn.swift -n=1 Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) RunID: 20110321-1213-4abfllee Progress: Progress: Submitting:1 Progress: Active:1 Final status: Finished successfully:1 Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) Canceling job 19904.sdb java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) Canceling job 19904.sdb ==== On Mon, Mar 21, 2011 at 12:14 PM, Michael Wilde wrote: > I had not run Swift on beagle for the past two weeks. When I started again > last night, Im now getting some kind of coaster service timeout error when > the script terminates. The script seems to run OK up to termination. > > The error is: > > Canceling job 19876.sdb > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > > Is anyone seeing the same thing? Any idea why? > > This is running swift at: > > login1$ which swift > /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast/bin/swift > login1$ swift -version > Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified > locally) > > > The log for this example is on CI net in > /home/wilde/swift/lab/catsn-20110320-2257-tpoi32je.log > > > Run command and output: > > login1$ swift -config cf -tc.file tc -sites.file beagle-coaster.xml > catsn.swift -n=10 > Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified > locally) > > RunID: 20110320-2257-tpoi32je > Progress: > Progress: Submitted:9 Active:1 > Trying to release previously released contact: 3 > Final status: Finished successfully:10 > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > Canceling job 19876.sdb > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > login1$ > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Mar 21 13:23:56 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 13:23:56 -0500 (CDT) Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: Message-ID: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> Ketan, I recall that you ran the same example last week, and did not report this error at that time. Is your impressions the same as mine: that something changed either in Beagle or in the Swift revision we are using there, to cause this? Im somewhat suspicious that a recent change in Beagle's PBS behavior is causing this error in Swift. Mihael, all, do you have any thoughts on what that might be? - Mike ----- Original Message ----- Hi, I tried on my home and am getting the same error stack as well. ==== [ketan at login2:catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster.xml catsn.swift -n=1 Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) RunID: 20110321-1213-4abfllee Progress: Progress: Submitting:1 Progress: Active:1 Final status: Finished successfully:1 Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) Canceling job 19904.sdb java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) Canceling job 19904.sdb ==== On Mon, Mar 21, 2011 at 12:14 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: I had not run Swift on beagle for the past two weeks. When I started again last night, Im now getting some kind of coaster service timeout error when the script terminates. The script seems to run OK up to termination. The error is: Canceling job 19876.sdb Got exception in send java.lang.IllegalStateException: Timer was cancelled Is anyone seeing the same thing? Any idea why? This is running swift at: login1$ which swift /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast/bin/swift login1$ swift -version Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) The log for this example is on CI net in /home/wilde/swift/lab/catsn-20110320-2257-tpoi32je.log Run command and output: login1$ swift -config cf -tc.file tc -sites.file beagle-coaster.xml catsn.swift -n=10 Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) RunID: 20110320-2257-tpoi32je Progress: Progress: Submitted:9 Active:1 Trying to release previously released contact: 3 Final status: Finished successfully:10 Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) Canceling job 19876.sdb Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) login1$ -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Mon Mar 21 13:30:49 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 21 Mar 2011 13:30:49 -0500 (CDT) Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> References: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> Message-ID: PBS is a possibility- did you update from trunk in the last 2 hours? On Mon, 21 Mar 2011, Michael Wilde wrote: > Ketan, I recall that you ran the same example last week, and did not > report this error at that time. > > > Is your impressions the same as mine: that something changed either in > Beagle or in the Swift revision we are using there, to cause this? > > > Im somewhat suspicious that a recent change in Beagle's PBS behavior is > causing this error in Swift. > > > Mihael, all, do you have any thoughts on what that might be? > > > - Mike > > ----- Original Message ----- > > > Hi, > > I tried on my home and am getting the same error stack as well. > > ==== > [ketan at login2:catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster.xml catsn.swift -n=1 > Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) > > RunID: 20110321-1213-4abfllee > Progress: > Progress: Submitting:1 > Progress: Active:1 > Final status: Finished successfully:1 > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > Canceling job 19904.sdb > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > Canceling job 19904.sdb > > ==== > > > On Mon, Mar 21, 2011 at 12:14 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: > > > I had not run Swift on beagle for the past two weeks. When I started again last night, Im now getting some kind of coaster service timeout error when the script terminates. The script seems to run OK up to termination. > > The error is: > > Canceling job 19876.sdb > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > > Is anyone seeing the same thing? Any idea why? > > This is running swift at: > > login1$ which swift > /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast/bin/swift > login1$ swift -version > Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) > > > The log for this example is on CI net in /home/wilde/swift/lab/catsn-20110320-2257-tpoi32je.log > > > Run command and output: > > login1$ swift -config cf -tc.file tc -sites.file beagle-coaster.xml catsn.swift -n=10 > Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog modified locally) > > RunID: 20110320-2257-tpoi32je > Progress: > Progress: Submitted:9 Active:1 > Trying to release previously released contact: 3 > Final status: Finished successfully:10 > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > Canceling job 19876.sdb > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > login1$ > > > > -- Justin M Wozniak From ketancmaheshwari at gmail.com Mon Mar 21 13:40:05 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 21 Mar 2011 13:40:05 -0500 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> References: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> Message-ID: Mike, On Mon, Mar 21, 2011 at 1:23 PM, Michael Wilde wrote: > Ketan, I recall that you ran the same example last week, and did not report > this error at that time. > > Is your impressions the same as mine: that something changed either in > Beagle or in the Swift revision we are using there, to cause this? > > That is right. I did not have any such error last week. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Mar 21 13:49:52 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 13:49:52 -0500 (CDT) Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: Message-ID: <1427714115.5399.1300733392682.JavaMail.root@zimbra.anl.gov> No, I first noticed this last night, and the code Im running hasn't been updated (from 0.92 branch) in the past 2 weeks as far as I know. - Mike ----- Original Message ----- > PBS is a possibility- did you update from trunk in the last 2 hours? > > On Mon, 21 Mar 2011, Michael Wilde wrote: > > > Ketan, I recall that you ran the same example last week, and did not > > report this error at that time. > > > > > > Is your impressions the same as mine: that something changed either > > in > > Beagle or in the Swift revision we are using there, to cause this? > > > > > > Im somewhat suspicious that a recent change in Beagle's PBS behavior > > is > > causing this error in Swift. > > > > > > Mihael, all, do you have any thoughts on what that might be? > > > > > > - Mike > > > > ----- Original Message ----- > > > > > > Hi, > > > > I tried on my home and am getting the same error stack as well. > > > > ==== > > [ketan at login2:catsn.works]$ swift -config cf -tc.file tc -sites.file > > beagle-coaster.xml catsn.swift -n=1 > > Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog > > modified locally) > > > > RunID: 20110321-1213-4abfllee > > Progress: > > Progress: Submitting:1 > > Progress: Active:1 > > Final status: Finished successfully:1 > > Got exception in send > > java.lang.IllegalStateException: Timer was cancelled > > at java.util.Timer.scheduleImpl(Timer.java:564) > > at java.util.Timer.schedule(Timer.java:449) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > Got exception in send > > java.lang.IllegalStateException: Timer was cancelled > > at java.util.Timer.scheduleImpl(Timer.java:564) > > at java.util.Timer.schedule(Timer.java:449) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > java.lang.IllegalStateException: Timer was cancelled > > at java.util.Timer.scheduleImpl(Timer.java:564) > > at java.util.Timer.schedule(Timer.java:449) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > > Canceling job 19904.sdb > > java.lang.IllegalStateException: Timer was cancelled > > at java.util.Timer.scheduleImpl(Timer.java:564) > > at java.util.Timer.schedule(Timer.java:449) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > > Canceling job 19904.sdb > > > > ==== > > > > > > On Mon, Mar 21, 2011 at 12:14 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > I had not run Swift on beagle for the past two weeks. When I started > > again last night, Im now getting some kind of coaster service > > timeout error when the script terminates. The script seems to run OK > > up to termination. > > > > The error is: > > > > Canceling job 19876.sdb > > Got exception in send > > java.lang.IllegalStateException: Timer was cancelled > > > > Is anyone seeing the same thing? Any idea why? > > > > This is running swift at: > > > > login1$ which swift > > /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast/bin/swift > > login1$ swift -version > > Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog > > modified locally) > > > > > > The log for this example is on CI net in > > /home/wilde/swift/lab/catsn-20110320-2257-tpoi32je.log > > > > > > Run command and output: > > > > login1$ swift -config cf -tc.file tc -sites.file beagle-coaster.xml > > catsn.swift -n=10 > > Swift svn swift-r4143 (swift modified locally) cog-r3056 (cog > > modified locally) > > > > RunID: 20110320-2257-tpoi32je > > Progress: > > Progress: Submitted:9 Active:1 > > Trying to release previously released contact: 3 > > Final status: Finished successfully:10 > > Got exception in send > > java.lang.IllegalStateException: Timer was cancelled > > at java.util.Timer.scheduleImpl(Timer.java:564) > > at java.util.Timer.schedule(Timer.java:449) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > Canceling job 19876.sdb > > Got exception in send > > java.lang.IllegalStateException: Timer was cancelled > > at java.util.Timer.scheduleImpl(Timer.java:564) > > at java.util.Timer.schedule(Timer.java:449) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > login1$ > > > > > > > > > > -- > Justin M Wozniak -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From skenny at uchicago.edu Mon Mar 21 14:27:31 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 21 Mar 2011 12:27:31 -0700 Subject: [Swift-devel] Re: Please send Swift documentation enhancement needs to this list In-Reply-To: <1508842266.4780.1300729043772.JavaMail.root@zimbra.anl.gov> References: <1508842266.4780.1300729043772.JavaMail.root@zimbra.anl.gov> Message-ID: On Mon, Mar 21, 2011 at 10:37 AM, Michael Wilde wrote: > was: Re: [Swift-devel] Running swift after fresh install giving errors > > Ketan, > > Sounds good. I think for now the CI Twiki Swift Cookbook page is one of the > main semi-public collections of missing documentation needed. You should > scan bugzilla as well. > > Does anyone else have a list of Swift documentation enhancements needed? > you mean besides https://sites.google.com/site/swiftdevel/ ? > > I also have few "developers diaries" which I used to keep about Swift > things that I found confusing and which needed further documentation. > Sarah, I think I passed these to you at one point. Were you able to by any > chance extract any documentation "to do" items from those notes? > i wasn't able to make much out of this document (attached)...maybe ketan will have more luck (?) > > ------------------------------ > > Mike, > > Done. Tonight I am going to give a big push to the user guide > documentation. > > Please keep me updated of more such ideas and suggestions if any. > > Ketan > > On Mon, Mar 21, 2011 at 11:23 AM, Michael Wilde wrote: > >> Cc'ing the list to confirm that this worked. >> >> We should add this to the user guide. Ketan, please add this to your >> doc-to-do list or bugzilla. >> >> - Mike >> >> ----- Forwarded Message ----- >> From: "Yadu Nand" >> To: "Michael Wilde" >> Sent: Monday, March 21, 2011 6:32:53 AM >> Subject: Re: [Swift-devel] Running swift after fresh install giving errors >> >> Hi Michael, >> >> > PATH=$HOME/swift/src/cog/modules/swift/dist/swift-svn/bin::$PATH >> Thanks, that fixed it. I was adding $HOME/bin/cog/module/swift/bin to the >> PATH. >> >> yadu at yadu-laptop:~/bin/cog/modules/swift/examples$ swift first.swift >> Swift svn swift-r4187 cog-r3062 >> >> RunID: 20110321-1653-akxbkrv9 >> Progress: time:14 >> Final status: time:316 Finished successfully:1 >> >> -- >> Thanks and Regards, >> Yadu Nand B >> (+91 94477 80725) >> ( http://humanint.posterous.com ) >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Notes.SwiftUsage.doc Type: application/msword Size: 212480 bytes Desc: not available URL: From jon.monette at gmail.com Mon Mar 21 14:36:44 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 21 Mar 2011 14:36:44 -0500 Subject: [Swift-devel] Re: Please send Swift documentation enhancement needs to this list In-Reply-To: References: <1508842266.4780.1300729043772.JavaMail.root@zimbra.anl.gov> Message-ID: How hard would it be to document all the valid keys and values that can go into the sites.xml or is there already such documentation available? Like the line 3600 or . It would be nice to download a doc or go to a website and can find all the profile keys and jobmanagers that the sites.xml can take and what the valid values can take. I know work has been put into developing sample sites.xml files for different machines but a webpage or a downloadable doc would be useful. On Mon, Mar 21, 2011 at 2:27 PM, Sarah Kenny wrote: > > > On Mon, Mar 21, 2011 at 10:37 AM, Michael Wilde wrote: > >> was: Re: [Swift-devel] Running swift after fresh install giving errors >> >> Ketan, >> >> Sounds good. I think for now the CI Twiki Swift Cookbook page is one of >> the main semi-public collections of missing documentation needed. You >> should scan bugzilla as well. >> >> Does anyone else have a list of Swift documentation enhancements needed? >> > > you mean besides https://sites.google.com/site/swiftdevel/ ? > > >> >> I also have few "developers diaries" which I used to keep about Swift >> things that I found confusing and which needed further documentation. >> Sarah, I think I passed these to you at one point. Were you able to by any >> chance extract any documentation "to do" items from those notes? >> > > i wasn't able to make much out of this document (attached)...maybe ketan > will have more luck (?) > > >> >> ------------------------------ >> >> Mike, >> >> Done. Tonight I am going to give a big push to the user guide >> documentation. >> >> Please keep me updated of more such ideas and suggestions if any. >> >> Ketan >> >> On Mon, Mar 21, 2011 at 11:23 AM, Michael Wilde wrote: >> >>> Cc'ing the list to confirm that this worked. >>> >>> We should add this to the user guide. Ketan, please add this to your >>> doc-to-do list or bugzilla. >>> >>> - Mike >>> >>> ----- Forwarded Message ----- >>> From: "Yadu Nand" >>> To: "Michael Wilde" >>> Sent: Monday, March 21, 2011 6:32:53 AM >>> Subject: Re: [Swift-devel] Running swift after fresh install giving >>> errors >>> >>> Hi Michael, >>> >>> > PATH=$HOME/swift/src/cog/modules/swift/dist/swift-svn/bin::$PATH >>> Thanks, that fixed it. I was adding $HOME/bin/cog/module/swift/bin to the >>> PATH. >>> >>> yadu at yadu-laptop:~/bin/cog/modules/swift/examples$ swift first.swift >>> Swift svn swift-r4187 cog-r3062 >>> >>> RunID: 20110321-1653-akxbkrv9 >>> Progress: time:14 >>> Final status: time:316 Finished successfully:1 >>> >>> -- >>> Thanks and Regards, >>> Yadu Nand B >>> (+91 94477 80725) >>> ( http://humanint.posterous.com ) >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> >> >> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon Mar 21 14:51:51 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 21 Mar 2011 14:51:51 -0500 Subject: [Swift-devel] Re: Please send Swift documentation enhancement needs to this list In-Reply-To: References: <1508842266.4780.1300729043772.JavaMail.root@zimbra.anl.gov> Message-ID: On Mon, Mar 21, 2011 at 2:36 PM, Jonathan Monette wrote: > How hard would it be to document all the valid keys and values that can go > into the sites.xml or is there already such documentation available? Like > the line 3600 > or . It would be > nice to download a doc or go to a website and can find all the profile keys > and jobmanagers that the sites.xml can take and what the valid values can > take. I know work has been put into developing sample sites.xml files for > different machines but a webpage or a downloadable doc would be useful. > > I agree, this should be documented. I am adding this to my doc-todo. Will start a page/section and add as much info as I can find. Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon Mar 21 14:55:54 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 21 Mar 2011 14:55:54 -0500 Subject: [Swift-devel] Re: Please send Swift documentation enhancement needs to this list In-Reply-To: References: <1508842266.4780.1300729043772.JavaMail.root@zimbra.anl.gov> Message-ID: > i wasn't able to make much out of this document (attached)...maybe ketan > will have more luck (?) > This seems quite like a historical document. I will see how much I can extract. Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Mar 21 15:44:37 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 21 Mar 2011 13:44:37 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> References: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> Message-ID: <1300740277.15551.1.camel@blabla2.none> On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde wrote: > > Mihael, all, do you have any thoughts on what that might be? Yes, but what I think it is wouldn't (in any obvious way I can see) be influenced by changes in the environment. If you look at the logs, is there anything else suspicious going on? Mihael From skenny at uchicago.edu Mon Mar 21 15:49:26 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 21 Mar 2011 13:49:26 -0700 Subject: [Swift-devel] latest trunk Message-ID: hmmm....so i just (a moment ago) built from trunk and on a quick test run (defaulting to localhost) i seem to be getting an infinite stream of something like this: No events in 10s. Badness Registered futures: int[] totalperms Closed, 10 elements, 0 listeners ---- Waiting threads: 0-4 ---- No events in 10s. Badness Registered futures: int[] totalperms Closed, 10 elements, 0 listeners ---- Waiting threads: 0-4 ---- it's a simple test script that runs fine in .92, anyone else experiencing this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From glen842 at uchicago.edu Mon Mar 21 16:30:17 2011 From: glen842 at uchicago.edu (Glen Hocky) Date: Mon, 21 Mar 2011 17:30:17 -0400 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1300740277.15551.1.camel@blabla2.none> References: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> <1300740277.15551.1.camel@blabla2.none> Message-ID: I got rid of the problem by setting my project as per Ti's instructions projects --available projects --set PROJECT On Mon, Mar 21, 2011 at 4:44 PM, Mihael Hategan wrote: > On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde wrote: > > > > > > Mihael, all, do you have any thoughts on what that might be? > > Yes, but what I think it is wouldn't (in any obvious way I can see) be > influenced by changes in the environment. > > If you look at the logs, is there anything else suspicious going on? > > Mihael > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Mar 21 16:57:08 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 16:57:08 -0500 (CDT) Subject: [Swift-devel] latest trunk In-Reply-To: Message-ID: <668521915.7021.1300744628531.JavaMail.root@zimbra.anl.gov> Can you post the script? ----- Original Message ----- hmmm....so i just (a moment ago) built from trunk and on a quick test run (defaulting to localhost) i seem to be getting an infinite stream of something like this: No events in 10s. Badness Registered futures: int[] totalperms Closed, 10 elements, 0 listeners ---- Waiting threads: 0-4 ---- No events in 10s. Badness Registered futures: int[] totalperms Closed, 10 elements, 0 listeners ---- Waiting threads: 0-4 ---- it's a simple test script that runs fine in .92, anyone else experiencing this? _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Mon Mar 21 17:02:00 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 21 Mar 2011 15:02:00 -0700 Subject: [Swift-devel] latest trunk In-Reply-To: <668521915.7021.1300744628531.JavaMail.root@zimbra.anl.gov> References: <668521915.7021.1300744628531.JavaMail.root@zimbra.anl.gov> Message-ID: type file; type Rscript; type mxModel; app (mxModel min) mxModelProcessor(file covMatrix, Rscript mxModProc, int modnum, float weight, string cond) { RInvoke @filename(mxModProc) @filename(covMatrix) modnum weight cond; } file covMatrix; Rscript mxScript; int totalperms[] = [1:10]; float initweight = .5; foreach perm in totalperms{ mxModel modmin; modmin = mxModelProcessor(covMatrix, mxScript, perm, initweight, "speech"); } the R script it calls is actually just a sleep job On Mon, Mar 21, 2011 at 2:57 PM, Michael Wilde wrote: > Can you post the script? > > ------------------------------ > > hmmm....so i just (a moment ago) built from trunk and on a quick test run > (defaulting to localhost) i seem to be getting an infinite stream of > something like this: > > No events in 10s. > Badness > > Registered futures: > int[] totalperms Closed, 10 elements, 0 listeners > ---- > > Waiting threads: > 0-4 > ---- > > No events in 10s. > Badness > > Registered futures: > int[] totalperms Closed, 10 elements, 0 listeners > ---- > > Waiting threads: > 0-4 > ---- > > it's a simple test script that runs fine in .92, anyone else experiencing > this? > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Mar 21 17:05:10 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 17:05:10 -0500 (CDT) Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: Message-ID: <230649012.7071.1300745110081.JavaMail.root@zimbra.anl.gov> Thats interesting. Last night the projects command was failing for me. Its working now. But I set a project explicitly in the sites.xl pool entry. I've got job in the queue to see if the fix to the projects command also fixed swift, or if its the fact that my project was set in sites.xml rather than via the default project mechanism. - Mike ----- Original Message ----- I got rid of the problem by setting my project as per Ti's instructions projects --available projects --set PROJECT On Mon, Mar 21, 2011 at 4:44 PM, Mihael Hategan < hategan at mcs.anl.gov > wrote: On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde wrote: > > Mihael, all, do you have any thoughts on what that might be? Yes, but what I think it is wouldn't (in any obvious way I can see) be influenced by changes in the environment. If you look at the logs, is there anything else suspicious going on? Mihael _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Mar 21 17:27:21 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 17:27:21 -0500 (CDT) Subject: [Swift-devel] latest trunk In-Reply-To: Message-ID: <1689800707.7159.1300746441630.JavaMail.root@zimbra.anl.gov> Sarah, I wont be able to test and help debug this today, but I notice that your app function declares but does not set the file return value "min". Was (or is) the exact same script working under 0.92 in this manner? If you think this should work under trunk, I would first package up a version that anyone can run (ie include a mock RInvoke script, tc, sites, and properties file. The try to create a trunk version as of about a week or 2 ago, and see if you find find the trunk rev that broke it. If you can reproduce the error on latest trunk, posting the log and all the specs/files needed to run it would help Justin or Mihael try to find the problem. But first make sure the script is logically correct w.r.t the min return value. Looks wrong to me, but I may be missing something obvious. - Mike ----- Original Message ----- type file; type Rscript; type mxModel; app (mxModel min) mxModelProcessor(file covMatrix, Rscript mxModProc, int modnum, float weight, string cond) { RInvoke @filename(mxModProc) @filename(covMatrix) modnum weight cond; } file covMatrix; Rscript mxScript; int totalperms[] = [1:10]; float initweight = .5; foreach perm in totalperms{ mxModel modmin; modmin = mxModelProcessor(covMatrix, mxScript, perm, initweight, "speech"); } the R script it calls is actually just a sleep job On Mon, Mar 21, 2011 at 2:57 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: Can you post the script? hmmm....so i just (a moment ago) built from trunk and on a quick test run (defaulting to localhost) i seem to be getting an infinite stream of something like this: No events in 10s. Badness Registered futures: int[] totalperms Closed, 10 elements, 0 listeners ---- Waiting threads: 0-4 ---- No events in 10s. Badness Registered futures: int[] totalperms Closed, 10 elements, 0 listeners ---- Waiting threads: 0-4 ---- it's a simple test script that runs fine in .92, anyone else experiencing this? _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Mon Mar 21 17:29:53 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 21 Mar 2011 15:29:53 -0700 Subject: [Swift-devel] latest trunk In-Reply-To: <1689800707.7159.1300746441630.JavaMail.root@zimbra.anl.gov> References: <1689800707.7159.1300746441630.JavaMail.root@zimbra.anl.gov> Message-ID: it works fine under .92 and is something i often use for testing, but no worries i can debug more, i just thought maybe there was a recent commit that would jump out at someone as the potential cause of this behavior. ~sk On Mon, Mar 21, 2011 at 3:27 PM, Michael Wilde wrote: > Sarah, > > I wont be able to test and help debug this today, but I notice that your > app function declares but does not set the file return value "min". Was (or > is) the exact same script working under 0.92 in this manner? > > If you think this should work under trunk, I would first package up a > version that anyone can run (ie include a mock RInvoke script, tc, sites, > and properties file. > > The try to create a trunk version as of about a week or 2 ago, and see if > you find find the trunk rev that broke it. > > If you can reproduce the error on latest trunk, posting the log and all the > specs/files needed to run it would help Justin or Mihael try to find the > problem. > > But first make sure the script is logically correct w.r.t the min return > value. Looks wrong to me, but I may be missing something obvious. > > - Mike > > ------------------------------ > > type file; > type Rscript; > type mxModel; > > app (mxModel min) mxModelProcessor(file covMatrix, Rscript mxModProc, int > modnum, float weight, string cond) > { > RInvoke @filename(mxModProc) @filename(covMatrix) modnum weight > cond; > } > > file covMatrix; > Rscript mxScript; > > int totalperms[] = [1:10]; > float initweight = .5; > foreach perm in totalperms{ > mxModel modmin file=@strcat("./results/speech_",perm,".rdata")>; > modmin = mxModelProcessor(covMatrix, mxScript, perm, initweight, > "speech"); > } > > the R script it calls is actually just a sleep job > > On Mon, Mar 21, 2011 at 2:57 PM, Michael Wilde wrote: > >> Can you post the script? >> >> ------------------------------ >> >> hmmm....so i just (a moment ago) built from trunk and on a quick test run >> (defaulting to localhost) i seem to be getting an infinite stream of >> something like this: >> >> No events in 10s. >> Badness >> >> Registered futures: >> int[] totalperms Closed, 10 elements, 0 listeners >> ---- >> >> Waiting threads: >> 0-4 >> ---- >> >> No events in 10s. >> Badness >> >> Registered futures: >> int[] totalperms Closed, 10 elements, 0 listeners >> ---- >> >> Waiting threads: >> 0-4 >> ---- >> >> it's a simple test script that runs fine in .92, anyone else experiencing >> this? >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> >> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Mon Mar 21 17:37:11 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 21 Mar 2011 15:37:11 -0700 Subject: [Swift-devel] latest trunk In-Reply-To: References: <1689800707.7159.1300746441630.JavaMail.root@zimbra.anl.gov> Message-ID: oh, problem was bad tc file...now i have to figure out why swift decided not to tell me that :P On Mon, Mar 21, 2011 at 3:29 PM, Sarah Kenny wrote: > it works fine under .92 and is something i often use for testing, but no > worries i can debug more, i just thought maybe there was a recent commit > that would jump out at someone as the potential cause of this behavior. > > ~sk > > > On Mon, Mar 21, 2011 at 3:27 PM, Michael Wilde wrote: > >> Sarah, >> >> I wont be able to test and help debug this today, but I notice that your >> app function declares but does not set the file return value "min". Was (or >> is) the exact same script working under 0.92 in this manner? >> >> If you think this should work under trunk, I would first package up a >> version that anyone can run (ie include a mock RInvoke script, tc, sites, >> and properties file. >> >> The try to create a trunk version as of about a week or 2 ago, and see if >> you find find the trunk rev that broke it. >> >> If you can reproduce the error on latest trunk, posting the log and all >> the specs/files needed to run it would help Justin or Mihael try to find the >> problem. >> >> But first make sure the script is logically correct w.r.t the min return >> value. Looks wrong to me, but I may be missing something obvious. >> >> - Mike >> >> ------------------------------ >> >> type file; >> type Rscript; >> type mxModel; >> >> app (mxModel min) mxModelProcessor(file covMatrix, Rscript mxModProc, int >> modnum, float weight, string cond) >> { >> RInvoke @filename(mxModProc) @filename(covMatrix) modnum weight >> cond; >> } >> >> file covMatrix; >> Rscript mxScript; >> >> int totalperms[] = [1:10]; >> float initweight = .5; >> foreach perm in totalperms{ >> mxModel modmin> file=@strcat("./results/speech_",perm,".rdata")>; >> modmin = mxModelProcessor(covMatrix, mxScript, perm, initweight, >> "speech"); >> } >> >> the R script it calls is actually just a sleep job >> >> On Mon, Mar 21, 2011 at 2:57 PM, Michael Wilde wrote: >> >>> Can you post the script? >>> >>> ------------------------------ >>> >>> hmmm....so i just (a moment ago) built from trunk and on a quick test run >>> (defaulting to localhost) i seem to be getting an infinite stream of >>> something like this: >>> >>> No events in 10s. >>> Badness >>> >>> Registered futures: >>> int[] totalperms Closed, 10 elements, 0 listeners >>> ---- >>> >>> Waiting threads: >>> 0-4 >>> ---- >>> >>> No events in 10s. >>> Badness >>> >>> Registered futures: >>> int[] totalperms Closed, 10 elements, 0 listeners >>> ---- >>> >>> Waiting threads: >>> 0-4 >>> ---- >>> >>> it's a simple test script that runs fine in .92, anyone else experiencing >>> this? >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >>> >> >> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Mar 21 18:33:14 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 21 Mar 2011 16:33:14 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: References: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> <1300740277.15551.1.camel@blabla2.none> Message-ID: <1300750394.16266.2.camel@blabla2.none> So it is related to PBS. Please send me logs with this problem. The timer thing shouldn't be happening. On Mon, 2011-03-21 at 17:30 -0400, Glen Hocky wrote: > I got rid of the problem by setting my project as per Ti's > instructions > > > projects --available > projects --set PROJECT > > On Mon, Mar 21, 2011 at 4:44 PM, Mihael Hategan > wrote: > On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde wrote: > > > > > > Mihael, all, do you have any thoughts on what that might be? > > > Yes, but what I think it is wouldn't (in any obvious way I can > see) be > influenced by changes in the environment. > > If you look at the logs, is there anything else suspicious > going on? > > Mihael > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From ketancmaheshwari at gmail.com Mon Mar 21 21:07:26 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 21 Mar 2011 21:07:26 -0500 Subject: [Swift-devel] Swift FAQ Message-ID: Does a Swift FAQ exist? I would like to include that as a part of documentation if it does not already exist somewhere. Let me know your opinions. Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Mar 21 21:56:50 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 21:56:50 -0500 (CDT) Subject: [Swift-devel] Swift FAQ In-Reply-To: Message-ID: <851266178.7750.1300762610454.JavaMail.root@zimbra.anl.gov> Ketan, we have never had an FAQ. I think it would be good to have one, but making a useful one will take a while and some organized effort. The Globus Online team just went through the effort of soliciting FA Q's and I think many people contributed. We could try something similar. We might want to organize it, at one level, by the different classes of users: - Considering using Swift? - First time Swift usage - Novice Swift user - Advanced Swift User I think we first devote some solid effort to improving our current documentation, but an FAQ effort near the tail end of that would, I agree, be very useful. - Mike ----- Original Message ----- Does a Swift FAQ exist? I would like to include that as a part of documentation if it does not already exist somewhere. Let me know your opinions. Ketan _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Mar 21 22:13:06 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 Mar 2011 22:13:06 -0500 (CDT) Subject: [Swift-devel] latest trunk In-Reply-To: Message-ID: <2042488827.7772.1300763586294.JavaMail.root@zimbra.anl.gov> OK. And you can ignore my (incorrect) comment on "min" not being set; I realized that its mapped by the caller and created by the RInvoke app, without being seen on the command line. Only modnum is passed and is used to create the proper output file name. (Probably a good FAQ topic ;) - Mike ----- Original Message ----- oh, problem was bad tc file...now i have to figure out why swift decided not to tell me that :P On Mon, Mar 21, 2011 at 3:29 PM, Sarah Kenny < skenny at uchicago.edu > wrote: it works fine under .92 and is something i often use for testing, but no worries i can debug more, i just thought maybe there was a recent commit that would jump out at someone as the potential cause of this behavior. ~sk On Mon, Mar 21, 2011 at 3:27 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: Sarah, I wont be able to test and help debug this today, but I notice that your app function declares but does not set the file return value "min". Was (or is) the exact same script working under 0.92 in this manner? If you think this should work under trunk, I would first package up a version that anyone can run (ie include a mock RInvoke script, tc, sites, and properties file. The try to create a trunk version as of about a week or 2 ago, and see if you find find the trunk rev that broke it. If you can reproduce the error on latest trunk, posting the log and all the specs/files needed to run it would help Justin or Mihael try to find the problem. But first make sure the script is logically correct w.r.t the min return value. Looks wrong to me, but I may be missing something obvious. - Mike type file; type Rscript; type mxModel; app (mxModel min) mxModelProcessor(file covMatrix, Rscript mxModProc, int modnum, float weight, string cond) { RInvoke @filename(mxModProc) @filename(covMatrix) modnum weight cond; } file covMatrix; Rscript mxScript; int totalperms[] = [1:10]; float initweight = .5; foreach perm in totalperms{ mxModel modmin; modmin = mxModelProcessor(covMatrix, mxScript, perm, initweight, "speech"); } the R script it calls is actually just a sleep job On Mon, Mar 21, 2011 at 2:57 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: Can you post the script? hmmm....so i just (a moment ago) built from trunk and on a quick test run (defaulting to localhost) i seem to be getting an infinite stream of something like this: No events in 10s. Badness Registered futures: int[] totalperms Closed, 10 elements, 0 listeners ---- Waiting threads: 0-4 ---- No events in 10s. Badness Registered futures: int[] totalperms Closed, 10 elements, 0 listeners ---- Waiting threads: 0-4 ---- it's a simple test script that runs fine in .92, anyone else experiencing this? _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Mar 22 00:31:04 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 21 Mar 2011 22:31:04 -0700 Subject: [Swift-devel] latest trunk In-Reply-To: References: <1689800707.7159.1300746441630.JavaMail.root@zimbra.anl.gov> Message-ID: <1300771864.19763.2.camel@blabla2.none> When the scheduler sees that no app is available on any site it should complain and fail. Theoretically. Logs may help. However, if not, it would be helpful to be able to reproduce the problem (as in me being able to do so). Mihael On Mon, 2011-03-21 at 15:37 -0700, Sarah Kenny wrote: > oh, problem was bad tc file...now i have to figure out why swift > decided not to tell me that :P > > On Mon, Mar 21, 2011 at 3:29 PM, Sarah Kenny > wrote: > it works fine under .92 and is something i often use for > testing, but no worries i can debug more, i just thought maybe > there was a recent commit that would jump out at someone as > the potential cause of this behavior. > > ~sk > > > > On Mon, Mar 21, 2011 at 3:27 PM, Michael Wilde > wrote: > Sarah, > > > I wont be able to test and help debug this today, but > I notice that your app function declares but does not > set the file return value "min". Was (or is) the > exact same script working under 0.92 in this manner? > > > If you think this should work under trunk, I would > first package up a version that anyone can run (ie > include a mock RInvoke script, tc, sites, and > properties file. > > > The try to create a trunk version as of about a week > or 2 ago, and see if you find find the trunk rev that > broke it. > > > If you can reproduce the error on latest trunk, > posting the log and all the specs/files needed to run > it would help Justin or Mihael try to find the > problem. > > > But first make sure the script is logically correct > w.r.t the min return value. Looks wrong to me, but I > may be missing something obvious. > > > - Mike > > > ______________________________________________________ > > type file; > type Rscript; > type mxModel; > > app (mxModel min) mxModelProcessor(file > covMatrix, Rscript mxModProc, int modnum, > float weight, string cond) > { > RInvoke @filename(mxModProc) > @filename(covMatrix) modnum weight cond; > } > > file > covMatrix; > Rscript > mxScript; > > int totalperms[] = [1:10]; > float initweight = .5; > foreach perm in totalperms{ > mxModel modmin file=@strcat("./results/speech_",perm,".rdata")>; > modmin = mxModelProcessor(covMatrix, > mxScript, perm, initweight, "speech"); > } > > the R script it calls is actually just a sleep > job > > On Mon, Mar 21, 2011 at 2:57 PM, Michael Wilde > wrote: > Can you post the script? > > > ______________________________________ > > hmmm....so i just (a moment > ago) built from trunk and on a > quick test run (defaulting to > localhost) i seem to be > getting an infinite stream of > something like this: > > No events in 10s. > Badness > > Registered futures: > int[] totalperms Closed, 10 > elements, 0 listeners > ---- > > Waiting threads: > 0-4 > ---- > > No events in 10s. > Badness > > Registered futures: > int[] totalperms Closed, 10 > elements, 0 listeners > ---- > > Waiting threads: > 0-4 > ---- > > it's a simple test script that > runs fine in .92, anyone else > experiencing this? > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > Michael Wilde > Computation Institute, University of > Chicago > Mathematics and Computer Science > Division > Argonne National Laboratory > > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wozniak at mcs.anl.gov Tue Mar 22 13:52:03 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 22 Mar 2011 13:52:03 -0500 (CDT) Subject: [Swift-devel] Renamed workersPerNode to jobsPerNode Message-ID: The point of this commit is to rename the workersPerNode setting to jobsPerNode. Not sure what happened to the subject. Justin -- Justin M Wozniak ---------- Forwarded message ---------- Date: Tue, 22 Mar 2011 13:47:41 From: noreply at svn.ci.uchicago.edu To: swift-commit at ci.uchicago.edu Subject: utf-8?B?W1N3aWZ0LWNvbW1pdF0gcjQyMDQgLSBpbiB0cnVuazogZG9jcyBldGMvc2l0ZXMgZXRj L3NpdGVzL2NuYXJpLWFiZQlldGMvc2l0ZXMvY25hcmktcXVlZW5iZWUgZXRjL3NpdGVzL2NuYXJp LXJhbmdlcglldGMvc2l0ZXMvbG9jYWwtcGJzLWNvYXN0ZXJzIGV0Yy9zaXRlcy9wYWRzLWxvY2Fs LXBicy1jb2FzdGVycwlldGMvc2l0ZXMvcGFkcy1yZW1vdGUtcGJzLWNvYXN0ZXJzLXNzaAlldGMv c2l0ZXMvdGVyYXBvcnQtbG9jYWwtcGJzLWNvYXN0ZXJzCWV0Yy9zaXRlcy90ZXJhcG9ydC1yZW1v dGUtcGJzLWNvYXN0ZXJzLXNzaCBsaWJleGVjCXRlc3RzL2NkbS9wcyB0ZXN0cy9jZG0vcHMvcGlu bmVkIHRlc3RzL21waQl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtY29iYWx0L2ludHJlcGlkCXRlc3Rz L3Byb3ZpZGVycy9sb2NhbC1jb2JhbHQvc3VydmV5b3IJdGVzdHMvcHJvdmlkZXJzL2xvY2FsLXBi cy1jb2FzdGVycwl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtc2dlLWNvYXN0ZXJzIHRlc3RzL3Byb3Zp ZGVycy9zc2gtcGJzLWNvYXN0ZXJz?= Author: wozniak Date: 2011-03-22 13:47:41 -0500 (Tue, 22 Mar 2011) New Revision: 4204 Added: trunk/etc/sites/cnari-abe/ trunk/etc/sites/cnari-queenbee/ trunk/etc/sites/cnari-ranger/ trunk/etc/sites/local-pbs-coasters/ trunk/etc/sites/pads-local-pbs-coasters/ trunk/etc/sites/pads-remote-pbs-coasters-ssh/ trunk/etc/sites/teraport-local-pbs-coasters/ trunk/etc/sites/teraport-remote-pbs-coasters-ssh/ Modified: trunk/docs/userguide.xml trunk/etc/sites/cnari-abe/sites.xml trunk/etc/sites/cnari-queenbee/sites.xml trunk/etc/sites/cnari-ranger/sites.xml trunk/etc/sites/intrepid trunk/etc/sites/local-pbs-coasters/sites.xml trunk/etc/sites/pads-local-pbs-coasters/sites.xml trunk/etc/sites/pads-remote-pbs-coasters-ssh/sites.xml trunk/etc/sites/ssh-pbs-coasters trunk/etc/sites/surveyor trunk/etc/sites/teraport-local-pbs-coasters/sites.xml trunk/etc/sites/teraport-remote-pbs-coasters-ssh/sites.xml trunk/libexec/vdl-sc.k trunk/tests/cdm/ps/pinned/sites.template.xml trunk/tests/cdm/ps/sites.template.xml trunk/tests/mpi/sites.template.xml trunk/tests/providers/local-cobalt/intrepid/sites.template.xml trunk/tests/providers/local-cobalt/surveyor/sites.template.xml trunk/tests/providers/local-pbs-coasters/sites.template.xml trunk/tests/providers/local-sge-coasters/sites.template.xml trunk/tests/providers/ssh-pbs-coasters/sites.template.xml Log: Rename workersPerNode to jobsPerNode From skenny at uchicago.edu Tue Mar 22 14:08:59 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 22 Mar 2011 12:08:59 -0700 Subject: [Swift-devel] Renamed workersPerNode to jobsPerNode In-Reply-To: References: Message-ID: i know you've already committed but this is something you *might* want to have prepended with "note:" (?) just a thought :) On Tue, Mar 22, 2011 at 11:52 AM, Justin M Wozniak wrote: > > The point of this commit is to rename the workersPerNode setting to > jobsPerNode. Not sure what happened to the subject. > Justin > > -- > Justin M Wozniak > > ---------- Forwarded message ---------- > Date: Tue, 22 Mar 2011 13:47:41 > From: noreply at svn.ci.uchicago.edu > To: swift-commit at ci.uchicago.edu > Subject: > > utf-8?B?W1N3aWZ0LWNvbW1pdF0gcjQyMDQgLSBpbiB0cnVuazogZG9jcyBldGMvc2l0ZXMgZXRj > > L3NpdGVzL2NuYXJpLWFiZQlldGMvc2l0ZXMvY25hcmktcXVlZW5iZWUgZXRjL3NpdGVzL2NuYXJp > > LXJhbmdlcglldGMvc2l0ZXMvbG9jYWwtcGJzLWNvYXN0ZXJzIGV0Yy9zaXRlcy9wYWRzLWxvY2Fs > > LXBicy1jb2FzdGVycwlldGMvc2l0ZXMvcGFkcy1yZW1vdGUtcGJzLWNvYXN0ZXJzLXNzaAlldGMv > > c2l0ZXMvdGVyYXBvcnQtbG9jYWwtcGJzLWNvYXN0ZXJzCWV0Yy9zaXRlcy90ZXJhcG9ydC1yZW1v > > dGUtcGJzLWNvYXN0ZXJzLXNzaCBsaWJleGVjCXRlc3RzL2NkbS9wcyB0ZXN0cy9jZG0vcHMvcGlu > > bmVkIHRlc3RzL21waQl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtY29iYWx0L2ludHJlcGlkCXRlc3Rz > > L3Byb3ZpZGVycy9sb2NhbC1jb2JhbHQvc3VydmV5b3IJdGVzdHMvcHJvdmlkZXJzL2xvY2FsLXBi > > cy1jb2FzdGVycwl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtc2dlLWNvYXN0ZXJzIHRlc3RzL3Byb3Zp > ZGVycy9zc2gtcGJzLWNvYXN0ZXJz?= > > Author: wozniak > Date: 2011-03-22 13:47:41 -0500 (Tue, 22 Mar 2011) > New Revision: 4204 > > Added: > trunk/etc/sites/cnari-abe/ > trunk/etc/sites/cnari-queenbee/ > trunk/etc/sites/cnari-ranger/ > trunk/etc/sites/local-pbs-coasters/ > trunk/etc/sites/pads-local-pbs-coasters/ > trunk/etc/sites/pads-remote-pbs-coasters-ssh/ > trunk/etc/sites/teraport-local-pbs-coasters/ > trunk/etc/sites/teraport-remote-pbs-coasters-ssh/ > Modified: > trunk/docs/userguide.xml > trunk/etc/sites/cnari-abe/sites.xml > trunk/etc/sites/cnari-queenbee/sites.xml > trunk/etc/sites/cnari-ranger/sites.xml > trunk/etc/sites/intrepid > trunk/etc/sites/local-pbs-coasters/sites.xml > trunk/etc/sites/pads-local-pbs-coasters/sites.xml > trunk/etc/sites/pads-remote-pbs-coasters-ssh/sites.xml > trunk/etc/sites/ssh-pbs-coasters > trunk/etc/sites/surveyor > trunk/etc/sites/teraport-local-pbs-coasters/sites.xml > trunk/etc/sites/teraport-remote-pbs-coasters-ssh/sites.xml > trunk/libexec/vdl-sc.k > trunk/tests/cdm/ps/pinned/sites.template.xml > trunk/tests/cdm/ps/sites.template.xml > trunk/tests/mpi/sites.template.xml > trunk/tests/providers/local-cobalt/intrepid/sites.template.xml > trunk/tests/providers/local-cobalt/surveyor/sites.template.xml > trunk/tests/providers/local-pbs-coasters/sites.template.xml > trunk/tests/providers/local-sge-coasters/sites.template.xml > trunk/tests/providers/ssh-pbs-coasters/sites.template.xml > Log: > Rename workersPerNode to jobsPerNode > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Tue Mar 22 14:18:42 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 22 Mar 2011 14:18:42 -0500 (CDT) Subject: [Swift-devel] Renamed workersPerNode to jobsPerNode In-Reply-To: References: Message-ID: It's there, you simply have to translate back from "ZQlldGMvc2l0" Kidding. On Tue, 22 Mar 2011, Sarah Kenny wrote: > i know you've already committed but this is something you *might* want to > have prepended with "note:" (?) > > just a thought :) > > On Tue, Mar 22, 2011 at 11:52 AM, Justin M Wozniak wrote: > >> >> The point of this commit is to rename the workersPerNode setting to >> jobsPerNode. Not sure what happened to the subject. >> Justin >> >> -- >> Justin M Wozniak >> >> ---------- Forwarded message ---------- >> Date: Tue, 22 Mar 2011 13:47:41 >> From: noreply at svn.ci.uchicago.edu >> To: swift-commit at ci.uchicago.edu >> Subject: >> >> utf-8?B?W1N3aWZ0LWNvbW1pdF0gcjQyMDQgLSBpbiB0cnVuazogZG9jcyBldGMvc2l0ZXMgZXRj >> >> L3NpdGVzL2NuYXJpLWFiZQlldGMvc2l0ZXMvY25hcmktcXVlZW5iZWUgZXRjL3NpdGVzL2NuYXJp >> >> LXJhbmdlcglldGMvc2l0ZXMvbG9jYWwtcGJzLWNvYXN0ZXJzIGV0Yy9zaXRlcy9wYWRzLWxvY2Fs >> >> LXBicy1jb2FzdGVycwlldGMvc2l0ZXMvcGFkcy1yZW1vdGUtcGJzLWNvYXN0ZXJzLXNzaAlldGMv >> >> c2l0ZXMvdGVyYXBvcnQtbG9jYWwtcGJzLWNvYXN0ZXJzCWV0Yy9zaXRlcy90ZXJhcG9ydC1yZW1v >> >> dGUtcGJzLWNvYXN0ZXJzLXNzaCBsaWJleGVjCXRlc3RzL2NkbS9wcyB0ZXN0cy9jZG0vcHMvcGlu >> >> bmVkIHRlc3RzL21waQl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtY29iYWx0L2ludHJlcGlkCXRlc3Rz >> >> L3Byb3ZpZGVycy9sb2NhbC1jb2JhbHQvc3VydmV5b3IJdGVzdHMvcHJvdmlkZXJzL2xvY2FsLXBi >> >> cy1jb2FzdGVycwl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtc2dlLWNvYXN0ZXJzIHRlc3RzL3Byb3Zp >> ZGVycy9zc2gtcGJzLWNvYXN0ZXJz?= >> >> Author: wozniak >> Date: 2011-03-22 13:47:41 -0500 (Tue, 22 Mar 2011) >> New Revision: 4204 >> >> Added: >> trunk/etc/sites/cnari-abe/ >> trunk/etc/sites/cnari-queenbee/ >> trunk/etc/sites/cnari-ranger/ >> trunk/etc/sites/local-pbs-coasters/ >> trunk/etc/sites/pads-local-pbs-coasters/ >> trunk/etc/sites/pads-remote-pbs-coasters-ssh/ >> trunk/etc/sites/teraport-local-pbs-coasters/ >> trunk/etc/sites/teraport-remote-pbs-coasters-ssh/ >> Modified: >> trunk/docs/userguide.xml >> trunk/etc/sites/cnari-abe/sites.xml >> trunk/etc/sites/cnari-queenbee/sites.xml >> trunk/etc/sites/cnari-ranger/sites.xml >> trunk/etc/sites/intrepid >> trunk/etc/sites/local-pbs-coasters/sites.xml >> trunk/etc/sites/pads-local-pbs-coasters/sites.xml >> trunk/etc/sites/pads-remote-pbs-coasters-ssh/sites.xml >> trunk/etc/sites/ssh-pbs-coasters >> trunk/etc/sites/surveyor >> trunk/etc/sites/teraport-local-pbs-coasters/sites.xml >> trunk/etc/sites/teraport-remote-pbs-coasters-ssh/sites.xml >> trunk/libexec/vdl-sc.k >> trunk/tests/cdm/ps/pinned/sites.template.xml >> trunk/tests/cdm/ps/sites.template.xml >> trunk/tests/mpi/sites.template.xml >> trunk/tests/providers/local-cobalt/intrepid/sites.template.xml >> trunk/tests/providers/local-cobalt/surveyor/sites.template.xml >> trunk/tests/providers/local-pbs-coasters/sites.template.xml >> trunk/tests/providers/local-sge-coasters/sites.template.xml >> trunk/tests/providers/ssh-pbs-coasters/sites.template.xml >> Log: >> Rename workersPerNode to jobsPerNode >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > -- Justin M Wozniak From wilde at mcs.anl.gov Tue Mar 22 14:22:23 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 22 Mar 2011 14:22:23 -0500 (CDT) Subject: [Swift-devel] Renamed workersPerNode to jobsPerNode In-Reply-To: Message-ID: <1269859568.14935.1300821743041.JavaMail.root@zimbra.anl.gov> Any chance we can retain the "workerspernode" tag in parallel for a while to prevent breakage to current user scripts (when they upgrade from 0.92 to 0.93)? We can change the docs to reflect the new name (which is much better than the old), but keep the old one in deprecated state for a while for backwards compatibility. I feel that, wherever practical, that should be how we proceed. I could see similar changes in may places. My favorite, eg, is to change "tc.data" to "apps.txt" or just "apps" or "applist" and -tc.file to -applist. But still to accept tc etc for at least a few more releases. - Mike ----- Original Message ----- > The point of this commit is to rename the workersPerNode setting to > jobsPerNode. Not sure what happened to the subject. > Justin > > -- > Justin M Wozniak > > ---------- Forwarded message ---------- > Date: Tue, 22 Mar 2011 13:47:41 > From: noreply at svn.ci.uchicago.edu > To: swift-commit at ci.uchicago.edu > Subject: > utf-8?B?W1N3aWZ0LWNvbW1pdF0gcjQyMDQgLSBpbiB0cnVuazogZG9jcyBldGMvc2l0ZXMgZXRj > L3NpdGVzL2NuYXJpLWFiZQlldGMvc2l0ZXMvY25hcmktcXVlZW5iZWUgZXRjL3NpdGVzL2NuYXJp > LXJhbmdlcglldGMvc2l0ZXMvbG9jYWwtcGJzLWNvYXN0ZXJzIGV0Yy9zaXRlcy9wYWRzLWxvY2Fs > LXBicy1jb2FzdGVycwlldGMvc2l0ZXMvcGFkcy1yZW1vdGUtcGJzLWNvYXN0ZXJzLXNzaAlldGMv > c2l0ZXMvdGVyYXBvcnQtbG9jYWwtcGJzLWNvYXN0ZXJzCWV0Yy9zaXRlcy90ZXJhcG9ydC1yZW1v > dGUtcGJzLWNvYXN0ZXJzLXNzaCBsaWJleGVjCXRlc3RzL2NkbS9wcyB0ZXN0cy9jZG0vcHMvcGlu > bmVkIHRlc3RzL21waQl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtY29iYWx0L2ludHJlcGlkCXRlc3Rz > L3Byb3ZpZGVycy9sb2NhbC1jb2JhbHQvc3VydmV5b3IJdGVzdHMvcHJvdmlkZXJzL2xvY2FsLXBi > cy1jb2FzdGVycwl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtc2dlLWNvYXN0ZXJzIHRlc3RzL3Byb3Zp > ZGVycy9zc2gtcGJzLWNvYXN0ZXJz?= > > Author: wozniak > Date: 2011-03-22 13:47:41 -0500 (Tue, 22 Mar 2011) > New Revision: 4204 > > Added: > trunk/etc/sites/cnari-abe/ > trunk/etc/sites/cnari-queenbee/ > trunk/etc/sites/cnari-ranger/ > trunk/etc/sites/local-pbs-coasters/ > trunk/etc/sites/pads-local-pbs-coasters/ > trunk/etc/sites/pads-remote-pbs-coasters-ssh/ > trunk/etc/sites/teraport-local-pbs-coasters/ > trunk/etc/sites/teraport-remote-pbs-coasters-ssh/ > Modified: > trunk/docs/userguide.xml > trunk/etc/sites/cnari-abe/sites.xml > trunk/etc/sites/cnari-queenbee/sites.xml > trunk/etc/sites/cnari-ranger/sites.xml > trunk/etc/sites/intrepid > trunk/etc/sites/local-pbs-coasters/sites.xml > trunk/etc/sites/pads-local-pbs-coasters/sites.xml > trunk/etc/sites/pads-remote-pbs-coasters-ssh/sites.xml > trunk/etc/sites/ssh-pbs-coasters > trunk/etc/sites/surveyor > trunk/etc/sites/teraport-local-pbs-coasters/sites.xml > trunk/etc/sites/teraport-remote-pbs-coasters-ssh/sites.xml > trunk/libexec/vdl-sc.k > trunk/tests/cdm/ps/pinned/sites.template.xml > trunk/tests/cdm/ps/sites.template.xml > trunk/tests/mpi/sites.template.xml > trunk/tests/providers/local-cobalt/intrepid/sites.template.xml > trunk/tests/providers/local-cobalt/surveyor/sites.template.xml > trunk/tests/providers/local-pbs-coasters/sites.template.xml > trunk/tests/providers/local-sge-coasters/sites.template.xml > trunk/tests/providers/ssh-pbs-coasters/sites.template.xml > Log: > Rename workersPerNode to jobsPerNode > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Tue Mar 22 15:02:42 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 22 Mar 2011 15:02:42 -0500 (CDT) Subject: [Swift-devel] Renamed workersPerNode to jobsPerNode In-Reply-To: <1269859568.14935.1300821743041.JavaMail.root@zimbra.anl.gov> References: <1269859568.14935.1300821743041.JavaMail.root@zimbra.anl.gov> Message-ID: workersPerNode is back in as an alias with a warning message. On Tue, 22 Mar 2011, Michael Wilde wrote: > Any chance we can retain the "workerspernode" tag in parallel for a > while to prevent breakage to current user scripts (when they upgrade > from 0.92 to 0.93)? > > We can change the docs to reflect the new name (which is much better > than the old), but keep the old one in deprecated state for a while for > backwards compatibility. I feel that, wherever practical, that should be > how we proceed. > > I could see similar changes in may places. My favorite, eg, is to > change "tc.data" to "apps.txt" or just "apps" or "applist" and -tc.file > to -applist. But still to accept tc etc for at least a few more > releases. > > - Mike > > ----- Original Message ----- >> The point of this commit is to rename the workersPerNode setting to >> jobsPerNode. Not sure what happened to the subject. >> Justin >> >> -- >> Justin M Wozniak >> >> ---------- Forwarded message ---------- >> Date: Tue, 22 Mar 2011 13:47:41 >> From: noreply at svn.ci.uchicago.edu >> To: swift-commit at ci.uchicago.edu >> Subject: >> utf-8?B?W1N3aWZ0LWNvbW1pdF0gcjQyMDQgLSBpbiB0cnVuazogZG9jcyBldGMvc2l0ZXMgZXRj >> L3NpdGVzL2NuYXJpLWFiZQlldGMvc2l0ZXMvY25hcmktcXVlZW5iZWUgZXRjL3NpdGVzL2NuYXJp >> LXJhbmdlcglldGMvc2l0ZXMvbG9jYWwtcGJzLWNvYXN0ZXJzIGV0Yy9zaXRlcy9wYWRzLWxvY2Fs >> LXBicy1jb2FzdGVycwlldGMvc2l0ZXMvcGFkcy1yZW1vdGUtcGJzLWNvYXN0ZXJzLXNzaAlldGMv >> c2l0ZXMvdGVyYXBvcnQtbG9jYWwtcGJzLWNvYXN0ZXJzCWV0Yy9zaXRlcy90ZXJhcG9ydC1yZW1v >> dGUtcGJzLWNvYXN0ZXJzLXNzaCBsaWJleGVjCXRlc3RzL2NkbS9wcyB0ZXN0cy9jZG0vcHMvcGlu >> bmVkIHRlc3RzL21waQl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtY29iYWx0L2ludHJlcGlkCXRlc3Rz >> L3Byb3ZpZGVycy9sb2NhbC1jb2JhbHQvc3VydmV5b3IJdGVzdHMvcHJvdmlkZXJzL2xvY2FsLXBi >> cy1jb2FzdGVycwl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtc2dlLWNvYXN0ZXJzIHRlc3RzL3Byb3Zp >> ZGVycy9zc2gtcGJzLWNvYXN0ZXJz?= >> >> Author: wozniak >> Date: 2011-03-22 13:47:41 -0500 (Tue, 22 Mar 2011) >> New Revision: 4204 >> >> Added: >> trunk/etc/sites/cnari-abe/ >> trunk/etc/sites/cnari-queenbee/ >> trunk/etc/sites/cnari-ranger/ >> trunk/etc/sites/local-pbs-coasters/ >> trunk/etc/sites/pads-local-pbs-coasters/ >> trunk/etc/sites/pads-remote-pbs-coasters-ssh/ >> trunk/etc/sites/teraport-local-pbs-coasters/ >> trunk/etc/sites/teraport-remote-pbs-coasters-ssh/ >> Modified: >> trunk/docs/userguide.xml >> trunk/etc/sites/cnari-abe/sites.xml >> trunk/etc/sites/cnari-queenbee/sites.xml >> trunk/etc/sites/cnari-ranger/sites.xml >> trunk/etc/sites/intrepid >> trunk/etc/sites/local-pbs-coasters/sites.xml >> trunk/etc/sites/pads-local-pbs-coasters/sites.xml >> trunk/etc/sites/pads-remote-pbs-coasters-ssh/sites.xml >> trunk/etc/sites/ssh-pbs-coasters >> trunk/etc/sites/surveyor >> trunk/etc/sites/teraport-local-pbs-coasters/sites.xml >> trunk/etc/sites/teraport-remote-pbs-coasters-ssh/sites.xml >> trunk/libexec/vdl-sc.k >> trunk/tests/cdm/ps/pinned/sites.template.xml >> trunk/tests/cdm/ps/sites.template.xml >> trunk/tests/mpi/sites.template.xml >> trunk/tests/providers/local-cobalt/intrepid/sites.template.xml >> trunk/tests/providers/local-cobalt/surveyor/sites.template.xml >> trunk/tests/providers/local-pbs-coasters/sites.template.xml >> trunk/tests/providers/local-sge-coasters/sites.template.xml >> trunk/tests/providers/ssh-pbs-coasters/sites.template.xml >> Log: >> Rename workersPerNode to jobsPerNode >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Justin M Wozniak From yadudoc1729 at gmail.com Tue Mar 22 15:27:53 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Wed, 23 Mar 2011 01:57:53 +0530 Subject: [Swift-devel] Swiftscript Iteration constructs, Clarification [GSoC 2011] Message-ID: Hi, This is regarding the idea on adding map, fold, scan to succeed the current iterative programming constructs (foreach, iterate). Currently foreach follows the following structure : file source[ ]; file destination[ ]; foreach value, index in source { destination[index] = func( value ); } map could do the same using the structure : destination = map ( func, source ) Is this what is required ? Furthermore, The swift userguide[1] mentions VDL.g and XDTM.xsd but those files are either missing or have (more likely) been renamed. Instead I have been looking at which I believe is the renamed version: resources/swiftscript.g resources/swiftscript.xsd Any inputs are welcome and appreciated. [1] http://www.ci.uchicago.edu/swift/guides/userguide.pdf -- Thanks and Regards, Yadu Nand B From bugzilla-daemon at mcs.anl.gov Tue Mar 22 16:36:16 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 22 Mar 2011 16:36:16 -0500 (CDT) Subject: [Swift-devel] [Bug 31] error message should not refer to java exception classes In-Reply-To: References: Message-ID: <20110322213616.B67612E379@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=31 skenny changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED --- Comment #5 from skenny 2011-03-22 16:36:16 --- full stack trace will now print when the following setting is used in log4j: log4j.logger.org.griphyn.vdl.karajan.functions.ProcessBulkErrors=DEBUG -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are watching the reporter. From hategan at mcs.anl.gov Wed Mar 23 00:49:34 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 22 Mar 2011 22:49:34 -0700 Subject: [Swift-devel] Renamed workersPerNode to jobsPerNode In-Reply-To: References: Message-ID: <1300859374.26068.1.camel@blabla2.none> Odd. However, I am not getting that commit message at all. Mihael On Tue, 2011-03-22 at 13:52 -0500, Justin M Wozniak wrote: > The point of this commit is to rename the workersPerNode setting to > jobsPerNode. Not sure what happened to the subject. > Justin > From hategan at mcs.anl.gov Wed Mar 23 00:50:33 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 22 Mar 2011 22:50:33 -0700 Subject: [Swift-devel] Renamed workersPerNode to jobsPerNode In-Reply-To: <1269859568.14935.1300821743041.JavaMail.root@zimbra.anl.gov> References: <1269859568.14935.1300821743041.JavaMail.root@zimbra.anl.gov> Message-ID: <1300859433.26068.2.camel@blabla2.none> Good point. Backwards compatibility, while a pain in the behind, it is a matter of courtesy to the users. On Tue, 2011-03-22 at 14:22 -0500, Michael Wilde wrote: > Any chance we can retain the "workerspernode" tag in parallel for a while to prevent breakage to current user scripts (when they upgrade from 0.92 to 0.93)? > > We can change the docs to reflect the new name (which is much better than the old), but keep the old one in deprecated state for a while for backwards compatibility. I feel that, wherever practical, that should be how we proceed. > > I could see similar changes in may places. My favorite, eg, is to change "tc.data" to "apps.txt" or just "apps" or "applist" and -tc.file to -applist. But still to accept tc etc for at least a few more releases. > > - Mike > > ----- Original Message ----- > > The point of this commit is to rename the workersPerNode setting to > > jobsPerNode. Not sure what happened to the subject. > > Justin > > > > -- > > Justin M Wozniak > > > > ---------- Forwarded message ---------- > > Date: Tue, 22 Mar 2011 13:47:41 > > From: noreply at svn.ci.uchicago.edu > > To: swift-commit at ci.uchicago.edu > > Subject: > > utf-8?B?W1N3aWZ0LWNvbW1pdF0gcjQyMDQgLSBpbiB0cnVuazogZG9jcyBldGMvc2l0ZXMgZXRj > > L3NpdGVzL2NuYXJpLWFiZQlldGMvc2l0ZXMvY25hcmktcXVlZW5iZWUgZXRjL3NpdGVzL2NuYXJp > > LXJhbmdlcglldGMvc2l0ZXMvbG9jYWwtcGJzLWNvYXN0ZXJzIGV0Yy9zaXRlcy9wYWRzLWxvY2Fs > > LXBicy1jb2FzdGVycwlldGMvc2l0ZXMvcGFkcy1yZW1vdGUtcGJzLWNvYXN0ZXJzLXNzaAlldGMv > > c2l0ZXMvdGVyYXBvcnQtbG9jYWwtcGJzLWNvYXN0ZXJzCWV0Yy9zaXRlcy90ZXJhcG9ydC1yZW1v > > dGUtcGJzLWNvYXN0ZXJzLXNzaCBsaWJleGVjCXRlc3RzL2NkbS9wcyB0ZXN0cy9jZG0vcHMvcGlu > > bmVkIHRlc3RzL21waQl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtY29iYWx0L2ludHJlcGlkCXRlc3Rz > > L3Byb3ZpZGVycy9sb2NhbC1jb2JhbHQvc3VydmV5b3IJdGVzdHMvcHJvdmlkZXJzL2xvY2FsLXBi > > cy1jb2FzdGVycwl0ZXN0cy9wcm92aWRlcnMvbG9jYWwtc2dlLWNvYXN0ZXJzIHRlc3RzL3Byb3Zp > > ZGVycy9zc2gtcGJzLWNvYXN0ZXJz?= > > > > Author: wozniak > > Date: 2011-03-22 13:47:41 -0500 (Tue, 22 Mar 2011) > > New Revision: 4204 > > > > Added: > > trunk/etc/sites/cnari-abe/ > > trunk/etc/sites/cnari-queenbee/ > > trunk/etc/sites/cnari-ranger/ > > trunk/etc/sites/local-pbs-coasters/ > > trunk/etc/sites/pads-local-pbs-coasters/ > > trunk/etc/sites/pads-remote-pbs-coasters-ssh/ > > trunk/etc/sites/teraport-local-pbs-coasters/ > > trunk/etc/sites/teraport-remote-pbs-coasters-ssh/ > > Modified: > > trunk/docs/userguide.xml > > trunk/etc/sites/cnari-abe/sites.xml > > trunk/etc/sites/cnari-queenbee/sites.xml > > trunk/etc/sites/cnari-ranger/sites.xml > > trunk/etc/sites/intrepid > > trunk/etc/sites/local-pbs-coasters/sites.xml > > trunk/etc/sites/pads-local-pbs-coasters/sites.xml > > trunk/etc/sites/pads-remote-pbs-coasters-ssh/sites.xml > > trunk/etc/sites/ssh-pbs-coasters > > trunk/etc/sites/surveyor > > trunk/etc/sites/teraport-local-pbs-coasters/sites.xml > > trunk/etc/sites/teraport-remote-pbs-coasters-ssh/sites.xml > > trunk/libexec/vdl-sc.k > > trunk/tests/cdm/ps/pinned/sites.template.xml > > trunk/tests/cdm/ps/sites.template.xml > > trunk/tests/mpi/sites.template.xml > > trunk/tests/providers/local-cobalt/intrepid/sites.template.xml > > trunk/tests/providers/local-cobalt/surveyor/sites.template.xml > > trunk/tests/providers/local-pbs-coasters/sites.template.xml > > trunk/tests/providers/local-sge-coasters/sites.template.xml > > trunk/tests/providers/ssh-pbs-coasters/sites.template.xml > > Log: > > Rename workersPerNode to jobsPerNode > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Wed Mar 23 12:03:40 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 23 Mar 2011 12:03:40 -0500 (CDT) Subject: [Swift-devel] Ciel - a data flow programming model similar to Swift's Message-ID: <1322208305.18759.1300899820600.JavaMail.root@zimbra.anl.gov> This seems important for us to understand and see how we compare / relate to: http://www.cl.cam.ac.uk/~dgm36/publications/2011-murray2011ciel.pdf - Mike From jon.monette at gmail.com Wed Mar 23 16:31:15 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 23 Mar 2011 16:31:15 -0500 Subject: [Swift-devel] hanging problem Message-ID: I think I have figured out my problem. Here is the function it hangs in. ( Table fits_tbl ) mFitBatch( Image diff_imgs[], Table diff_tbl ) { Status stats[] ; Table status_tbl = create_status_table( diff_tbl ); foreach img, i in stats { stats[ i ] = mFitplane ( diff_imgs[i] ); } fits_tbl = mConcatFit( status_tbl, stats ); } The app mConcatFit never executes. Here is the app. app ( Table fits_tbl ) mConcatFit( Table status_tbl, Status stats[] ) { mConcatFit @status_tbl @fits_tbl @dirname( stats[0] ); } The problem is @dirname(stats[0]) returns an org.griphyn.vdl.karajan.VDL2FutureException. When I do a tracef( "%s\n", @dirname(stats[0]) ) that is what gets printed to the screen. This causes the script to hang. When I do a tracef( "%q\n", stats ) this is the output. [org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000602 type Status with no value at dataset=stats path=[0] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000601 type Status with no value at dataset=stats path=[1] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000599 type Status with no value at dataset=stats path=[2] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000598 type Status with no value at dataset=stats path=[3] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000606 type Status with no value at dataset=stats path=[4] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000605 type Status with no value at dataset=stats path=[5] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000604 type Status with no value at dataset=stats path=[6] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000603 type Status with no value at dataset=stats path=[7] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000608 type Status with no value at dataset=stats path=[8] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000607 type Status with no value at dataset=stats path=[9] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000600 type Status with no value at dataset=stats path=[10] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000596 type Status with no value at dataset=stats path=[11] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000597 type Status with no value at dataset=stats path=[12] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000594 type Status with no value at dataset=stats path=[13] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000595 type Status with no value at dataset=stats path=[14] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000592 type Status with no value at dataset=stats path=[15] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110323-1620-9e0zjb30:720000000593 type Status with no value at dataset=stats path=[16] (not closed)] However, stats is closed with no listeners. Status[] stats Closed, no listeners How can the array be closed but all of its values not be? Attached is the log for one of the runs that hangs. -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: m101_montage-20110310-2035-dmxf29be.log Type: application/octet-stream Size: 195772 bytes Desc: not available URL: From bugzilla-daemon at mcs.anl.gov Wed Mar 23 17:29:15 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 23 Mar 2011 17:29:15 -0500 (CDT) Subject: [Swift-devel] [Bug 271] New: java.lang.nullPointerException on manual worker.pl start Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=271 Summary: java.lang.nullPointerException on manual worker.pl start Product: Swift Version: trunk Platform: Other OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: ketan at mcs.anl.gov CC: robertx at mail.com After starting coaster service, when worker.pl is started immediately (manually) without starting a dummy Swift, a nullpointerException is thrown. Detailed commandlines as below: # Service [ketan at octopus:swift-svn]$ ./bin/coaster-service -nosec Local contacts: [http://140.221.9.110:35362] Started local service: http://140.221.9.110:35362 Started coaster service: http://140.221.9.110:1984 Started coaster service: http://140.221.9.110:1984 === # Worker [ketan at crush:bin]$ worker.pl http://140.221.9.110:35362 mcs /tmp/ Failed to process data: Failed to register (service returned error: java.lang.NullPointerException) at ./worker.pl line 751. === -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From bugzilla-daemon at mcs.anl.gov Thu Mar 24 15:45:24 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 24 Mar 2011 15:45:24 -0500 (CDT) Subject: [Swift-devel] [Bug 273] New: resume is currently broken (trunk) Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=273 Summary: resume is currently broken (trunk) Product: Swift Version: trunk Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: aespinosa at cs.uchicago.edu See related thread in http://mail.ci.uchicago.edu/pipermail/swift-devel/2011-February/007472.html Still broken in swift-r4208 cog-r3073 -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From wilde at mcs.anl.gov Thu Mar 24 19:28:38 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 24 Mar 2011 19:28:38 -0500 (CDT) Subject: [Swift-devel] Fwd: [Swift-user] determining unmapped paths In-Reply-To: Message-ID: <655120943.25600.1301012918101.JavaMail.root@zimbra.anl.gov> Sarah, this is a perfect example of a messaging deficiency to fix. Can you add to bugzilla? Thanks, Mike ----- Forwarded Message ----- From: "Allan Espinosa" To: "Swift-User" Sent: Thursday, March 24, 2011 4:28:51 PM Subject: [Swift-user] determining unmapped paths I'm trying figure out where in my workflow is causing this problem: 2011-03-24 16:23:50,485-0500 WARN FlowNode Ex098 java.lang.IllegalStateException: mapper.existing() returned a path [3] that it cannot subsequently map at org.griphyn.vdl.mapping.RootDataNode.checkInputs(RootDataNode.java:129) at org.griphyn.vdl.mapping.RootArrayDataNode.checkInputs(RootArrayDataNode.java:67) at org.griphyn.vdl.mapping.RootArrayDataNode.innerInit(RootArrayDataNode.java:53) at org.griphyn.vdl.mapping.RootArrayDataNode.handleClosed(RootArrayDataNode.java:80) at org.griphyn.vdl.mapping.AbstractDataNode.notifyListeners(AbstractDataNode.java:583) at org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:396) at org.griphyn.vdl.mapping.ArrayDataNode.closeDeep(ArrayDataNode.java:51) at org.griphyn.vdl.karajan.lib.PartialCloseDataset.function(PartialCloseDataset.java:79) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) It doesn't specify which data object it crashes on so i'm quite clueless at this point. I'm using the latest trunk any particular log4j class i should be enabling to debug? Thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hockyg at uchicago.edu Fri Mar 25 18:48:11 2011 From: hockyg at uchicago.edu (Glen Hocky) Date: Fri, 25 Mar 2011 19:48:11 -0400 Subject: [Swift-devel] bad use of "not defined" error message Message-ID: Hey everyone, Here's a suggestion for an improved compile error message in an app() I had something like app(outStruct o) run(params){ runApp @filename(o.param1) } When I removed o.param1, rather than telling me param1 wasn't a member of outStruct, it instead said Could not start execution. Type outStruct is not defined. in application run at line 24 (where I have substituted all my real stuff for the fake names used above) -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Fri Mar 25 18:49:08 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 25 Mar 2011 18:49:08 -0500 Subject: [Swift-devel] cog 4.1.8 + release-0.92 branch build failure Message-ID: delete.jar: [echo] [karajan]: DELETE.JAR (cog-karajan-0.36-dev.jar) compile: [echo] [karajan]: COMPILE [mkdir] Created dir: /autonfs/home/aespinosa/swift/cogkit/modules/karajan/build [javac] Compiling 493 source files to /autonfs/home/aespinosa/swift/cogkit/modules/karajan/build [javac] /autonfs/home/aespinosa/swift/cogkit/modules/karajan/src/org/globus/cog/karajan/workflow/nodes/grid/AbstractGridNode.java:189: setStack(org.globus.cog.abstraction.interfaces.Task,org.globus.cog.karajan.stack.VariableStack) is already defined in org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode [javac] protected final void setStack(Task task, VariableStack stack) { [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 1 error -Allan From aespinosa at cs.uchicago.edu Fri Mar 25 19:42:30 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 25 Mar 2011 19:42:30 -0500 Subject: [Swift-devel] hang checker fun Message-ID: this has been occurring for 70 times already. What i expect is for the app with SgtDim sub to run and close the future. 2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s. 2011-03-25 19:40:12,217-0500 WARN HangChecker Registered futures: Rupture[] rups Closed, 1 elements, 0 listeners Variation vars - Closed, no listeners SgtDim sub - Open, 1 listeners string site Closed, no listeners Variation[] vars Closed, 72 elements, 0 listeners ---- Waiting threads: 0-13 0-13-0-7 0-13-0-8-1-1 ---- -- Allan M. Espinosa PhD student, Computer Science University of Chicago From ketancmaheshwari at gmail.com Fri Mar 25 20:32:34 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 25 Mar 2011 20:32:34 -0500 Subject: [Swift-devel] hang checker fun In-Reply-To: References: Message-ID: I did not understand this message. On Fri, Mar 25, 2011 at 7:42 PM, Allan Espinosa wrote: > this has been occurring for 70 times already. What i expect is for > the app with SgtDim sub to run and close the future. > > 2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s. > 2011-03-25 19:40:12,217-0500 WARN HangChecker > Registered futures: > Rupture[] rups Closed, 1 elements, 0 listeners > Variation vars - Closed, no listeners > SgtDim sub - Open, 1 listeners > string site Closed, no listeners > Variation[] vars Closed, 72 elements, 0 listeners > ---- > > Waiting threads: > 0-13 > 0-13-0-7 > 0-13-0-8-1-1 > ---- > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Fri Mar 25 20:36:57 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 25 Mar 2011 20:36:57 -0500 Subject: [Swift-devel] hang checker fun In-Reply-To: References: Message-ID: Hi Ketan, Mihael added the HangChecker to shed a bit of light when a workflow is 'hanging'. Theoretically it should start executing the the job with the open future but somehow it is still waiting for something. Btw, the state of the related job is 'Initializing shared directory' This is with the latest trunk. jstack doesn't report anything peculiar (like deadlocks): [aespinosa at communicado cybershake]$ jstack 20444 2011-03-25 20:36:02 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): "Attach Listener" daemon prio=10 tid=0x0000000055b81000 nid=0x67e1 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "NBS7" daemon prio=10 tid=0x0000000055b1c800 nid=0x500e waiting on condition [0x0000000042f55000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab61ed0160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS6" daemon prio=10 tid=0x0000000055b1b800 nid=0x500d waiting on condition [0x0000000042e54000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab61ed0160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS5" daemon prio=10 tid=0x0000000055b1a800 nid=0x500b waiting on condition [0x0000000042d53000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab61ed0160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS4" daemon prio=10 tid=0x0000000055b18800 nid=0x5009 waiting on condition [0x0000000042c52000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab61ed0160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS3" daemon prio=10 tid=0x0000000055b20800 nid=0x5008 waiting on condition [0x0000000042b51000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab61ed0160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS2" daemon prio=10 tid=0x0000000055a72800 nid=0x5007 waiting on condition [0x0000000042a50000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab61ed0160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS1" daemon prio=10 tid=0x0000000055bfb000 nid=0x5006 waiting on condition [0x0000000040e7f000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab61ed0160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "Scheduler" prio=10 tid=0x00002aabb8810000 nid=0x5005 in Object.wait() [0x0000000040774000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab62337030> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler) at org.globus.cog.karajan.scheduler.LateBindingScheduler.sleep(LateBindingScheduler.java:304) at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:257) - locked <0x00002aab62337030> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler) "Progress ticker" daemon prio=10 tid=0x00002aabc0331000 nid=0x4ffe waiting on condition [0x0000000040673000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.run(RuntimeStats.java:138) "Restart Log Sync" daemon prio=10 tid=0x00002aabc00f2800 nid=0x4ffd in Object.wait() [0x000000004294f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab61ed0000> (a org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) at java.lang.Object.wait(Object.java:485) at org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread.run(SyncThread.java:47) - locked <0x00002aab61ed0000> (a org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) "Overloaded Host Monitor" daemon prio=10 tid=0x00002aabc015e000 nid=0x4ffc waiting on condition [0x000000004284e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:47) "Timer-0" daemon prio=10 tid=0x0000000055b3b800 nid=0x4ffa in Object.wait() [0x000000004274d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab61ec0260> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Timer.java:509) - locked <0x00002aab61ec0260> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) "NBS0" daemon prio=10 tid=0x00002aabc01a8000 nid=0x4ff9 waiting on condition [0x000000004264c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab61ed0160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-8" prio=10 tid=0x0000000055ae7000 nid=0x4ff8 waiting on condition [0x000000004254b000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab62385f10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-7" prio=10 tid=0x0000000055b69000 nid=0x4ff7 waiting on condition [0x000000004244a000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab62385f10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-6" prio=10 tid=0x0000000055eec800 nid=0x4ff6 waiting on condition [0x0000000042349000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab62385f10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-5" prio=10 tid=0x0000000055b05800 nid=0x4ff5 waiting on condition [0x0000000042248000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab62385f10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-4" prio=10 tid=0x00002aabc029d000 nid=0x4ff4 waiting on condition [0x0000000042147000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab62385f10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-3" prio=10 tid=0x00002aabc0106800 nid=0x4ff3 waiting on condition [0x0000000042046000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab62385f10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-2" prio=10 tid=0x00002aabc02a0000 nid=0x4ff2 waiting on condition [0x0000000041d71000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab62385f10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-1" prio=10 tid=0x00002aabc00fb000 nid=0x4ff1 waiting on condition [0x0000000041c70000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab62385f10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "Hang checker" prio=10 tid=0x00002aabc0228800 nid=0x4ff0 in Object.wait() [0x0000000041b6f000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab61ed0718> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Timer.java:509) - locked <0x00002aab61ed0718> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) "Low Memory Detector" daemon prio=10 tid=0x00000000558b5000 nid=0x4fe8 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x00000000558b3000 nid=0x4fe7 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x00000000558ad800 nid=0x4fe6 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00000000558ab800 nid=0x4fe5 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x0000000055886800 nid=0x4fe4 in Object.wait() [0x0000000040d7e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab61ec8770> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00002aab61ec8770> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x0000000055884800 nid=0x4fe3 in Object.wait() [0x0000000040c7d000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab61ed0498> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00002aab61ed0498> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x0000000055823000 nid=0x4fdd in Object.wait() [0x0000000041f45000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab61ec8850> (a org.griphyn.vdl.karajan.VDL2ExecutionContext) at java.lang.Object.wait(Object.java:485) at org.globus.cog.karajan.workflow.ExecutionContext.waitFor(ExecutionContext.java:218) - locked <0x00002aab61ec8850> (a org.griphyn.vdl.karajan.VDL2ExecutionContext) at org.griphyn.vdl.karajan.Loader.main(Loader.java:199) "VM Thread" prio=10 tid=0x0000000055880800 nid=0x4fe2 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000055836000 nid=0x4fde runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000055838000 nid=0x4fdf runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x000000005583a000 nid=0x4fe0 runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x000000005583b800 nid=0x4fe1 runnable "VM Periodic Task Thread" prio=10 tid=0x00000000558c0000 nid=0x4fe9 waiting on condition JNI global references: 1621 2011/3/25 Ketan Maheshwari : > I did not understand this message. > > On Fri, Mar 25, 2011 at 7:42 PM, Allan Espinosa > wrote: >> >> this has been occurring for 70 times already. ?What i expect is for >> the app with SgtDim sub to run and close the future. >> >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker No events in 10s. >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker >> Registered futures: >> Rupture[] rups ?Closed, 1 elements, 0 listeners >> Variation vars - Closed, no listeners >> SgtDim sub - Open, 1 listeners >> string site ?Closed, no listeners >> Variation[] vars ?Closed, 72 elements, 0 listeners >> ---- >> >> Waiting threads: >> 0-13 >> 0-13-0-7 >> 0-13-0-8-1-1 >> ---- >> >> From ketancmaheshwari at gmail.com Sat Mar 26 00:28:04 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Sat, 26 Mar 2011 00:28:04 -0500 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1300750394.16266.2.camel@blabla2.none> References: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> <1300740277.15551.1.camel@blabla2.none> <1300750394.16266.2.camel@blabla2.none> Message-ID: Hi, The timer error: "Timer was cancelled" persists on beagle. Does anybody been able to resolve it so far? The stack I got today is quite similar to the previous ones: ===== Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) Got exception in send java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) java.lang.IllegalStateException: Timer was cancelled at java.util.Timer.scheduleImpl(Timer.java:564) at java.util.Timer.schedule(Timer.java:449) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) Canceling job 21995.sdb ===== Ketan On Mon, Mar 21, 2011 at 6:33 PM, Mihael Hategan wrote: > So it is related to PBS. Please send me logs with this problem. The > timer thing shouldn't be happening. > > On Mon, 2011-03-21 at 17:30 -0400, Glen Hocky wrote: > > I got rid of the problem by setting my project as per Ti's > > instructions > > > > > > projects --available > > projects --set PROJECT > > > > On Mon, Mar 21, 2011 at 4:44 PM, Mihael Hategan > > wrote: > > On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde wrote: > > > > > > > > > > Mihael, all, do you have any thoughts on what that might be? > > > > > > Yes, but what I think it is wouldn't (in any obvious way I can > > see) be > > influenced by changes in the environment. > > > > If you look at the logs, is there anything else suspicious > > going on? > > > > Mihael > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sat Mar 26 04:26:00 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 26 Mar 2011 02:26:00 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: References: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> <1300740277.15551.1.camel@blabla2.none> <1300750394.16266.2.camel@blabla2.none> Message-ID: <1301131560.2627.0.camel@blabla2.none> I will look at it on Sunday or Monday. Mihael On Sat, 2011-03-26 at 00:28 -0500, Ketan Maheshwari wrote: > Hi, > > The timer error: "Timer was cancelled" persists on beagle. Does > anybody been able to resolve it so far? > > > The stack I got today is quite similar to the previous ones: > > > ===== > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > $Sender.run(AbstractPipedChannel.java:115) > Got exception in send > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > $Sender.run(AbstractPipedChannel.java:115) > java.lang.IllegalStateException: Timer was cancelled > at java.util.Timer.scheduleImpl(Timer.java:564) > at java.util.Timer.schedule(Timer.java:449) > at > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > at > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > Canceling job 21995.sdb > > ===== > > > > Ketan > > > On Mon, Mar 21, 2011 at 6:33 PM, Mihael Hategan > wrote: > So it is related to PBS. Please send me logs with this > problem. The > timer thing shouldn't be happening. > > > On Mon, 2011-03-21 at 17:30 -0400, Glen Hocky wrote: > > I got rid of the problem by setting my project as per Ti's > > instructions > > > > > > projects --available > > projects --set PROJECT > > > > On Mon, Mar 21, 2011 at 4:44 PM, Mihael Hategan > > > wrote: > > On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde > wrote: > > > > > > > > > > Mihael, all, do you have any thoughts on what that > might be? > > > > > > Yes, but what I think it is wouldn't (in any obvious > way I can > > see) be > > influenced by changes in the environment. > > > > If you look at the logs, is there anything else > suspicious > > going on? > > > > Mihael > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Sat Mar 26 06:44:47 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 26 Mar 2011 06:44:47 -0500 (CDT) Subject: [Swift-devel] Swift hang checker In-Reply-To: <1299448004.16332.2.camel@blabla2.none> Message-ID: <2101943594.30155.1301139887859.JavaMail.root@zimbra.anl.gov> was: [Swift-devel] Re: Workflow waiting on condition hang I missed this when it was announced Mar 6 (email below). Sounds very useful. We should add a User Guide entry for this, with a few Swift deadlock examples and show users how to use the information to identify and correct the deadlock. How close to the Swift source code can we make the hang-checker messages, so that the user can relate it to Swift functions, expressions, and ideally source code lines? Ketan, please add this to the list of "cookbook" entries to merge into the User Guide, and I will file it in bugzilla. - Mike ----- Forwarded Message ----- From: "Mihael Hategan" To: "Jonathan Monette" Cc: "Swift Devel" Sent: Sunday, March 6, 2011 3:46:44 PM Subject: [Swift-devel] Re: Workflow waiting on condition hang Given that this does not seem to be a java deadlock, I added a hang checker to swift. If nothing is being executed inside karajan and no jobs are running in any ten second interval, it will dump future and thread information to the log file. This is in swift trunk r4171. Can you give that a try and report back the details? Mihael On Sat, 2011-02-19 at 14:54 -0600, Jonathan Monette wrote: > Yes. It always seems to hang at the same place. > > Attached is my montage script. It hangs in the mFitBatch function at > the mConcatFit app call. All other files have been created up to that > step but that app never runs. > > On 2/17/11 3:39 PM, Mihael Hategan wrote: > > On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote: > >> Hello, > >> My workflow seems to be hanging. This is trunk swift-r4107 and > >> cog-r3051. Attached is a compressed log file and the jstack output for > >> my workflow. The jstack file says it is waiting for a condition and my > >> workflow hangs. > > There's lots of stuff waiting because that's what they do when they > > don't have anything else to do. So I don't see a problem there. > > > > There are no jobs going to the coaster service, so clearly things aren't > > progressing. > > > > So now the question is: does this happen every time you run it or just > > some times? > > > > Also, please send the swift script. > > > > Mihael > > > > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sat Mar 26 07:02:42 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 26 Mar 2011 07:02:42 -0500 (CDT) Subject: [Swift-devel] Swift hang checker In-Reply-To: <951162664.30157.1301140931538.JavaMail.root@zimbra.anl.gov> Message-ID: <1221026352.30159.1301140962395.JavaMail.root@zimbra.anl.gov> As I was filing the bug I realized I mis-took Allan's posting of jstack output for the Hang Checker. The current Hang Checker output is actually *very* nice and useful already: Registered futures: Rupture[] rups Closed, 1 elements, 0 listeners Variation vars - Closed, no listeners SgtDim sub - Open, 1 listeners string site Closed, no listeners Variation[] vars Closed, 72 elements, 0 listeners Is it possible (and sensible) to add to this a dump or summary of the current Swift threads and the function call or expression they are running? Eg, from the output above, would one conclude that there is only one function hanging at the moment in this code: SgtDim sub - Open, 1 listeners Would knowing what expression (and line of code) is waiting on the variable "sub" be helpful? And possible to print? - Mike ----- Original Message ----- > was: [Swift-devel] Re: Workflow waiting on condition hang > > I missed this when it was announced Mar 6 (email below). Sounds very > useful. > > We should add a User Guide entry for this, with a few Swift deadlock > examples and show users how to use the information to identify and > correct the deadlock. > > How close to the Swift source code can we make the hang-checker > messages, so that the user can relate it to Swift functions, > expressions, and ideally source code lines? > > Ketan, please add this to the list of "cookbook" entries to merge into > the User Guide, and I will file it in bugzilla. > > - Mike > > > > ----- Forwarded Message ----- > From: "Mihael Hategan" > To: "Jonathan Monette" > Cc: "Swift Devel" > Sent: Sunday, March 6, 2011 3:46:44 PM > Subject: [Swift-devel] Re: Workflow waiting on condition hang > > Given that this does not seem to be a java deadlock, I added a hang > checker to swift. If nothing is being executed inside karajan and no > jobs are running in any ten second interval, it will dump future and > thread information to the log file. > > This is in swift trunk r4171. > > Can you give that a try and report back the details? > > Mihael > > On Sat, 2011-02-19 at 14:54 -0600, Jonathan Monette wrote: > > Yes. It always seems to hang at the same place. > > > > Attached is my montage script. It hangs in the mFitBatch function at > > the mConcatFit app call. All other files have been created up to > > that > > step but that app never runs. > > > > On 2/17/11 3:39 PM, Mihael Hategan wrote: > > > On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote: > > >> Hello, > > >> My workflow seems to be hanging. This is trunk swift-r4107 > > >> and > > >> cog-r3051. Attached is a compressed log file and the jstack > > >> output for > > >> my workflow. The jstack file says it is waiting for a condition > > >> and my > > >> workflow hangs. > > > There's lots of stuff waiting because that's what they do when > > > they > > > don't have anything else to do. So I don't see a problem there. > > > > > > There are no jobs going to the coaster service, so clearly things > > > aren't > > > progressing. > > > > > > So now the question is: does this happen every time you run it or > > > just > > > some times? > > > > > > Also, please send the swift script. > > > > > > Mihael > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Sat Mar 26 07:03:46 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 26 Mar 2011 07:03:46 -0500 (CDT) Subject: [Swift-devel] [Bug 275] New: Document the Swift Hang Checker and improve its messages Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=275 Summary: Document the Swift Hang Checker and improve its messages Product: Swift Version: 0.93 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: ketan at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov was: [Swift-devel] Re: Workflow waiting on condition hang I missed this when it was announced Mar 6 (email below). Sounds very useful. We should add a User Guide entry for this, with a few Swift deadlock examples and show users how to use the information to identify and correct the deadlock. How close to the Swift source code can we make the hang-checker messages, so that the user can relate it to Swift functions, expressions, and ideally source code lines? Ketan, please add this to the list of "cookbook" entries to merge into the User Guide, and I will file it in bugzilla. - Mike The current Hang Checker output is actually *very* nice and useful already: Registered futures: Rupture[] rups Closed, 1 elements, 0 listeners Variation vars - Closed, no listeners SgtDim sub - Open, 1 listeners string site Closed, no listeners Variation[] vars Closed, 72 elements, 0 listeners Is it possible (and sensible) to add to this a dump or summary of the current Swift threads and the function call or expression they are running? Eg, from the output above, would one conclude that there is only one function hanging at the moment in this code: SgtDim sub - Open, 1 listeners Would knowing what expression (and line of code) is waiting on the variable "sub" be helpful? And possible to print? - Mike ----- Forwarded Message ----- From: "Mihael Hategan" To: "Jonathan Monette" Cc: "Swift Devel" Sent: Sunday, March 6, 2011 3:46:44 PM Subject: [Swift-devel] Re: Workflow waiting on condition hang Given that this does not seem to be a java deadlock, I added a hang checker to swift. If nothing is being executed inside karajan and no jobs are running in any ten second interval, it will dump future and thread information to the log file. This is in swift trunk r4171. Can you give that a try and report back the details? Mihael On Sat, 2011-02-19 at 14:54 -0600, Jonathan Monette wrote: > Yes. It always seems to hang at the same place. > > Attached is my montage script. It hangs in the mFitBatch function at > the mConcatFit app call. All other files have been created up to that > step but that app never runs. > > On 2/17/11 3:39 PM, Mihael Hategan wrote: > > On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote: > >> Hello, > >> My workflow seems to be hanging. This is trunk swift-r4107 and > >> cog-r3051. Attached is a compressed log file and the jstack output for > >> my workflow. The jstack file says it is waiting for a condition and my > >> workflow hangs. > > There's lots of stuff waiting because that's what they do when they > > don't have anything else to do. So I don't see a problem there. > > > > There are no jobs going to the coaster service, so clearly things aren't > > progressing. > > > > So now the question is: does this happen every time you run it or just > > some times? > > > > Also, please send the swift script. > > > > Mihael > > > > ============ Here is an example of its current output: ----- Forwarded Message ----- From: "Allan Espinosa" To: "swift-devel" Sent: Friday, March 25, 2011 7:42:30 PM Subject: [Swift-devel] hang checker fun this has been occurring for 70 times already. What i expect is for the app with SgtDim sub to run and close the future. 2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s. 2011-03-25 19:40:12,217-0500 WARN HangChecker Registered futures: Rupture[] rups Closed, 1 elements, 0 listeners Variation vars - Closed, no listeners SgtDim sub - Open, 1 listeners string site Closed, no listeners Variation[] vars Closed, 72 elements, 0 listeners ---- Waiting threads: 0-13 0-13-0-7 0-13-0-8-1-1 ---- -- Allan M. Espinosa PhD student, Computer Science University of Chicago _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory ---------------- and more: -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From ketancmaheshwari at gmail.com Sat Mar 26 10:07:50 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Sat, 26 Mar 2011 10:07:50 -0500 Subject: [Swift-devel] Swift hang checker In-Reply-To: <2101943594.30155.1301139887859.JavaMail.root@zimbra.anl.gov> References: <1299448004.16332.2.camel@blabla2.none> <2101943594.30155.1301139887859.JavaMail.root@zimbra.anl.gov> Message-ID: On Sat, Mar 26, 2011 at 6:44 AM, Michael Wilde wrote: > was: [Swift-devel] Re: Workflow waiting on condition hang > > I missed this when it was announced Mar 6 (email below). Sounds very > useful. > > We should add a User Guide entry for this, with a few Swift deadlock > examples and show users how to use the information to identify and correct > the deadlock. > > How close to the Swift source code can we make the hang-checker messages, > so that the user can relate it to Swift functions, expressions, and ideally > source code lines? > > Ketan, please add this to the list of "cookbook" entries to merge into the > User Guide, and I will file it in bugzilla. > I have added this on my list. I will take care of this. > > - Mike > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Sat Mar 26 10:47:04 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sat, 26 Mar 2011 10:47:04 -0500 (Central Daylight Time) Subject: [Swift-devel] cog 4.1.8 + release-0.92 branch build failure In-Reply-To: References: Message-ID: Sorry about that- there was a minor merge error. Please update cog and try again. Thanks for the report Justin On Fri, 25 Mar 2011, Allan Espinosa wrote: > delete.jar: > [echo] [karajan]: DELETE.JAR (cog-karajan-0.36-dev.jar) > > compile: > [echo] [karajan]: COMPILE > [mkdir] Created dir: > /autonfs/home/aespinosa/swift/cogkit/modules/karajan/build > [javac] Compiling 493 source files to > /autonfs/home/aespinosa/swift/cogkit/modules/karajan/build > [javac] /autonfs/home/aespinosa/swift/cogkit/modules/karajan/src/org/globus/cog/karajan/workflow/nodes/grid/AbstractGridNode.java:189: > setStack(org.globus.cog.abstraction.interfaces.Task,org.globus.cog.karajan.stack.VariableStack) > is already defined in > org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode > [javac] protected final void setStack(Task task, VariableStack stack) { > [javac] ^ > [javac] Note: Some input files use or override a deprecated API. > [javac] Note: Recompile with -Xlint:deprecation for details. > [javac] Note: Some input files use unchecked or unsafe operations. > [javac] Note: Recompile with -Xlint:unchecked for details. > [javac] 1 error -- Justin M Wozniak From bugzilla-daemon at mcs.anl.gov Sun Mar 27 12:52:47 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 27 Mar 2011 12:52:47 -0500 (CDT) Subject: [Swift-devel] [Bug 277] New: Swift gives misleading error message when sites file is missing tag Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=277 Summary: Swift gives misleading error message when sites file is missing tag Product: Swift Version: 0.93 Platform: All OS/Version: All Status: NEW Severity: blocker Priority: P2 Component: General AssignedTo: skenny at uchicago.edu ReportedBy: wilde at mcs.anl.gov >From email to a user: User was getting error msg: 2011-03-27 11:06:56,449-0500 DEBUG VDL2ExecutionContext task:resources @ vdl-sc.k, line: 111: Unexpected argument localhost(coaster) The immediate problem was that you were missing the outer XML tag brackets of: ... ...around your tag. A sites file needs to start and end with the config tag. All the pools should be inside this tag. The error message given by Swift for this was terribly misleading. I will file in bugzilla. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From hategan at mcs.anl.gov Sun Mar 27 15:40:05 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 13:40:05 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1301131560.2627.0.camel@blabla2.none> References: <458231440.5018.1300731836884.JavaMail.root@zimbra.anl.gov> <1300740277.15551.1.camel@blabla2.none> <1300750394.16266.2.camel@blabla2.none> <1301131560.2627.0.camel@blabla2.none> Message-ID: <1301258405.1584.0.camel@blabla2.none> Ok. I need logs. Logs please. Logs!!! On Sat, 2011-03-26 at 02:26 -0700, Mihael Hategan wrote: > I will look at it on Sunday or Monday. > > Mihael > > On Sat, 2011-03-26 at 00:28 -0500, Ketan Maheshwari wrote: > > Hi, > > > > The timer error: "Timer was cancelled" persists on beagle. Does > > anybody been able to resolve it so far? > > > > > > The stack I got today is quite similar to the previous ones: > > > > > > ===== > > Got exception in send > > java.lang.IllegalStateException: Timer was cancelled > > at java.util.Timer.scheduleImpl(Timer.java:564) > > at java.util.Timer.schedule(Timer.java:449) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > > $Sender.run(AbstractPipedChannel.java:115) > > Got exception in send > > java.lang.IllegalStateException: Timer was cancelled > > at java.util.Timer.scheduleImpl(Timer.java:564) > > at java.util.Timer.schedule(Timer.java:449) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > > $Sender.run(AbstractPipedChannel.java:115) > > java.lang.IllegalStateException: Timer was cancelled > > at java.util.Timer.scheduleImpl(Timer.java:564) > > at java.util.Timer.schedule(Timer.java:449) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > at > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > > Canceling job 21995.sdb > > > > ===== > > > > > > > > Ketan > > > > > > On Mon, Mar 21, 2011 at 6:33 PM, Mihael Hategan > > wrote: > > So it is related to PBS. Please send me logs with this > > problem. The > > timer thing shouldn't be happening. > > > > > > On Mon, 2011-03-21 at 17:30 -0400, Glen Hocky wrote: > > > I got rid of the problem by setting my project as per Ti's > > > instructions > > > > > > > > > projects --available > > > projects --set PROJECT > > > > > > On Mon, Mar 21, 2011 at 4:44 PM, Mihael Hategan > > > > > wrote: > > > On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde > > wrote: > > > > > > > > > > > > > > Mihael, all, do you have any thoughts on what that > > might be? > > > > > > > > > Yes, but what I think it is wouldn't (in any obvious > > way I can > > > see) be > > > influenced by changes in the environment. > > > > > > If you look at the logs, is there anything else > > suspicious > > > going on? > > > > > > Mihael > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun Mar 27 15:41:53 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 13:41:53 -0700 Subject: [Swift-devel] hang checker fun In-Reply-To: References: Message-ID: <1301258513.1584.1.camel@blabla2.none> May I see the script? On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: > this has been occurring for 70 times already. What i expect is for > the app with SgtDim sub to run and close the future. > > 2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s. > 2011-03-25 19:40:12,217-0500 WARN HangChecker > Registered futures: > Rupture[] rups Closed, 1 elements, 0 listeners > Variation vars - Closed, no listeners > SgtDim sub - Open, 1 listeners > string site Closed, no listeners > Variation[] vars Closed, 72 elements, 0 listeners > ---- > > Waiting threads: > 0-13 > 0-13-0-7 > 0-13-0-8-1-1 > ---- > > From hategan at mcs.anl.gov Sun Mar 27 15:45:48 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 13:45:48 -0700 Subject: [Swift-devel] hanging problem In-Reply-To: References: Message-ID: <1301258748.1584.3.camel@blabla2.none> On Wed, 2011-03-23 at 16:31 -0500, Jonathan Monette wrote: > How can the array be closed but all of its values not be? The array being closed simply means that its size is known, but not necessarily that its elements have all been computed. I'll look at the log, but I'd also like the entire script. Mihael From wilde at mcs.anl.gov Sun Mar 27 15:49:42 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 27 Mar 2011 15:49:42 -0500 (CDT) Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1301258405.1584.0.camel@blabla2.none> Message-ID: <809261295.31771.1301258982970.JavaMail.root@zimbra.anl.gov> On CI net, in /home/wilde/swift/lab, you can find "logs. Logs please. Logs!!!" :) login1$ grep -i 'timer.*cancel' catsn*.log catsn-20110320-2218-tjaeo8md.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2218-tjaeo8md.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2218-tjaeo8md.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2231-o3tft91h.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2231-o3tft91h.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2257-tpoi32je.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2257-tpoi32je.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110321-1600-3j1159na.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110321-1600-3j1159na.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled login1$ ----- Original Message ----- > Ok. I need logs. Logs please. Logs!!! > > On Sat, 2011-03-26 at 02:26 -0700, Mihael Hategan wrote: > > I will look at it on Sunday or Monday. > > > > Mihael > > > > On Sat, 2011-03-26 at 00:28 -0500, Ketan Maheshwari wrote: > > > Hi, > > > > > > The timer error: "Timer was cancelled" persists on beagle. Does > > > anybody been able to resolve it so far? > > > > > > > > > The stack I got today is quite similar to the previous ones: > > > > > > > > > ===== > > > Got exception in send > > > java.lang.IllegalStateException: Timer was cancelled > > > at java.util.Timer.scheduleImpl(Timer.java:564) > > > at java.util.Timer.schedule(Timer.java:449) > > > at > > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > > at > > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > > > $Sender.run(AbstractPipedChannel.java:115) > > > Got exception in send > > > java.lang.IllegalStateException: Timer was cancelled > > > at java.util.Timer.scheduleImpl(Timer.java:564) > > > at java.util.Timer.schedule(Timer.java:449) > > > at > > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > > at > > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > > > $Sender.run(AbstractPipedChannel.java:115) > > > java.lang.IllegalStateException: Timer was cancelled > > > at java.util.Timer.scheduleImpl(Timer.java:564) > > > at java.util.Timer.schedule(Timer.java:449) > > > at > > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > > at > > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > > > Canceling job 21995.sdb > > > > > > ===== > > > > > > > > > > > > Ketan > > > > > > > > > On Mon, Mar 21, 2011 at 6:33 PM, Mihael Hategan > > > > > > wrote: > > > So it is related to PBS. Please send me logs with this > > > problem. The > > > timer thing shouldn't be happening. > > > > > > > > > On Mon, 2011-03-21 at 17:30 -0400, Glen Hocky wrote: > > > > I got rid of the problem by setting my project as per > > > > Ti's > > > > instructions > > > > > > > > > > > > projects --available > > > > projects --set PROJECT > > > > > > > > On Mon, Mar 21, 2011 at 4:44 PM, Mihael Hategan > > > > > > > wrote: > > > > On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde > > > wrote: > > > > > > > > > > > > > > > > > > Mihael, all, do you have any thoughts on what > > > > > that > > > might be? > > > > > > > > > > > > Yes, but what I think it is wouldn't (in any > > > > obvious > > > way I can > > > > see) be > > > > influenced by changes in the environment. > > > > > > > > If you look at the logs, is there anything else > > > suspicious > > > > going on? > > > > > > > > Mihael > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Sun Mar 27 15:54:35 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 27 Mar 2011 15:54:35 -0500 Subject: [Swift-devel] hanging problem In-Reply-To: <1301258748.1584.3.camel@blabla2.none> References: <1301258748.1584.3.camel@blabla2.none> Message-ID: here is my entire script On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan wrote: > On Wed, 2011-03-23 at 16:31 -0500, Jonathan Monette wrote: > > > How can the array be closed but all of its values not be? > > The array being closed simply means that its size is known, but not > necessarily that its elements have all been computed. > > I'll look at the log, but I'd also like the entire script. > > Mihael > > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: montage.swift Type: application/octet-stream Size: 5766 bytes Desc: not available URL: From wilde at mcs.anl.gov Sun Mar 27 15:55:41 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 27 Mar 2011 15:55:41 -0500 (CDT) Subject: [Swift-devel] Fwd: Proposal for coaster service options In-Reply-To: <707163068.205329.1300460327403.JavaMail.root@zimbra.anl.gov> Message-ID: <1770867299.31773.1301259341372.JavaMail.root@zimbra.anl.gov> Mihael, I want to also bring this development task back to your attention. It may need some further discussion on swift-devel. Can you send an update about any issues? We should make sure that the changes you make fit into and support the evolving scripts for coaster execution, so that your effort on this is directly useful to the end user tools. Justin and Ketan have both looked at those tools lately and may want to comment. - Mike ----- Forwarded Message ----- From: "Michael Wilde" To: "Mihael Hategan" Cc: "Swift Devel" Sent: Friday, March 18, 2011 9:58:47 AM Subject: Re: Proposal for coaster service options ----- Original Message ----- > On Thu, 2011-03-17 at 23:43 -0500, Michael Wilde wrote: > > > > One other item that came up in yesterday's meeting was the set > > > > of > > > > command line features to add to coaster-service (and to swift > > > > itself > > > > which we didnt mention) to put the integrated coaster service > > > > into > > > > passive mode and to make it save port numbers in a file for > > > > integration with scripts. > > > > > > > > That might be a good task to do soon if its easy/feasible. > > > > > > Yes. Seems like a quick and useful thing to have. Though doesn't > > > the > > > sites.xml scheme work in this case? > > > > By "this case" do you mean the case where the coaster service is > > running in the Swift JVM? I.e. from jobmanager=local:something in > > the > > coaster pool entry? > > I think I'm misunderstanding the issue. > > Are you referring to having the standalone service configured for > passive mode? Yes. The original mail I sent, proposing new command line options, was referring entirely to the coaster-service command. In a later email, I realized that some of those issues might apply to the coaster service when running within the swift command's jvm as well. - it seems that some or all of port management options (for setting and reporting port numbers) may apply to swift as well - its likely that the option to set passive *does not* apply, as it already works. I think I was confused on the various combinations when I brought that up. Since currently we get the standalone service to enter passive mode by running a swift script that has passive mode set in the sites entry for that service, I realized on reflection that setting the passive option when the coaster service is running with the swift command JVM *must* be working correctly. It would be good to verify and create tests for this, but that is my current assumption. Related to all this: I think that to do this job fully, we need to complete the set of wrapper commands that make manually run coasters an end-user-ready feature. And then to create scripts in the test framework to verify that they work. That will take more discussion, specification work, and devel time. But I feel we need to now get this feature completed and working, as there is user need for it. Mihael, if you can get the changes into coaster-service and the swift command, I think others can get the wrapper scripts done and tested. There is I think a prototype for this command support somewhere (Justin, you reminded me of these a few days ago: can you point out where they are?) - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sun Mar 27 15:56:35 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 27 Mar 2011 15:56:35 -0500 (CDT) Subject: [Swift-devel] Fwd: Proposal for coaster service options In-Reply-To: <970913622.205347.1300460386760.JavaMail.root@zimbra.anl.gov> Message-ID: <1896142109.31775.1301259395486.JavaMail.root@zimbra.anl.gov> Related to my just-prior post: ----- Forwarded Message ----- From: "Michael Wilde" To: "Mihael Hategan" Cc: "Swift Devel" Sent: Friday, March 18, 2011 9:59:46 AM Subject: Re: Proposal for coaster service options ----- Forwarded Message ----- From: "Justin M Wozniak" To: "Michael Wilde" Sent: Friday, March 18, 2011 9:51:13 AM Subject: Re: WHere are manual coaster startup scripts? They're in: https://svn.ci.uchicago.edu/svn/vdl2/usertools/persistent-coasters ----- Original Message ----- > ----- Original Message ----- > > On Thu, 2011-03-17 at 23:43 -0500, Michael Wilde wrote: > > > > > One other item that came up in yesterday's meeting was the set > > > > > of > > > > > command line features to add to coaster-service (and to swift > > > > > itself > > > > > which we didnt mention) to put the integrated coaster service > > > > > into > > > > > passive mode and to make it save port numbers in a file for > > > > > integration with scripts. > > > > > > > > > > That might be a good task to do soon if its easy/feasible. > > > > > > > > Yes. Seems like a quick and useful thing to have. Though doesn't > > > > the > > > > sites.xml scheme work in this case? > > > > > > By "this case" do you mean the case where the coaster service is > > > running in the Swift JVM? I.e. from jobmanager=local:something in > > > the > > > coaster pool entry? > > > > I think I'm misunderstanding the issue. > > > > Are you referring to having the standalone service configured for > > passive mode? > > Yes. The original mail I sent, proposing new command line options, was > referring entirely to the coaster-service command. > > In a later email, I realized that some of those issues might apply to > the coaster service when running within the swift command's jvm as > well. > > - it seems that some or all of port management options (for setting > and reporting port numbers) may apply to swift as well > > - its likely that the option to set passive *does not* apply, as it > already works. I think I was confused on the various combinations when > I brought that up. Since currently we get the standalone service to > enter passive mode by running a swift script that has passive mode set > in the sites entry for that service, I realized on reflection that > setting the passive option when the coaster service is running with > the swift command JVM *must* be working correctly. It would be good to > verify and create tests for this, but that is my current assumption. > > Related to all this: I think that to do this job fully, we need to > complete the set of wrapper commands that make manually run coasters > an end-user-ready feature. And then to create scripts in the test > framework to verify that they work. That will take more discussion, > specification work, and devel time. But I feel we need to now get this > feature completed and working, as there is user need for it. > > Mihael, if you can get the changes into coaster-service and the swift > command, I think others can get the wrapper scripts done and tested. > > There is I think a prototype for this command support somewhere > (Justin, you reminded me of these a few days ago: can you point out > where they are?) > > - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Sun Mar 27 15:58:12 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 13:58:12 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <809261295.31771.1301258982970.JavaMail.root@zimbra.anl.gov> References: <809261295.31771.1301258982970.JavaMail.root@zimbra.anl.gov> Message-ID: <1301259492.17525.2.camel@blabla2.none> Thank you good sir! It seems that the problem occurs when the system is shutting down. During JVM finalization it turns out that timers cannot be used. When they are, the "Timer was cancelled" error appears. I'll see how this can be fixed. On Sun, 2011-03-27 at 15:49 -0500, Michael Wilde wrote: > On CI net, in /home/wilde/swift/lab, you can find "logs. Logs please. Logs!!!" :) > > login1$ grep -i 'timer.*cancel' catsn*.log > catsn-20110320-2218-tjaeo8md.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2218-tjaeo8md.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2218-tjaeo8md.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2231-o3tft91h.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2231-o3tft91h.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2257-tpoi32je.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2257-tpoi32je.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110321-1600-3j1159na.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110321-1600-3j1159na.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled > catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled > login1$ > > > ----- Original Message ----- > > Ok. I need logs. Logs please. Logs!!! > > > > On Sat, 2011-03-26 at 02:26 -0700, Mihael Hategan wrote: > > > I will look at it on Sunday or Monday. > > > > > > Mihael > > > > > > On Sat, 2011-03-26 at 00:28 -0500, Ketan Maheshwari wrote: > > > > Hi, > > > > > > > > The timer error: "Timer was cancelled" persists on beagle. Does > > > > anybody been able to resolve it so far? > > > > > > > > > > > > The stack I got today is quite similar to the previous ones: > > > > > > > > > > > > ===== > > > > Got exception in send > > > > java.lang.IllegalStateException: Timer was cancelled > > > > at java.util.Timer.scheduleImpl(Timer.java:564) > > > > at java.util.Timer.schedule(Timer.java:449) > > > > at > > > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > > > at > > > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > > > > $Sender.run(AbstractPipedChannel.java:115) > > > > Got exception in send > > > > java.lang.IllegalStateException: Timer was cancelled > > > > at java.util.Timer.scheduleImpl(Timer.java:564) > > > > at java.util.Timer.schedule(Timer.java:449) > > > > at > > > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > > > at > > > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > > > > $Sender.run(AbstractPipedChannel.java:115) > > > > java.lang.IllegalStateException: Timer was cancelled > > > > at java.util.Timer.scheduleImpl(Timer.java:564) > > > > at java.util.Timer.schedule(Timer.java:449) > > > > at > > > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > > > at > > > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > > > > Canceling job 21995.sdb > > > > > > > > ===== > > > > > > > > > > > > > > > > Ketan > > > > > > > > > > > > On Mon, Mar 21, 2011 at 6:33 PM, Mihael Hategan > > > > > > > > wrote: > > > > So it is related to PBS. Please send me logs with this > > > > problem. The > > > > timer thing shouldn't be happening. > > > > > > > > > > > > On Mon, 2011-03-21 at 17:30 -0400, Glen Hocky wrote: > > > > > I got rid of the problem by setting my project as per > > > > > Ti's > > > > > instructions > > > > > > > > > > > > > > > projects --available > > > > > projects --set PROJECT > > > > > > > > > > On Mon, Mar 21, 2011 at 4:44 PM, Mihael Hategan > > > > > > > > > wrote: > > > > > On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Mihael, all, do you have any thoughts on what > > > > > > that > > > > might be? > > > > > > > > > > > > > > > Yes, but what I think it is wouldn't (in any > > > > > obvious > > > > way I can > > > > > see) be > > > > > influenced by changes in the environment. > > > > > > > > > > If you look at the logs, is there anything else > > > > suspicious > > > > > going on? > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Sun Mar 27 15:59:05 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 13:59:05 -0700 Subject: [Swift-devel] Re: Fwd: Proposal for coaster service options In-Reply-To: <1770867299.31773.1301259341372.JavaMail.root@zimbra.anl.gov> References: <1770867299.31773.1301259341372.JavaMail.root@zimbra.anl.gov> Message-ID: <1301259545.17525.3.camel@blabla2.none> Yes. I'll do that once I deal with this new set of bugs/hangs. I think by the end of this week if not sooner. Mihael On Sun, 2011-03-27 at 15:55 -0500, Michael Wilde wrote: > Mihael, I want to also bring this development task back to your attention. It may need some further discussion on swift-devel. > > Can you send an update about any issues? > > We should make sure that the changes you make fit into and support the evolving scripts for coaster execution, so that your effort on this is directly useful to the end user tools. > > Justin and Ketan have both looked at those tools lately and may want to comment. > > - Mike > > ----- Forwarded Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Friday, March 18, 2011 9:58:47 AM > Subject: Re: Proposal for coaster service options > > ----- Original Message ----- > > On Thu, 2011-03-17 at 23:43 -0500, Michael Wilde wrote: > > > > > One other item that came up in yesterday's meeting was the set > > > > > of > > > > > command line features to add to coaster-service (and to swift > > > > > itself > > > > > which we didnt mention) to put the integrated coaster service > > > > > into > > > > > passive mode and to make it save port numbers in a file for > > > > > integration with scripts. > > > > > > > > > > That might be a good task to do soon if its easy/feasible. > > > > > > > > Yes. Seems like a quick and useful thing to have. Though doesn't > > > > the > > > > sites.xml scheme work in this case? > > > > > > By "this case" do you mean the case where the coaster service is > > > running in the Swift JVM? I.e. from jobmanager=local:something in > > > the > > > coaster pool entry? > > > > I think I'm misunderstanding the issue. > > > > Are you referring to having the standalone service configured for > > passive mode? > > Yes. The original mail I sent, proposing new command line options, was referring entirely to the coaster-service command. > > In a later email, I realized that some of those issues might apply to the coaster service when running within the swift command's jvm as well. > > - it seems that some or all of port management options (for setting and reporting port numbers) may apply to swift as well > > - its likely that the option to set passive *does not* apply, as it already works. I think I was confused on the various combinations when I brought that up. Since currently we get the standalone service to enter passive mode by running a swift script that has passive mode set in the sites entry for that service, I realized on reflection that setting the passive option when the coaster service is running with the swift command JVM *must* be working correctly. It would be good to verify and create tests for this, but that is my current assumption. > > Related to all this: I think that to do this job fully, we need to complete the set of wrapper commands that make manually run coasters an end-user-ready feature. And then to create scripts in the test framework to verify that they work. That will take more discussion, specification work, and devel time. But I feel we need to now get this feature completed and working, as there is user need for it. > > Mihael, if you can get the changes into coaster-service and the swift command, I think others can get the wrapper scripts done and tested. > > There is I think a prototype for this command support somewhere (Justin, you reminded me of these a few days ago: can you point out where they are?) > > - Mike > From hategan at mcs.anl.gov Sun Mar 27 16:23:58 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 14:23:58 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1301259492.17525.2.camel@blabla2.none> References: <809261295.31771.1301258982970.JavaMail.root@zimbra.anl.gov> <1301259492.17525.2.camel@blabla2.none> Message-ID: <1301261038.12361.0.camel@blabla2.none> Actually I'm not so sure any more. My java Timer does not seem to have a scheduleImpl method. What version of java is this? Mihael On Sun, 2011-03-27 at 13:58 -0700, Mihael Hategan wrote: > Thank you good sir! > > It seems that the problem occurs when the system is shutting down. > During JVM finalization it turns out that timers cannot be used. When > they are, the "Timer was cancelled" error appears. > > I'll see how this can be fixed. > > On Sun, 2011-03-27 at 15:49 -0500, Michael Wilde wrote: > > On CI net, in /home/wilde/swift/lab, you can find "logs. Logs please. Logs!!!" :) > > > > login1$ grep -i 'timer.*cancel' catsn*.log > > catsn-20110320-2218-tjaeo8md.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2218-tjaeo8md.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2218-tjaeo8md.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2231-o3tft91h.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2231-o3tft91h.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2257-tpoi32je.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2257-tpoi32je.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2258-02gmt2c7.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110320-2258-2ued34uf.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110321-1600-3j1159na.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110321-1600-3j1159na.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled > > catsn-20110324-1026-x6yxif12.log:java.lang.IllegalStateException: Timer was cancelled > > login1$ > > > > > > ----- Original Message ----- > > > Ok. I need logs. Logs please. Logs!!! > > > > > > On Sat, 2011-03-26 at 02:26 -0700, Mihael Hategan wrote: > > > > I will look at it on Sunday or Monday. > > > > > > > > Mihael > > > > > > > > On Sat, 2011-03-26 at 00:28 -0500, Ketan Maheshwari wrote: > > > > > Hi, > > > > > > > > > > The timer error: "Timer was cancelled" persists on beagle. Does > > > > > anybody been able to resolve it so far? > > > > > > > > > > > > > > > The stack I got today is quite similar to the previous ones: > > > > > > > > > > > > > > > ===== > > > > > Got exception in send > > > > > java.lang.IllegalStateException: Timer was cancelled > > > > > at java.util.Timer.scheduleImpl(Timer.java:564) > > > > > at java.util.Timer.schedule(Timer.java:449) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > > > > > $Sender.run(AbstractPipedChannel.java:115) > > > > > Got exception in send > > > > > java.lang.IllegalStateException: Timer was cancelled > > > > > at java.util.Timer.scheduleImpl(Timer.java:564) > > > > > at java.util.Timer.schedule(Timer.java:449) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:89) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel > > > > > $Sender.run(AbstractPipedChannel.java:115) > > > > > java.lang.IllegalStateException: Timer was cancelled > > > > > at java.util.Timer.scheduleImpl(Timer.java:564) > > > > > at java.util.Timer.schedule(Timer.java:449) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:155) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:149) > > > > > at > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > > > > > Canceling job 21995.sdb > > > > > > > > > > ===== > > > > > > > > > > > > > > > > > > > > Ketan > > > > > > > > > > > > > > > On Mon, Mar 21, 2011 at 6:33 PM, Mihael Hategan > > > > > > > > > > wrote: > > > > > So it is related to PBS. Please send me logs with this > > > > > problem. The > > > > > timer thing shouldn't be happening. > > > > > > > > > > > > > > > On Mon, 2011-03-21 at 17:30 -0400, Glen Hocky wrote: > > > > > > I got rid of the problem by setting my project as per > > > > > > Ti's > > > > > > instructions > > > > > > > > > > > > > > > > > > projects --available > > > > > > projects --set PROJECT > > > > > > > > > > > > On Mon, Mar 21, 2011 at 4:44 PM, Mihael Hategan > > > > > > > > > > > wrote: > > > > > > On Mon, 2011-03-21 at 13:23 -0500, Michael Wilde > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Mihael, all, do you have any thoughts on what > > > > > > > that > > > > > might be? > > > > > > > > > > > > > > > > > > Yes, but what I think it is wouldn't (in any > > > > > > obvious > > > > > way I can > > > > > > see) be > > > > > > influenced by changes in the environment. > > > > > > > > > > > > If you look at the logs, is there anything else > > > > > suspicious > > > > > > going on? > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From aespinosa at cs.uchicago.edu Sun Mar 27 16:55:01 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Sun, 27 Mar 2011 16:55:01 -0500 Subject: [Swift-devel] hang checker fun In-Reply-To: <1301258513.1584.1.camel@blabla2.none> References: <1301258513.1584.1.camel@blabla2.none> Message-ID: Here it is. The get_app() calls are simple wrappers to readData() type offset { int off; int size; } type offset_file; (offset _off[]) mkoffset(int _size, int _group_size) { offset_file file ; file = mkoffset_file(_size, _group_size); _off = readData(file); } app (offset_file _off) mkoffset_file(int _size, int _group_size) { mkoffset _size _group_size; } /* TODO: data management zip jobs */ /* Main program */ int run_id = 664; int agg_size = 80; int loc_size = 20; string datadir = "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; Station site = get_site(run_id); Sgt sgt_var ; Rupture rups[] = get_ruptures(run_id, site); foreach rup in rups { string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, "/", rup.index); Sgt sub ; string var_str[] = get_variations( site, rup, "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" ); Variation vars[] ; sub = extract(sgt_var, site, vars[rup.size-1]); string seis_str[]; string peak_str[]; foreach var,i in vars { seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, "_", rup.index, "_", i, ".grm"); peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, "_", rup.index, "_", i, ".bsa"); } Seismogram seis[] ; PeakValue peak[] ; if(rup.size <= loc_size) { /* * Not worth to transfer the data. Execute on TeraGrid instead. * Also execute on localhost. */ foreach var,i in vars { (seis[i], peak[i]) = seispeak_local(sub, var, site); } } else {if(rup.size <= agg_size) { /* Execute on a single resource */ (seis, peak) = seispeak_agg(sub, vars, site, rup.size); } else { /*offset offs[] = mkoffset(rup.size, agg_size);*/ /*for i in offs {*/ /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ /*off.size);*/ /*}*/ }} } 2011/3/27 Mihael Hategan : > May I see the script? > > On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: >> this has been occurring for 70 times already. ?What i expect is for >> the app with SgtDim sub to run and close the future. >> >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker No events in 10s. >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker >> Registered futures: >> Rupture[] rups ?Closed, 1 elements, 0 listeners >> Variation vars - Closed, no listeners >> SgtDim sub - Open, 1 listeners >> string site ?Closed, no listeners >> Variation[] vars ?Closed, 72 elements, 0 listeners >> ---- >> >> Waiting threads: >> 0-13 >> 0-13-0-7 >> 0-13-0-8-1-1 >> ---- >> >> > > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Sun Mar 27 17:35:43 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 15:35:43 -0700 Subject: [Swift-devel] hang checker fun In-Reply-To: References: <1301258513.1584.1.camel@blabla2.none> Message-ID: <1301265343.32276.0.camel@blabla2.none> I don't believe you. There is no SgtDim data type in that script. Mihael On Sun, 2011-03-27 at 16:55 -0500, Allan Espinosa wrote: > Here it is. The get_app() calls are simple wrappers to readData() > > type offset { > int off; > int size; > } > type offset_file; > (offset _off[]) mkoffset(int _size, int _group_size) { > offset_file file ; > file = mkoffset_file(_size, _group_size); > _off = readData(file); > } > app (offset_file _off) mkoffset_file(int _size, int _group_size) { > mkoffset _size _group_size; > } > > /* TODO: data management zip jobs */ > > /* Main program */ > int run_id = 664; > int agg_size = 80; > int loc_size = 20; > string datadir = > "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; > > Station site = get_site(run_id); > > Sgt sgt_var l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; > Rupture rups[] = get_ruptures(run_id, site); > > foreach rup in rups { > string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, > "/", rup.index); > Sgt sub r=rup.index>; > string var_str[] = get_variations( site, rup, > "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" > ); > Variation vars[] ; > > sub = extract(sgt_var, site, vars[rup.size-1]); > > string seis_str[]; > string peak_str[]; > > foreach var,i in vars { > seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > "_", rup.index, "_", i, ".grm"); > peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > "_", rup.index, "_", i, ".bsa"); > } > > Seismogram seis[] ; > PeakValue peak[] ; > > if(rup.size <= loc_size) { > /* > * Not worth to transfer the data. Execute on TeraGrid instead. > * Also execute on localhost. > */ > foreach var,i in vars { > (seis[i], peak[i]) = seispeak_local(sub, var, site); > } > } else {if(rup.size <= agg_size) { > /* Execute on a single resource */ > (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > } else { > /*offset offs[] = mkoffset(rup.size, agg_size);*/ > /*for i in offs {*/ > /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ > /*off.size);*/ > /*}*/ > }} > } > > > 2011/3/27 Mihael Hategan : > > May I see the script? > > > > On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: > >> this has been occurring for 70 times already. What i expect is for > >> the app with SgtDim sub to run and close the future. > >> > >> 2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s. > >> 2011-03-25 19:40:12,217-0500 WARN HangChecker > >> Registered futures: > >> Rupture[] rups Closed, 1 elements, 0 listeners > >> Variation vars - Closed, no listeners > >> SgtDim sub - Open, 1 listeners > >> string site Closed, no listeners > >> Variation[] vars Closed, 72 elements, 0 listeners > >> ---- > >> > >> Waiting threads: > >> 0-13 > >> 0-13-0-7 > >> 0-13-0-8-1-1 > >> ---- > >> > >> > > > > > > > > > > > From aespinosa at cs.uchicago.edu Sun Mar 27 17:56:23 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Sun, 27 Mar 2011 17:56:23 -0500 Subject: [Swift-devel] hang checker fun In-Reply-To: <1301265343.32276.0.camel@blabla2.none> References: <1301258513.1584.1.camel@blabla2.none> <1301265343.32276.0.camel@blabla2.none> Message-ID: oops. trimmed the first part. Thanks type SgtDim; type Variation; type Seismogram; type PeakValue; type Station { string name; float lat; float lon; int erf; int variation_scenario; } type Sgt { SgtDim x; SgtDim y; } type Rupture { int source; int index; int size; } /* some constants used by the apps*/ global int num_time_steps = 3000; global string spectra_period1 = "all"; global float filter_highhz = 5.0; global float simulation_timeskip = 0.1; app (Sgt _ext) extract(Sgt _sgt, Station _stat, Variation _var) { extract @strcat("stat=", _stat.name) "extract_sgt=1" @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) @strcat("rupmodfile=", @filename(_var)) @strcat("sgt_xfile=", @filename(_sgt.x)) @strcat("sgt_yfile=", @filename(_sgt.y)) @strcat("extract_sgt_xfile=", @filename(_ext.x)) @strcat("extract_sgt_yfile=", @filename(_ext.y)); } app (Seismogram _seis, PeakValue _peak) seispeak(Sgt _sgt, Variation _var, Station _stat) { seispeak /* Args of seismogram synthesis */ @strcat("stat=", _stat.name) "extract_sgt=0" @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) @strcat("rupmodfile=", @filename(_var)) @strcat("sgt_xfile=", @filename(_sgt.x)) @strcat("sgt_yfile=", @filename(_sgt.y)) @strcat("seis_file=", @filename(_seis)) /* Args of peak ground acceleration */ "simulation_out_pointsX=2" "simulation_out_pointsY=1" "surfseis_rspectra_seismogram_units=cmpersec" "surfseis_rspectra_output_units=cmpersec2" "surfseis_rspectra_output_type=aa" "surfseis_rspectra_apply_byteswap=no" @strcat("simulation_out_timesamples=", num_time_steps) @strcat("simulation_out_timeskip=", simulation_timeskip) @strcat("surfseis_rspectra_period=", spectra_period1) @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) @strcat("in=", @filename(_seis)) @strcat("out=", @filename(_peak)); } app (Seismogram _seis, PeakValue _peak) seispeak_local(Sgt _sgt, Variation _var, Station _stat) { seispeak_local /* Args of seismogram synthesis */ @strcat("stat=", _stat.name) "extract_sgt=0" @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) @strcat("rupmodfile=", @filename(_var)) @strcat("sgt_xfile=", @filename(_sgt.x)) @strcat("sgt_yfile=", @filename(_sgt.y)) @strcat("seis_file=", @filename(_seis)) /* Args of peak ground acceleration */ "simulation_out_pointsX=2" "simulation_out_pointsY=1" "surfseis_rspectra_seismogram_units=cmpersec" "surfseis_rspectra_output_units=cmpersec2" "surfseis_rspectra_output_type=aa" "surfseis_rspectra_apply_byteswap=no" @strcat("simulation_out_timesamples=", num_time_steps) @strcat("simulation_out_timeskip=", simulation_timeskip) @strcat("surfseis_rspectra_period=", spectra_period1) @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) @strcat("in=", @filename(_seis)) @strcat("out=", @filename(_peak)); } app (Seismogram _seis[], PeakValue _peak[]) seispeak_agg(Sgt _sgt, Variation _var[], Station _stat, int n) { seispeak_agg /* System args */ _stat.name _stat.lon _stat.lat num_time_steps num_time_steps simulation_timeskip spectra_period1 filter_highhz @filename(_sgt.x) @filename(_sgt.y) n @filenames(_var) @filenames(_seis) @filenames(_peak); } // Auxillary functions for the mappers type StationFile; app (StationFile _stat) getsite_file(int _run_id) { getsite _run_id stdout=@filename(_stat); } (Station _stat) get_site(int _run_id) { StationFile file<"/var/tmp/site_tmp">; /*file = getsite_file(_run_id);*/ _stat = readData(file); } type RuptureFile; app (RuptureFile _rup) getrupture_file(int _run_id) { getrupture _run_id stdout=@filename(_rup); } (Rupture _rup[]) get_ruptures(int _run_id, Station _site) { /*RuptureFile file;*/ RuptureFile file<"LGU/rup_tmp">; /*file = getrupture_file(_run_id);*/ _rup = readData(file); } type VariationFile; app (VariationFile _var) getvariation_file(Station _site, Rupture _rup, string _loc) { variation_mapper "-e" _site.erf "-v" _site.variation_scenario "-l" _loc "-s" _rup.source "-r" _rup.index stdout=@_var; } (string _vars[]) get_variations(Station _site, Rupture _rup, string _loc){ string fname = @strcat(_rup.source, "_", _rup.index); VariationFile file; /*file = getvariation_file(_site, _rup, _loc);*/ _vars = readData(file); } type offset { int off; int size; } type offset_file; (offset _off[]) mkoffset(int _size, int _group_size) { offset_file file ; file = mkoffset_file(_size, _group_size); _off = readData(file); } app (offset_file _off) mkoffset_file(int _size, int _group_size) { mkoffset _size _group_size; } /* TODO: data management zip jobs */ /* Main program */ int run_id = 664; int agg_size = 80; int loc_size = 20; string datadir = "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; Station site = get_site(run_id); Sgt sgt_var ; Rupture rups[] = get_ruptures(run_id, site); foreach rup in rups { string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, "/", rup.index); Sgt sub ; string var_str[] = get_variations( site, rup, "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" ); Variation vars[] ; sub = extract(sgt_var, site, vars[rup.size-1]); string seis_str[]; string peak_str[]; foreach var,i in vars { seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, "_", rup.index, "_", i, ".grm"); peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, "_", rup.index, "_", i, ".bsa"); } Seismogram seis[] ; PeakValue peak[] ; if(rup.size <= loc_size) { /* * Not worth to transfer the data. Execute on TeraGrid instead. * Also execute on localhost. */ foreach var,i in vars { (seis[i], peak[i]) = seispeak_local(sub, var, site); } } else {if(rup.size <= agg_size) { /* Execute on a single resource */ (seis, peak) = seispeak_agg(sub, vars, site, rup.size); } else { /*offset offs[] = mkoffset(rup.size, agg_size);*/ /*for i in offs {*/ /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ /*off.size);*/ /*}*/ }} } 2011/3/27 Mihael Hategan : > I don't believe you. There is no SgtDim data type in that script. > > Mihael > > On Sun, 2011-03-27 at 16:55 -0500, Allan Espinosa wrote: >> Here it is. ?The get_app() calls are simple wrappers to readData() >> >> type offset { >> ? int off; >> ? int size; >> } >> type offset_file; >> (offset _off[]) mkoffset(int _size, int _group_size) { >> ? ?offset_file file ; >> ? ?file = mkoffset_file(_size, _group_size); >> ? ?_off = readData(file); >> } >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { >> ? mkoffset _size _group_size; >> } >> >> /* TODO: data management zip jobs */ >> >> /* Main program */ >> int run_id = 664; >> int agg_size = 80; >> int loc_size = 20; >> string datadir = >> ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; >> >> Station site = get_site(run_id); >> >> Sgt sgt_var > ? l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; >> Rupture rups[] = get_ruptures(run_id, site); >> >> foreach rup in rups { >> ? string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, >> "/", rup.index); >> ? Sgt sub > ? ? ? r=rup.index>; >> ? string var_str[] = get_variations( site, rup, >> ? ? ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" >> ); >> ? Variation vars[] ; >> >> ? sub = extract(sgt_var, ?site, vars[rup.size-1]); >> >> ? string seis_str[]; >> ? string peak_str[]; >> >> ? foreach var,i in vars { >> ? ? seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".grm"); >> ? ? peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".bsa"); >> ? } >> >> ? Seismogram seis[] ; >> ? PeakValue ?peak[] ; >> >> ? if(rup.size <= loc_size) { >> ? ? /* >> ? ? ?* Not worth to transfer the data. Execute on TeraGrid instead. >> ? ? ?* Also execute on localhost. >> ? ? ?*/ >> ? ? foreach var,i in vars { >> ? ? ? (seis[i], peak[i]) = seispeak_local(sub, var, site); >> ? ? } >> ? } else {if(rup.size <= agg_size) { >> ? ? /* Execute on a single resource */ >> ? ? (seis, peak) = seispeak_agg(sub, vars, site, rup.size); >> ? } else { >> ? ? /*offset offs[] = mkoffset(rup.size, agg_size);*/ >> ? ? /*for i in offs {*/ >> ? ? ? /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /*off.size);*/ >> ? ? /*}*/ >> ? }} >> } >> >> >> 2011/3/27 Mihael Hategan : >> > May I see the script? >> > >> > On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: >> >> this has been occurring for 70 times already. ?What i expect is for >> >> the app with SgtDim sub to run and close the future. >> >> >> >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker No events in 10s. >> >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker >> >> Registered futures: >> >> Rupture[] rups ?Closed, 1 elements, 0 listeners >> >> Variation vars - Closed, no listeners >> >> SgtDim sub - Open, 1 listeners >> >> string site ?Closed, no listeners >> >> Variation[] vars ?Closed, 72 elements, 0 listeners >> >> ---- >> >> >> >> Waiting threads: >> >> 0-13 >> >> 0-13-0-7 >> >> 0-13-0-8-1-1 >> >> ---- >> >> >> >> >> > >> > >> > >> > >> From hategan at mcs.anl.gov Sun Mar 27 18:26:41 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 16:26:41 -0700 Subject: [Swift-devel] hang checker fun In-Reply-To: References: <1301258513.1584.1.camel@blabla2.none> <1301265343.32276.0.camel@blabla2.none> Message-ID: <1301268401.32276.1.camel@blabla2.none> May I also see the log? It looks like there's something weird around line 186. Mihael On Sun, 2011-03-27 at 17:56 -0500, Allan Espinosa wrote: > oops. trimmed the first part. > > Thanks > > type SgtDim; > type Variation; > type Seismogram; > type PeakValue; > > type Station { > string name; > float lat; > float lon; > int erf; > int variation_scenario; > } > > type Sgt { > SgtDim x; > SgtDim y; > } > > type Rupture { > int source; > int index; > int size; > } > > /* some constants used by the apps*/ > global int num_time_steps = 3000; > global string spectra_period1 = "all"; > global float filter_highhz = 5.0; > global float simulation_timeskip = 0.1; > > app (Sgt _ext) extract(Sgt _sgt, Station _stat, Variation _var) { > extract @strcat("stat=", _stat.name) "extract_sgt=1" > @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > > @strcat("rupmodfile=", @filename(_var)) > @strcat("sgt_xfile=", @filename(_sgt.x)) > @strcat("sgt_yfile=", @filename(_sgt.y)) > @strcat("extract_sgt_xfile=", @filename(_ext.x)) > @strcat("extract_sgt_yfile=", @filename(_ext.y)); > } > > app (Seismogram _seis, PeakValue _peak) > seispeak(Sgt _sgt, Variation _var, Station _stat) { > seispeak > /* Args of seismogram synthesis */ > @strcat("stat=", _stat.name) "extract_sgt=0" > @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) > > @strcat("rupmodfile=", @filename(_var)) > @strcat("sgt_xfile=", @filename(_sgt.x)) > @strcat("sgt_yfile=", @filename(_sgt.y)) > @strcat("seis_file=", @filename(_seis)) > > /* Args of peak ground acceleration */ > "simulation_out_pointsX=2" "simulation_out_pointsY=1" > "surfseis_rspectra_seismogram_units=cmpersec" > "surfseis_rspectra_output_units=cmpersec2" > "surfseis_rspectra_output_type=aa" > "surfseis_rspectra_apply_byteswap=no" > > @strcat("simulation_out_timesamples=", num_time_steps) > @strcat("simulation_out_timeskip=", simulation_timeskip) > @strcat("surfseis_rspectra_period=", spectra_period1) > @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) > @strcat("in=", @filename(_seis)) > @strcat("out=", @filename(_peak)); > } > > app (Seismogram _seis, PeakValue _peak) > seispeak_local(Sgt _sgt, Variation _var, Station _stat) { > seispeak_local > /* Args of seismogram synthesis */ > @strcat("stat=", _stat.name) "extract_sgt=0" > @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) > > @strcat("rupmodfile=", @filename(_var)) > @strcat("sgt_xfile=", @filename(_sgt.x)) > @strcat("sgt_yfile=", @filename(_sgt.y)) > @strcat("seis_file=", @filename(_seis)) > > /* Args of peak ground acceleration */ > "simulation_out_pointsX=2" "simulation_out_pointsY=1" > "surfseis_rspectra_seismogram_units=cmpersec" > "surfseis_rspectra_output_units=cmpersec2" > "surfseis_rspectra_output_type=aa" > "surfseis_rspectra_apply_byteswap=no" > > @strcat("simulation_out_timesamples=", num_time_steps) > @strcat("simulation_out_timeskip=", simulation_timeskip) > @strcat("surfseis_rspectra_period=", spectra_period1) > @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) > @strcat("in=", @filename(_seis)) > @strcat("out=", @filename(_peak)); > } > > app (Seismogram _seis[], PeakValue _peak[]) > seispeak_agg(Sgt _sgt, Variation _var[], Station _stat, int n) { > seispeak_agg > /* System args */ > _stat.name _stat.lon _stat.lat num_time_steps > num_time_steps simulation_timeskip spectra_period1 filter_highhz > > @filename(_sgt.x) @filename(_sgt.y) > > n @filenames(_var) @filenames(_seis) @filenames(_peak); > } > > // Auxillary functions for the mappers > type StationFile; > app (StationFile _stat) getsite_file(int _run_id) { > getsite _run_id stdout=@filename(_stat); > } > (Station _stat) get_site(int _run_id) { > StationFile file<"/var/tmp/site_tmp">; > /*file = getsite_file(_run_id);*/ > _stat = readData(file); > } > > type RuptureFile; > app (RuptureFile _rup) getrupture_file(int _run_id) { > getrupture _run_id stdout=@filename(_rup); > } > (Rupture _rup[]) get_ruptures(int _run_id, Station _site) { > /*RuptureFile file "/rup_tmp")>;*/ > RuptureFile file<"LGU/rup_tmp">; > /*file = getrupture_file(_run_id);*/ > _rup = readData(file); > } > > type VariationFile; > app (VariationFile _var) getvariation_file(Station _site, Rupture _rup, > string _loc) { > variation_mapper "-e" _site.erf "-v" _site.variation_scenario > "-l" _loc "-s" _rup.source "-r" _rup.index stdout=@_var; > } > (string _vars[]) get_variations(Station _site, Rupture _rup, string _loc){ > string fname = @strcat(_rup.source, "_", _rup.index); > VariationFile file file=@strcat(_site.name, "/varlist/", _rup.source, "/", fname, ".txt")>; > /*file = getvariation_file(_site, _rup, _loc);*/ > _vars = readData(file); > } > > type offset { > int off; > int size; > } > type offset_file; > (offset _off[]) mkoffset(int _size, int _group_size) { > offset_file file ; > file = mkoffset_file(_size, _group_size); > _off = readData(file); > } > app (offset_file _off) mkoffset_file(int _size, int _group_size) { > mkoffset _size _group_size; > } > > /* TODO: data management zip jobs */ > > /* Main program */ > int run_id = 664; > int agg_size = 80; > int loc_size = 20; > string datadir = > "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; > > Station site = get_site(run_id); > > Sgt sgt_var l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; > Rupture rups[] = get_ruptures(run_id, site); > > foreach rup in rups { > string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, > "/", rup.index); > Sgt sub r=rup.index>; > string var_str[] = get_variations( site, rup, > "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" > ); > Variation vars[] ; > > sub = extract(sgt_var, site, vars[rup.size-1]); > > string seis_str[]; > string peak_str[]; > > foreach var,i in vars { > seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > "_", rup.index, "_", i, ".grm"); > peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > "_", rup.index, "_", i, ".bsa"); > } > > Seismogram seis[] ; > PeakValue peak[] ; > > if(rup.size <= loc_size) { > /* > * Not worth to transfer the data. Execute on TeraGrid instead. > * Also execute on localhost. > */ > foreach var,i in vars { > (seis[i], peak[i]) = seispeak_local(sub, var, site); > } > } else {if(rup.size <= agg_size) { > /* Execute on a single resource */ > (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > } else { > /*offset offs[] = mkoffset(rup.size, agg_size);*/ > /*for i in offs {*/ > /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ > /*off.size);*/ > /*}*/ > }} > } > > > 2011/3/27 Mihael Hategan : > > I don't believe you. There is no SgtDim data type in that script. > > > > Mihael > > > > On Sun, 2011-03-27 at 16:55 -0500, Allan Espinosa wrote: > >> Here it is. The get_app() calls are simple wrappers to readData() > >> > >> type offset { > >> int off; > >> int size; > >> } > >> type offset_file; > >> (offset _off[]) mkoffset(int _size, int _group_size) { > >> offset_file file ; > >> file = mkoffset_file(_size, _group_size); > >> _off = readData(file); > >> } > >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { > >> mkoffset _size _group_size; > >> } > >> > >> /* TODO: data management zip jobs */ > >> > >> /* Main program */ > >> int run_id = 664; > >> int agg_size = 80; > >> int loc_size = 20; > >> string datadir = > >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; > >> > >> Station site = get_site(run_id); > >> > >> Sgt sgt_var >> l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; > >> Rupture rups[] = get_ruptures(run_id, site); > >> > >> foreach rup in rups { > >> string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, > >> "/", rup.index); > >> Sgt sub >> r=rup.index>; > >> string var_str[] = get_variations( site, rup, > >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" > >> ); > >> Variation vars[] ; > >> > >> sub = extract(sgt_var, site, vars[rup.size-1]); > >> > >> string seis_str[]; > >> string peak_str[]; > >> > >> foreach var,i in vars { > >> seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > >> "_", rup.index, "_", i, ".grm"); > >> peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > >> "_", rup.index, "_", i, ".bsa"); > >> } > >> > >> Seismogram seis[] ; > >> PeakValue peak[] ; > >> > >> if(rup.size <= loc_size) { > >> /* > >> * Not worth to transfer the data. Execute on TeraGrid instead. > >> * Also execute on localhost. > >> */ > >> foreach var,i in vars { > >> (seis[i], peak[i]) = seispeak_local(sub, var, site); > >> } > >> } else {if(rup.size <= agg_size) { > >> /* Execute on a single resource */ > >> (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > >> } else { > >> /*offset offs[] = mkoffset(rup.size, agg_size);*/ > >> /*for i in offs {*/ > >> /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ > >> /*off.size);*/ > >> /*}*/ > >> }} > >> } > >> > >> > >> 2011/3/27 Mihael Hategan : > >> > May I see the script? > >> > > >> > On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: > >> >> this has been occurring for 70 times already. What i expect is for > >> >> the app with SgtDim sub to run and close the future. > >> >> > >> >> 2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s. > >> >> 2011-03-25 19:40:12,217-0500 WARN HangChecker > >> >> Registered futures: > >> >> Rupture[] rups Closed, 1 elements, 0 listeners > >> >> Variation vars - Closed, no listeners > >> >> SgtDim sub - Open, 1 listeners > >> >> string site Closed, no listeners > >> >> Variation[] vars Closed, 72 elements, 0 listeners > >> >> ---- > >> >> > >> >> Waiting threads: > >> >> 0-13 > >> >> 0-13-0-7 > >> >> 0-13-0-8-1-1 > >> >> ---- > >> >> > >> >> > >> > > >> > > >> > > >> > > >> From aespinosa at cs.uchicago.edu Sun Mar 27 18:29:30 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Sun, 27 Mar 2011 18:29:30 -0500 Subject: [Swift-devel] hang checker fun In-Reply-To: <1301268401.32276.1.camel@blabla2.none> References: <1301258513.1584.1.camel@blabla2.none> <1301265343.32276.0.camel@blabla2.none> <1301268401.32276.1.camel@blabla2.none> Message-ID: Here you go. (see attached) -Allan 2011/3/27 Mihael Hategan : > May I also see the log? It looks like there's something weird around > line 186. > > Mihael > > On Sun, 2011-03-27 at 17:56 -0500, Allan Espinosa wrote: >> oops. trimmed the first part. >> >> Thanks >> >> type SgtDim; >> type Variation; >> type Seismogram; >> type PeakValue; >> >> type Station { >> ? string name; >> ? float lat; >> ? float lon; >> ? int erf; >> ? int variation_scenario; >> } >> >> type Sgt { >> ? SgtDim x; >> ? SgtDim y; >> } >> >> type Rupture { >> ? int source; >> ? int index; >> ? int size; >> } >> >> /* some constants used by the apps*/ >> global int num_time_steps = 3000; >> global string spectra_period1 = "all"; >> global float filter_highhz = 5.0; >> global float simulation_timeskip = 0.1; >> >> app (Sgt _ext) extract(Sgt _sgt, Station _stat, Variation _var) { >> ? extract @strcat("stat=", _stat.name) "extract_sgt=1" >> ? ? ? @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) >> >> ? ? ? @strcat("rupmodfile=", @filename(_var)) >> ? ? ? @strcat("sgt_xfile=", @filename(_sgt.x)) >> ? ? ? @strcat("sgt_yfile=", @filename(_sgt.y)) >> ? ? ? @strcat("extract_sgt_xfile=", @filename(_ext.x)) >> ? ? ? @strcat("extract_sgt_yfile=", @filename(_ext.y)); >> } >> >> app (Seismogram _seis, PeakValue _peak) >> ? ? seispeak(Sgt _sgt, Variation _var, Station _stat) { >> ? seispeak >> ? ? ? /* Args of seismogram synthesis ? ? */ >> ? ? ? @strcat("stat=", _stat.name) "extract_sgt=0" >> ? ? ? @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) >> ? ? ? "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) >> >> ? ? ? @strcat("rupmodfile=", @filename(_var)) >> ? ? ? @strcat("sgt_xfile=", @filename(_sgt.x)) >> ? ? ? @strcat("sgt_yfile=", @filename(_sgt.y)) >> ? ? ? @strcat("seis_file=", @filename(_seis)) >> >> ? ? ? /* Args of peak ground acceleration */ >> ? ? ? "simulation_out_pointsX=2" "simulation_out_pointsY=1" >> ? ? ? "surfseis_rspectra_seismogram_units=cmpersec" >> ? ? ? "surfseis_rspectra_output_units=cmpersec2" >> ? ? ? "surfseis_rspectra_output_type=aa" >> ? ? ? "surfseis_rspectra_apply_byteswap=no" >> >> ? ? ? @strcat("simulation_out_timesamples=", num_time_steps) >> ? ? ? @strcat("simulation_out_timeskip=", simulation_timeskip) >> ? ? ? @strcat("surfseis_rspectra_period=", spectra_period1) >> ? ? ? @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) >> ? ? ? @strcat("in=", @filename(_seis)) >> ? ? ? @strcat("out=", @filename(_peak)); >> } >> >> app (Seismogram _seis, PeakValue _peak) >> ? ? seispeak_local(Sgt _sgt, Variation _var, Station _stat) { >> ? seispeak_local >> ? ? ? /* Args of seismogram synthesis ? ? */ >> ? ? ? @strcat("stat=", _stat.name) "extract_sgt=0" >> ? ? ? @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) >> ? ? ? "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) >> >> ? ? ? @strcat("rupmodfile=", @filename(_var)) >> ? ? ? @strcat("sgt_xfile=", @filename(_sgt.x)) >> ? ? ? @strcat("sgt_yfile=", @filename(_sgt.y)) >> ? ? ? @strcat("seis_file=", @filename(_seis)) >> >> ? ? ? /* Args of peak ground acceleration */ >> ? ? ? "simulation_out_pointsX=2" "simulation_out_pointsY=1" >> ? ? ? "surfseis_rspectra_seismogram_units=cmpersec" >> ? ? ? "surfseis_rspectra_output_units=cmpersec2" >> ? ? ? "surfseis_rspectra_output_type=aa" >> ? ? ? "surfseis_rspectra_apply_byteswap=no" >> >> ? ? ? @strcat("simulation_out_timesamples=", num_time_steps) >> ? ? ? @strcat("simulation_out_timeskip=", simulation_timeskip) >> ? ? ? @strcat("surfseis_rspectra_period=", spectra_period1) >> ? ? ? @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) >> ? ? ? @strcat("in=", @filename(_seis)) >> ? ? ? @strcat("out=", @filename(_peak)); >> } >> >> app (Seismogram _seis[], PeakValue _peak[]) >> ? ? seispeak_agg(Sgt _sgt, Variation _var[], Station _stat, int n) { >> ? seispeak_agg >> ? ? ? /* System args */ >> ? ? ? _stat.name _stat.lon _stat.lat num_time_steps >> ? ? ? num_time_steps simulation_timeskip spectra_period1 filter_highhz >> >> ? ? ? @filename(_sgt.x) @filename(_sgt.y) >> >> ? ? ? n @filenames(_var) @filenames(_seis) @filenames(_peak); >> } >> >> // Auxillary functions for the mappers >> type StationFile; >> app (StationFile _stat) getsite_file(int _run_id) { >> ? getsite _run_id stdout=@filename(_stat); >> } >> (Station _stat) get_site(int _run_id) { >> ? StationFile file<"/var/tmp/site_tmp">; >> ? /*file = getsite_file(_run_id);*/ >> ? _stat = readData(file); >> } >> >> type RuptureFile; >> app (RuptureFile _rup) getrupture_file(int _run_id) { >> ? getrupture _run_id stdout=@filename(_rup); >> } >> (Rupture _rup[]) get_ruptures(int _run_id, Station _site) { >> ? /*RuptureFile file> "/rup_tmp")>;*/ >> ? RuptureFile file<"LGU/rup_tmp">; >> ? /*file = getrupture_file(_run_id);*/ >> ? _rup = readData(file); >> } >> >> type VariationFile; >> app (VariationFile _var) getvariation_file(Station _site, Rupture _rup, >> ? ? string _loc) { >> ? variation_mapper "-e" _site.erf "-v" _site.variation_scenario >> ? ? ? "-l" _loc "-s" _rup.source "-r" _rup.index stdout=@_var; >> } >> (string _vars[]) get_variations(Station _site, Rupture _rup, string _loc){ >> ? string fname = @strcat(_rup.source, "_", _rup.index); >> ? VariationFile file> ? ? ? file=@strcat(_site.name, "/varlist/", _rup.source, "/", fname, ".txt")>; >> ? /*file = getvariation_file(_site, _rup, _loc);*/ >> ? _vars = readData(file); >> } >> >> type offset { >> ? int off; >> ? int size; >> } >> type offset_file; >> (offset _off[]) mkoffset(int _size, int _group_size) { >> ? ?offset_file file ; >> ? ?file = mkoffset_file(_size, _group_size); >> ? ?_off = readData(file); >> } >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { >> ? mkoffset _size _group_size; >> } >> >> /* TODO: data management zip jobs */ >> >> /* Main program */ >> int run_id = 664; >> int agg_size = 80; >> int loc_size = 20; >> string datadir = >> ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; >> >> Station site = get_site(run_id); >> >> Sgt sgt_var > ? l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; >> Rupture rups[] = get_ruptures(run_id, site); >> >> foreach rup in rups { >> ? string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, >> "/", rup.index); >> ? Sgt sub > ? ? ? r=rup.index>; >> ? string var_str[] = get_variations( site, rup, >> ? ? ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" >> ); >> ? Variation vars[] ; >> >> ? sub = extract(sgt_var, ?site, vars[rup.size-1]); >> >> ? string seis_str[]; >> ? string peak_str[]; >> >> ? foreach var,i in vars { >> ? ? seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".grm"); >> ? ? peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".bsa"); >> ? } >> >> ? Seismogram seis[] ; >> ? PeakValue ?peak[] ; >> >> ? if(rup.size <= loc_size) { >> ? ? /* >> ? ? ?* Not worth to transfer the data. Execute on TeraGrid instead. >> ? ? ?* Also execute on localhost. >> ? ? ?*/ >> ? ? foreach var,i in vars { >> ? ? ? (seis[i], peak[i]) = seispeak_local(sub, var, site); >> ? ? } >> ? } else {if(rup.size <= agg_size) { >> ? ? /* Execute on a single resource */ >> ? ? (seis, peak) = seispeak_agg(sub, vars, site, rup.size); >> ? } else { >> ? ? /*offset offs[] = mkoffset(rup.size, agg_size);*/ >> ? ? /*for i in offs {*/ >> ? ? ? /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /*off.size);*/ >> ? ? /*}*/ >> ? }} >> } >> >> >> 2011/3/27 Mihael Hategan : >> > I don't believe you. There is no SgtDim data type in that script. >> > >> > Mihael >> > >> > On Sun, 2011-03-27 at 16:55 -0500, Allan Espinosa wrote: >> >> Here it is. ?The get_app() calls are simple wrappers to readData() >> >> >> >> type offset { >> >> ? int off; >> >> ? int size; >> >> } >> >> type offset_file; >> >> (offset _off[]) mkoffset(int _size, int _group_size) { >> >> ? ?offset_file file ; >> >> ? ?file = mkoffset_file(_size, _group_size); >> >> ? ?_off = readData(file); >> >> } >> >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { >> >> ? mkoffset _size _group_size; >> >> } >> >> >> >> /* TODO: data management zip jobs */ >> >> >> >> /* Main program */ >> >> int run_id = 664; >> >> int agg_size = 80; >> >> int loc_size = 20; >> >> string datadir = >> >> ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; >> >> >> >> Station site = get_site(run_id); >> >> >> >> Sgt sgt_var > >> ? l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; >> >> Rupture rups[] = get_ruptures(run_id, site); >> >> >> >> foreach rup in rups { >> >> ? string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, >> >> "/", rup.index); >> >> ? Sgt sub > >> ? ? ? r=rup.index>; >> >> ? string var_str[] = get_variations( site, rup, >> >> ? ? ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" >> >> ); >> >> ? Variation vars[] ; >> >> >> >> ? sub = extract(sgt_var, ?site, vars[rup.size-1]); >> >> >> >> ? string seis_str[]; >> >> ? string peak_str[]; >> >> >> >> ? foreach var,i in vars { >> >> ? ? seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".grm"); >> >> ? ? peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".bsa"); >> >> ? } >> >> >> >> ? Seismogram seis[] ; >> >> ? PeakValue ?peak[] ; >> >> >> >> ? if(rup.size <= loc_size) { >> >> ? ? /* >> >> ? ? ?* Not worth to transfer the data. Execute on TeraGrid instead. >> >> ? ? ?* Also execute on localhost. >> >> ? ? ?*/ >> >> ? ? foreach var,i in vars { >> >> ? ? ? (seis[i], peak[i]) = seispeak_local(sub, var, site); >> >> ? ? } >> >> ? } else {if(rup.size <= agg_size) { >> >> ? ? /* Execute on a single resource */ >> >> ? ? (seis, peak) = seispeak_agg(sub, vars, site, rup.size); >> >> ? } else { >> >> ? ? /*offset offs[] = mkoffset(rup.size, agg_size);*/ >> >> ? ? /*for i in offs {*/ >> >> ? ? ? /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /*off.size);*/ >> >> ? ? /*}*/ >> >> ? }} >> >> } >> >> >> >> >> >> 2011/3/27 Mihael Hategan : >> >> > May I see the script? >> >> > >> >> > On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: >> >> >> this has been occurring for 70 times already. ?What i expect is for >> >> >> the app with SgtDim sub to run and close the future. >> >> >> >> >> >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker No events in 10s. >> >> >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker >> >> >> Registered futures: >> >> >> Rupture[] rups ?Closed, 1 elements, 0 listeners >> >> >> Variation vars - Closed, no listeners >> >> >> SgtDim sub - Open, 1 listeners >> >> >> string site ?Closed, no listeners >> >> >> Variation[] vars ?Closed, 72 elements, 0 listeners >> >> >> ---- >> >> >> >> >> >> Waiting threads: >> >> >> 0-13 >> >> >> 0-13-0-7 >> >> >> 0-13-0-8-1-1 >> >> >> ---- >> >> >> >> >> >> >> >> > >> >> > >> >> > >> >> > >> >> > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: postproc-20110325-1935-ibad1ek0.log.bz2 Type: application/x-bzip2 Size: 68741 bytes Desc: not available URL: From hategan at mcs.anl.gov Sun Mar 27 18:41:30 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 16:41:30 -0700 Subject: [Swift-devel] hang checker fun In-Reply-To: References: <1301258513.1584.1.camel@blabla2.none> <1301265343.32276.0.camel@blabla2.none> <1301268401.32276.1.camel@blabla2.none> Message-ID: <1301269290.32276.3.camel@blabla2.none> Ok. The fact that the hang checker kicks in doesn't mean that there is necessarily a hang. What I see from the log is that extract is trying to run and is probably just queued. I will try to change the hang checker to not kick in if there is at least one job running. Mihael On Sun, 2011-03-27 at 18:29 -0500, Allan Espinosa wrote: > Here you go. (see attached) > > -Allan > > 2011/3/27 Mihael Hategan : > > May I also see the log? It looks like there's something weird around > > line 186. > > > > Mihael > > > > On Sun, 2011-03-27 at 17:56 -0500, Allan Espinosa wrote: > >> oops. trimmed the first part. > >> > >> Thanks > >> > >> type SgtDim; > >> type Variation; > >> type Seismogram; > >> type PeakValue; > >> > >> type Station { > >> string name; > >> float lat; > >> float lon; > >> int erf; > >> int variation_scenario; > >> } > >> > >> type Sgt { > >> SgtDim x; > >> SgtDim y; > >> } > >> > >> type Rupture { > >> int source; > >> int index; > >> int size; > >> } > >> > >> /* some constants used by the apps*/ > >> global int num_time_steps = 3000; > >> global string spectra_period1 = "all"; > >> global float filter_highhz = 5.0; > >> global float simulation_timeskip = 0.1; > >> > >> app (Sgt _ext) extract(Sgt _sgt, Station _stat, Variation _var) { > >> extract @strcat("stat=", _stat.name) "extract_sgt=1" > >> @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > >> > >> @strcat("rupmodfile=", @filename(_var)) > >> @strcat("sgt_xfile=", @filename(_sgt.x)) > >> @strcat("sgt_yfile=", @filename(_sgt.y)) > >> @strcat("extract_sgt_xfile=", @filename(_ext.x)) > >> @strcat("extract_sgt_yfile=", @filename(_ext.y)); > >> } > >> > >> app (Seismogram _seis, PeakValue _peak) > >> seispeak(Sgt _sgt, Variation _var, Station _stat) { > >> seispeak > >> /* Args of seismogram synthesis */ > >> @strcat("stat=", _stat.name) "extract_sgt=0" > >> @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > >> "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) > >> > >> @strcat("rupmodfile=", @filename(_var)) > >> @strcat("sgt_xfile=", @filename(_sgt.x)) > >> @strcat("sgt_yfile=", @filename(_sgt.y)) > >> @strcat("seis_file=", @filename(_seis)) > >> > >> /* Args of peak ground acceleration */ > >> "simulation_out_pointsX=2" "simulation_out_pointsY=1" > >> "surfseis_rspectra_seismogram_units=cmpersec" > >> "surfseis_rspectra_output_units=cmpersec2" > >> "surfseis_rspectra_output_type=aa" > >> "surfseis_rspectra_apply_byteswap=no" > >> > >> @strcat("simulation_out_timesamples=", num_time_steps) > >> @strcat("simulation_out_timeskip=", simulation_timeskip) > >> @strcat("surfseis_rspectra_period=", spectra_period1) > >> @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) > >> @strcat("in=", @filename(_seis)) > >> @strcat("out=", @filename(_peak)); > >> } > >> > >> app (Seismogram _seis, PeakValue _peak) > >> seispeak_local(Sgt _sgt, Variation _var, Station _stat) { > >> seispeak_local > >> /* Args of seismogram synthesis */ > >> @strcat("stat=", _stat.name) "extract_sgt=0" > >> @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > >> "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) > >> > >> @strcat("rupmodfile=", @filename(_var)) > >> @strcat("sgt_xfile=", @filename(_sgt.x)) > >> @strcat("sgt_yfile=", @filename(_sgt.y)) > >> @strcat("seis_file=", @filename(_seis)) > >> > >> /* Args of peak ground acceleration */ > >> "simulation_out_pointsX=2" "simulation_out_pointsY=1" > >> "surfseis_rspectra_seismogram_units=cmpersec" > >> "surfseis_rspectra_output_units=cmpersec2" > >> "surfseis_rspectra_output_type=aa" > >> "surfseis_rspectra_apply_byteswap=no" > >> > >> @strcat("simulation_out_timesamples=", num_time_steps) > >> @strcat("simulation_out_timeskip=", simulation_timeskip) > >> @strcat("surfseis_rspectra_period=", spectra_period1) > >> @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) > >> @strcat("in=", @filename(_seis)) > >> @strcat("out=", @filename(_peak)); > >> } > >> > >> app (Seismogram _seis[], PeakValue _peak[]) > >> seispeak_agg(Sgt _sgt, Variation _var[], Station _stat, int n) { > >> seispeak_agg > >> /* System args */ > >> _stat.name _stat.lon _stat.lat num_time_steps > >> num_time_steps simulation_timeskip spectra_period1 filter_highhz > >> > >> @filename(_sgt.x) @filename(_sgt.y) > >> > >> n @filenames(_var) @filenames(_seis) @filenames(_peak); > >> } > >> > >> // Auxillary functions for the mappers > >> type StationFile; > >> app (StationFile _stat) getsite_file(int _run_id) { > >> getsite _run_id stdout=@filename(_stat); > >> } > >> (Station _stat) get_site(int _run_id) { > >> StationFile file<"/var/tmp/site_tmp">; > >> /*file = getsite_file(_run_id);*/ > >> _stat = readData(file); > >> } > >> > >> type RuptureFile; > >> app (RuptureFile _rup) getrupture_file(int _run_id) { > >> getrupture _run_id stdout=@filename(_rup); > >> } > >> (Rupture _rup[]) get_ruptures(int _run_id, Station _site) { > >> /*RuptureFile file >> "/rup_tmp")>;*/ > >> RuptureFile file<"LGU/rup_tmp">; > >> /*file = getrupture_file(_run_id);*/ > >> _rup = readData(file); > >> } > >> > >> type VariationFile; > >> app (VariationFile _var) getvariation_file(Station _site, Rupture _rup, > >> string _loc) { > >> variation_mapper "-e" _site.erf "-v" _site.variation_scenario > >> "-l" _loc "-s" _rup.source "-r" _rup.index stdout=@_var; > >> } > >> (string _vars[]) get_variations(Station _site, Rupture _rup, string _loc){ > >> string fname = @strcat(_rup.source, "_", _rup.index); > >> VariationFile file >> file=@strcat(_site.name, "/varlist/", _rup.source, "/", fname, ".txt")>; > >> /*file = getvariation_file(_site, _rup, _loc);*/ > >> _vars = readData(file); > >> } > >> > >> type offset { > >> int off; > >> int size; > >> } > >> type offset_file; > >> (offset _off[]) mkoffset(int _size, int _group_size) { > >> offset_file file ; > >> file = mkoffset_file(_size, _group_size); > >> _off = readData(file); > >> } > >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { > >> mkoffset _size _group_size; > >> } > >> > >> /* TODO: data management zip jobs */ > >> > >> /* Main program */ > >> int run_id = 664; > >> int agg_size = 80; > >> int loc_size = 20; > >> string datadir = > >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; > >> > >> Station site = get_site(run_id); > >> > >> Sgt sgt_var >> l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; > >> Rupture rups[] = get_ruptures(run_id, site); > >> > >> foreach rup in rups { > >> string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, > >> "/", rup.index); > >> Sgt sub >> r=rup.index>; > >> string var_str[] = get_variations( site, rup, > >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" > >> ); > >> Variation vars[] ; > >> > >> sub = extract(sgt_var, site, vars[rup.size-1]); > >> > >> string seis_str[]; > >> string peak_str[]; > >> > >> foreach var,i in vars { > >> seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > >> "_", rup.index, "_", i, ".grm"); > >> peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > >> "_", rup.index, "_", i, ".bsa"); > >> } > >> > >> Seismogram seis[] ; > >> PeakValue peak[] ; > >> > >> if(rup.size <= loc_size) { > >> /* > >> * Not worth to transfer the data. Execute on TeraGrid instead. > >> * Also execute on localhost. > >> */ > >> foreach var,i in vars { > >> (seis[i], peak[i]) = seispeak_local(sub, var, site); > >> } > >> } else {if(rup.size <= agg_size) { > >> /* Execute on a single resource */ > >> (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > >> } else { > >> /*offset offs[] = mkoffset(rup.size, agg_size);*/ > >> /*for i in offs {*/ > >> /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ > >> /*off.size);*/ > >> /*}*/ > >> }} > >> } > >> > >> > >> 2011/3/27 Mihael Hategan : > >> > I don't believe you. There is no SgtDim data type in that script. > >> > > >> > Mihael > >> > > >> > On Sun, 2011-03-27 at 16:55 -0500, Allan Espinosa wrote: > >> >> Here it is. The get_app() calls are simple wrappers to readData() > >> >> > >> >> type offset { > >> >> int off; > >> >> int size; > >> >> } > >> >> type offset_file; > >> >> (offset _off[]) mkoffset(int _size, int _group_size) { > >> >> offset_file file ; > >> >> file = mkoffset_file(_size, _group_size); > >> >> _off = readData(file); > >> >> } > >> >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { > >> >> mkoffset _size _group_size; > >> >> } > >> >> > >> >> /* TODO: data management zip jobs */ > >> >> > >> >> /* Main program */ > >> >> int run_id = 664; > >> >> int agg_size = 80; > >> >> int loc_size = 20; > >> >> string datadir = > >> >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; > >> >> > >> >> Station site = get_site(run_id); > >> >> > >> >> Sgt sgt_var >> >> l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; > >> >> Rupture rups[] = get_ruptures(run_id, site); > >> >> > >> >> foreach rup in rups { > >> >> string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, > >> >> "/", rup.index); > >> >> Sgt sub >> >> r=rup.index>; > >> >> string var_str[] = get_variations( site, rup, > >> >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" > >> >> ); > >> >> Variation vars[] ; > >> >> > >> >> sub = extract(sgt_var, site, vars[rup.size-1]); > >> >> > >> >> string seis_str[]; > >> >> string peak_str[]; > >> >> > >> >> foreach var,i in vars { > >> >> seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > >> >> "_", rup.index, "_", i, ".grm"); > >> >> peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > >> >> "_", rup.index, "_", i, ".bsa"); > >> >> } > >> >> > >> >> Seismogram seis[] ; > >> >> PeakValue peak[] ; > >> >> > >> >> if(rup.size <= loc_size) { > >> >> /* > >> >> * Not worth to transfer the data. Execute on TeraGrid instead. > >> >> * Also execute on localhost. > >> >> */ > >> >> foreach var,i in vars { > >> >> (seis[i], peak[i]) = seispeak_local(sub, var, site); > >> >> } > >> >> } else {if(rup.size <= agg_size) { > >> >> /* Execute on a single resource */ > >> >> (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > >> >> } else { > >> >> /*offset offs[] = mkoffset(rup.size, agg_size);*/ > >> >> /*for i in offs {*/ > >> >> /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ > >> >> /*off.size);*/ > >> >> /*}*/ > >> >> }} > >> >> } > >> >> > >> >> > >> >> 2011/3/27 Mihael Hategan : > >> >> > May I see the script? > >> >> > > >> >> > On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: > >> >> >> this has been occurring for 70 times already. What i expect is for > >> >> >> the app with SgtDim sub to run and close the future. > >> >> >> > >> >> >> 2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s. > >> >> >> 2011-03-25 19:40:12,217-0500 WARN HangChecker > >> >> >> Registered futures: > >> >> >> Rupture[] rups Closed, 1 elements, 0 listeners > >> >> >> Variation vars - Closed, no listeners > >> >> >> SgtDim sub - Open, 1 listeners > >> >> >> string site Closed, no listeners > >> >> >> Variation[] vars Closed, 72 elements, 0 listeners > >> >> >> ---- > >> >> >> > >> >> >> Waiting threads: > >> >> >> 0-13 > >> >> >> 0-13-0-7 > >> >> >> 0-13-0-8-1-1 > >> >> >> ---- > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > > > > > > > From hategan at mcs.anl.gov Sun Mar 27 18:54:12 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 16:54:12 -0700 Subject: [Swift-devel] hang checker fun In-Reply-To: <1301269290.32276.3.camel@blabla2.none> References: <1301258513.1584.1.camel@blabla2.none> <1301265343.32276.0.camel@blabla2.none> <1301268401.32276.1.camel@blabla2.none> <1301269290.32276.3.camel@blabla2.none> Message-ID: <1301270052.32276.4.camel@blabla2.none> Maybe. Maybe not. It may be that the job itself doesn't get queued. On Sun, 2011-03-27 at 16:41 -0700, Mihael Hategan wrote: > Ok. The fact that the hang checker kicks in doesn't mean that there is > necessarily a hang. What I see from the log is that extract is trying to > run and is probably just queued. > > I will try to change the hang checker to not kick in if there is at > least one job running. > > Mihael > > On Sun, 2011-03-27 at 18:29 -0500, Allan Espinosa wrote: > > Here you go. (see attached) > > > > -Allan > > > > 2011/3/27 Mihael Hategan : > > > May I also see the log? It looks like there's something weird around > > > line 186. > > > > > > Mihael > > > > > > On Sun, 2011-03-27 at 17:56 -0500, Allan Espinosa wrote: > > >> oops. trimmed the first part. > > >> > > >> Thanks > > >> > > >> type SgtDim; > > >> type Variation; > > >> type Seismogram; > > >> type PeakValue; > > >> > > >> type Station { > > >> string name; > > >> float lat; > > >> float lon; > > >> int erf; > > >> int variation_scenario; > > >> } > > >> > > >> type Sgt { > > >> SgtDim x; > > >> SgtDim y; > > >> } > > >> > > >> type Rupture { > > >> int source; > > >> int index; > > >> int size; > > >> } > > >> > > >> /* some constants used by the apps*/ > > >> global int num_time_steps = 3000; > > >> global string spectra_period1 = "all"; > > >> global float filter_highhz = 5.0; > > >> global float simulation_timeskip = 0.1; > > >> > > >> app (Sgt _ext) extract(Sgt _sgt, Station _stat, Variation _var) { > > >> extract @strcat("stat=", _stat.name) "extract_sgt=1" > > >> @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > > >> > > >> @strcat("rupmodfile=", @filename(_var)) > > >> @strcat("sgt_xfile=", @filename(_sgt.x)) > > >> @strcat("sgt_yfile=", @filename(_sgt.y)) > > >> @strcat("extract_sgt_xfile=", @filename(_ext.x)) > > >> @strcat("extract_sgt_yfile=", @filename(_ext.y)); > > >> } > > >> > > >> app (Seismogram _seis, PeakValue _peak) > > >> seispeak(Sgt _sgt, Variation _var, Station _stat) { > > >> seispeak > > >> /* Args of seismogram synthesis */ > > >> @strcat("stat=", _stat.name) "extract_sgt=0" > > >> @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > > >> "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) > > >> > > >> @strcat("rupmodfile=", @filename(_var)) > > >> @strcat("sgt_xfile=", @filename(_sgt.x)) > > >> @strcat("sgt_yfile=", @filename(_sgt.y)) > > >> @strcat("seis_file=", @filename(_seis)) > > >> > > >> /* Args of peak ground acceleration */ > > >> "simulation_out_pointsX=2" "simulation_out_pointsY=1" > > >> "surfseis_rspectra_seismogram_units=cmpersec" > > >> "surfseis_rspectra_output_units=cmpersec2" > > >> "surfseis_rspectra_output_type=aa" > > >> "surfseis_rspectra_apply_byteswap=no" > > >> > > >> @strcat("simulation_out_timesamples=", num_time_steps) > > >> @strcat("simulation_out_timeskip=", simulation_timeskip) > > >> @strcat("surfseis_rspectra_period=", spectra_period1) > > >> @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) > > >> @strcat("in=", @filename(_seis)) > > >> @strcat("out=", @filename(_peak)); > > >> } > > >> > > >> app (Seismogram _seis, PeakValue _peak) > > >> seispeak_local(Sgt _sgt, Variation _var, Station _stat) { > > >> seispeak_local > > >> /* Args of seismogram synthesis */ > > >> @strcat("stat=", _stat.name) "extract_sgt=0" > > >> @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > > >> "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) > > >> > > >> @strcat("rupmodfile=", @filename(_var)) > > >> @strcat("sgt_xfile=", @filename(_sgt.x)) > > >> @strcat("sgt_yfile=", @filename(_sgt.y)) > > >> @strcat("seis_file=", @filename(_seis)) > > >> > > >> /* Args of peak ground acceleration */ > > >> "simulation_out_pointsX=2" "simulation_out_pointsY=1" > > >> "surfseis_rspectra_seismogram_units=cmpersec" > > >> "surfseis_rspectra_output_units=cmpersec2" > > >> "surfseis_rspectra_output_type=aa" > > >> "surfseis_rspectra_apply_byteswap=no" > > >> > > >> @strcat("simulation_out_timesamples=", num_time_steps) > > >> @strcat("simulation_out_timeskip=", simulation_timeskip) > > >> @strcat("surfseis_rspectra_period=", spectra_period1) > > >> @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) > > >> @strcat("in=", @filename(_seis)) > > >> @strcat("out=", @filename(_peak)); > > >> } > > >> > > >> app (Seismogram _seis[], PeakValue _peak[]) > > >> seispeak_agg(Sgt _sgt, Variation _var[], Station _stat, int n) { > > >> seispeak_agg > > >> /* System args */ > > >> _stat.name _stat.lon _stat.lat num_time_steps > > >> num_time_steps simulation_timeskip spectra_period1 filter_highhz > > >> > > >> @filename(_sgt.x) @filename(_sgt.y) > > >> > > >> n @filenames(_var) @filenames(_seis) @filenames(_peak); > > >> } > > >> > > >> // Auxillary functions for the mappers > > >> type StationFile; > > >> app (StationFile _stat) getsite_file(int _run_id) { > > >> getsite _run_id stdout=@filename(_stat); > > >> } > > >> (Station _stat) get_site(int _run_id) { > > >> StationFile file<"/var/tmp/site_tmp">; > > >> /*file = getsite_file(_run_id);*/ > > >> _stat = readData(file); > > >> } > > >> > > >> type RuptureFile; > > >> app (RuptureFile _rup) getrupture_file(int _run_id) { > > >> getrupture _run_id stdout=@filename(_rup); > > >> } > > >> (Rupture _rup[]) get_ruptures(int _run_id, Station _site) { > > >> /*RuptureFile file > >> "/rup_tmp")>;*/ > > >> RuptureFile file<"LGU/rup_tmp">; > > >> /*file = getrupture_file(_run_id);*/ > > >> _rup = readData(file); > > >> } > > >> > > >> type VariationFile; > > >> app (VariationFile _var) getvariation_file(Station _site, Rupture _rup, > > >> string _loc) { > > >> variation_mapper "-e" _site.erf "-v" _site.variation_scenario > > >> "-l" _loc "-s" _rup.source "-r" _rup.index stdout=@_var; > > >> } > > >> (string _vars[]) get_variations(Station _site, Rupture _rup, string _loc){ > > >> string fname = @strcat(_rup.source, "_", _rup.index); > > >> VariationFile file > >> file=@strcat(_site.name, "/varlist/", _rup.source, "/", fname, ".txt")>; > > >> /*file = getvariation_file(_site, _rup, _loc);*/ > > >> _vars = readData(file); > > >> } > > >> > > >> type offset { > > >> int off; > > >> int size; > > >> } > > >> type offset_file; > > >> (offset _off[]) mkoffset(int _size, int _group_size) { > > >> offset_file file ; > > >> file = mkoffset_file(_size, _group_size); > > >> _off = readData(file); > > >> } > > >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { > > >> mkoffset _size _group_size; > > >> } > > >> > > >> /* TODO: data management zip jobs */ > > >> > > >> /* Main program */ > > >> int run_id = 664; > > >> int agg_size = 80; > > >> int loc_size = 20; > > >> string datadir = > > >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; > > >> > > >> Station site = get_site(run_id); > > >> > > >> Sgt sgt_var > >> l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; > > >> Rupture rups[] = get_ruptures(run_id, site); > > >> > > >> foreach rup in rups { > > >> string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, > > >> "/", rup.index); > > >> Sgt sub > >> r=rup.index>; > > >> string var_str[] = get_variations( site, rup, > > >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" > > >> ); > > >> Variation vars[] ; > > >> > > >> sub = extract(sgt_var, site, vars[rup.size-1]); > > >> > > >> string seis_str[]; > > >> string peak_str[]; > > >> > > >> foreach var,i in vars { > > >> seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > > >> "_", rup.index, "_", i, ".grm"); > > >> peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > > >> "_", rup.index, "_", i, ".bsa"); > > >> } > > >> > > >> Seismogram seis[] ; > > >> PeakValue peak[] ; > > >> > > >> if(rup.size <= loc_size) { > > >> /* > > >> * Not worth to transfer the data. Execute on TeraGrid instead. > > >> * Also execute on localhost. > > >> */ > > >> foreach var,i in vars { > > >> (seis[i], peak[i]) = seispeak_local(sub, var, site); > > >> } > > >> } else {if(rup.size <= agg_size) { > > >> /* Execute on a single resource */ > > >> (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > > >> } else { > > >> /*offset offs[] = mkoffset(rup.size, agg_size);*/ > > >> /*for i in offs {*/ > > >> /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ > > >> /*off.size);*/ > > >> /*}*/ > > >> }} > > >> } > > >> > > >> > > >> 2011/3/27 Mihael Hategan : > > >> > I don't believe you. There is no SgtDim data type in that script. > > >> > > > >> > Mihael > > >> > > > >> > On Sun, 2011-03-27 at 16:55 -0500, Allan Espinosa wrote: > > >> >> Here it is. The get_app() calls are simple wrappers to readData() > > >> >> > > >> >> type offset { > > >> >> int off; > > >> >> int size; > > >> >> } > > >> >> type offset_file; > > >> >> (offset _off[]) mkoffset(int _size, int _group_size) { > > >> >> offset_file file ; > > >> >> file = mkoffset_file(_size, _group_size); > > >> >> _off = readData(file); > > >> >> } > > >> >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { > > >> >> mkoffset _size _group_size; > > >> >> } > > >> >> > > >> >> /* TODO: data management zip jobs */ > > >> >> > > >> >> /* Main program */ > > >> >> int run_id = 664; > > >> >> int agg_size = 80; > > >> >> int loc_size = 20; > > >> >> string datadir = > > >> >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; > > >> >> > > >> >> Station site = get_site(run_id); > > >> >> > > >> >> Sgt sgt_var > >> >> l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; > > >> >> Rupture rups[] = get_ruptures(run_id, site); > > >> >> > > >> >> foreach rup in rups { > > >> >> string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, > > >> >> "/", rup.index); > > >> >> Sgt sub > >> >> r=rup.index>; > > >> >> string var_str[] = get_variations( site, rup, > > >> >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" > > >> >> ); > > >> >> Variation vars[] ; > > >> >> > > >> >> sub = extract(sgt_var, site, vars[rup.size-1]); > > >> >> > > >> >> string seis_str[]; > > >> >> string peak_str[]; > > >> >> > > >> >> foreach var,i in vars { > > >> >> seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > > >> >> "_", rup.index, "_", i, ".grm"); > > >> >> peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > > >> >> "_", rup.index, "_", i, ".bsa"); > > >> >> } > > >> >> > > >> >> Seismogram seis[] ; > > >> >> PeakValue peak[] ; > > >> >> > > >> >> if(rup.size <= loc_size) { > > >> >> /* > > >> >> * Not worth to transfer the data. Execute on TeraGrid instead. > > >> >> * Also execute on localhost. > > >> >> */ > > >> >> foreach var,i in vars { > > >> >> (seis[i], peak[i]) = seispeak_local(sub, var, site); > > >> >> } > > >> >> } else {if(rup.size <= agg_size) { > > >> >> /* Execute on a single resource */ > > >> >> (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > > >> >> } else { > > >> >> /*offset offs[] = mkoffset(rup.size, agg_size);*/ > > >> >> /*for i in offs {*/ > > >> >> /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ > > >> >> /*off.size);*/ > > >> >> /*}*/ > > >> >> }} > > >> >> } > > >> >> > > >> >> > > >> >> 2011/3/27 Mihael Hategan : > > >> >> > May I see the script? > > >> >> > > > >> >> > On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: > > >> >> >> this has been occurring for 70 times already. What i expect is for > > >> >> >> the app with SgtDim sub to run and close the future. > > >> >> >> > > >> >> >> 2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s. > > >> >> >> 2011-03-25 19:40:12,217-0500 WARN HangChecker > > >> >> >> Registered futures: > > >> >> >> Rupture[] rups Closed, 1 elements, 0 listeners > > >> >> >> Variation vars - Closed, no listeners > > >> >> >> SgtDim sub - Open, 1 listeners > > >> >> >> string site Closed, no listeners > > >> >> >> Variation[] vars Closed, 72 elements, 0 listeners > > >> >> >> ---- > > >> >> >> > > >> >> >> Waiting threads: > > >> >> >> 0-13 > > >> >> >> 0-13-0-7 > > >> >> >> 0-13-0-8-1-1 > > >> >> >> ---- > > >> >> >> > > >> >> >> > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun Mar 27 19:11:35 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 17:11:35 -0700 Subject: [Swift-devel] hang checker fun In-Reply-To: <1301270052.32276.4.camel@blabla2.none> References: <1301258513.1584.1.camel@blabla2.none> <1301265343.32276.0.camel@blabla2.none> <1301268401.32276.1.camel@blabla2.none> <1301269290.32276.3.camel@blabla2.none> <1301270052.32276.4.camel@blabla2.none> Message-ID: <1301271095.32276.11.camel@blabla2.none> That would be the maybe not. The hang checker already checks if jobs are running. It seems that the submission fails in execute2 for unknown reasons and is retried 3 times. Then it hangs quietly instead of logging "END_FAILURE". Which is very odd. Can you point me to your exact installation of swift and configuration files? Mihael On Sun, 2011-03-27 at 16:54 -0700, Mihael Hategan wrote: > Maybe. Maybe not. It may be that the job itself doesn't get queued. > > On Sun, 2011-03-27 at 16:41 -0700, Mihael Hategan wrote: > > Ok. The fact that the hang checker kicks in doesn't mean that there is > > necessarily a hang. What I see from the log is that extract is trying to > > run and is probably just queued. > > > > I will try to change the hang checker to not kick in if there is at > > least one job running. > > > > Mihael > > > > On Sun, 2011-03-27 at 18:29 -0500, Allan Espinosa wrote: > > > Here you go. (see attached) > > > > > > -Allan > > > > > > 2011/3/27 Mihael Hategan : > > > > May I also see the log? It looks like there's something weird around > > > > line 186. > > > > > > > > Mihael > > > > > > > > On Sun, 2011-03-27 at 17:56 -0500, Allan Espinosa wrote: > > > >> oops. trimmed the first part. > > > >> > > > >> Thanks > > > >> > > > >> type SgtDim; > > > >> type Variation; > > > >> type Seismogram; > > > >> type PeakValue; > > > >> > > > >> type Station { > > > >> string name; > > > >> float lat; > > > >> float lon; > > > >> int erf; > > > >> int variation_scenario; > > > >> } > > > >> > > > >> type Sgt { > > > >> SgtDim x; > > > >> SgtDim y; > > > >> } > > > >> > > > >> type Rupture { > > > >> int source; > > > >> int index; > > > >> int size; > > > >> } > > > >> > > > >> /* some constants used by the apps*/ > > > >> global int num_time_steps = 3000; > > > >> global string spectra_period1 = "all"; > > > >> global float filter_highhz = 5.0; > > > >> global float simulation_timeskip = 0.1; > > > >> > > > >> app (Sgt _ext) extract(Sgt _sgt, Station _stat, Variation _var) { > > > >> extract @strcat("stat=", _stat.name) "extract_sgt=1" > > > >> @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > > > >> > > > >> @strcat("rupmodfile=", @filename(_var)) > > > >> @strcat("sgt_xfile=", @filename(_sgt.x)) > > > >> @strcat("sgt_yfile=", @filename(_sgt.y)) > > > >> @strcat("extract_sgt_xfile=", @filename(_ext.x)) > > > >> @strcat("extract_sgt_yfile=", @filename(_ext.y)); > > > >> } > > > >> > > > >> app (Seismogram _seis, PeakValue _peak) > > > >> seispeak(Sgt _sgt, Variation _var, Station _stat) { > > > >> seispeak > > > >> /* Args of seismogram synthesis */ > > > >> @strcat("stat=", _stat.name) "extract_sgt=0" > > > >> @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > > > >> "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) > > > >> > > > >> @strcat("rupmodfile=", @filename(_var)) > > > >> @strcat("sgt_xfile=", @filename(_sgt.x)) > > > >> @strcat("sgt_yfile=", @filename(_sgt.y)) > > > >> @strcat("seis_file=", @filename(_seis)) > > > >> > > > >> /* Args of peak ground acceleration */ > > > >> "simulation_out_pointsX=2" "simulation_out_pointsY=1" > > > >> "surfseis_rspectra_seismogram_units=cmpersec" > > > >> "surfseis_rspectra_output_units=cmpersec2" > > > >> "surfseis_rspectra_output_type=aa" > > > >> "surfseis_rspectra_apply_byteswap=no" > > > >> > > > >> @strcat("simulation_out_timesamples=", num_time_steps) > > > >> @strcat("simulation_out_timeskip=", simulation_timeskip) > > > >> @strcat("surfseis_rspectra_period=", spectra_period1) > > > >> @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) > > > >> @strcat("in=", @filename(_seis)) > > > >> @strcat("out=", @filename(_peak)); > > > >> } > > > >> > > > >> app (Seismogram _seis, PeakValue _peak) > > > >> seispeak_local(Sgt _sgt, Variation _var, Station _stat) { > > > >> seispeak_local > > > >> /* Args of seismogram synthesis */ > > > >> @strcat("stat=", _stat.name) "extract_sgt=0" > > > >> @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > > > >> "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) > > > >> > > > >> @strcat("rupmodfile=", @filename(_var)) > > > >> @strcat("sgt_xfile=", @filename(_sgt.x)) > > > >> @strcat("sgt_yfile=", @filename(_sgt.y)) > > > >> @strcat("seis_file=", @filename(_seis)) > > > >> > > > >> /* Args of peak ground acceleration */ > > > >> "simulation_out_pointsX=2" "simulation_out_pointsY=1" > > > >> "surfseis_rspectra_seismogram_units=cmpersec" > > > >> "surfseis_rspectra_output_units=cmpersec2" > > > >> "surfseis_rspectra_output_type=aa" > > > >> "surfseis_rspectra_apply_byteswap=no" > > > >> > > > >> @strcat("simulation_out_timesamples=", num_time_steps) > > > >> @strcat("simulation_out_timeskip=", simulation_timeskip) > > > >> @strcat("surfseis_rspectra_period=", spectra_period1) > > > >> @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) > > > >> @strcat("in=", @filename(_seis)) > > > >> @strcat("out=", @filename(_peak)); > > > >> } > > > >> > > > >> app (Seismogram _seis[], PeakValue _peak[]) > > > >> seispeak_agg(Sgt _sgt, Variation _var[], Station _stat, int n) { > > > >> seispeak_agg > > > >> /* System args */ > > > >> _stat.name _stat.lon _stat.lat num_time_steps > > > >> num_time_steps simulation_timeskip spectra_period1 filter_highhz > > > >> > > > >> @filename(_sgt.x) @filename(_sgt.y) > > > >> > > > >> n @filenames(_var) @filenames(_seis) @filenames(_peak); > > > >> } > > > >> > > > >> // Auxillary functions for the mappers > > > >> type StationFile; > > > >> app (StationFile _stat) getsite_file(int _run_id) { > > > >> getsite _run_id stdout=@filename(_stat); > > > >> } > > > >> (Station _stat) get_site(int _run_id) { > > > >> StationFile file<"/var/tmp/site_tmp">; > > > >> /*file = getsite_file(_run_id);*/ > > > >> _stat = readData(file); > > > >> } > > > >> > > > >> type RuptureFile; > > > >> app (RuptureFile _rup) getrupture_file(int _run_id) { > > > >> getrupture _run_id stdout=@filename(_rup); > > > >> } > > > >> (Rupture _rup[]) get_ruptures(int _run_id, Station _site) { > > > >> /*RuptureFile file > > >> "/rup_tmp")>;*/ > > > >> RuptureFile file<"LGU/rup_tmp">; > > > >> /*file = getrupture_file(_run_id);*/ > > > >> _rup = readData(file); > > > >> } > > > >> > > > >> type VariationFile; > > > >> app (VariationFile _var) getvariation_file(Station _site, Rupture _rup, > > > >> string _loc) { > > > >> variation_mapper "-e" _site.erf "-v" _site.variation_scenario > > > >> "-l" _loc "-s" _rup.source "-r" _rup.index stdout=@_var; > > > >> } > > > >> (string _vars[]) get_variations(Station _site, Rupture _rup, string _loc){ > > > >> string fname = @strcat(_rup.source, "_", _rup.index); > > > >> VariationFile file > > >> file=@strcat(_site.name, "/varlist/", _rup.source, "/", fname, ".txt")>; > > > >> /*file = getvariation_file(_site, _rup, _loc);*/ > > > >> _vars = readData(file); > > > >> } > > > >> > > > >> type offset { > > > >> int off; > > > >> int size; > > > >> } > > > >> type offset_file; > > > >> (offset _off[]) mkoffset(int _size, int _group_size) { > > > >> offset_file file ; > > > >> file = mkoffset_file(_size, _group_size); > > > >> _off = readData(file); > > > >> } > > > >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { > > > >> mkoffset _size _group_size; > > > >> } > > > >> > > > >> /* TODO: data management zip jobs */ > > > >> > > > >> /* Main program */ > > > >> int run_id = 664; > > > >> int agg_size = 80; > > > >> int loc_size = 20; > > > >> string datadir = > > > >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; > > > >> > > > >> Station site = get_site(run_id); > > > >> > > > >> Sgt sgt_var > > >> l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; > > > >> Rupture rups[] = get_ruptures(run_id, site); > > > >> > > > >> foreach rup in rups { > > > >> string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, > > > >> "/", rup.index); > > > >> Sgt sub > > >> r=rup.index>; > > > >> string var_str[] = get_variations( site, rup, > > > >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" > > > >> ); > > > >> Variation vars[] ; > > > >> > > > >> sub = extract(sgt_var, site, vars[rup.size-1]); > > > >> > > > >> string seis_str[]; > > > >> string peak_str[]; > > > >> > > > >> foreach var,i in vars { > > > >> seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > > > >> "_", rup.index, "_", i, ".grm"); > > > >> peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > > > >> "_", rup.index, "_", i, ".bsa"); > > > >> } > > > >> > > > >> Seismogram seis[] ; > > > >> PeakValue peak[] ; > > > >> > > > >> if(rup.size <= loc_size) { > > > >> /* > > > >> * Not worth to transfer the data. Execute on TeraGrid instead. > > > >> * Also execute on localhost. > > > >> */ > > > >> foreach var,i in vars { > > > >> (seis[i], peak[i]) = seispeak_local(sub, var, site); > > > >> } > > > >> } else {if(rup.size <= agg_size) { > > > >> /* Execute on a single resource */ > > > >> (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > > > >> } else { > > > >> /*offset offs[] = mkoffset(rup.size, agg_size);*/ > > > >> /*for i in offs {*/ > > > >> /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ > > > >> /*off.size);*/ > > > >> /*}*/ > > > >> }} > > > >> } > > > >> > > > >> > > > >> 2011/3/27 Mihael Hategan : > > > >> > I don't believe you. There is no SgtDim data type in that script. > > > >> > > > > >> > Mihael > > > >> > > > > >> > On Sun, 2011-03-27 at 16:55 -0500, Allan Espinosa wrote: > > > >> >> Here it is. The get_app() calls are simple wrappers to readData() > > > >> >> > > > >> >> type offset { > > > >> >> int off; > > > >> >> int size; > > > >> >> } > > > >> >> type offset_file; > > > >> >> (offset _off[]) mkoffset(int _size, int _group_size) { > > > >> >> offset_file file ; > > > >> >> file = mkoffset_file(_size, _group_size); > > > >> >> _off = readData(file); > > > >> >> } > > > >> >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { > > > >> >> mkoffset _size _group_size; > > > >> >> } > > > >> >> > > > >> >> /* TODO: data management zip jobs */ > > > >> >> > > > >> >> /* Main program */ > > > >> >> int run_id = 664; > > > >> >> int agg_size = 80; > > > >> >> int loc_size = 20; > > > >> >> string datadir = > > > >> >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; > > > >> >> > > > >> >> Station site = get_site(run_id); > > > >> >> > > > >> >> Sgt sgt_var > > >> >> l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; > > > >> >> Rupture rups[] = get_ruptures(run_id, site); > > > >> >> > > > >> >> foreach rup in rups { > > > >> >> string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, > > > >> >> "/", rup.index); > > > >> >> Sgt sub > > >> >> r=rup.index>; > > > >> >> string var_str[] = get_variations( site, rup, > > > >> >> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" > > > >> >> ); > > > >> >> Variation vars[] ; > > > >> >> > > > >> >> sub = extract(sgt_var, site, vars[rup.size-1]); > > > >> >> > > > >> >> string seis_str[]; > > > >> >> string peak_str[]; > > > >> >> > > > >> >> foreach var,i in vars { > > > >> >> seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > > > >> >> "_", rup.index, "_", i, ".grm"); > > > >> >> peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > > > >> >> "_", rup.index, "_", i, ".bsa"); > > > >> >> } > > > >> >> > > > >> >> Seismogram seis[] ; > > > >> >> PeakValue peak[] ; > > > >> >> > > > >> >> if(rup.size <= loc_size) { > > > >> >> /* > > > >> >> * Not worth to transfer the data. Execute on TeraGrid instead. > > > >> >> * Also execute on localhost. > > > >> >> */ > > > >> >> foreach var,i in vars { > > > >> >> (seis[i], peak[i]) = seispeak_local(sub, var, site); > > > >> >> } > > > >> >> } else {if(rup.size <= agg_size) { > > > >> >> /* Execute on a single resource */ > > > >> >> (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > > > >> >> } else { > > > >> >> /*offset offs[] = mkoffset(rup.size, agg_size);*/ > > > >> >> /*for i in offs {*/ > > > >> >> /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ > > > >> >> /*off.size);*/ > > > >> >> /*}*/ > > > >> >> }} > > > >> >> } > > > >> >> > > > >> >> > > > >> >> 2011/3/27 Mihael Hategan : > > > >> >> > May I see the script? > > > >> >> > > > > >> >> > On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: > > > >> >> >> this has been occurring for 70 times already. What i expect is for > > > >> >> >> the app with SgtDim sub to run and close the future. > > > >> >> >> > > > >> >> >> 2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s. > > > >> >> >> 2011-03-25 19:40:12,217-0500 WARN HangChecker > > > >> >> >> Registered futures: > > > >> >> >> Rupture[] rups Closed, 1 elements, 0 listeners > > > >> >> >> Variation vars - Closed, no listeners > > > >> >> >> SgtDim sub - Open, 1 listeners > > > >> >> >> string site Closed, no listeners > > > >> >> >> Variation[] vars Closed, 72 elements, 0 listeners > > > >> >> >> ---- > > > >> >> >> > > > >> >> >> Waiting threads: > > > >> >> >> 0-13 > > > >> >> >> 0-13-0-7 > > > >> >> >> 0-13-0-8-1-1 > > > >> >> >> ---- > > > >> >> >> > > > >> >> >> > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun Mar 27 19:41:07 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 17:41:07 -0700 Subject: [Swift-devel] hanging problem In-Reply-To: References: <1301258748.1584.3.camel@blabla2.none> Message-ID: <1301272867.32276.17.camel@blabla2.none> You have: foreach img, i in stats { stats[ i ] = mFitplane ( diff_imgs[i] ); } What is it that you are trying to do there? Mihael On Sun, 2011-03-27 at 15:54 -0500, Jonathan Monette wrote: > here is my entire script > > On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan > wrote: > On Wed, 2011-03-23 at 16:31 -0500, Jonathan Monette wrote: > > > How can the array be closed but all of its values not be? > > > The array being closed simply means that its size is known, > but not > necessarily that its elements have all been computed. > > I'll look at the log, but I'd also like the entire script. > > Mihael > > > > > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein > > From jon.monette at gmail.com Sun Mar 27 19:43:37 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 27 Mar 2011 19:43:37 -0500 Subject: [Swift-devel] hanging problem In-Reply-To: <1301272867.32276.17.camel@blabla2.none> References: <1301258748.1584.3.camel@blabla2.none> <1301272867.32276.17.camel@blabla2.none> Message-ID: stats is an array mapped to several files in a directory of metadata generated by mFitplane. I need to pass all these files and another metadata file to mConcatFit which is the app after this foreach loop. I need to wait for the foreach loop to be complete before mConcatFit can run. On Sun, Mar 27, 2011 at 7:41 PM, Mihael Hategan wrote: > You have: > foreach img, i in stats > { > stats[ i ] = mFitplane ( diff_imgs[i] ); > } > > What is it that you are trying to do there? > > Mihael > > On Sun, 2011-03-27 at 15:54 -0500, Jonathan Monette wrote: > > here is my entire script > > > > On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan > > wrote: > > On Wed, 2011-03-23 at 16:31 -0500, Jonathan Monette wrote: > > > > > How can the array be closed but all of its values not be? > > > > > > The array being closed simply means that its size is known, > > but not > > necessarily that its elements have all been computed. > > > > I'll look at the log, but I'd also like the entire script. > > > > Mihael > > > > > > > > > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > > > > > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Mar 27 20:00:47 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 18:00:47 -0700 Subject: [Swift-devel] hanging problem In-Reply-To: References: <1301258748.1584.3.camel@blabla2.none> <1301272867.32276.17.camel@blabla2.none> Message-ID: <1301274047.16149.14.camel@blabla2.none> Well, you seem to be iterating over an array that you are trying to build inside the iteration, and you are not doing a fold. It's somewhat coincidental that it works, probably because stats is mapped by a static mapper and you don't actually use the value ("img"). Though I see what you are trying to do. And it should either work or fail nicely. So I'll see if I can make a simple test case out of this. Mihael On Sun, 2011-03-27 at 19:43 -0500, Jonathan Monette wrote: > stats is an array mapped to several files in a directory of metadata > generated by mFitplane. I need to pass all these files and another > metadata file to mConcatFit which is the app after this foreach loop. > I need to wait for the foreach loop to be complete before mConcatFit > can run. > > On Sun, Mar 27, 2011 at 7:41 PM, Mihael Hategan > wrote: > You have: > foreach img, i in stats > { > stats[ i ] = mFitplane ( diff_imgs[i] ); > } > > > What is it that you are trying to do there? > > Mihael > > > On Sun, 2011-03-27 at 15:54 -0500, Jonathan Monette wrote: > > here is my entire script > > > > On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan > > > wrote: > > On Wed, 2011-03-23 at 16:31 -0500, Jonathan Monette > wrote: > > > > > How can the array be closed but all of its values > not be? > > > > > > The array being closed simply means that its size is > known, > > but not > > necessarily that its elements have all been > computed. > > > > I'll look at the log, but I'd also like the entire > script. > > > > Mihael > > > > > > > > > > > > -- > > Any intelligent fool can make things bigger and more > complex... It > > takes a touch of genius - and a lot of courage to move in > the opposite > > direction. > > - Albert Einstein > > > > > > > > > > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein > > From dsk at ci.uchicago.edu Sun Mar 27 20:02:56 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Sun, 27 Mar 2011 20:02:56 -0500 Subject: [Swift-devel] hanging problem In-Reply-To: <1301274047.16149.14.camel@blabla2.none> References: <1301258748.1584.3.camel@blabla2.none> <1301272867.32276.17.camel@blabla2.none> <1301274047.16149.14.camel@blabla2.none> Message-ID: Can you suggest how this should be done? Dan On Mar 27, 2011, at 8:00 PM, Mihael Hategan wrote: > Well, you seem to be iterating over an array that you are trying to > build inside the iteration, and you are not doing a fold. It's somewhat > coincidental that it works, probably because stats is mapped by a static > mapper and you don't actually use the value ("img"). > > Though I see what you are trying to do. And it should either work or > fail nicely. So I'll see if I can make a simple test case out of this. > > Mihael > > On Sun, 2011-03-27 at 19:43 -0500, Jonathan Monette wrote: >> stats is an array mapped to several files in a directory of metadata >> generated by mFitplane. I need to pass all these files and another >> metadata file to mConcatFit which is the app after this foreach loop. >> I need to wait for the foreach loop to be complete before mConcatFit >> can run. >> >> On Sun, Mar 27, 2011 at 7:41 PM, Mihael Hategan >> wrote: >> You have: >> foreach img, i in stats >> { >> stats[ i ] = mFitplane ( diff_imgs[i] ); >> } >> >> >> What is it that you are trying to do there? >> >> Mihael >> >> >> On Sun, 2011-03-27 at 15:54 -0500, Jonathan Monette wrote: >>> here is my entire script >>> >>> On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan >> >>> wrote: >>> On Wed, 2011-03-23 at 16:31 -0500, Jonathan Monette >> wrote: >>> >>>> How can the array be closed but all of its values >> not be? >>> >>> >>> The array being closed simply means that its size is >> known, >>> but not >>> necessarily that its elements have all been >> computed. >>> >>> I'll look at the log, but I'd also like the entire >> script. >>> >>> Mihael >>> >>> >>> >>> >>> >>> -- >>> Any intelligent fool can make things bigger and more >> complex... It >>> takes a touch of genius - and a lot of courage to move in >> the opposite >>> direction. >>> - Albert Einstein >>> >>> >> >> >> >> >> >> >> -- >> Any intelligent fool can make things bigger and more complex... It >> takes a touch of genius - and a lot of courage to move in the opposite >> direction. >> - Albert Einstein >> >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From jon.monette at gmail.com Sun Mar 27 20:05:51 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 27 Mar 2011 20:05:51 -0500 Subject: [Swift-devel] hanging problem In-Reply-To: References: <1301258748.1584.3.camel@blabla2.none> <1301272867.32276.17.camel@blabla2.none> <1301274047.16149.14.camel@blabla2.none> Message-ID: Yea. I use the foreach loop because I need to iterate through the diff_imgs array and run an app on each of the entries. The loop runs and completes and all the files in stats are mapped and have data they are just not being closed. So is there a better way of accomplishing what I am doing in Swift? On Sun, Mar 27, 2011 at 8:02 PM, Daniel S. Katz wrote: > Can you suggest how this should be done? > > Dan > > > On Mar 27, 2011, at 8:00 PM, Mihael Hategan wrote: > > > Well, you seem to be iterating over an array that you are trying to > > build inside the iteration, and you are not doing a fold. It's somewhat > > coincidental that it works, probably because stats is mapped by a static > > mapper and you don't actually use the value ("img"). > > > > Though I see what you are trying to do. And it should either work or > > fail nicely. So I'll see if I can make a simple test case out of this. > > > > Mihael > > > > On Sun, 2011-03-27 at 19:43 -0500, Jonathan Monette wrote: > >> stats is an array mapped to several files in a directory of metadata > >> generated by mFitplane. I need to pass all these files and another > >> metadata file to mConcatFit which is the app after this foreach loop. > >> I need to wait for the foreach loop to be complete before mConcatFit > >> can run. > >> > >> On Sun, Mar 27, 2011 at 7:41 PM, Mihael Hategan > >> wrote: > >> You have: > >> foreach img, i in stats > >> { > >> stats[ i ] = mFitplane ( diff_imgs[i] ); > >> } > >> > >> > >> What is it that you are trying to do there? > >> > >> Mihael > >> > >> > >> On Sun, 2011-03-27 at 15:54 -0500, Jonathan Monette wrote: > >>> here is my entire script > >>> > >>> On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan > >> > >>> wrote: > >>> On Wed, 2011-03-23 at 16:31 -0500, Jonathan Monette > >> wrote: > >>> > >>>> How can the array be closed but all of its values > >> not be? > >>> > >>> > >>> The array being closed simply means that its size is > >> known, > >>> but not > >>> necessarily that its elements have all been > >> computed. > >>> > >>> I'll look at the log, but I'd also like the entire > >> script. > >>> > >>> Mihael > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> Any intelligent fool can make things bigger and more > >> complex... It > >>> takes a touch of genius - and a lot of courage to move in > >> the opposite > >>> direction. > >>> - Albert Einstein > >>> > >>> > >> > >> > >> > >> > >> > >> > >> -- > >> Any intelligent fool can make things bigger and more complex... It > >> takes a touch of genius - and a lot of courage to move in the opposite > >> direction. > >> - Albert Einstein > >> > >> > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Daniel S. Katz > University of Chicago > (773) 834-7186 (voice) > (773) 834-3700 (fax) > d.katz at ieee.org or dsk at ci.uchicago.edu > http://www.ci.uchicago.edu/~dsk/ > > > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Sun Mar 27 20:10:26 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 27 Mar 2011 20:10:26 -0500 Subject: [Swift-devel] hanging problem In-Reply-To: References: <1301258748.1584.3.camel@blabla2.none> <1301272867.32276.17.camel@blabla2.none> <1301274047.16149.14.camel@blabla2.none> Message-ID: This makes sense syntactically but I am guessing the internal logic does not like this so is this something that Swift shouldn't do or something that Swift should do but is just broken? On Sun, Mar 27, 2011 at 8:05 PM, Jonathan Monette wrote: > Yea. I use the foreach loop because I need to iterate through the > diff_imgs array and run an app on each of the entries. The loop runs and > completes and all the files in stats are mapped and have data they are just > not being closed. So is there a better way of accomplishing what I am doing > in Swift? > > > On Sun, Mar 27, 2011 at 8:02 PM, Daniel S. Katz wrote: > >> Can you suggest how this should be done? >> >> Dan >> >> >> On Mar 27, 2011, at 8:00 PM, Mihael Hategan wrote: >> >> > Well, you seem to be iterating over an array that you are trying to >> > build inside the iteration, and you are not doing a fold. It's somewhat >> > coincidental that it works, probably because stats is mapped by a static >> > mapper and you don't actually use the value ("img"). >> > >> > Though I see what you are trying to do. And it should either work or >> > fail nicely. So I'll see if I can make a simple test case out of this. >> > >> > Mihael >> > >> > On Sun, 2011-03-27 at 19:43 -0500, Jonathan Monette wrote: >> >> stats is an array mapped to several files in a directory of metadata >> >> generated by mFitplane. I need to pass all these files and another >> >> metadata file to mConcatFit which is the app after this foreach loop. >> >> I need to wait for the foreach loop to be complete before mConcatFit >> >> can run. >> >> >> >> On Sun, Mar 27, 2011 at 7:41 PM, Mihael Hategan >> >> wrote: >> >> You have: >> >> foreach img, i in stats >> >> { >> >> stats[ i ] = mFitplane ( diff_imgs[i] ); >> >> } >> >> >> >> >> >> What is it that you are trying to do there? >> >> >> >> Mihael >> >> >> >> >> >> On Sun, 2011-03-27 at 15:54 -0500, Jonathan Monette wrote: >> >>> here is my entire script >> >>> >> >>> On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan >> >> >> >>> wrote: >> >>> On Wed, 2011-03-23 at 16:31 -0500, Jonathan Monette >> >> wrote: >> >>> >> >>>> How can the array be closed but all of its values >> >> not be? >> >>> >> >>> >> >>> The array being closed simply means that its size is >> >> known, >> >>> but not >> >>> necessarily that its elements have all been >> >> computed. >> >>> >> >>> I'll look at the log, but I'd also like the entire >> >> script. >> >>> >> >>> Mihael >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Any intelligent fool can make things bigger and more >> >> complex... It >> >>> takes a touch of genius - and a lot of courage to move in >> >> the opposite >> >>> direction. >> >>> - Albert Einstein >> >>> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> Any intelligent fool can make things bigger and more complex... It >> >> takes a touch of genius - and a lot of courage to move in the opposite >> >> direction. >> >> - Albert Einstein >> >> >> >> >> > >> > >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> -- >> Daniel S. Katz >> University of Chicago >> (773) 834-7186 (voice) >> (773) 834-3700 (fax) >> d.katz at ieee.org or dsk at ci.uchicago.edu >> http://www.ci.uchicago.edu/~dsk/ >> >> >> >> > > > -- > Any intelligent fool can make things bigger and more complex... It takes a > touch of genius - and a lot of courage to move in the opposite direction. > - Albert Einstein > > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Mar 27 20:11:48 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 18:11:48 -0700 Subject: [Swift-devel] hanging problem In-Reply-To: References: <1301258748.1584.3.camel@blabla2.none> <1301272867.32276.17.camel@blabla2.none> <1301274047.16149.14.camel@blabla2.none> Message-ID: <1301274708.7591.4.camel@blabla2.none> if trying to map some operation from a collection a to a collection b, one would iterate on the source collection instead of the destination collection. Essentially you want to say "for each element in a, let b[i] = f(a[i]))" instead of "for each element in b, let b[i] = f(a[i]))". In the latter case, unless you have expressed somehow that b has a certain number of elements you would get an error or a noop (since b starts with no elements). However, in swift, that expressing that b has a certain number of elements is hidden in the semantics of specific mappers and is therefore not clear or guaranteed. Mihael On Sun, 2011-03-27 at 20:02 -0500, Daniel S. Katz wrote: > Can you suggest how this should be done? > > Dan > > > On Mar 27, 2011, at 8:00 PM, Mihael Hategan wrote: > > > Well, you seem to be iterating over an array that you are trying to > > build inside the iteration, and you are not doing a fold. It's somewhat > > coincidental that it works, probably because stats is mapped by a static > > mapper and you don't actually use the value ("img"). > > > > Though I see what you are trying to do. And it should either work or > > fail nicely. So I'll see if I can make a simple test case out of this. > > > > Mihael > > > > On Sun, 2011-03-27 at 19:43 -0500, Jonathan Monette wrote: > >> stats is an array mapped to several files in a directory of metadata > >> generated by mFitplane. I need to pass all these files and another > >> metadata file to mConcatFit which is the app after this foreach loop. > >> I need to wait for the foreach loop to be complete before mConcatFit > >> can run. > >> > >> On Sun, Mar 27, 2011 at 7:41 PM, Mihael Hategan > >> wrote: > >> You have: > >> foreach img, i in stats > >> { > >> stats[ i ] = mFitplane ( diff_imgs[i] ); > >> } > >> > >> > >> What is it that you are trying to do there? > >> > >> Mihael > >> > >> > >> On Sun, 2011-03-27 at 15:54 -0500, Jonathan Monette wrote: > >>> here is my entire script > >>> > >>> On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan > >> > >>> wrote: > >>> On Wed, 2011-03-23 at 16:31 -0500, Jonathan Monette > >> wrote: > >>> > >>>> How can the array be closed but all of its values > >> not be? > >>> > >>> > >>> The array being closed simply means that its size is > >> known, > >>> but not > >>> necessarily that its elements have all been > >> computed. > >>> > >>> I'll look at the log, but I'd also like the entire > >> script. > >>> > >>> Mihael > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> Any intelligent fool can make things bigger and more > >> complex... It > >>> takes a touch of genius - and a lot of courage to move in > >> the opposite > >>> direction. > >>> - Albert Einstein > >>> > >>> > >> > >> > >> > >> > >> > >> > >> -- > >> Any intelligent fool can make things bigger and more complex... It > >> takes a touch of genius - and a lot of courage to move in the opposite > >> direction. > >> - Albert Einstein > >> > >> > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From ketancmaheshwari at gmail.com Sun Mar 27 20:12:54 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Sun, 27 Mar 2011 20:12:54 -0500 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1301261038.12361.0.camel@blabla2.none> References: <809261295.31771.1301258982970.JavaMail.root@zimbra.anl.gov> <1301259492.17525.2.camel@blabla2.none> <1301261038.12361.0.camel@blabla2.none> Message-ID: On Sun, Mar 27, 2011 at 4:23 PM, Mihael Hategan wrote: > Actually I'm not so sure any more. > > My java Timer does not seem to have a scheduleImpl method. What version > of java is this? > > On beagle it is java 1.6.0: [ketan at login2:~]$ java -version java version "1.6.0" Java(TM) SE Runtime Environment (build pxa6460sr9-20101125_01(SR9)) IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) J9VM - 20101124_069295 JIT - r9_20101028_17488ifx2 GC - 20101027_AA) JCL - 20101119_01 === --Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Mar 27 20:13:52 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 27 Mar 2011 18:13:52 -0700 Subject: [Swift-devel] hanging problem In-Reply-To: References: <1301258748.1584.3.camel@blabla2.none> <1301272867.32276.17.camel@blabla2.none> <1301274047.16149.14.camel@blabla2.none> Message-ID: <1301274832.7591.6.camel@blabla2.none> Let me be clear. I don't think that's your problem. That is simply a matter of, perhaps, style. I found it unclear. It doesn't say what it does. On Sun, 2011-03-27 at 20:10 -0500, Jonathan Monette wrote: > This makes sense syntactically but I am guessing the internal logic > does not like this so is this something that Swift shouldn't do or > something that Swift should do but is just broken? > > On Sun, Mar 27, 2011 at 8:05 PM, Jonathan Monette > wrote: > Yea. I use the foreach loop because I need to iterate through > the diff_imgs array and run an app on each of the entries. > The loop runs and completes and all the files in stats are > mapped and have data they are just not being closed. So is > there a better way of accomplishing what I am doing in Swift? > > > > > On Sun, Mar 27, 2011 at 8:02 PM, Daniel S. Katz > wrote: > Can you suggest how this should be done? > > Dan > > > > On Mar 27, 2011, at 8:00 PM, Mihael Hategan wrote: > > > Well, you seem to be iterating over an array that > you are trying to > > build inside the iteration, and you are not doing a > fold. It's somewhat > > coincidental that it works, probably because stats > is mapped by a static > > mapper and you don't actually use the value ("img"). > > > > Though I see what you are trying to do. And it > should either work or > > fail nicely. So I'll see if I can make a simple test > case out of this. > > > > Mihael > > > > On Sun, 2011-03-27 at 19:43 -0500, Jonathan Monette > wrote: > >> stats is an array mapped to several files in a > directory of metadata > >> generated by mFitplane. I need to pass all these > files and another > >> metadata file to mConcatFit which is the app after > this foreach loop. > >> I need to wait for the foreach loop to be complete > before mConcatFit > >> can run. > >> > >> On Sun, Mar 27, 2011 at 7:41 PM, Mihael Hategan > > >> wrote: > >> You have: > >> foreach img, i in stats > >> { > >> stats[ i ] = mFitplane > ( diff_imgs[i] ); > >> } > >> > >> > >> What is it that you are trying to do there? > >> > >> Mihael > >> > >> > >> On Sun, 2011-03-27 at 15:54 -0500, Jonathan > Monette wrote: > >>> here is my entire script > >>> > >>> On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan > >> > >>> wrote: > >>> On Wed, 2011-03-23 at 16:31 -0500, Jonathan > Monette > >> wrote: > >>> > >>>> How can the array be closed but all of its values > >> not be? > >>> > >>> > >>> The array being closed simply means that > its size is > >> known, > >>> but not > >>> necessarily that its elements have all been > >> computed. > >>> > >>> I'll look at the log, but I'd also like the > entire > >> script. > >>> > >>> Mihael > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> Any intelligent fool can make things bigger and > more > >> complex... It > >>> takes a touch of genius - and a lot of courage to > move in > >> the opposite > >>> direction. > >>> - Albert Einstein > >>> > >>> > >> > >> > >> > >> > >> > >> > >> -- > >> Any intelligent fool can make things bigger and > more complex... It > >> takes a touch of genius - and a lot of courage to > move in the opposite > >> direction. > >> - Albert Einstein > >> > >> > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Daniel S. Katz > University of Chicago > (773) 834-7186 (voice) > (773) 834-3700 (fax) > d.katz at ieee.org or dsk at ci.uchicago.edu > http://www.ci.uchicago.edu/~dsk/ > > > > > > > > -- > > Any intelligent fool can make things bigger and more > complex... It takes a touch of genius - and a lot of courage > to move in the opposite direction. > - Albert Einstein > > > > > > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein > > From jon.monette at gmail.com Sun Mar 27 20:16:48 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 27 Mar 2011 20:16:48 -0500 Subject: [Swift-devel] hanging problem In-Reply-To: <1301274832.7591.6.camel@blabla2.none> References: <1301258748.1584.3.camel@blabla2.none> <1301272867.32276.17.camel@blabla2.none> <1301274047.16149.14.camel@blabla2.none> <1301274832.7591.6.camel@blabla2.none> Message-ID: Ok. I understand. On Sun, Mar 27, 2011 at 8:13 PM, Mihael Hategan wrote: > Let me be clear. I don't think that's your problem. That is simply a > matter of, perhaps, style. I found it unclear. It doesn't say what it > does. > > On Sun, 2011-03-27 at 20:10 -0500, Jonathan Monette wrote: > > This makes sense syntactically but I am guessing the internal logic > > does not like this so is this something that Swift shouldn't do or > > something that Swift should do but is just broken? > > > > On Sun, Mar 27, 2011 at 8:05 PM, Jonathan Monette > > wrote: > > Yea. I use the foreach loop because I need to iterate through > > the diff_imgs array and run an app on each of the entries. > > The loop runs and completes and all the files in stats are > > mapped and have data they are just not being closed. So is > > there a better way of accomplishing what I am doing in Swift? > > > > > > > > > > On Sun, Mar 27, 2011 at 8:02 PM, Daniel S. Katz > > wrote: > > Can you suggest how this should be done? > > > > Dan > > > > > > > > On Mar 27, 2011, at 8:00 PM, Mihael Hategan wrote: > > > > > Well, you seem to be iterating over an array that > > you are trying to > > > build inside the iteration, and you are not doing a > > fold. It's somewhat > > > coincidental that it works, probably because stats > > is mapped by a static > > > mapper and you don't actually use the value ("img"). > > > > > > Though I see what you are trying to do. And it > > should either work or > > > fail nicely. So I'll see if I can make a simple test > > case out of this. > > > > > > Mihael > > > > > > On Sun, 2011-03-27 at 19:43 -0500, Jonathan Monette > > wrote: > > >> stats is an array mapped to several files in a > > directory of metadata > > >> generated by mFitplane. I need to pass all these > > files and another > > >> metadata file to mConcatFit which is the app after > > this foreach loop. > > >> I need to wait for the foreach loop to be complete > > before mConcatFit > > >> can run. > > >> > > >> On Sun, Mar 27, 2011 at 7:41 PM, Mihael Hategan > > > > >> wrote: > > >> You have: > > >> foreach img, i in stats > > >> { > > >> stats[ i ] = mFitplane > > ( diff_imgs[i] ); > > >> } > > >> > > >> > > >> What is it that you are trying to do there? > > >> > > >> Mihael > > >> > > >> > > >> On Sun, 2011-03-27 at 15:54 -0500, Jonathan > > Monette wrote: > > >>> here is my entire script > > >>> > > >>> On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan > > >> > > >>> wrote: > > >>> On Wed, 2011-03-23 at 16:31 -0500, Jonathan > > Monette > > >> wrote: > > >>> > > >>>> How can the array be closed but all of its values > > >> not be? > > >>> > > >>> > > >>> The array being closed simply means that > > its size is > > >> known, > > >>> but not > > >>> necessarily that its elements have all been > > >> computed. > > >>> > > >>> I'll look at the log, but I'd also like the > > entire > > >> script. > > >>> > > >>> Mihael > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> -- > > >>> Any intelligent fool can make things bigger and > > more > > >> complex... It > > >>> takes a touch of genius - and a lot of courage to > > move in > > >> the opposite > > >>> direction. > > >>> - Albert Einstein > > >>> > > >>> > > >> > > >> > > >> > > >> > > >> > > >> > > >> -- > > >> Any intelligent fool can make things bigger and > > more complex... It > > >> takes a touch of genius - and a lot of courage to > > move in the opposite > > >> direction. > > >> - Albert Einstein > > >> > > >> > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Daniel S. Katz > > University of Chicago > > (773) 834-7186 (voice) > > (773) 834-3700 (fax) > > d.katz at ieee.org or dsk at ci.uchicago.edu > > http://www.ci.uchicago.edu/~dsk/ > > > > > > > > > > > > > > > > -- > > > > Any intelligent fool can make things bigger and more > > complex... It takes a touch of genius - and a lot of courage > > to move in the opposite direction. > > - Albert Einstein > > > > > > > > > > > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > > > > > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Sun Mar 27 20:59:01 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 27 Mar 2011 20:59:01 -0500 Subject: [Swift-devel] hanging problem In-Reply-To: References: <1301258748.1584.3.camel@blabla2.none> <1301272867.32276.17.camel@blabla2.none> <1301274047.16149.14.camel@blabla2.none> <1301274832.7591.6.camel@blabla2.none> Message-ID: By simply changing foreach img, i in stats to foreach img, i in diff_imgs I get the output below. RunID: 20110327-2014-gnzxh45g (input): found 10 files Progress: time:0 original callback URI is http://169.254.95.119:44421 callback URI has been overridden to http://192.5.86.6:44421 Adjusting buffer size to 786432 Adjusting buffer size to 524288 Adjusting buffer size to 314368 Progress: time:6284 Submitted:9 Active:1 Adjusting buffer size to 224256 Progress: time:7346 Submitted:2 Active:8 Adjusting buffer size to 156672 Progress: time:8894 Active:2 Checking status:1 Finished successfully:7 Progress: time:9907 Submitted:16 Active:2 Finished successfully:12 [org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000630 type Status with no value at dataset=stats path=[0] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000636 type Status with no value at dataset=stats path=[1] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000621 type Status with no value at dataset=stats path=[2] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000600 type Status with no value at dataset=stats path=[3] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000606 type Status with no value at dataset=stats path=[4] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000609 type Status with no value at dataset=stats path=[5] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000603 type Status with no value at dataset=stats path=[6] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000618 type Status with no value at dataset=stats path=[7] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000627 type Status with no value at dataset=stats path=[8] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000639 type Status with no value at dataset=stats path=[9] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000643 type Status with no value at dataset=stats path=[10] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000597 type Status with no value at dataset=stats path=[11] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000642 type Status with no value at dataset=stats path=[12] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000624 type Status with no value at dataset=stats path=[13] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000633 type Status with no value at dataset=stats path=[14] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000612 type Status with no value at dataset=stats path=[15] (not closed),org.griphyn.vdl.mapping.DataNode identifier dataset:20110327-2014-u3tv8xt3:720000000615 type Status with no value at dataset=stats path=[16] (not closed)] Progress: time:11194 Submitted:24 Active:9 Checking status:1 Finished successfully:30 The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000006.000009.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000005.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000006.000007.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000002.000007.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000007.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000001.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000006.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000001.000002.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000004.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000006.000007.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000004.000005.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000007.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000002.000007.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000003.000004.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000001.000008.fits. Progress: time:12197 Stage in:1 Submitted:10 Active:6 Finished successfully:46 Failed but can retry:1 The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000001.000002.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000004.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000006.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000006.000007.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000007.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000004.000005.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000006.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000002.000007.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000001.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000003.000004.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000001.000002.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000004.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000006.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000002.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000004.000005.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000006.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000008.000009.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000002.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000003.000004.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000003.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000009.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000006.000009.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000005.000006.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000005.fits. Progress: time:13207 Submitted:1 Active:6 Checking status:1 Finished successfully:47 Failed but can retry:9 The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000006.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000002.000008.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000008.000009.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000006.000009.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000005.000006.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000003.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000009.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000005.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000008.000009.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000003.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000000.000009.fits. The cache already contains pads:m101_montage-20110327-2014-gnzxh45g/shared/stat_dir/stat.diff.000005.000006.fits. No events in 10s. Badness Registered futures: Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners Table corrections_tbl - Open, 1 listeners Image diff_img - Closed, no listeners Image[] corrected_images Open, 1 listeners Table fits_images_tbl - Open, 1 listeners Table difference_tbl - Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners Image proj_img - Closed, no listeners Image proj_img - Closed, no listeners BackgroundStruct[] back_struct Open, 0 elements, 1 listeners Status stats - Closed, no listeners Status[] stats Closed, no listeners Image[] projected_images Closed, no listeners Image proj_img - Closed, no listeners Image diff_img - Closed, no listeners Image[] difference_images Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image corrected_mos - Open, 1 listeners Table status_tbl - Closed, no listeners Image diff_img - Closed, no listeners Table back_list - Open, 1 listeners Image proj_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners DiffStruct[] diffs Closed, 17 elements, 0 listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners string swift#mapper#17028 Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners Image proj_1 - Closed, no listeners Table images_tbl - Closed, no listeners ---- Waiting threads: 0-8-1 0-6-3 0-9 0-8-2 0-10 0-8-0 0-7 0-5-2 ---- Progress: time:30007 Finished successfully:47 Failed but can retry:17 No events in 10s. Badness Registered futures: Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners Table corrections_tbl - Open, 1 listeners Image diff_img - Closed, no listeners Image[] corrected_images Open, 1 listeners Table fits_images_tbl - Open, 1 listeners Table difference_tbl - Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners Image proj_img - Closed, no listeners Image proj_img - Closed, no listeners BackgroundStruct[] back_struct Open, 0 elements, 1 listeners Status stats - Closed, no listeners Status[] stats Closed, no listeners Image[] projected_images Closed, no listeners Image proj_img - Closed, no listeners Image diff_img - Closed, no listeners Image[] difference_images Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image corrected_mos - Open, 1 listeners Table status_tbl - Closed, no listeners Image diff_img - Closed, no listeners Table back_list - Open, 1 listeners Image proj_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners DiffStruct[] diffs Closed, 17 elements, 0 listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners string swift#mapper#17028 Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image diff_img - Closed, no listeners Image proj_img - Closed, no listeners Image proj_1 - Closed, no listeners Table images_tbl - Closed, no listeners ---- Waiting threads: 0-8-1 0-6-3 0-9 0-8-2 0-10 0-8-0 0-7 0-5-2 ---- What does it mean that something is in the cache? On Sun, Mar 27, 2011 at 8:16 PM, Jonathan Monette wrote: > Ok. I understand. > > > On Sun, Mar 27, 2011 at 8:13 PM, Mihael Hategan wrote: > >> Let me be clear. I don't think that's your problem. That is simply a >> matter of, perhaps, style. I found it unclear. It doesn't say what it >> does. >> >> On Sun, 2011-03-27 at 20:10 -0500, Jonathan Monette wrote: >> > This makes sense syntactically but I am guessing the internal logic >> > does not like this so is this something that Swift shouldn't do or >> > something that Swift should do but is just broken? >> > >> > On Sun, Mar 27, 2011 at 8:05 PM, Jonathan Monette >> > wrote: >> > Yea. I use the foreach loop because I need to iterate through >> > the diff_imgs array and run an app on each of the entries. >> > The loop runs and completes and all the files in stats are >> > mapped and have data they are just not being closed. So is >> > there a better way of accomplishing what I am doing in Swift? >> > >> > >> > >> > >> > On Sun, Mar 27, 2011 at 8:02 PM, Daniel S. Katz >> > wrote: >> > Can you suggest how this should be done? >> > >> > Dan >> > >> > >> > >> > On Mar 27, 2011, at 8:00 PM, Mihael Hategan wrote: >> > >> > > Well, you seem to be iterating over an array that >> > you are trying to >> > > build inside the iteration, and you are not doing a >> > fold. It's somewhat >> > > coincidental that it works, probably because stats >> > is mapped by a static >> > > mapper and you don't actually use the value ("img"). >> > > >> > > Though I see what you are trying to do. And it >> > should either work or >> > > fail nicely. So I'll see if I can make a simple test >> > case out of this. >> > > >> > > Mihael >> > > >> > > On Sun, 2011-03-27 at 19:43 -0500, Jonathan Monette >> > wrote: >> > >> stats is an array mapped to several files in a >> > directory of metadata >> > >> generated by mFitplane. I need to pass all these >> > files and another >> > >> metadata file to mConcatFit which is the app after >> > this foreach loop. >> > >> I need to wait for the foreach loop to be complete >> > before mConcatFit >> > >> can run. >> > >> >> > >> On Sun, Mar 27, 2011 at 7:41 PM, Mihael Hategan >> > >> > >> wrote: >> > >> You have: >> > >> foreach img, i in stats >> > >> { >> > >> stats[ i ] = mFitplane >> > ( diff_imgs[i] ); >> > >> } >> > >> >> > >> >> > >> What is it that you are trying to do there? >> > >> >> > >> Mihael >> > >> >> > >> >> > >> On Sun, 2011-03-27 at 15:54 -0500, Jonathan >> > Monette wrote: >> > >>> here is my entire script >> > >>> >> > >>> On Sun, Mar 27, 2011 at 3:45 PM, Mihael Hategan >> > >> >> > >>> wrote: >> > >>> On Wed, 2011-03-23 at 16:31 -0500, Jonathan >> > Monette >> > >> wrote: >> > >>> >> > >>>> How can the array be closed but all of its values >> > >> not be? >> > >>> >> > >>> >> > >>> The array being closed simply means that >> > its size is >> > >> known, >> > >>> but not >> > >>> necessarily that its elements have all been >> > >> computed. >> > >>> >> > >>> I'll look at the log, but I'd also like the >> > entire >> > >> script. >> > >>> >> > >>> Mihael >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> -- >> > >>> Any intelligent fool can make things bigger and >> > more >> > >> complex... It >> > >>> takes a touch of genius - and a lot of courage to >> > move in >> > >> the opposite >> > >>> direction. >> > >>> - Albert Einstein >> > >>> >> > >>> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> -- >> > >> Any intelligent fool can make things bigger and >> > more complex... It >> > >> takes a touch of genius - and a lot of courage to >> > move in the opposite >> > >> direction. >> > >> - Albert Einstein >> > >> >> > >> >> > > >> > > >> > >> > > _______________________________________________ >> > > Swift-devel mailing list >> > > Swift-devel at ci.uchicago.edu >> > > >> > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > >> > -- >> > Daniel S. Katz >> > University of Chicago >> > (773) 834-7186 (voice) >> > (773) 834-3700 (fax) >> > d.katz at ieee.org or dsk at ci.uchicago.edu >> > http://www.ci.uchicago.edu/~dsk/ >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > Any intelligent fool can make things bigger and more >> > complex... It takes a touch of genius - and a lot of courage >> > to move in the opposite direction. >> > - Albert Einstein >> > >> > >> > >> > >> > >> > >> > -- >> > Any intelligent fool can make things bigger and more complex... It >> > takes a touch of genius - and a lot of courage to move in the opposite >> > direction. >> > - Albert Einstein >> > >> > >> >> >> > > > -- > Any intelligent fool can make things bigger and more complex... It takes a > touch of genius - and a lot of courage to move in the opposite direction. > - Albert Einstein > > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Mar 28 02:04:16 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Mar 2011 00:04:16 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: References: <809261295.31771.1301258982970.JavaMail.root@zimbra.anl.gov> <1301259492.17525.2.camel@blabla2.none> <1301261038.12361.0.camel@blabla2.none> Message-ID: <1301295856.21494.1.camel@blabla2.none> The IBM implementation seems to do pretty much the same thing as the Sun one. Which is that they never call .cancel() on a timer. So I don't understand what's happening here. I don't see any piece of code that cancels that timer. Are you all using the same swift installation? Can you try a clean install? Mihael On Sun, 2011-03-27 at 20:12 -0500, Ketan Maheshwari wrote: > > > On Sun, Mar 27, 2011 at 4:23 PM, Mihael Hategan > wrote: > Actually I'm not so sure any more. > > My java Timer does not seem to have a scheduleImpl method. > What version > of java is this? > > On beagle it is java 1.6.0: > > [ketan at login2:~]$ java -version > java version "1.6.0" > Java(TM) SE Runtime Environment (build pxa6460sr9-20101125_01(SR9)) > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 > jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) > J9VM - 20101124_069295 > JIT - r9_20101028_17488ifx2 > GC - 20101027_AA) > JCL - 20101119_01 > > === > > --Ketan > From ketancmaheshwari at gmail.com Mon Mar 28 09:34:45 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 28 Mar 2011 09:34:45 -0500 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1301295856.21494.1.camel@blabla2.none> References: <809261295.31771.1301258982970.JavaMail.root@zimbra.anl.gov> <1301259492.17525.2.camel@blabla2.none> <1301261038.12361.0.camel@blabla2.none> <1301295856.21494.1.camel@blabla2.none> Message-ID: Hi Mihael, Mike, I think Mike had built a local Swift with pbs+coaster capabilities for beagle. I am not sure if a clean install from repo has (if yes, I do not know which rev) these capabilities. Ketan On Mon, Mar 28, 2011 at 2:04 AM, Mihael Hategan wrote: > The IBM implementation seems to do pretty much the same thing as the Sun > one. Which is that they never call .cancel() on a timer. > > So I don't understand what's happening here. I don't see any piece of > code that cancels that timer. Are you all using the same swift > installation? Can you try a clean install? > > Mihael > > On Sun, 2011-03-27 at 20:12 -0500, Ketan Maheshwari wrote: > > > > > > On Sun, Mar 27, 2011 at 4:23 PM, Mihael Hategan > > wrote: > > Actually I'm not so sure any more. > > > > My java Timer does not seem to have a scheduleImpl method. > > What version > > of java is this? > > > > On beagle it is java 1.6.0: > > > > [ketan at login2:~]$ java -version > > java version "1.6.0" > > Java(TM) SE Runtime Environment (build pxa6460sr9-20101125_01(SR9)) > > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 > > jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) > > J9VM - 20101124_069295 > > JIT - r9_20101028_17488ifx2 > > GC - 20101027_AA) > > JCL - 20101119_01 > > > > === > > > > --Ketan > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Mar 28 10:05:21 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 28 Mar 2011 10:05:21 -0500 (CDT) Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: Message-ID: <800646342.33185.1301324721343.JavaMail.root@zimbra.anl.gov> This was run on an 0.92 release modified to support Beagle. Code was build from ~wilde/swift/src/0.92 We can/should try with plain 0.92, on both Beagle and vanilla linux building from both Beagle Java and Sun Java Those are the variables I can think of to get closer to the root cause. - Mike ----- Original Message ----- > Hi Mihael, Mike, > > I think Mike had built a local Swift with pbs+coaster capabilities for > beagle. I am not sure if a clean install from repo has (if yes, I do > not know which rev) these capabilities. > > Ketan > > > On Mon, Mar 28, 2011 at 2:04 AM, Mihael Hategan < hategan at mcs.anl.gov > > wrote: > > > The IBM implementation seems to do pretty much the same thing as the > Sun > one. Which is that they never call .cancel() on a timer. > > So I don't understand what's happening here. I don't see any piece of > code that cancels that timer. Are you all using the same swift > installation? Can you try a clean install? > > Mihael > > > > > On Sun, 2011-03-27 at 20:12 -0500, Ketan Maheshwari wrote: > > > > > > On Sun, Mar 27, 2011 at 4:23 PM, Mihael Hategan < > > hategan at mcs.anl.gov > > > wrote: > > Actually I'm not so sure any more. > > > > My java Timer does not seem to have a scheduleImpl method. > > What version > > of java is this? > > > > On beagle it is java 1.6.0: > > > > [ketan at login2:~]$ java -version > > java version "1.6.0" > > Java(TM) SE Runtime Environment (build pxa6460sr9-20101125_01(SR9)) > > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 > > jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) > > J9VM - 20101124_069295 > > JIT - r9_20101028_17488ifx2 > > GC - 20101027_AA) > > JCL - 20101119_01 > > > > === > > > > --Ketan > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Mon Mar 28 11:21:37 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Mar 2011 09:21:37 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <800646342.33185.1301324721343.JavaMail.root@zimbra.anl.gov> References: <800646342.33185.1301324721343.JavaMail.root@zimbra.anl.gov> Message-ID: <1301329297.24400.4.camel@blabla2.none> Ok. So I'm gonna then go with timer thread killed during shutdown. Here's the relevant code in Timer.java: public void run() { try { mainLoop(); } finally { // Someone killed this Thread, behave as if Timer cancelled synchronized(queue) { newTasksMayBeScheduled = false; ... private void sched(TimerTask task, long time, long period) { //this is scheduleImpl in the IBM jvm if (time < 0) throw new IllegalArgumentException("Illegal execution time."); synchronized(queue) { if (!thread.newTasksMayBeScheduled) throw new IllegalStateException("Timer already cancelled."); ... I guess the solution here is to ignore this error during shutdown and simply not have timeouts. Mihael On Mon, 2011-03-28 at 10:05 -0500, Michael Wilde wrote: > This was run on an 0.92 release modified to support Beagle. Code was build from ~wilde/swift/src/0.92 > > We can/should try with plain 0.92, > on both Beagle and vanilla linux > building from both Beagle Java and Sun Java > > Those are the variables I can think of to get closer to the root cause. > > - Mike > > ----- Original Message ----- > > Hi Mihael, Mike, > > > > I think Mike had built a local Swift with pbs+coaster capabilities for > > beagle. I am not sure if a clean install from repo has (if yes, I do > > not know which rev) these capabilities. > > > > Ketan > > > > > > On Mon, Mar 28, 2011 at 2:04 AM, Mihael Hategan < hategan at mcs.anl.gov > > > wrote: > > > > > > The IBM implementation seems to do pretty much the same thing as the > > Sun > > one. Which is that they never call .cancel() on a timer. > > > > So I don't understand what's happening here. I don't see any piece of > > code that cancels that timer. Are you all using the same swift > > installation? Can you try a clean install? > > > > Mihael > > > > > > > > > > On Sun, 2011-03-27 at 20:12 -0500, Ketan Maheshwari wrote: > > > > > > > > > On Sun, Mar 27, 2011 at 4:23 PM, Mihael Hategan < > > > hategan at mcs.anl.gov > > > > wrote: > > > Actually I'm not so sure any more. > > > > > > My java Timer does not seem to have a scheduleImpl method. > > > What version > > > of java is this? > > > > > > On beagle it is java 1.6.0: > > > > > > [ketan at login2:~]$ java -version > > > java version "1.6.0" > > > Java(TM) SE Runtime Environment (build pxa6460sr9-20101125_01(SR9)) > > > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 > > > jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) > > > J9VM - 20101124_069295 > > > JIT - r9_20101028_17488ifx2 > > > GC - 20101027_AA) > > > JCL - 20101119_01 > > > > > > === > > > > > > --Ketan > > > > From hategan at mcs.anl.gov Mon Mar 28 11:25:50 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Mar 2011 09:25:50 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1301329297.24400.4.camel@blabla2.none> References: <800646342.33185.1301324721343.JavaMail.root@zimbra.anl.gov> <1301329297.24400.4.camel@blabla2.none> Message-ID: <1301329550.24400.5.camel@blabla2.none> Ok. Fixed in cog 4.1.8 r3078. An info message is now logged instead of the big nasty error. Mihael On Mon, 2011-03-28 at 09:21 -0700, Mihael Hategan wrote: > Ok. So I'm gonna then go with timer thread killed during shutdown. > > Here's the relevant code in Timer.java: > public void run() { > try { > mainLoop(); > } finally { > // Someone killed this Thread, behave as if Timer cancelled > synchronized(queue) { > newTasksMayBeScheduled = false; > ... > private void sched(TimerTask task, long time, long period) { > //this is scheduleImpl in the IBM jvm > if (time < 0) > throw new IllegalArgumentException("Illegal execution > time."); > > synchronized(queue) { > if (!thread.newTasksMayBeScheduled) > throw new IllegalStateException("Timer already > cancelled."); > ... > > I guess the solution here is to ignore this error during shutdown and > simply not have timeouts. > > Mihael > > On Mon, 2011-03-28 at 10:05 -0500, Michael Wilde wrote: > > This was run on an 0.92 release modified to support Beagle. Code was build from ~wilde/swift/src/0.92 > > > > We can/should try with plain 0.92, > > on both Beagle and vanilla linux > > building from both Beagle Java and Sun Java > > > > Those are the variables I can think of to get closer to the root cause. > > > > - Mike > > > > ----- Original Message ----- > > > Hi Mihael, Mike, > > > > > > I think Mike had built a local Swift with pbs+coaster capabilities for > > > beagle. I am not sure if a clean install from repo has (if yes, I do > > > not know which rev) these capabilities. > > > > > > Ketan > > > > > > > > > On Mon, Mar 28, 2011 at 2:04 AM, Mihael Hategan < hategan at mcs.anl.gov > > > > wrote: > > > > > > > > > The IBM implementation seems to do pretty much the same thing as the > > > Sun > > > one. Which is that they never call .cancel() on a timer. > > > > > > So I don't understand what's happening here. I don't see any piece of > > > code that cancels that timer. Are you all using the same swift > > > installation? Can you try a clean install? > > > > > > Mihael > > > > > > > > > > > > > > > On Sun, 2011-03-27 at 20:12 -0500, Ketan Maheshwari wrote: > > > > > > > > > > > > On Sun, Mar 27, 2011 at 4:23 PM, Mihael Hategan < > > > > hategan at mcs.anl.gov > > > > > wrote: > > > > Actually I'm not so sure any more. > > > > > > > > My java Timer does not seem to have a scheduleImpl method. > > > > What version > > > > of java is this? > > > > > > > > On beagle it is java 1.6.0: > > > > > > > > [ketan at login2:~]$ java -version > > > > java version "1.6.0" > > > > Java(TM) SE Runtime Environment (build pxa6460sr9-20101125_01(SR9)) > > > > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 > > > > jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) > > > > J9VM - 20101124_069295 > > > > JIT - r9_20101028_17488ifx2 > > > > GC - 20101027_AA) > > > > JCL - 20101119_01 > > > > > > > > === > > > > > > > > --Ketan > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From aespinosa at cs.uchicago.edu Mon Mar 28 11:34:00 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 28 Mar 2011 11:34:00 -0500 Subject: [Swift-devel] cog 4.1.8 + release-0.92 branch build failure In-Reply-To: References: Message-ID: The build now works for me. Thanks, -Allan 2011/3/26 Justin M Wozniak : > > Sorry about that- there was a minor merge error. ?Please update cog and try > again. > ? ? ? ?Thanks for the report > ? ? ? ?Justin > > On Fri, 25 Mar 2011, Allan Espinosa wrote: > >> delete.jar: >> ? ?[echo] [karajan]: DELETE.JAR (cog-karajan-0.36-dev.jar) >> >> compile: >> ? ?[echo] [karajan]: COMPILE >> ? [mkdir] Created dir: >> /autonfs/home/aespinosa/swift/cogkit/modules/karajan/build >> ? [javac] Compiling 493 source files to >> /autonfs/home/aespinosa/swift/cogkit/modules/karajan/build >> ? [javac] >> /autonfs/home/aespinosa/swift/cogkit/modules/karajan/src/org/globus/cog/karajan/workflow/nodes/grid/AbstractGridNode.java:189: >> >> setStack(org.globus.cog.abstraction.interfaces.Task,org.globus.cog.karajan.stack.VariableStack) >> is already defined in >> org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode >> ? [javac] ? ? protected final void setStack(Task task, VariableStack >> stack) { >> ? [javac] ? ? ? ? ? ? ? ? ? ? ? ? ?^ >> ? [javac] Note: Some input files use or override a deprecated API. >> ? [javac] Note: Recompile with -Xlint:deprecation for details. >> ? [javac] Note: Some input files use unchecked or unsafe operations. >> ? [javac] Note: Recompile with -Xlint:unchecked for details. >> ? [javac] 1 error > > -- > Justin M Wozniak From aespinosa at cs.uchicago.edu Mon Mar 28 11:35:41 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 28 Mar 2011 11:35:41 -0500 Subject: [Swift-devel] hang checker fun In-Reply-To: <1301271095.32276.11.camel@blabla2.none> References: <1301258513.1584.1.camel@blabla2.none> <1301265343.32276.0.camel@blabla2.none> <1301268401.32276.1.camel@blabla2.none> <1301269290.32276.3.camel@blabla2.none> <1301270052.32276.4.camel@blabla2.none> <1301271095.32276.11.camel@blabla2.none> Message-ID: on pads/ ci, the scripts and config files are in ~aespinosa/workflows/cybershake 2011/3/27 Mihael Hategan : > That would be the maybe not. The hang checker already checks if jobs are > running. > > It seems that the submission fails in execute2 for unknown reasons and > is retried 3 times. Then it hangs quietly instead of logging > "END_FAILURE". Which is very odd. Can you point me to your exact > installation of swift and configuration files? > > Mihael > > On Sun, 2011-03-27 at 16:54 -0700, Mihael Hategan wrote: >> Maybe. Maybe not. It may be that the job itself doesn't get queued. >> >> On Sun, 2011-03-27 at 16:41 -0700, Mihael Hategan wrote: >> > Ok. The fact that the hang checker kicks in doesn't mean that there is >> > necessarily a hang. What I see from the log is that extract is trying to >> > run and is probably just queued. >> > >> > I will try to change the hang checker to not kick in if there is at >> > least one job running. >> > >> > Mihael >> > >> > On Sun, 2011-03-27 at 18:29 -0500, Allan Espinosa wrote: >> > > Here you go. (see attached) >> > > >> > > -Allan >> > > >> > > 2011/3/27 Mihael Hategan : >> > > > May I also see the log? It looks like there's something weird around >> > > > line 186. >> > > > >> > > > Mihael >> > > > >> > > > On Sun, 2011-03-27 at 17:56 -0500, Allan Espinosa wrote: >> > > >> oops. trimmed the first part. >> > > >> >> > > >> Thanks >> > > >> >> > > >> type SgtDim; >> > > >> type Variation; >> > > >> type Seismogram; >> > > >> type PeakValue; >> > > >> >> > > >> type Station { >> > > >> ? string name; >> > > >> ? float lat; >> > > >> ? float lon; >> > > >> ? int erf; >> > > >> ? int variation_scenario; >> > > >> } >> > > >> >> > > >> type Sgt { >> > > >> ? SgtDim x; >> > > >> ? SgtDim y; >> > > >> } >> > > >> >> > > >> type Rupture { >> > > >> ? int source; >> > > >> ? int index; >> > > >> ? int size; >> > > >> } >> > > >> >> > > >> /* some constants used by the apps*/ >> > > >> global int num_time_steps = 3000; >> > > >> global string spectra_period1 = "all"; >> > > >> global float filter_highhz = 5.0; >> > > >> global float simulation_timeskip = 0.1; >> > > >> >> > > >> app (Sgt _ext) extract(Sgt _sgt, Station _stat, Variation _var) { >> > > >> ? extract @strcat("stat=", _stat.name) "extract_sgt=1" >> > > >> ? ? ? @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) >> > > >> >> > > >> ? ? ? @strcat("rupmodfile=", @filename(_var)) >> > > >> ? ? ? @strcat("sgt_xfile=", @filename(_sgt.x)) >> > > >> ? ? ? @strcat("sgt_yfile=", @filename(_sgt.y)) >> > > >> ? ? ? @strcat("extract_sgt_xfile=", @filename(_ext.x)) >> > > >> ? ? ? @strcat("extract_sgt_yfile=", @filename(_ext.y)); >> > > >> } >> > > >> >> > > >> app (Seismogram _seis, PeakValue _peak) >> > > >> ? ? seispeak(Sgt _sgt, Variation _var, Station _stat) { >> > > >> ? seispeak >> > > >> ? ? ? /* Args of seismogram synthesis ? ? */ >> > > >> ? ? ? @strcat("stat=", _stat.name) "extract_sgt=0" >> > > >> ? ? ? @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) >> > > >> ? ? ? "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) >> > > >> >> > > >> ? ? ? @strcat("rupmodfile=", @filename(_var)) >> > > >> ? ? ? @strcat("sgt_xfile=", @filename(_sgt.x)) >> > > >> ? ? ? @strcat("sgt_yfile=", @filename(_sgt.y)) >> > > >> ? ? ? @strcat("seis_file=", @filename(_seis)) >> > > >> >> > > >> ? ? ? /* Args of peak ground acceleration */ >> > > >> ? ? ? "simulation_out_pointsX=2" "simulation_out_pointsY=1" >> > > >> ? ? ? "surfseis_rspectra_seismogram_units=cmpersec" >> > > >> ? ? ? "surfseis_rspectra_output_units=cmpersec2" >> > > >> ? ? ? "surfseis_rspectra_output_type=aa" >> > > >> ? ? ? "surfseis_rspectra_apply_byteswap=no" >> > > >> >> > > >> ? ? ? @strcat("simulation_out_timesamples=", num_time_steps) >> > > >> ? ? ? @strcat("simulation_out_timeskip=", simulation_timeskip) >> > > >> ? ? ? @strcat("surfseis_rspectra_period=", spectra_period1) >> > > >> ? ? ? @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) >> > > >> ? ? ? @strcat("in=", @filename(_seis)) >> > > >> ? ? ? @strcat("out=", @filename(_peak)); >> > > >> } >> > > >> >> > > >> app (Seismogram _seis, PeakValue _peak) >> > > >> ? ? seispeak_local(Sgt _sgt, Variation _var, Station _stat) { >> > > >> ? seispeak_local >> > > >> ? ? ? /* Args of seismogram synthesis ? ? */ >> > > >> ? ? ? @strcat("stat=", _stat.name) "extract_sgt=0" >> > > >> ? ? ? @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) >> > > >> ? ? ? "outputBinary=1" "mergeOutput=1" @strcat("ntout=", num_time_steps) >> > > >> >> > > >> ? ? ? @strcat("rupmodfile=", @filename(_var)) >> > > >> ? ? ? @strcat("sgt_xfile=", @filename(_sgt.x)) >> > > >> ? ? ? @strcat("sgt_yfile=", @filename(_sgt.y)) >> > > >> ? ? ? @strcat("seis_file=", @filename(_seis)) >> > > >> >> > > >> ? ? ? /* Args of peak ground acceleration */ >> > > >> ? ? ? "simulation_out_pointsX=2" "simulation_out_pointsY=1" >> > > >> ? ? ? "surfseis_rspectra_seismogram_units=cmpersec" >> > > >> ? ? ? "surfseis_rspectra_output_units=cmpersec2" >> > > >> ? ? ? "surfseis_rspectra_output_type=aa" >> > > >> ? ? ? "surfseis_rspectra_apply_byteswap=no" >> > > >> >> > > >> ? ? ? @strcat("simulation_out_timesamples=", num_time_steps) >> > > >> ? ? ? @strcat("simulation_out_timeskip=", simulation_timeskip) >> > > >> ? ? ? @strcat("surfseis_rspectra_period=", spectra_period1) >> > > >> ? ? ? @strcat(" surfseis_rspectra_apply_filter_highHZ=", filter_highhz) >> > > >> ? ? ? @strcat("in=", @filename(_seis)) >> > > >> ? ? ? @strcat("out=", @filename(_peak)); >> > > >> } >> > > >> >> > > >> app (Seismogram _seis[], PeakValue _peak[]) >> > > >> ? ? seispeak_agg(Sgt _sgt, Variation _var[], Station _stat, int n) { >> > > >> ? seispeak_agg >> > > >> ? ? ? /* System args */ >> > > >> ? ? ? _stat.name _stat.lon _stat.lat num_time_steps >> > > >> ? ? ? num_time_steps simulation_timeskip spectra_period1 filter_highhz >> > > >> >> > > >> ? ? ? @filename(_sgt.x) @filename(_sgt.y) >> > > >> >> > > >> ? ? ? n @filenames(_var) @filenames(_seis) @filenames(_peak); >> > > >> } >> > > >> >> > > >> // Auxillary functions for the mappers >> > > >> type StationFile; >> > > >> app (StationFile _stat) getsite_file(int _run_id) { >> > > >> ? getsite _run_id stdout=@filename(_stat); >> > > >> } >> > > >> (Station _stat) get_site(int _run_id) { >> > > >> ? StationFile file<"/var/tmp/site_tmp">; >> > > >> ? /*file = getsite_file(_run_id);*/ >> > > >> ? _stat = readData(file); >> > > >> } >> > > >> >> > > >> type RuptureFile; >> > > >> app (RuptureFile _rup) getrupture_file(int _run_id) { >> > > >> ? getrupture _run_id stdout=@filename(_rup); >> > > >> } >> > > >> (Rupture _rup[]) get_ruptures(int _run_id, Station _site) { >> > > >> ? /*RuptureFile file> > > >> "/rup_tmp")>;*/ >> > > >> ? RuptureFile file<"LGU/rup_tmp">; >> > > >> ? /*file = getrupture_file(_run_id);*/ >> > > >> ? _rup = readData(file); >> > > >> } >> > > >> >> > > >> type VariationFile; >> > > >> app (VariationFile _var) getvariation_file(Station _site, Rupture _rup, >> > > >> ? ? string _loc) { >> > > >> ? variation_mapper "-e" _site.erf "-v" _site.variation_scenario >> > > >> ? ? ? "-l" _loc "-s" _rup.source "-r" _rup.index stdout=@_var; >> > > >> } >> > > >> (string _vars[]) get_variations(Station _site, Rupture _rup, string _loc){ >> > > >> ? string fname = @strcat(_rup.source, "_", _rup.index); >> > > >> ? VariationFile file> > > >> ? ? ? file=@strcat(_site.name, "/varlist/", _rup.source, "/", fname, ".txt")>; >> > > >> ? /*file = getvariation_file(_site, _rup, _loc);*/ >> > > >> ? _vars = readData(file); >> > > >> } >> > > >> >> > > >> type offset { >> > > >> ? int off; >> > > >> ? int size; >> > > >> } >> > > >> type offset_file; >> > > >> (offset _off[]) mkoffset(int _size, int _group_size) { >> > > >> ? ?offset_file file ; >> > > >> ? ?file = mkoffset_file(_size, _group_size); >> > > >> ? ?_off = readData(file); >> > > >> } >> > > >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { >> > > >> ? mkoffset _size _group_size; >> > > >> } >> > > >> >> > > >> /* TODO: data management zip jobs */ >> > > >> >> > > >> /* Main program */ >> > > >> int run_id = 664; >> > > >> int agg_size = 80; >> > > >> int loc_size = 20; >> > > >> string datadir = >> > > >> ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; >> > > >> >> > > >> Station site = get_site(run_id); >> > > >> >> > > >> Sgt sgt_var > > > >> ? l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; >> > > >> Rupture rups[] = get_ruptures(run_id, site); >> > > >> >> > > >> foreach rup in rups { >> > > >> ? string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, >> > > >> "/", rup.index); >> > > >> ? Sgt sub > > > >> ? ? ? r=rup.index>; >> > > >> ? string var_str[] = get_variations( site, rup, >> > > >> ? ? ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" >> > > >> ); >> > > >> ? Variation vars[] ; >> > > >> >> > > >> ? sub = extract(sgt_var, ?site, vars[rup.size-1]); >> > > >> >> > > >> ? string seis_str[]; >> > > >> ? string peak_str[]; >> > > >> >> > > >> ? foreach var,i in vars { >> > > >> ? ? seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, >> > > >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".grm"); >> > > >> ? ? peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, >> > > >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".bsa"); >> > > >> ? } >> > > >> >> > > >> ? Seismogram seis[] ; >> > > >> ? PeakValue ?peak[] ; >> > > >> >> > > >> ? if(rup.size <= loc_size) { >> > > >> ? ? /* >> > > >> ? ? ?* Not worth to transfer the data. Execute on TeraGrid instead. >> > > >> ? ? ?* Also execute on localhost. >> > > >> ? ? ?*/ >> > > >> ? ? foreach var,i in vars { >> > > >> ? ? ? (seis[i], peak[i]) = seispeak_local(sub, var, site); >> > > >> ? ? } >> > > >> ? } else {if(rup.size <= agg_size) { >> > > >> ? ? /* Execute on a single resource */ >> > > >> ? ? (seis, peak) = seispeak_agg(sub, vars, site, rup.size); >> > > >> ? } else { >> > > >> ? ? /*offset offs[] = mkoffset(rup.size, agg_size);*/ >> > > >> ? ? /*for i in offs {*/ >> > > >> ? ? ? /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ >> > > >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /*off.size);*/ >> > > >> ? ? /*}*/ >> > > >> ? }} >> > > >> } >> > > >> >> > > >> >> > > >> 2011/3/27 Mihael Hategan : >> > > >> > I don't believe you. There is no SgtDim data type in that script. >> > > >> > >> > > >> > Mihael >> > > >> > >> > > >> > On Sun, 2011-03-27 at 16:55 -0500, Allan Espinosa wrote: >> > > >> >> Here it is. ?The get_app() calls are simple wrappers to readData() >> > > >> >> >> > > >> >> type offset { >> > > >> >> ? int off; >> > > >> >> ? int size; >> > > >> >> } >> > > >> >> type offset_file; >> > > >> >> (offset _off[]) mkoffset(int _size, int _group_size) { >> > > >> >> ? ?offset_file file ; >> > > >> >> ? ?file = mkoffset_file(_size, _group_size); >> > > >> >> ? ?_off = readData(file); >> > > >> >> } >> > > >> >> app (offset_file _off) mkoffset_file(int _size, int _group_size) { >> > > >> >> ? mkoffset _size _group_size; >> > > >> >> } >> > > >> >> >> > > >> >> /* TODO: data management zip jobs */ >> > > >> >> >> > > >> >> /* Main program */ >> > > >> >> int run_id = 664; >> > > >> >> int agg_size = 80; >> > > >> >> int loc_size = 20; >> > > >> >> string datadir = >> > > >> >> ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results"; >> > > >> >> >> > > >> >> Station site = get_site(run_id); >> > > >> >> >> > > >> >> Sgt sgt_var > > > >> >> ? l="gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles">; >> > > >> >> Rupture rups[] = get_ruptures(run_id, site); >> > > >> >> >> > > >> >> foreach rup in rups { >> > > >> >> ? string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, >> > > >> >> "/", rup.index); >> > > >> >> ? Sgt sub > > > >> >> ? ? ? r=rup.index>; >> > > >> >> ? string var_str[] = get_variations( site, rup, >> > > >> >> ? ? ? "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" >> > > >> >> ); >> > > >> >> ? Variation vars[] ; >> > > >> >> >> > > >> >> ? sub = extract(sgt_var, ?site, vars[rup.size-1]); >> > > >> >> >> > > >> >> ? string seis_str[]; >> > > >> >> ? string peak_str[]; >> > > >> >> >> > > >> >> ? foreach var,i in vars { >> > > >> >> ? ? seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, >> > > >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".grm"); >> > > >> >> ? ? peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, >> > > >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? "_", rup.index, "_", i, ".bsa"); >> > > >> >> ? } >> > > >> >> >> > > >> >> ? Seismogram seis[] ; >> > > >> >> ? PeakValue ?peak[] ; >> > > >> >> >> > > >> >> ? if(rup.size <= loc_size) { >> > > >> >> ? ? /* >> > > >> >> ? ? ?* Not worth to transfer the data. Execute on TeraGrid instead. >> > > >> >> ? ? ?* Also execute on localhost. >> > > >> >> ? ? ?*/ >> > > >> >> ? ? foreach var,i in vars { >> > > >> >> ? ? ? (seis[i], peak[i]) = seispeak_local(sub, var, site); >> > > >> >> ? ? } >> > > >> >> ? } else {if(rup.size <= agg_size) { >> > > >> >> ? ? /* Execute on a single resource */ >> > > >> >> ? ? (seis, peak) = seispeak_agg(sub, vars, site, rup.size); >> > > >> >> ? } else { >> > > >> >> ? ? /*offset offs[] = mkoffset(rup.size, agg_size);*/ >> > > >> >> ? ? /*for i in offs {*/ >> > > >> >> ? ? ? /*(seis, peak) = seispeak_agg(sub, vars[i.off:i.off+off.size],*/ >> > > >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /*off.size);*/ >> > > >> >> ? ? /*}*/ >> > > >> >> ? }} >> > > >> >> } >> > > >> >> >> > > >> >> >> > > >> >> 2011/3/27 Mihael Hategan : >> > > >> >> > May I see the script? >> > > >> >> > >> > > >> >> > On Fri, 2011-03-25 at 19:42 -0500, Allan Espinosa wrote: >> > > >> >> >> this has been occurring for 70 times already. ?What i expect is for >> > > >> >> >> the app with SgtDim sub to run and close the future. >> > > >> >> >> >> > > >> >> >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker No events in 10s. >> > > >> >> >> 2011-03-25 19:40:12,217-0500 WARN ?HangChecker >> > > >> >> >> Registered futures: >> > > >> >> >> Rupture[] rups ?Closed, 1 elements, 0 listeners >> > > >> >> >> Variation vars - Closed, no listeners >> > > >> >> >> SgtDim sub - Open, 1 listeners >> > > >> >> >> string site ?Closed, no listeners >> > > >> >> >> Variation[] vars ?Closed, 72 elements, 0 listeners >> > > >> >> >> ---- >> > > >> >> >> >> > > >> >> >> Waiting threads: >> > > >> >> >> 0-13 >> > > >> >> >> 0-13-0-7 >> > > >> >> >> 0-13-0-8-1-1 >> > > >> >> >> --- From ketancmaheshwari at gmail.com Mon Mar 28 12:05:10 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 28 Mar 2011 12:05:10 -0500 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1301329550.24400.5.camel@blabla2.none> References: <800646342.33185.1301324721343.JavaMail.root@zimbra.anl.gov> <1301329297.24400.4.camel@blabla2.none> <1301329550.24400.5.camel@blabla2.none> Message-ID: Hi, I checked out r3078. However, on beagle, I am getting another exception: TaskSubmissionException, can only cancel an active task. Attached is the logfile and following is the exception stacktrace: ==== [ketan at login1:pbs.run]$ sh run.sh Swift svn swift-r4225 cog-r3078 RunID: 20110328-1053-5q1fu8re Progress: time:0 SwiftScript trace: 1y4m-1 SwiftScript trace: 2day-1 SwiftScript trace: 2e5p-1 SwiftScript trace: 1y4m-2 SwiftScript trace: 2eaq-1 SwiftScript trace: 2dhy-1 SwiftScript trace: 1wxp-1 SwiftScript trace: 1j55-1 SwiftScript trace: 1jmt-1 SwiftScript trace: 1wf0-1 Failed to shut down block: Block 0328-541001-000000 (240x99940.000s) org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Can only cancel an active task at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:191) at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:102) at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:91) at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:45) at org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:308) at org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:288) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.cleanDoneBlocks(BlockQueueProcessor.java:186) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:509) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) Failed to shut down block: Block 0328-541001-000001 (240x99940.000s) org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Can only cancel an active task at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:191) at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:102) at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:91) at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:45) at org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:308) at org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:288) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.cleanDoneBlocks(BlockQueueProcessor.java:186) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:509) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) Failed to shut down block: Block 0328-541001-000002 (240x99940.000s) ==== On Mon, Mar 28, 2011 at 11:25 AM, Mihael Hategan wrote: > Ok. Fixed in cog 4.1.8 r3078. An info message is now logged instead of > the big nasty error. > > Mihael > > On Mon, 2011-03-28 at 09:21 -0700, Mihael Hategan wrote: > > Ok. So I'm gonna then go with timer thread killed during shutdown. > > > > Here's the relevant code in Timer.java: > > public void run() { > > try { > > mainLoop(); > > } finally { > > // Someone killed this Thread, behave as if Timer cancelled > > synchronized(queue) { > > newTasksMayBeScheduled = false; > > ... > > private void sched(TimerTask task, long time, long period) { > > //this is scheduleImpl in the IBM jvm > > if (time < 0) > > throw new IllegalArgumentException("Illegal execution > > time."); > > > > synchronized(queue) { > > if (!thread.newTasksMayBeScheduled) > > throw new IllegalStateException("Timer already > > cancelled."); > > ... > > > > I guess the solution here is to ignore this error during shutdown and > > simply not have timeouts. > > > > Mihael > > > > On Mon, 2011-03-28 at 10:05 -0500, Michael Wilde wrote: > > > This was run on an 0.92 release modified to support Beagle. Code was > build from ~wilde/swift/src/0.92 > > > > > > We can/should try with plain 0.92, > > > on both Beagle and vanilla linux > > > building from both Beagle Java and Sun Java > > > > > > Those are the variables I can think of to get closer to the root cause. > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > Hi Mihael, Mike, > > > > > > > > I think Mike had built a local Swift with pbs+coaster capabilities > for > > > > beagle. I am not sure if a clean install from repo has (if yes, I do > > > > not know which rev) these capabilities. > > > > > > > > Ketan > > > > > > > > > > > > On Mon, Mar 28, 2011 at 2:04 AM, Mihael Hategan < > hategan at mcs.anl.gov > > > > > wrote: > > > > > > > > > > > > The IBM implementation seems to do pretty much the same thing as the > > > > Sun > > > > one. Which is that they never call .cancel() on a timer. > > > > > > > > So I don't understand what's happening here. I don't see any piece of > > > > code that cancels that timer. Are you all using the same swift > > > > installation? Can you try a clean install? > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > On Sun, 2011-03-27 at 20:12 -0500, Ketan Maheshwari wrote: > > > > > > > > > > > > > > > On Sun, Mar 27, 2011 at 4:23 PM, Mihael Hategan < > > > > > hategan at mcs.anl.gov > > > > > > wrote: > > > > > Actually I'm not so sure any more. > > > > > > > > > > My java Timer does not seem to have a scheduleImpl method. > > > > > What version > > > > > of java is this? > > > > > > > > > > On beagle it is java 1.6.0: > > > > > > > > > > [ketan at login2:~]$ java -version > > > > > java version "1.6.0" > > > > > Java(TM) SE Runtime Environment (build pxa6460sr9-20101125_01(SR9)) > > > > > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 > > > > > jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) > > > > > J9VM - 20101124_069295 > > > > > JIT - r9_20101028_17488ifx2 > > > > > GC - 20101027_AA) > > > > > JCL - 20101119_01 > > > > > > > > > > === > > > > > > > > > > --Ketan > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ftdock-20110328-1053-5q1fu8re.log Type: application/octet-stream Size: 173046 bytes Desc: not available URL: From hategan at mcs.anl.gov Mon Mar 28 12:14:26 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Mar 2011 10:14:26 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: References: <800646342.33185.1301324721343.JavaMail.root@zimbra.anl.gov> <1301329297.24400.4.camel@blabla2.none> <1301329550.24400.5.camel@blabla2.none> Message-ID: <1301332466.26164.1.camel@blabla2.none> Good point. That behavior is pretty silly. One should be able to cancel a non-active task even if just for the reason that the coding involved in making sure that you only cancel an active task is unnecessarily complex. On Mon, 2011-03-28 at 12:05 -0500, Ketan Maheshwari wrote: > Hi, > > I checked out r3078. However, on beagle, I am getting another > exception: TaskSubmissionException, can only cancel an active task. > > Attached is the logfile and following is the exception stacktrace: > > ==== > > [ketan at login1:pbs.run]$ sh run.sh > Swift svn swift-r4225 cog-r3078 > > RunID: 20110328-1053-5q1fu8re > Progress: time:0 > SwiftScript trace: 1y4m-1 > SwiftScript trace: 2day-1 > SwiftScript trace: 2e5p-1 > SwiftScript trace: 1y4m-2 > SwiftScript trace: 2eaq-1 > SwiftScript trace: 2dhy-1 > SwiftScript trace: 1wxp-1 > SwiftScript trace: 1j55-1 > SwiftScript trace: 1jmt-1 > SwiftScript trace: 1wf0-1 > Failed to shut down block: Block 0328-541001-000000 (240x99940.000s) > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Can only cancel an active task > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:191) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > at > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:102) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:91) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:45) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:308) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:288) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.cleanDoneBlocks(BlockQueueProcessor.java:186) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:509) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > Failed to shut down block: Block 0328-541001-000001 (240x99940.000s) > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Can only cancel an active task > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:191) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > at > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:102) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:91) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:45) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:308) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:288) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.cleanDoneBlocks(BlockQueueProcessor.java:186) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:509) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > Failed to shut down block: Block 0328-541001-000002 (240x99940.000s) > ==== > > > > On Mon, Mar 28, 2011 at 11:25 AM, Mihael Hategan > wrote: > Ok. Fixed in cog 4.1.8 r3078. An info message is now logged > instead of > the big nasty error. > > Mihael > > > On Mon, 2011-03-28 at 09:21 -0700, Mihael Hategan wrote: > > Ok. So I'm gonna then go with timer thread killed during > shutdown. > > > > Here's the relevant code in Timer.java: > > public void run() { > > try { > > mainLoop(); > > } finally { > > // Someone killed this Thread, behave as if > Timer cancelled > > synchronized(queue) { > > newTasksMayBeScheduled = false; > > ... > > private void sched(TimerTask task, long time, long period) { > > //this is scheduleImpl in the IBM jvm > > if (time < 0) > > throw new IllegalArgumentException("Illegal > execution > > time."); > > > > synchronized(queue) { > > if (!thread.newTasksMayBeScheduled) > > throw new IllegalStateException("Timer > already > > cancelled."); > > ... > > > > I guess the solution here is to ignore this error during > shutdown and > > simply not have timeouts. > > > > Mihael > > > > On Mon, 2011-03-28 at 10:05 -0500, Michael Wilde wrote: > > > This was run on an 0.92 release modified to support > Beagle. Code was build from ~wilde/swift/src/0.92 > > > > > > We can/should try with plain 0.92, > > > on both Beagle and vanilla linux > > > building from both Beagle Java and Sun Java > > > > > > Those are the variables I can think of to get closer to > the root cause. > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > Hi Mihael, Mike, > > > > > > > > I think Mike had built a local Swift with pbs+coaster > capabilities for > > > > beagle. I am not sure if a clean install from repo has > (if yes, I do > > > > not know which rev) these capabilities. > > > > > > > > Ketan > > > > > > > > > > > > On Mon, Mar 28, 2011 at 2:04 AM, Mihael Hategan < > hategan at mcs.anl.gov > > > > > wrote: > > > > > > > > > > > > The IBM implementation seems to do pretty much the same > thing as the > > > > Sun > > > > one. Which is that they never call .cancel() on a timer. > > > > > > > > So I don't understand what's happening here. I don't see > any piece of > > > > code that cancels that timer. Are you all using the same > swift > > > > installation? Can you try a clean install? > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > On Sun, 2011-03-27 at 20:12 -0500, Ketan Maheshwari > wrote: > > > > > > > > > > > > > > > On Sun, Mar 27, 2011 at 4:23 PM, Mihael Hategan < > > > > > hategan at mcs.anl.gov > > > > > > wrote: > > > > > Actually I'm not so sure any more. > > > > > > > > > > My java Timer does not seem to have a scheduleImpl > method. > > > > > What version > > > > > of java is this? > > > > > > > > > > On beagle it is java 1.6.0: > > > > > > > > > > [ketan at login2:~]$ java -version > > > > > java version "1.6.0" > > > > > Java(TM) SE Runtime Environment (build > pxa6460sr9-20101125_01(SR9)) > > > > > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux > amd64-64 > > > > > jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) > > > > > J9VM - 20101124_069295 > > > > > JIT - r9_20101028_17488ifx2 > > > > > GC - 20101027_AA) > > > > > JCL - 20101119_01 > > > > > > > > > > === > > > > > > > > > > --Ketan > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From aespinosa at cs.uchicago.edu Mon Mar 28 12:19:29 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 28 Mar 2011 12:19:29 -0500 Subject: [Swift-devel] resume on merged 0.92 Message-ID: With the merge of trunk to release-0.92, I thinkk the resume feature broke. So I should just use the tarball edition of release-0.92? -Allan From ketancmaheshwari at gmail.com Mon Mar 28 13:01:27 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 28 Mar 2011 13:01:27 -0500 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: <1301332466.26164.1.camel@blabla2.none> References: <800646342.33185.1301324721343.JavaMail.root@zimbra.anl.gov> <1301329297.24400.4.camel@blabla2.none> <1301329550.24400.5.camel@blabla2.none> <1301332466.26164.1.camel@blabla2.none> Message-ID: Mihael, Can you give me a helping hand: I do not understand which task needs to be canceled as it is just the beginning of job submission. Ketan On Mon, Mar 28, 2011 at 12:14 PM, Mihael Hategan wrote: > Good point. That behavior is pretty silly. One should be able to cancel > a non-active task even if just for the reason that the coding involved > in making sure that you only cancel an active task is unnecessarily > complex. > > On Mon, 2011-03-28 at 12:05 -0500, Ketan Maheshwari wrote: > > Hi, > > > > I checked out r3078. However, on beagle, I am getting another > > exception: TaskSubmissionException, can only cancel an active task. > > > > Attached is the logfile and following is the exception stacktrace: > > > > ==== > > > > [ketan at login1:pbs.run]$ sh run.sh > > Swift svn swift-r4225 cog-r3078 > > > > RunID: 20110328-1053-5q1fu8re > > Progress: time:0 > > SwiftScript trace: 1y4m-1 > > SwiftScript trace: 2day-1 > > SwiftScript trace: 2e5p-1 > > SwiftScript trace: 1y4m-2 > > SwiftScript trace: 2eaq-1 > > SwiftScript trace: 2dhy-1 > > SwiftScript trace: 1wxp-1 > > SwiftScript trace: 1j55-1 > > SwiftScript trace: 1jmt-1 > > SwiftScript trace: 1wf0-1 > > Failed to shut down block: Block 0328-541001-000000 (240x99940.000s) > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > Can only cancel an active task > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:191) > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > > at > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:102) > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:91) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:45) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:308) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:288) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.cleanDoneBlocks(BlockQueueProcessor.java:186) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:509) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > Failed to shut down block: Block 0328-541001-000001 (240x99940.000s) > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > Can only cancel an active task > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:191) > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > > at > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:102) > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:91) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:45) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:308) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:288) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.cleanDoneBlocks(BlockQueueProcessor.java:186) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:509) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > Failed to shut down block: Block 0328-541001-000002 (240x99940.000s) > > ==== > > > > > > > > On Mon, Mar 28, 2011 at 11:25 AM, Mihael Hategan > > wrote: > > Ok. Fixed in cog 4.1.8 r3078. An info message is now logged > > instead of > > the big nasty error. > > > > Mihael > > > > > > On Mon, 2011-03-28 at 09:21 -0700, Mihael Hategan wrote: > > > Ok. So I'm gonna then go with timer thread killed during > > shutdown. > > > > > > Here's the relevant code in Timer.java: > > > public void run() { > > > try { > > > mainLoop(); > > > } finally { > > > // Someone killed this Thread, behave as if > > Timer cancelled > > > synchronized(queue) { > > > newTasksMayBeScheduled = false; > > > ... > > > private void sched(TimerTask task, long time, long period) { > > > //this is scheduleImpl in the IBM jvm > > > if (time < 0) > > > throw new IllegalArgumentException("Illegal > > execution > > > time."); > > > > > > synchronized(queue) { > > > if (!thread.newTasksMayBeScheduled) > > > throw new IllegalStateException("Timer > > already > > > cancelled."); > > > ... > > > > > > I guess the solution here is to ignore this error during > > shutdown and > > > simply not have timeouts. > > > > > > Mihael > > > > > > On Mon, 2011-03-28 at 10:05 -0500, Michael Wilde wrote: > > > > This was run on an 0.92 release modified to support > > Beagle. Code was build from ~wilde/swift/src/0.92 > > > > > > > > We can/should try with plain 0.92, > > > > on both Beagle and vanilla linux > > > > building from both Beagle Java and Sun Java > > > > > > > > Those are the variables I can think of to get closer to > > the root cause. > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > Hi Mihael, Mike, > > > > > > > > > > I think Mike had built a local Swift with pbs+coaster > > capabilities for > > > > > beagle. I am not sure if a clean install from repo has > > (if yes, I do > > > > > not know which rev) these capabilities. > > > > > > > > > > Ketan > > > > > > > > > > > > > > > On Mon, Mar 28, 2011 at 2:04 AM, Mihael Hategan < > > hategan at mcs.anl.gov > > > > > > wrote: > > > > > > > > > > > > > > > The IBM implementation seems to do pretty much the same > > thing as the > > > > > Sun > > > > > one. Which is that they never call .cancel() on a timer. > > > > > > > > > > So I don't understand what's happening here. I don't see > > any piece of > > > > > code that cancels that timer. Are you all using the same > > swift > > > > > installation? Can you try a clean install? > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, 2011-03-27 at 20:12 -0500, Ketan Maheshwari > > wrote: > > > > > > > > > > > > > > > > > > On Sun, Mar 27, 2011 at 4:23 PM, Mihael Hategan < > > > > > > hategan at mcs.anl.gov > > > > > > > wrote: > > > > > > Actually I'm not so sure any more. > > > > > > > > > > > > My java Timer does not seem to have a scheduleImpl > > method. > > > > > > What version > > > > > > of java is this? > > > > > > > > > > > > On beagle it is java 1.6.0: > > > > > > > > > > > > [ketan at login2:~]$ java -version > > > > > > java version "1.6.0" > > > > > > Java(TM) SE Runtime Environment (build > > pxa6460sr9-20101125_01(SR9)) > > > > > > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux > > amd64-64 > > > > > > jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) > > > > > > J9VM - 20101124_069295 > > > > > > JIT - r9_20101028_17488ifx2 > > > > > > GC - 20101027_AA) > > > > > > JCL - 20101119_01 > > > > > > > > > > > > === > > > > > > > > > > > > --Ketan > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Mar 28 13:28:53 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Mar 2011 11:28:53 -0700 Subject: [Swift-devel] "Timer was cancelled" error on Beagle on Swift script termination In-Reply-To: References: <800646342.33185.1301324721343.JavaMail.root@zimbra.anl.gov> <1301329297.24400.4.camel@blabla2.none> <1301329550.24400.5.camel@blabla2.none> <1301332466.26164.1.camel@blabla2.none> Message-ID: <1301336933.28283.3.camel@blabla2.none> On Mon, 2011-03-28 at 13:01 -0500, Ketan Maheshwari wrote: > Mihael, > > Can you give me a helping hand: I do not understand which task needs > to be canceled as it is just the beginning of job submission. Actually it's quite the end: org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:288) What happens is that there are rather few jobs left and some of the coaster blocks are not needed any more, so the system tries to shut them down. In this particular case it is trying to shut down a block that never got to run. Mihael From aespinosa at cs.uchicago.edu Mon Mar 28 13:31:51 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 28 Mar 2011 13:31:51 -0500 Subject: using the fake provider (was Re: [Swift-devel] more on job throughput) Message-ID: Hi Mihael, How should I use the fake provider? I want to generate fake workloads while waiting for the trunk fixes. I noticed the provider-fake is not listed in swift-src/dependencies.xml . Are we expected to add this manually? Thanks, -Allan 2010/7/26 Mihael Hategan : > Here's a plot of the number of tasks in the various stages that the > runtime stats track. > > This is with 8192 jobs and the fake provider (which does nothing and > finishes tasks almost immediately, and which I should probably commit > somewhere if anybody else wants to play with this). > > I also attached the scripts used. You would need to change RuntimeStats > to print the stats more often than the 1s default (say something like > (MIN,MAX)_PERIOD_MS=100). From hategan at mcs.anl.gov Mon Mar 28 14:15:09 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Mar 2011 12:15:09 -0700 Subject: using the fake provider (was Re: [Swift-devel] more on job throughput) In-Reply-To: References: Message-ID: <1301339709.29163.1.camel@blabla2.none> On Mon, 2011-03-28 at 13:31 -0500, Allan Espinosa wrote: > Hi Mihael, > > How should I use the fake provider? I want to generate fake workloads > while waiting for the trunk fixes. > > I noticed the provider-fake is not listed in > swift-src/dependencies.xml . Are we expected to add this manually? Yes. It acts as a local provider, so you should also remove the local provider from the dependencies (or just delete the jar from lib). Other than that it should probably work out of the box. You won't be able to run coaster stuff through it though. Mihael From aespinosa at cs.uchicago.edu Mon Mar 28 17:25:18 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 28 Mar 2011 17:25:18 -0500 Subject: [Swift-devel] Re: resume on merged 0.92 In-Reply-To: References: Message-ID: Ok now i'm able to describe the problem clearer now that i finished a run with the stable branch. It cannot resume jobs from the resumefile. But when the workflow finishes, the resumefile is still removed as it was done normally. 2011/3/28 Allan Espinosa : > With the merge of trunk to release-0.92, I thinkk the resume feature > broke. ?So I should just use the tarball edition of release-0.92? > > -Allan > From bugzilla-daemon at mcs.anl.gov Mon Mar 28 17:26:39 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 28 Mar 2011 17:26:39 -0500 (CDT) Subject: [Swift-devel] [Bug 273] resume is currently broken (trunk) In-Reply-To: References: Message-ID: <20110328222639.80F102E318@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=273 Allan Espinosa changed: What |Removed |Added ---------------------------------------------------------------------------- Version|trunk |0.92 --- Comment #1 from Allan Espinosa 2011-03-28 17:26:39 --- Broken in 0.92 after the merge as well. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From yadudoc1729 at gmail.com Tue Mar 29 10:54:41 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 29 Mar 2011 21:24:41 +0530 Subject: [Swift-devel] Adding associative array operators for Swift [GSoC] Message-ID: Hi, I see that "Adding associative array operators to Swift" has been added to the ideas page [1] (recently). I am interested in working on this project, and believe that this will allow further work on implementing map-reduce on Swift (as Mike Wilde had mentioned in a previous mail). I have gone through most of the documentation on Swift and believe I have a fair understanding of the system. I have been doing research for the idea on adding functional iteration constructs to swift and I believe I have learnt a lot about swift from it. Any ideas and suggestions would be greatly appreciated. [1] http://dev.globus.org/wiki/Google_Summer_of_Code_2011_Ideas -- Thanks and Regards, Yadu Nand B From wilde at mcs.anl.gov Tue Mar 29 14:34:28 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 29 Mar 2011 14:34:28 -0500 (CDT) Subject: [Swift-devel] Re: Adding associative array operators for Swift [GSoC] In-Reply-To: Message-ID: <2077046954.39787.1301427268311.JavaMail.root@zimbra.anl.gov> Hi Yadu, To do the associative array project, you need to learn how Swift is compiled and Karajan, how it adds new Swift-specific runtime functions to Karajan (specifically the Swift-array access primitives) and how to extend those primitives to allow strings (minimally) and other data types (possibly) to be used as array indices. You'll also need to modify the Swift grammar and parser (ANTLR) to accept non-integer array indices; and the Swift type checking system to extend the language's type conformation semantics. Lastly you'll need to make sure the enhanced Swift still passes all its old language and runtime tests, and add new tests for the new syntax and semantics. And I agree with you: it is certainly *possible* that an associative array construct, because of its key-value nature, would enable new models of map-reduce-like scripting patters to be implemented using Swift (but that remains to be developed and could probably be done in a variety of ways). So this would be a nice first-phase project that could then lead to either the map-reduce project or further exploration of other language feature (e.g.: support for array slices was also recently requested...) Regards, Mike ----- Original Message ----- > Hi, > > I see that "Adding associative array operators to Swift" has been > added > to the ideas page [1] (recently). I am interested in working on this > project, > and believe that this will allow further work on implementing > map-reduce > on Swift (as Mike Wilde had mentioned in a previous mail). > > I have gone through most of the documentation on Swift and believe I > have a fair understanding of the system. I have been doing research > for the idea on adding functional iteration constructs to swift and I > believe > I have learnt a lot about swift from it. > > Any ideas and suggestions would be greatly appreciated. > > [1] http://dev.globus.org/wiki/Google_Summer_of_Code_2011_Ideas > > -- > Thanks and Regards, > Yadu Nand B -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Tue Mar 29 17:23:15 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 29 Mar 2011 17:23:15 -0500 (CDT) Subject: [Swift-devel] Swift impact Fwd: [beagle-users] login nodes In-Reply-To: <99BF4946-DC8A-4FE7-8219-53C9576F074C@ci.uchicago.edu> Message-ID: <914746260.40767.1301437395256.JavaMail.root@zimbra.anl.gov> Ketan, we should test running Swift on Beagle *from* a job run under qsub -I and aprun. Ie, put Swift itself on a compute node. As usage gets heavier, we'll need to do this to avoid overtaxing the login hosts (and giving Swift a black eye from the Sysadmin team). Same applies everywhere, notably on TeraGrid hosts. We should document how to do this and add this mode to the test suite. - Mike ----- Forwarded Message ----- From: "Ti Leggett" To: beagle-users at ci.uchicago.edu Sent: Tuesday, March 29, 2011 10:39:28 AM Subject: [beagle-users] login nodes -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is a reminder that resource intensive commands should not be run on the login nodes. In fact in all but rare cases the only command that should be run on them is qsub or aprun. IF you need to do resource intensive tasks, you must use sandbox.beagle.ci.uchicago.edu. During the next maintenance period, we will most likely be renaming nodes to properly reflect there use and restrict access and resources on the PBS MOM nodes (currently the login nodes). -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.14 (Darwin) iEYEARECAAYFAk2R/TAACgkQ4RgdOxQVi0BFcACgn2r9M+CgLMQIXyZs43qpKHUu lMsAn38eHyxdGl3QIQ+6y7MQeeKFZ6g6 =G7sB -----END PGP SIGNATURE----- _______________________________________________ beagle-users mailing list beagle-users at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/beagle-users -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From yadudoc1729 at gmail.com Wed Mar 30 14:23:34 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 31 Mar 2011 00:53:34 +0530 Subject: [Swift-devel] Re: Adding associative array operators for Swift [GSoC] In-Reply-To: <2077046954.39787.1301427268311.JavaMail.root@zimbra.anl.gov> References: <2077046954.39787.1301427268311.JavaMail.root@zimbra.anl.gov> Message-ID: Hi Mike, Thanks for the detailed reply. >From what I understand an associative array implementation for swift would need Add key-value bindings and lookups based on keys. Modifying and removing the value associated with a key doesn't fit with the functional style and hence wouldn't be needed ? This project certainly looks very interesting and hopefully I'll get to understand a sizeable portion of the codebase and contribute something meaningful to swift over this summer. I'm filing an application right away ! On Wed, Mar 30, 2011 at 1:04 AM, Michael Wilde wrote: > Hi Yadu, > > To do the associative array project, you need to learn how Swift is compiled and Karajan, how it adds new Swift-specific runtime functions to Karajan (specifically the Swift-array access primitives) and how to extend those primitives to allow strings (minimally) and other data types (possibly) to be used as array indices. > > You'll also need to modify the Swift grammar and parser (ANTLR) to accept non-integer array indices; and the Swift type checking system to extend the language's type conformation semantics. > > Lastly you'll need to make sure the enhanced Swift still passes all its old language and runtime tests, and add new tests for the new syntax and semantics. > > And I agree with you: it is certainly *possible* that an associative array construct, because of its key-value nature, would enable new models of map-reduce-like scripting patters to be implemented using Swift (but that remains to be developed and could probably be done in a variety of ways). > > So this would be a nice first-phase project that could then lead to either the map-reduce project or further exploration of other language feature (e.g.: support for array slices was also recently requested...) > > Regards, > > Mike > > > > ----- Original Message ----- >> Hi, >> >> I see that "Adding associative array operators to Swift" has been >> added >> to the ideas page [1] (recently). I am interested in working on this >> project, >> and believe that this will allow further work on implementing >> map-reduce >> on Swift (as Mike Wilde had mentioned in a previous mail). >> >> I have gone through most of the documentation on Swift and believe I >> have a fair understanding of the system. I have been doing research >> for the idea on adding functional iteration constructs to swift and I >> believe >> I have learnt a lot about swift from it. >> >> Any ideas and suggestions would be greatly appreciated. >> >> [1] http://dev.globus.org/wiki/Google_Summer_of_Code_2011_Ideas >> >> -- >> Thanks and Regards, >> Yadu Nand B > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > Thanks and Regards, Yadu From wilde at mcs.anl.gov Wed Mar 30 15:30:10 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 15:30:10 -0500 (CDT) Subject: [Swift-devel] Re: Adding associative array operators for Swift [GSoC] In-Reply-To: Message-ID: <1739806492.47376.1301517010014.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > Hi Mike, > > Thanks for the detailed reply. > > From what I understand an associative array implementation for swift > would > need Add key-value bindings and lookups based on keys. Modifying and > removing > the value associated with a key doesn't fit with the functional style > and hence > wouldn't be needed ? Correct: associative arrays would act just like integer-indexed arrays: each array element is write-one and immutable. > > This project certainly looks very interesting and hopefully I'll get > to understand > a sizeable portion of the codebase and contribute something meaningful > to > swift over this summer. I'm filing an application right away ! Great! - Mike > > On Wed, Mar 30, 2011 at 1:04 AM, Michael Wilde > wrote: > > Hi Yadu, > > > > To do the associative array project, you need to learn how Swift is > > compiled and Karajan, how it adds new Swift-specific runtime > > functions to Karajan (specifically the Swift-array access > > primitives) and how to extend those primitives to allow strings > > (minimally) and other data types (possibly) to be used as array > > indices. > > > > You'll also need to modify the Swift grammar and parser (ANTLR) to > > accept non-integer array indices; and the Swift type checking system > > to extend the language's type conformation semantics. > > > > Lastly you'll need to make sure the enhanced Swift still passes all > > its old language and runtime tests, and add new tests for the new > > syntax and semantics. > > > > And I agree with you: it is certainly *possible* that an associative > > array construct, because of its key-value nature, would enable new > > models of map-reduce-like scripting patters to be implemented using > > Swift (but that remains to be developed and could probably be done > > in a variety of ways). > > > > So this would be a nice first-phase project that could then lead to > > either the map-reduce project or further exploration of other > > language feature (e.g.: support for array slices was also recently > > requested...) > > > > Regards, > > > > Mike > > > > > > > > ----- Original Message ----- > >> Hi, > >> > >> I see that "Adding associative array operators to Swift" has been > >> added > >> to the ideas page [1] (recently). I am interested in working on > >> this > >> project, > >> and believe that this will allow further work on implementing > >> map-reduce > >> on Swift (as Mike Wilde had mentioned in a previous mail). > >> > >> I have gone through most of the documentation on Swift and believe > >> I > >> have a fair understanding of the system. I have been doing research > >> for the idea on adding functional iteration constructs to swift and > >> I > >> believe > >> I have learnt a lot about swift from it. > >> > >> Any ideas and suggestions would be greatly appreciated. > >> > >> [1] http://dev.globus.org/wiki/Google_Summer_of_Code_2011_Ideas > >> > >> -- > >> Thanks and Regards, > >> Yadu Nand B > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > Thanks and Regards, > Yadu -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Wed Mar 30 16:49:16 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 30 Mar 2011 16:49:16 -0500 (CDT) Subject: [Swift-devel] [Bug 277] Swift gives misleading error message when sites file is missing tag In-Reply-To: References: Message-ID: <20110330214916.5334B1C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=277 --- Comment #1 from skenny 2011-03-30 16:49:16 --- maybe run thru "sanity test" via python, etc in swift command (?) -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Mar 30 17:16:47 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 30 Mar 2011 17:16:47 -0500 (CDT) Subject: [Swift-devel] [Bug 229] Swift log should capture additional environmental information In-Reply-To: References: Message-ID: <20110330221647.03A2B2B89D@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=229 skenny changed: What |Removed |Added ---------------------------------------------------------------------------- Component|SwiftScript language |Log processing and plotting AssignedTo|skenny at uchicago.edu |wozniak at mcs.anl.gov -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Mar 30 17:17:29 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 30 Mar 2011 17:17:29 -0500 (CDT) Subject: [Swift-devel] [Bug 237] swift command argument parsing yields misleading error messages In-Reply-To: References: Message-ID: <20110330221729.092CE2B8A3@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=237 skenny changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Component|SwiftScript language |error messages Version|unspecified |0.93 -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Mar 30 17:19:05 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 30 Mar 2011 17:19:05 -0500 (CDT) Subject: [Swift-devel] [Bug 231] ssh staging gives error if login scripts write to stdout In-Reply-To: References: Message-ID: <20110330221905.3766F2B8CC@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=231 skenny changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Component|SwiftScript language |error messages Version|unspecified |0.93 Severity|normal |minor -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Mar 30 17:20:35 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 30 Mar 2011 17:20:35 -0500 (CDT) Subject: [Swift-devel] [Bug 255] Extra field in tc file gives java exception in profile parsing In-Reply-To: References: Message-ID: <20110330222035.718B62B916@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=255 skenny changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Component|SwiftScript language |error messages Version|unspecified |0.93 OS/Version|Mac OS |All -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Mar 30 17:32:04 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 30 Mar 2011 17:32:04 -0500 (CDT) Subject: [Swift-devel] [Bug 231] ssh staging gives error if login scripts write to stdout In-Reply-To: References: Message-ID: <20110330223204.5545B1C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=231 skenny changed: What |Removed |Added ---------------------------------------------------------------------------- Version|0.93 |unspecified -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From zhaozhang at uchicago.edu Wed Mar 30 17:47:27 2011 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Wed, 30 Mar 2011 17:47:27 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? Message-ID: <4D93B2FF.20702@uchicago.edu> Hi guys, I am seeing something weird in swfit-0.92. Any idea about this? The swift script is very simple: zzhang at sandbox:~/workplace/Andrey> cat movies.swift type Pickle {} type History {} type Image {} app (History historyout) movie_graph (int rerun, int epochs, Pickle picklefile) { movie_graph rerun epochs; } int arr[]; iterate i { arr[i] = i+1; }until(i == 1); int epochs; epochs = 3; Pickle picklefile ; foreach a in arr{ History historyout ; historyout = movie_graph(a, epochs, picklefile); } I ran the script with the latest 0.92 version, which is loaded as a module on beagle. The I saw this: zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift Variable epochs defined in scope 99878388 shadows variable of same name in scope 1813605401 Variable picklefile defined in scope 99878388 shadows variable of same name in scope 1813605401 Swift svn swift-r4157 cog-r3056 RunID: 20110330-1636-ev8vm8gb Progress: Progress: Selecting site:3 Active:1 Progress: Selecting site:3 Checking status:1 Progress: Selecting site:2 Stage in:1 Finished successfully:1 Progress: Selecting site:2 Active:1 Finished successfully:1 Progress: Selecting site:2 Active:1 Finished successfully:1 Progress: Selecting site:1 Stage in:1 Finished successfully:2 Progress: Selecting site:1 Active:1 Finished successfully:2 Progress: Selecting site:1 Checking status:1 Finished successfully:2 The cache already contains localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. Execution failed: The cache already contains localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. Then I switched to an older version, it worked well. zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift Variable epochs defined in scope 212602028 shadows variable of same name in scope 1538939834 Variable picklefile defined in scope 212602028 shadows variable of same name in scope 1538939834 Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog modified locally) RunID: 20110330-1639-gmbyz1qa Progress: Progress: Active:2 Progress: Active:1 Checking status:1 Final status: Finished successfully:2 From ketancmaheshwari at gmail.com Wed Mar 30 17:59:37 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 30 Mar 2011 17:59:37 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <4D93B2FF.20702@uchicago.edu> References: <4D93B2FF.20702@uchicago.edu> Message-ID: Hi Zhao, [Quick spontaneous answer:] I just tried to run a very simple script (attached relevant files zipped) and it worked without any errors for me. May be you can try that and I will try yours to see what we can reproduce. Sorry gmail doesnt like to attach so here is the link: http://www.mcs.anl.gov/~ketan/files/tmp.tgz Regards, Ketan On Wed, Mar 30, 2011 at 5:47 PM, Zhao Zhang wrote: > Hi guys, > > I am seeing something weird in swfit-0.92. Any idea about this? > The swift script is very simple: > > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > type Pickle {} > type History {} > type Image {} > > app (History historyout) movie_graph (int rerun, int epochs, Pickle > picklefile) > { > movie_graph rerun epochs; > } > > int arr[]; > iterate i > { > arr[i] = i+1; > }until(i == 1); > > int epochs; > epochs = 3; > Pickle picklefile ; > foreach a in arr{ > History historyout "/histories.pickled-", a)>; > historyout = movie_graph(a, epochs, picklefile); > } > > > > I ran the script with the latest 0.92 version, which is loaded as a module > on beagle. The I saw this: > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift > Variable epochs defined in scope 99878388 shadows variable of same name in > scope 1813605401 > Variable picklefile defined in scope 99878388 shadows variable of same name > in scope 1813605401 > Swift svn swift-r4157 cog-r3056 > > RunID: 20110330-1636-ev8vm8gb > Progress: > Progress: Selecting site:3 Active:1 > Progress: Selecting site:3 Checking status:1 > Progress: Selecting site:2 Stage in:1 Finished successfully:1 > Progress: Selecting site:2 Active:1 Finished successfully:1 > Progress: Selecting site:2 Active:1 Finished successfully:1 > Progress: Selecting site:1 Stage in:1 Finished successfully:2 > Progress: Selecting site:1 Active:1 Finished successfully:2 > Progress: Selecting site:1 Checking status:1 Finished successfully:2 > The cache already contains > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > Execution failed: > The cache already contains > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > Then I switched to an older version, it worked well. > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift > Variable epochs defined in scope 212602028 shadows variable of same name in > scope 1538939834 > Variable picklefile defined in scope 212602028 shadows variable of same > name in scope 1538939834 > Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog modified > locally) > > RunID: 20110330-1639-gmbyz1qa > Progress: > Progress: Active:2 > Progress: Active:1 Checking status:1 > Final status: Finished successfully:2 > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Wed Mar 30 18:02:50 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 30 Mar 2011 18:02:50 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: References: <4D93B2FF.20702@uchicago.edu> Message-ID: The cache already contains localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. What does this mean? I have seen this line before in execution and do not know what it means. Why is Swift complaining about this? Does this mean this file has already been staged in? On Wed, Mar 30, 2011 at 5:59 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > Hi Zhao, > > [Quick spontaneous answer:] > > I just tried to run a very simple script (attached relevant files zipped) > and it worked without any errors for me. > > May be you can try that and I will try yours to see what we can reproduce. > > Sorry gmail doesnt like to attach so here is the link: > http://www.mcs.anl.gov/~ketan/files/tmp.tgz > > Regards, > Ketan > > > > On Wed, Mar 30, 2011 at 5:47 PM, Zhao Zhang wrote: > >> Hi guys, >> >> I am seeing something weird in swfit-0.92. Any idea about this? >> The swift script is very simple: >> >> zzhang at sandbox:~/workplace/Andrey> cat movies.swift >> type Pickle {} >> type History {} >> type Image {} >> >> app (History historyout) movie_graph (int rerun, int epochs, Pickle >> picklefile) >> { >> movie_graph rerun epochs; >> } >> >> int arr[]; >> iterate i >> { >> arr[i] = i+1; >> }until(i == 1); >> >> int epochs; >> epochs = 3; >> Pickle picklefile ; >> foreach a in arr{ >> History historyout > "/histories.pickled-", a)>; >> historyout = movie_graph(a, epochs, picklefile); >> } >> >> >> >> I ran the script with the latest 0.92 version, which is loaded as a module >> on beagle. The I saw this: >> zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift >> Variable epochs defined in scope 99878388 shadows variable of same name in >> scope 1813605401 >> Variable picklefile defined in scope 99878388 shadows variable of same >> name in scope 1813605401 >> Swift svn swift-r4157 cog-r3056 >> >> RunID: 20110330-1636-ev8vm8gb >> Progress: >> Progress: Selecting site:3 Active:1 >> Progress: Selecting site:3 Checking status:1 >> Progress: Selecting site:2 Stage in:1 Finished successfully:1 >> Progress: Selecting site:2 Active:1 Finished successfully:1 >> Progress: Selecting site:2 Active:1 Finished successfully:1 >> Progress: Selecting site:1 Stage in:1 Finished successfully:2 >> Progress: Selecting site:1 Active:1 Finished successfully:2 >> Progress: Selecting site:1 Checking status:1 Finished successfully:2 >> The cache already contains >> localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >> >> Execution failed: >> The cache already contains >> localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >> >> >> Then I switched to an older version, it worked well. >> zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift >> Variable epochs defined in scope 212602028 shadows variable of same name >> in scope 1538939834 >> Variable picklefile defined in scope 212602028 shadows variable of same >> name in scope 1538939834 >> Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog modified >> locally) >> >> RunID: 20110330-1639-gmbyz1qa >> Progress: >> Progress: Active:2 >> Progress: Active:1 Checking status:1 >> Final status: Finished successfully:2 >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Wed Mar 30 18:21:16 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 30 Mar 2011 18:21:16 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <4D93B2FF.20702@uchicago.edu> References: <4D93B2FF.20702@uchicago.edu> Message-ID: I had this error before when two output mapper objects mapped to the same file. $ swift bug_same.swift Swift svn swift-r4208 cog-r3073 RunID: 20110330-1818-ygec7ppa Progress: time:0 The cache already contains localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. The cache already contains localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. Progress: time:1960 Stage in:1 Finished successfully:1 The cache already contains localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. [aespinosa at communicado testing]$ [aespinosa at communicado testing]$ cat bug_same.swift type file; app (file out) echo(string input) { echo input stdout=@filename(out); } file a <"foo">; file b <"foo">; a = echo("hello world"); b = echo("foo bar"); But i think you should be using other Swift mappers that does auto-numbering of files by default. -Allan 2011/3/30 Zhao Zhang : > Hi guys, > > I am seeing something weird in swfit-0.92. Any idea about this? > The swift script is very simple: > > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > type Pickle {} > type History {} > type Image {} > > app (History historyout) movie_graph (int rerun, int epochs, Pickle > picklefile) > { > ? movie_graph rerun epochs; > } > > int arr[]; > iterate i > { > ?arr[i] = i+1; > }until(i == 1); > > int epochs; > epochs = 3; > Pickle picklefile ; > foreach a in arr{ > ?History historyout "/histories.pickled-", a)>; > ?historyout = movie_graph(a, epochs, picklefile); > } > > > > I ran the script with the latest 0.92 version, which is loaded as a module > on beagle. The I saw this: > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift > Variable epochs defined in scope 99878388 shadows variable of same name in > scope 1813605401 > Variable picklefile defined in scope 99878388 shadows variable of same name > in scope 1813605401 > Swift svn swift-r4157 cog-r3056 > > RunID: 20110330-1636-ev8vm8gb > Progress: > Progress: ?Selecting site:3 ?Active:1 > Progress: ?Selecting site:3 ?Checking status:1 > Progress: ?Selecting site:2 ?Stage in:1 ?Finished successfully:1 > Progress: ?Selecting site:2 ?Active:1 ?Finished successfully:1 > Progress: ?Selecting site:2 ?Active:1 ?Finished successfully:1 > Progress: ?Selecting site:1 ?Stage in:1 ?Finished successfully:2 > Progress: ?Selecting site:1 ?Active:1 ?Finished successfully:2 > Progress: ?Selecting site:1 ?Checking status:1 ?Finished successfully:2 > The cache already contains > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > Execution failed: > ? ? ? ?The cache already contains > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > Then I switched to an older version, it worked well. > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift > Variable epochs defined in scope 212602028 shadows variable of same name in > scope 1538939834 > Variable picklefile defined in scope 212602028 shadows variable of same name > in scope 1538939834 > Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog modified > locally) > > RunID: 20110330-1639-gmbyz1qa > Progress: > Progress: ?Active:2 > Progress: ?Active:1 ?Checking status:1 > Final status: ?Finished successfully:2 From jon.monette at gmail.com Wed Mar 30 18:33:04 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 30 Mar 2011 18:33:04 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: References: <4D93B2FF.20702@uchicago.edu> Message-ID: Ok. I understand this error better. But shouldn't that be a different error then? Like a and b are mapped to the same file? I don't know if Swift can know this but looking at the explanation and error it should unless this cache message has a deeper meaning. On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa wrote: > I had this error before when two output mapper objects mapped to the same > file. > > $ swift bug_same.swift > Swift svn swift-r4208 cog-r3073 > > RunID: 20110330-1818-ygec7ppa > Progress: time:0 > The cache already contains > localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > The cache already contains > localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > Progress: time:1960 Stage in:1 Finished successfully:1 > The cache already contains > localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > [aespinosa at communicado testing]$ > [aespinosa at communicado testing]$ cat bug_same.swift > type file; > > app (file out) echo(string input) { > echo input stdout=@filename(out); > } > > file a <"foo">; > file b <"foo">; > > a = echo("hello world"); > b = echo("foo bar"); > > But i think you should be using other Swift mappers that does > auto-numbering of files by default. > > -Allan > > 2011/3/30 Zhao Zhang : > > Hi guys, > > > > I am seeing something weird in swfit-0.92. Any idea about this? > > The swift script is very simple: > > > > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > > type Pickle {} > > type History {} > > type Image {} > > > > app (History historyout) movie_graph (int rerun, int epochs, Pickle > > picklefile) > > { > > movie_graph rerun epochs; > > } > > > > int arr[]; > > iterate i > > { > > arr[i] = i+1; > > }until(i == 1); > > > > int epochs; > > epochs = 3; > > Pickle picklefile ; > > foreach a in arr{ > > History historyout > "/histories.pickled-", a)>; > > historyout = movie_graph(a, epochs, picklefile); > > } > > > > > > > > I ran the script with the latest 0.92 version, which is loaded as a > module > > on beagle. The I saw this: > > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift > > Variable epochs defined in scope 99878388 shadows variable of same name > in > > scope 1813605401 > > Variable picklefile defined in scope 99878388 shadows variable of same > name > > in scope 1813605401 > > Swift svn swift-r4157 cog-r3056 > > > > RunID: 20110330-1636-ev8vm8gb > > Progress: > > Progress: Selecting site:3 Active:1 > > Progress: Selecting site:3 Checking status:1 > > Progress: Selecting site:2 Stage in:1 Finished successfully:1 > > Progress: Selecting site:2 Active:1 Finished successfully:1 > > Progress: Selecting site:2 Active:1 Finished successfully:1 > > Progress: Selecting site:1 Stage in:1 Finished successfully:2 > > Progress: Selecting site:1 Active:1 Finished successfully:2 > > Progress: Selecting site:1 Checking status:1 Finished successfully:2 > > The cache already contains > > > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > Execution failed: > > The cache already contains > > > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > > > Then I switched to an older version, it worked well. > > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift > > Variable epochs defined in scope 212602028 shadows variable of same name > in > > scope 1538939834 > > Variable picklefile defined in scope 212602028 shadows variable of same > name > > in scope 1538939834 > > Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog modified > > locally) > > > > RunID: 20110330-1639-gmbyz1qa > > Progress: > > Progress: Active:2 > > Progress: Active:1 Checking status:1 > > Final status: Finished successfully:2 > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Wed Mar 30 18:37:40 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 30 Mar 2011 18:37:40 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: References: <4D93B2FF.20702@uchicago.edu> Message-ID: Or maybe local variables are static? Maybe they mapped to different files but to the same cache object? But I have been doing local variables in my own workflows though. 2011/3/30 Jonathan Monette : > Ok. ?I understand this error better. ?But shouldn't that be a different > error then? Like a and b are mapped to the same file? I don't know if Swift > can know this but looking at the explanation and error it should unless this > cache message has a deeper meaning. > > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > wrote: >> >> I had this error before when two output mapper objects mapped to the same >> file. >> >> $ swift bug_same.swift >> Swift svn swift-r4208 cog-r3073 >> >> RunID: 20110330-1818-ygec7ppa >> Progress: ?time:0 >> The cache already contains >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >> >> The cache already contains >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >> >> Progress: ?time:1960 ?Stage in:1 ?Finished successfully:1 >> The cache already contains >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >> >> [aespinosa at communicado testing]$ >> [aespinosa at communicado testing]$ cat bug_same.swift >> type file; >> >> app (file out) echo(string input) { >> ?echo input stdout=@filename(out); >> } >> >> file a <"foo">; >> file b <"foo">; >> >> a = echo("hello world"); >> b = echo("foo bar"); >> >> But i think you should be using other Swift mappers that does >> auto-numbering ?of files by default. >> >> -Allan >> >> 2011/3/30 Zhao Zhang : >> > Hi guys, >> > >> > I am seeing something weird in swfit-0.92. Any idea about this? >> > The swift script is very simple: >> > >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift >> > type Pickle {} >> > type History {} >> > type Image {} >> > >> > app (History historyout) movie_graph (int rerun, int epochs, Pickle >> > picklefile) >> > { >> > ? movie_graph rerun epochs; >> > } >> > >> > int arr[]; >> > iterate i >> > { >> > ?arr[i] = i+1; >> > }until(i == 1); >> > >> > int epochs; >> > epochs = 3; >> > Pickle picklefile ; >> > foreach a in arr{ >> > ?History historyout > > "/histories.pickled-", a)>; >> > ?historyout = movie_graph(a, epochs, picklefile); >> > } >> > >> > >> > >> > I ran the script with the latest 0.92 version, which is loaded as a >> > module >> > on beagle. The I saw this: >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift >> > Variable epochs defined in scope 99878388 shadows variable of same name >> > in >> > scope 1813605401 >> > Variable picklefile defined in scope 99878388 shadows variable of same >> > name >> > in scope 1813605401 >> > Swift svn swift-r4157 cog-r3056 >> > >> > RunID: 20110330-1636-ev8vm8gb >> > Progress: >> > Progress: ?Selecting site:3 ?Active:1 >> > Progress: ?Selecting site:3 ?Checking status:1 >> > Progress: ?Selecting site:2 ?Stage in:1 ?Finished successfully:1 >> > Progress: ?Selecting site:2 ?Active:1 ?Finished successfully:1 >> > Progress: ?Selecting site:2 ?Active:1 ?Finished successfully:1 >> > Progress: ?Selecting site:1 ?Stage in:1 ?Finished successfully:2 >> > Progress: ?Selecting site:1 ?Active:1 ?Finished successfully:2 >> > Progress: ?Selecting site:1 ?Checking status:1 ?Finished successfully:2 >> > The cache already contains >> > >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >> > >> > Execution failed: >> > ? ? ? ?The cache already contains >> > >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >> > >> > >> > Then I switched to an older version, it worked well. >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data movies.swift >> > Variable epochs defined in scope 212602028 shadows variable of same name >> > in >> > scope 1538939834 >> > Variable picklefile defined in scope 212602028 shadows variable of same >> > name >> > in scope 1538939834 >> > Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog modified >> > locally) >> > >> > RunID: 20110330-1639-gmbyz1qa >> > Progress: >> > Progress: ?Active:2 >> > Progress: ?Active:1 ?Checking status:1 >> > Final status: ?Finished successfully:2 >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From wilde at mcs.anl.gov Wed Mar 30 19:42:35 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 19:42:35 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: Message-ID: <428994688.48199.1301532155198.JavaMail.root@zimbra.anl.gov> The most common case for this error occurs when two iterations within a foreach loop map an output file to the same physical file name. When swift runs and tries to put the output object into its site cache, it sees that a file of the name name is already in the cache, and its semantics do not allow that. I have not yet stared at this code long enough to see if this explains what is happening here. I also dont know why it might work under one version and fail under 0.92. If the above situation is occurring, perhaps there is some randomness involved: loop iteration ordering; filename generation randomness or difference, etc. But I would debug with that in mind: make sure that all *output* fie names mapped by the script are unique. Ideally, one should be able to find the culprit by grepping the swift log for all the mapped file names and look for duplicates. - Mike ----- Original Message ----- > Or maybe local variables are static? Maybe they mapped to different > files but to the same cache object? But I have been doing local > variables in my own workflows though. > > 2011/3/30 Jonathan Monette : > > Ok. I understand this error better. But shouldn't that be a > > different > > error then? Like a and b are mapped to the same file? I don't know > > if Swift > > can know this but looking at the explanation and error it should > > unless this > > cache message has a deeper meaning. > > > > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > > > wrote: > >> > >> I had this error before when two output mapper objects mapped to > >> the same > >> file. > >> > >> $ swift bug_same.swift > >> Swift svn swift-r4208 cog-r3073 > >> > >> RunID: 20110330-1818-ygec7ppa > >> Progress: time:0 > >> The cache already contains > >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >> > >> The cache already contains > >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >> > >> Progress: time:1960 Stage in:1 Finished successfully:1 > >> The cache already contains > >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >> > >> [aespinosa at communicado testing]$ > >> [aespinosa at communicado testing]$ cat bug_same.swift > >> type file; > >> > >> app (file out) echo(string input) { > >> ?echo input stdout=@filename(out); > >> } > >> > >> file a <"foo">; > >> file b <"foo">; > >> > >> a = echo("hello world"); > >> b = echo("foo bar"); > >> > >> But i think you should be using other Swift mappers that does > >> auto-numbering of files by default. > >> > >> -Allan > >> > >> 2011/3/30 Zhao Zhang : > >> > Hi guys, > >> > > >> > I am seeing something weird in swfit-0.92. Any idea about this? > >> > The swift script is very simple: > >> > > >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > >> > type Pickle {} > >> > type History {} > >> > type Image {} > >> > > >> > app (History historyout) movie_graph (int rerun, int epochs, > >> > Pickle > >> > picklefile) > >> > { > >> > ? movie_graph rerun epochs; > >> > } > >> > > >> > int arr[]; > >> > iterate i > >> > { > >> > ?arr[i] = i+1; > >> > }until(i == 1); > >> > > >> > int epochs; > >> > epochs = 3; > >> > Pickle picklefile >> > file="for_movies.pickled">; > >> > foreach a in arr{ > >> > ?History historyout >> > ?file=@strcat("output/rerun", a, > >> > "/histories.pickled-", a)>; > >> > ?historyout = movie_graph(a, epochs, picklefile); > >> > } > >> > > >> > > >> > > >> > I ran the script with the latest 0.92 version, which is loaded as > >> > a > >> > module > >> > on beagle. The I saw this: > >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > >> > movies.swift > >> > Variable epochs defined in scope 99878388 shadows variable of > >> > same name > >> > in > >> > scope 1813605401 > >> > Variable picklefile defined in scope 99878388 shadows variable of > >> > same > >> > name > >> > in scope 1813605401 > >> > Swift svn swift-r4157 cog-r3056 > >> > > >> > RunID: 20110330-1636-ev8vm8gb > >> > Progress: > >> > Progress: Selecting site:3 Active:1 > >> > Progress: Selecting site:3 Checking status:1 > >> > Progress: Selecting site:2 Stage in:1 Finished successfully:1 > >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > >> > Progress: Selecting site:1 Stage in:1 Finished successfully:2 > >> > Progress: Selecting site:1 Active:1 Finished successfully:2 > >> > Progress: Selecting site:1 Checking status:1 Finished > >> > successfully:2 > >> > The cache already contains > >> > > >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > >> > > >> > Execution failed: > >> > ? ? ? ?The cache already contains > >> > > >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > >> > > >> > > >> > Then I switched to an older version, it worked well. > >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > >> > movies.swift > >> > Variable epochs defined in scope 212602028 shadows variable of > >> > same name > >> > in > >> > scope 1538939834 > >> > Variable picklefile defined in scope 212602028 shadows variable > >> > of same > >> > name > >> > in scope 1538939834 > >> > Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog > >> > modified > >> > locally) > >> > > >> > RunID: 20110330-1639-gmbyz1qa > >> > Progress: > >> > Progress: Active:2 > >> > Progress: Active:1 Checking status:1 > >> > Final status: Finished successfully:2 > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Wed Mar 30 19:44:25 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 30 Mar 2011 19:44:25 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <428994688.48199.1301532155198.JavaMail.root@zimbra.anl.gov> References: <428994688.48199.1301532155198.JavaMail.root@zimbra.anl.gov> Message-ID: Or just use the concurrent mapper to let swift handle the output naming itself. The resume files can't persist through multiple sessions though. 2011/3/30 Michael Wilde : > The most common case for this error occurs when two iterations within a foreach loop map an output file to the same physical file name. When swift runs and tries to put the output object into its site cache, it sees that a file of the name name is already in the cache, and its semantics do not allow that. > > I have not yet stared at this code long enough to see if this explains what is happening here. > > I also dont know why it might work under one version and fail under 0.92. If the above situation is occurring, perhaps there is some randomness involved: loop iteration ordering; filename generation randomness or difference, etc. > > But I would debug with that in mind: make sure that all *output* fie names mapped by the script are unique. ?Ideally, one should be able to find the culprit by grepping the swift log for all the mapped file names and look for duplicates. > > - Mike > > > ----- Original Message ----- >> Or maybe local variables are static? Maybe they mapped to different >> files but to the same cache object? But I have been doing local >> variables in my own workflows though. >> >> 2011/3/30 Jonathan Monette : >> > Ok. I understand this error better. But shouldn't that be a >> > different >> > error then? Like a and b are mapped to the same file? I don't know >> > if Swift >> > can know this but looking at the explanation and error it should >> > unless this >> > cache message has a deeper meaning. >> > >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa >> > >> > wrote: >> >> >> >> I had this error before when two output mapper objects mapped to >> >> the same >> >> file. >> >> >> >> $ swift bug_same.swift >> >> Swift svn swift-r4208 cog-r3073 >> >> >> >> RunID: 20110330-1818-ygec7ppa >> >> Progress: time:0 >> >> The cache already contains >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >> >> >> >> The cache already contains >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >> >> >> >> Progress: time:1960 Stage in:1 Finished successfully:1 >> >> The cache already contains >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >> >> >> >> [aespinosa at communicado testing]$ >> >> [aespinosa at communicado testing]$ cat bug_same.swift >> >> type file; >> >> >> >> app (file out) echo(string input) { >> >> ?echo input stdout=@filename(out); >> >> } >> >> >> >> file a <"foo">; >> >> file b <"foo">; >> >> >> >> a = echo("hello world"); >> >> b = echo("foo bar"); >> >> >> >> But i think you should be using other Swift mappers that does >> >> auto-numbering of files by default. >> >> >> >> -Allan >> >> >> >> 2011/3/30 Zhao Zhang : >> >> > Hi guys, >> >> > >> >> > I am seeing something weird in swfit-0.92. Any idea about this? >> >> > The swift script is very simple: >> >> > >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift >> >> > type Pickle {} >> >> > type History {} >> >> > type Image {} >> >> > >> >> > app (History historyout) movie_graph (int rerun, int epochs, >> >> > Pickle >> >> > picklefile) >> >> > { >> >> > ? movie_graph rerun epochs; >> >> > } >> >> > >> >> > int arr[]; >> >> > iterate i >> >> > { >> >> > ?arr[i] = i+1; >> >> > }until(i == 1); >> >> > >> >> > int epochs; >> >> > epochs = 3; >> >> > Pickle picklefile > >> > file="for_movies.pickled">; >> >> > foreach a in arr{ >> >> > ?History historyout > >> > ?file=@strcat("output/rerun", a, >> >> > "/histories.pickled-", a)>; >> >> > ?historyout = movie_graph(a, epochs, picklefile); >> >> > } >> >> > >> >> > >> >> > >> >> > I ran the script with the latest 0.92 version, which is loaded as >> >> > a >> >> > module >> >> > on beagle. The I saw this: >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data >> >> > movies.swift >> >> > Variable epochs defined in scope 99878388 shadows variable of >> >> > same name >> >> > in >> >> > scope 1813605401 >> >> > Variable picklefile defined in scope 99878388 shadows variable of >> >> > same >> >> > name >> >> > in scope 1813605401 >> >> > Swift svn swift-r4157 cog-r3056 >> >> > >> >> > RunID: 20110330-1636-ev8vm8gb >> >> > Progress: >> >> > Progress: Selecting site:3 Active:1 >> >> > Progress: Selecting site:3 Checking status:1 >> >> > Progress: Selecting site:2 Stage in:1 Finished successfully:1 >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 >> >> > Progress: Selecting site:1 Stage in:1 Finished successfully:2 >> >> > Progress: Selecting site:1 Active:1 Finished successfully:2 >> >> > Progress: Selecting site:1 Checking status:1 Finished >> >> > successfully:2 >> >> > The cache already contains >> >> > >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >> >> > >> >> > Execution failed: >> >> > ? ? ? ?The cache already contains >> >> > >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >> >> > >> >> > >> >> > Then I switched to an older version, it worked well. >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data >> >> > movies.swift >> >> > Variable epochs defined in scope 212602028 shadows variable of >> >> > same name >> >> > in >> >> > scope 1538939834 >> >> > Variable picklefile defined in scope 212602028 shadows variable >> >> > of same >> >> > name >> >> > in scope 1538939834 >> >> > Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog >> >> > modified >> >> > locally) >> >> > >> >> > RunID: 20110330-1639-gmbyz1qa >> >> > Progress: >> >> > Progress: Active:2 >> >> > Progress: Active:1 Checking status:1 >> >> > Final status: Finished successfully:2 >> >> _______________________________________________ From wilde at mcs.anl.gov Wed Mar 30 19:58:23 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 19:58:23 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: Message-ID: <1915541056.48255.1301533103829.JavaMail.root@zimbra.anl.gov> I wouldn't do that quite yet. This is very curious, but I think its the likely cause of what you are seeing. The foreach() loop in this case seems to be having double vision :) Its either some Swift subtlety or a bug: login1$ cat zz1.swift int arr[]; iterate i { arr[i] = i+1; trace(i, arr[i]); }until(i == 1); foreach a in arr { trace("for", a); } login1$ swift zz1.swift Swift svn swift-r4157 cog-r3056 RunID: 20110331-0053-tkh8yla5 Progress: SwiftScript trace: 0, 1 SwiftScript trace: for, 1 SwiftScript trace: for, 1 SwiftScript trace: 1, 2 SwiftScript trace: for, 2 SwiftScript trace: for, 2 Final status: login1$ But, Zhao, you could in the meantime use much simpler code like so: login1$ cat zz2.swift foreach a in [0:3] { trace("for", a); } login1$ swift zz2.swift Swift svn swift-r4157 cog-r3056 RunID: 20110331-0057-huo8jei0 Progress: SwiftScript trace: for, 1 SwiftScript trace: for, 3 SwiftScript trace: for, 2 SwiftScript trace: for, 0 Final status: login1$ I suspect we need to make this more clear in the user guide and tutorials :) - Mike ----- Original Message ----- > Or just use the concurrent mapper to let swift handle the output > naming itself. The resume files can't persist through multiple > sessions though. > > 2011/3/30 Michael Wilde : > > The most common case for this error occurs when two iterations > > within a foreach loop map an output file to the same physical file > > name. When swift runs and tries to put the output object into its > > site cache, it sees that a file of the name name is already in the > > cache, and its semantics do not allow that. > > > > I have not yet stared at this code long enough to see if this > > explains what is happening here. > > > > I also dont know why it might work under one version and fail under > > 0.92. If the above situation is occurring, perhaps there is some > > randomness involved: loop iteration ordering; filename generation > > randomness or difference, etc. > > > > But I would debug with that in mind: make sure that all *output* fie > > names mapped by the script are unique. Ideally, one should be able > > to find the culprit by grepping the swift log for all the mapped > > file names and look for duplicates. > > > > - Mike > > > > > > ----- Original Message ----- > >> Or maybe local variables are static? Maybe they mapped to different > >> files but to the same cache object? But I have been doing local > >> variables in my own workflows though. > >> > >> 2011/3/30 Jonathan Monette : > >> > Ok. I understand this error better. But shouldn't that be a > >> > different > >> > error then? Like a and b are mapped to the same file? I don't > >> > know > >> > if Swift > >> > can know this but looking at the explanation and error it should > >> > unless this > >> > cache message has a deeper meaning. > >> > > >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > >> > > >> > wrote: > >> >> > >> >> I had this error before when two output mapper objects mapped to > >> >> the same > >> >> file. > >> >> > >> >> $ swift bug_same.swift > >> >> Swift svn swift-r4208 cog-r3073 > >> >> > >> >> RunID: 20110330-1818-ygec7ppa > >> >> Progress: time:0 > >> >> The cache already contains > >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >> >> > >> >> The cache already contains > >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >> >> > >> >> Progress: time:1960 Stage in:1 Finished successfully:1 > >> >> The cache already contains > >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >> >> > >> >> [aespinosa at communicado testing]$ > >> >> [aespinosa at communicado testing]$ cat bug_same.swift > >> >> type file; > >> >> > >> >> app (file out) echo(string input) { > >> >> ?echo input stdout=@filename(out); > >> >> } > >> >> > >> >> file a <"foo">; > >> >> file b <"foo">; > >> >> > >> >> a = echo("hello world"); > >> >> b = echo("foo bar"); > >> >> > >> >> But i think you should be using other Swift mappers that does > >> >> auto-numbering of files by default. > >> >> > >> >> -Allan > >> >> > >> >> 2011/3/30 Zhao Zhang : > >> >> > Hi guys, > >> >> > > >> >> > I am seeing something weird in swfit-0.92. Any idea about > >> >> > this? > >> >> > The swift script is very simple: > >> >> > > >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > >> >> > type Pickle {} > >> >> > type History {} > >> >> > type Image {} > >> >> > > >> >> > app (History historyout) movie_graph (int rerun, int epochs, > >> >> > Pickle > >> >> > picklefile) > >> >> > { > >> >> > ? movie_graph rerun epochs; > >> >> > } > >> >> > > >> >> > int arr[]; > >> >> > iterate i > >> >> > { > >> >> > ?arr[i] = i+1; > >> >> > }until(i == 1); > >> >> > > >> >> > int epochs; > >> >> > epochs = 3; > >> >> > Pickle picklefile >> >> > file="for_movies.pickled">; > >> >> > foreach a in arr{ > >> >> > ?History historyout >> >> > ?file=@strcat("output/rerun", a, > >> >> > "/histories.pickled-", a)>; > >> >> > ?historyout = movie_graph(a, epochs, picklefile); > >> >> > } > >> >> > > >> >> > > >> >> > > >> >> > I ran the script with the latest 0.92 version, which is loaded > >> >> > as > >> >> > a > >> >> > module > >> >> > on beagle. The I saw this: > >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > >> >> > movies.swift > >> >> > Variable epochs defined in scope 99878388 shadows variable of > >> >> > same name > >> >> > in > >> >> > scope 1813605401 > >> >> > Variable picklefile defined in scope 99878388 shadows variable > >> >> > of > >> >> > same > >> >> > name > >> >> > in scope 1813605401 > >> >> > Swift svn swift-r4157 cog-r3056 > >> >> > > >> >> > RunID: 20110330-1636-ev8vm8gb > >> >> > Progress: > >> >> > Progress: Selecting site:3 Active:1 > >> >> > Progress: Selecting site:3 Checking status:1 > >> >> > Progress: Selecting site:2 Stage in:1 Finished successfully:1 > >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > >> >> > Progress: Selecting site:1 Stage in:1 Finished successfully:2 > >> >> > Progress: Selecting site:1 Active:1 Finished successfully:2 > >> >> > Progress: Selecting site:1 Checking status:1 Finished > >> >> > successfully:2 > >> >> > The cache already contains > >> >> > > >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > >> >> > > >> >> > Execution failed: > >> >> > ? ? ? ?The cache already contains > >> >> > > >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > >> >> > > >> >> > > >> >> > Then I switched to an older version, it worked well. > >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > >> >> > movies.swift > >> >> > Variable epochs defined in scope 212602028 shadows variable of > >> >> > same name > >> >> > in > >> >> > scope 1538939834 > >> >> > Variable picklefile defined in scope 212602028 shadows > >> >> > variable > >> >> > of same > >> >> > name > >> >> > in scope 1538939834 > >> >> > Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog > >> >> > modified > >> >> > locally) > >> >> > > >> >> > RunID: 20110330-1639-gmbyz1qa > >> >> > Progress: > >> >> > Progress: Active:2 > >> >> > Progress: Active:1 Checking status:1 > >> >> > Final status: Finished successfully:2 > >> >> _______________________________________________ -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Wed Mar 30 20:06:10 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 30 Mar 2011 20:06:10 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1915541056.48255.1301533103829.JavaMail.root@zimbra.anl.gov> References: <1915541056.48255.1301533103829.JavaMail.root@zimbra.anl.gov> Message-ID: Wow, I didn't know we can do that! I treat the docs too canonically :P 2011/3/30 Michael Wilde : > login1$ cat zz2.swift > > foreach a in [0:3] { > ?trace("for", a); > } > > login1$ swift zz2.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110331-0057-huo8jei0 > Progress: > SwiftScript trace: for, 1 > SwiftScript trace: for, 3 > SwiftScript trace: for, 2 > SwiftScript trace: for, 0 > Final status: > login1$ > > I suspect we need to make this more clear in the user guide and tutorials :) I agree. > > - Mike > > > ----- Original Message ----- >> Or just use the concurrent mapper to let swift handle the output >> naming itself. The resume files can't persist through multiple >> sessions though. >> >> 2011/3/30 Michael Wilde : >> > The most common case for this error occurs when two iterations >> > within a foreach loop map an output file to the same physical file >> > name. When swift runs and tries to put the output object into its >> > site cache, it sees that a file of the name name is already in the >> > cache, and its semantics do not allow that. >> > >> > I have not yet stared at this code long enough to see if this >> > explains what is happening here. >> > >> > I also dont know why it might work under one version and fail under >> > 0.92. If the above situation is occurring, perhaps there is some >> > randomness involved: loop iteration ordering; filename generation >> > randomness or difference, etc. >> > >> > But I would debug with that in mind: make sure that all *output* fie >> > names mapped by the script are unique. Ideally, one should be able >> > to find the culprit by grepping the swift log for all the mapped >> > file names and look for duplicates. >> > >> > - Mike >> > >> > >> > ----- Original Message ----- >> >> Or maybe local variables are static? Maybe they mapped to different >> >> files but to the same cache object? But I have been doing local >> >> variables in my own workflows though. >> >> >> >> 2011/3/30 Jonathan Monette : >> >> > Ok. I understand this error better. But shouldn't that be a >> >> > different >> >> > error then? Like a and b are mapped to the same file? I don't >> >> > know >> >> > if Swift >> >> > can know this but looking at the explanation and error it should >> >> > unless this >> >> > cache message has a deeper meaning. >> >> > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa >> >> > >> >> > wrote: >> >> >> >> >> >> I had this error before when two output mapper objects mapped to >> >> >> the same >> >> >> file. >> >> >> >> >> >> $ swift bug_same.swift >> >> >> Swift svn swift-r4208 cog-r3073 >> >> >> >> >> >> RunID: 20110330-1818-ygec7ppa >> >> >> Progress: time:0 >> >> >> The cache already contains >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >> >> >> >> >> >> The cache already contains >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >> >> >> >> >> >> Progress: time:1960 Stage in:1 Finished successfully:1 >> >> >> The cache already contains >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >> >> >> >> >> >> [aespinosa at communicado testing]$ >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift >> >> >> type file; >> >> >> >> >> >> app (file out) echo(string input) { >> >> >> ?echo input stdout=@filename(out); >> >> >> } >> >> >> >> >> >> file a <"foo">; >> >> >> file b <"foo">; >> >> >> >> >> >> a = echo("hello world"); >> >> >> b = echo("foo bar"); >> >> >> >> >> >> But i think you should be using other Swift mappers that does >> >> >> auto-numbering of files by default. >> >> >> >> >> >> -Allan >> >> >> >> >> >> 2011/3/30 Zhao Zhang : >> >> >> > Hi guys, >> >> >> > >> >> >> > I am seeing something weird in swfit-0.92. Any idea about >> >> >> > this? >> >> >> > The swift script is very simple: >> >> >> > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift >> >> >> > type Pickle {} >> >> >> > type History {} >> >> >> > type Image {} >> >> >> > >> >> >> > app (History historyout) movie_graph (int rerun, int epochs, >> >> >> > Pickle >> >> >> > picklefile) >> >> >> > { >> >> >> > ? movie_graph rerun epochs; >> >> >> > } >> >> >> > >> >> >> > int arr[]; >> >> >> > iterate i >> >> >> > { >> >> >> > ?arr[i] = i+1; >> >> >> > }until(i == 1); >> >> >> > >> >> >> > int epochs; >> >> >> > epochs = 3; >> >> >> > Pickle picklefile > >> >> > file="for_movies.pickled">; >> >> >> > foreach a in arr{ >> >> >> > ?History historyout > >> >> > ?file=@strcat("output/rerun", a, >> >> >> > "/histories.pickled-", a)>; >> >> >> > ?historyout = movie_graph(a, epochs, picklefile); >> >> >> > } >> >> >> > >> >> >> > >> >> >> > >> >> >> > I ran the script with the latest 0.92 version, which is loaded >> >> >> > as >> >> >> > a >> >> >> > module >> >> >> > on beagle. The I saw this: >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data >> >> >> > movies.swift >> >> >> > Variable epochs defined in scope 99878388 shadows variable of >> >> >> > same name >> >> >> > in >> >> >> > scope 1813605401 >> >> >> > Variable picklefile defined in scope 99878388 shadows variable >> >> >> > of >> >> >> > same >> >> >> > name >> >> >> > in scope 1813605401 >> >> >> > Swift svn swift-r4157 cog-r3056 >> >> >> > >> >> >> > RunID: 20110330-1636-ev8vm8gb >> >> >> > Progress: >> >> >> > Progress: Selecting site:3 Active:1 >> >> >> > Progress: Selecting site:3 Checking status:1 >> >> >> > Progress: Selecting site:2 Stage in:1 Finished successfully:1 >> >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 >> >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 >> >> >> > Progress: Selecting site:1 Stage in:1 Finished successfully:2 >> >> >> > Progress: Selecting site:1 Active:1 Finished successfully:2 >> >> >> > Progress: Selecting site:1 Checking status:1 Finished >> >> >> > successfully:2 >> >> >> > The cache already contains >> >> >> > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >> >> >> > >> >> >> > Execution failed: >> >> >> > ? ? ? ?The cache already contains >> >> >> > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >> >> >> > >> >> >> > >> >> >> > Then I switched to an older version, it worked well. >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data >> >> >> > movies.swift >> >> >> > Variable epochs defined in scope 212602028 shadows variable of >> >> >> > same name >> >> >> > in >> >> >> > scope 1538939834 >> >> >> > Variable picklefile defined in scope 212602028 shadows >> >> >> > variable >> >> >> > of same >> >> >> > name >> >> >> > in scope 1538939834 >> >> >> > Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog >> >> >> > modified >> >> >> > locally) >> >> >> > >> >> >> > RunID: 20110330-1639-gmbyz1qa >> >> >> > Progress: >> >> >> > Progress: Active:2 >> >> >> > Progress: Active:1 Checking status:1 >> >> >> > Final status: Finished successfully:2 >> >> >> _______________________________________________ > From hategan at mcs.anl.gov Wed Mar 30 20:35:00 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 30 Mar 2011 18:35:00 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1915541056.48255.1301533103829.JavaMail.root@zimbra.anl.gov> References: <1915541056.48255.1301533103829.JavaMail.root@zimbra.anl.gov> Message-ID: <1301535300.20580.0.camel@blabla2.none> Something odd must have happened there with 0.92. I will go through the commits and check. On Wed, 2011-03-30 at 19:58 -0500, Michael Wilde wrote: > I wouldn't do that quite yet. This is very curious, but I think its the likely cause of what you are seeing. The foreach() loop in this case seems to be having double vision :) Its either some Swift subtlety or a bug: > > login1$ cat zz1.swift > int arr[]; > > iterate i > { > arr[i] = i+1; > trace(i, arr[i]); > }until(i == 1); > > foreach a in arr { > trace("for", a); > } > > login1$ swift zz1.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110331-0053-tkh8yla5 > Progress: > SwiftScript trace: 0, 1 > SwiftScript trace: for, 1 > SwiftScript trace: for, 1 > SwiftScript trace: 1, 2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 2 > Final status: > login1$ > > > But, Zhao, you could in the meantime use much simpler code like so: > > login1$ cat zz2.swift > > foreach a in [0:3] { > trace("for", a); > } > > login1$ swift zz2.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110331-0057-huo8jei0 > Progress: > SwiftScript trace: for, 1 > SwiftScript trace: for, 3 > SwiftScript trace: for, 2 > SwiftScript trace: for, 0 > Final status: > login1$ > > I suspect we need to make this more clear in the user guide and tutorials :) > > - Mike > > > ----- Original Message ----- > > Or just use the concurrent mapper to let swift handle the output > > naming itself. The resume files can't persist through multiple > > sessions though. > > > > 2011/3/30 Michael Wilde : > > > The most common case for this error occurs when two iterations > > > within a foreach loop map an output file to the same physical file > > > name. When swift runs and tries to put the output object into its > > > site cache, it sees that a file of the name name is already in the > > > cache, and its semantics do not allow that. > > > > > > I have not yet stared at this code long enough to see if this > > > explains what is happening here. > > > > > > I also dont know why it might work under one version and fail under > > > 0.92. If the above situation is occurring, perhaps there is some > > > randomness involved: loop iteration ordering; filename generation > > > randomness or difference, etc. > > > > > > But I would debug with that in mind: make sure that all *output* fie > > > names mapped by the script are unique. Ideally, one should be able > > > to find the culprit by grepping the swift log for all the mapped > > > file names and look for duplicates. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > >> Or maybe local variables are static? Maybe they mapped to different > > >> files but to the same cache object? But I have been doing local > > >> variables in my own workflows though. > > >> > > >> 2011/3/30 Jonathan Monette : > > >> > Ok. I understand this error better. But shouldn't that be a > > >> > different > > >> > error then? Like a and b are mapped to the same file? I don't > > >> > know > > >> > if Swift > > >> > can know this but looking at the explanation and error it should > > >> > unless this > > >> > cache message has a deeper meaning. > > >> > > > >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > >> > > > >> > wrote: > > >> >> > > >> >> I had this error before when two output mapper objects mapped to > > >> >> the same > > >> >> file. > > >> >> > > >> >> $ swift bug_same.swift > > >> >> Swift svn swift-r4208 cog-r3073 > > >> >> > > >> >> RunID: 20110330-1818-ygec7ppa > > >> >> Progress: time:0 > > >> >> The cache already contains > > >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > >> >> > > >> >> The cache already contains > > >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > >> >> > > >> >> Progress: time:1960 Stage in:1 Finished successfully:1 > > >> >> The cache already contains > > >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > >> >> > > >> >> [aespinosa at communicado testing]$ > > >> >> [aespinosa at communicado testing]$ cat bug_same.swift > > >> >> type file; > > >> >> > > >> >> app (file out) echo(string input) { > > >> >> echo input stdout=@filename(out); > > >> >> } > > >> >> > > >> >> file a <"foo">; > > >> >> file b <"foo">; > > >> >> > > >> >> a = echo("hello world"); > > >> >> b = echo("foo bar"); > > >> >> > > >> >> But i think you should be using other Swift mappers that does > > >> >> auto-numbering of files by default. > > >> >> > > >> >> -Allan > > >> >> > > >> >> 2011/3/30 Zhao Zhang : > > >> >> > Hi guys, > > >> >> > > > >> >> > I am seeing something weird in swfit-0.92. Any idea about > > >> >> > this? > > >> >> > The swift script is very simple: > > >> >> > > > >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > > >> >> > type Pickle {} > > >> >> > type History {} > > >> >> > type Image {} > > >> >> > > > >> >> > app (History historyout) movie_graph (int rerun, int epochs, > > >> >> > Pickle > > >> >> > picklefile) > > >> >> > { > > >> >> > movie_graph rerun epochs; > > >> >> > } > > >> >> > > > >> >> > int arr[]; > > >> >> > iterate i > > >> >> > { > > >> >> > arr[i] = i+1; > > >> >> > }until(i == 1); > > >> >> > > > >> >> > int epochs; > > >> >> > epochs = 3; > > >> >> > Pickle picklefile > >> >> > file="for_movies.pickled">; > > >> >> > foreach a in arr{ > > >> >> > History historyout > >> >> > file=@strcat("output/rerun", a, > > >> >> > "/histories.pickled-", a)>; > > >> >> > historyout = movie_graph(a, epochs, picklefile); > > >> >> > } > > >> >> > > > >> >> > > > >> >> > > > >> >> > I ran the script with the latest 0.92 version, which is loaded > > >> >> > as > > >> >> > a > > >> >> > module > > >> >> > on beagle. The I saw this: > > >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > > >> >> > movies.swift > > >> >> > Variable epochs defined in scope 99878388 shadows variable of > > >> >> > same name > > >> >> > in > > >> >> > scope 1813605401 > > >> >> > Variable picklefile defined in scope 99878388 shadows variable > > >> >> > of > > >> >> > same > > >> >> > name > > >> >> > in scope 1813605401 > > >> >> > Swift svn swift-r4157 cog-r3056 > > >> >> > > > >> >> > RunID: 20110330-1636-ev8vm8gb > > >> >> > Progress: > > >> >> > Progress: Selecting site:3 Active:1 > > >> >> > Progress: Selecting site:3 Checking status:1 > > >> >> > Progress: Selecting site:2 Stage in:1 Finished successfully:1 > > >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > > >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > > >> >> > Progress: Selecting site:1 Stage in:1 Finished successfully:2 > > >> >> > Progress: Selecting site:1 Active:1 Finished successfully:2 > > >> >> > Progress: Selecting site:1 Checking status:1 Finished > > >> >> > successfully:2 > > >> >> > The cache already contains > > >> >> > > > >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > >> >> > > > >> >> > Execution failed: > > >> >> > The cache already contains > > >> >> > > > >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > >> >> > > > >> >> > > > >> >> > Then I switched to an older version, it worked well. > > >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > > >> >> > movies.swift > > >> >> > Variable epochs defined in scope 212602028 shadows variable of > > >> >> > same name > > >> >> > in > > >> >> > scope 1538939834 > > >> >> > Variable picklefile defined in scope 212602028 shadows > > >> >> > variable > > >> >> > of same > > >> >> > name > > >> >> > in scope 1538939834 > > >> >> > Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog > > >> >> > modified > > >> >> > locally) > > >> >> > > > >> >> > RunID: 20110330-1639-gmbyz1qa > > >> >> > Progress: > > >> >> > Progress: Active:2 > > >> >> > Progress: Active:1 Checking status:1 > > >> >> > Final status: Finished successfully:2 > > >> >> _______________________________________________ > From wilde at mcs.anl.gov Wed Mar 30 20:37:03 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 20:37:03 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: Message-ID: <793966660.48334.1301535423996.JavaMail.root@zimbra.anl.gov> OK, what am I missing? int arr[]; arr[0]=1; arr[1]=2; foreach a in arr { trace("for", a); } login1$ swift zz3.swift Swift svn swift-r4157 cog-r3056 RunID: 20110331-0134-bfzhkgaa Progress: SwiftScript trace: for, 2 SwiftScript trace: for, 1 SwiftScript trace: for, 1 SwiftScript trace: for, 2 Final status: When did the foreach loop become the twice-each loop? I need to try some other revision and hosts with this. - Mike ----- Original Message ----- > Wow, I didn't know we can do that! I treat the docs too canonically :P > > 2011/3/30 Michael Wilde : > > > login1$ cat zz2.swift > > > > foreach a in [0:3] { > > ?trace("for", a); > > } > > > > login1$ swift zz2.swift > > Swift svn swift-r4157 cog-r3056 > > > > RunID: 20110331-0057-huo8jei0 > > Progress: > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 3 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 0 > > Final status: > > login1$ > > > > I suspect we need to make this more clear in the user guide and > > tutorials :) > > I agree. > > > > > - Mike > > > > > > ----- Original Message ----- > >> Or just use the concurrent mapper to let swift handle the output > >> naming itself. The resume files can't persist through multiple > >> sessions though. > >> > >> 2011/3/30 Michael Wilde : > >> > The most common case for this error occurs when two iterations > >> > within a foreach loop map an output file to the same physical > >> > file > >> > name. When swift runs and tries to put the output object into its > >> > site cache, it sees that a file of the name name is already in > >> > the > >> > cache, and its semantics do not allow that. > >> > > >> > I have not yet stared at this code long enough to see if this > >> > explains what is happening here. > >> > > >> > I also dont know why it might work under one version and fail > >> > under > >> > 0.92. If the above situation is occurring, perhaps there is some > >> > randomness involved: loop iteration ordering; filename generation > >> > randomness or difference, etc. > >> > > >> > But I would debug with that in mind: make sure that all *output* > >> > fie > >> > names mapped by the script are unique. Ideally, one should be > >> > able > >> > to find the culprit by grepping the swift log for all the mapped > >> > file names and look for duplicates. > >> > > >> > - Mike > >> > > >> > > >> > ----- Original Message ----- > >> >> Or maybe local variables are static? Maybe they mapped to > >> >> different > >> >> files but to the same cache object? But I have been doing local > >> >> variables in my own workflows though. > >> >> > >> >> 2011/3/30 Jonathan Monette : > >> >> > Ok. I understand this error better. But shouldn't that be a > >> >> > different > >> >> > error then? Like a and b are mapped to the same file? I don't > >> >> > know > >> >> > if Swift > >> >> > can know this but looking at the explanation and error it > >> >> > should > >> >> > unless this > >> >> > cache message has a deeper meaning. > >> >> > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> I had this error before when two output mapper objects mapped > >> >> >> to > >> >> >> the same > >> >> >> file. > >> >> >> > >> >> >> $ swift bug_same.swift > >> >> >> Swift svn swift-r4208 cog-r3073 > >> >> >> > >> >> >> RunID: 20110330-1818-ygec7ppa > >> >> >> Progress: time:0 > >> >> >> The cache already contains > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >> >> >> > >> >> >> The cache already contains > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >> >> >> > >> >> >> Progress: time:1960 Stage in:1 Finished successfully:1 > >> >> >> The cache already contains > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >> >> >> > >> >> >> [aespinosa at communicado testing]$ > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift > >> >> >> type file; > >> >> >> > >> >> >> app (file out) echo(string input) { > >> >> >> ?echo input stdout=@filename(out); > >> >> >> } > >> >> >> > >> >> >> file a <"foo">; > >> >> >> file b <"foo">; > >> >> >> > >> >> >> a = echo("hello world"); > >> >> >> b = echo("foo bar"); > >> >> >> > >> >> >> But i think you should be using other Swift mappers that does > >> >> >> auto-numbering of files by default. > >> >> >> > >> >> >> -Allan > >> >> >> > >> >> >> 2011/3/30 Zhao Zhang : > >> >> >> > Hi guys, > >> >> >> > > >> >> >> > I am seeing something weird in swfit-0.92. Any idea about > >> >> >> > this? > >> >> >> > The swift script is very simple: > >> >> >> > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > >> >> >> > type Pickle {} > >> >> >> > type History {} > >> >> >> > type Image {} > >> >> >> > > >> >> >> > app (History historyout) movie_graph (int rerun, int > >> >> >> > epochs, > >> >> >> > Pickle > >> >> >> > picklefile) > >> >> >> > { > >> >> >> > ? movie_graph rerun epochs; > >> >> >> > } > >> >> >> > > >> >> >> > int arr[]; > >> >> >> > iterate i > >> >> >> > { > >> >> >> > ?arr[i] = i+1; > >> >> >> > }until(i == 1); > >> >> >> > > >> >> >> > int epochs; > >> >> >> > epochs = 3; > >> >> >> > Pickle picklefile >> >> >> > file="for_movies.pickled">; > >> >> >> > foreach a in arr{ > >> >> >> > ?History historyout >> >> >> > ?file=@strcat("output/rerun", a, > >> >> >> > "/histories.pickled-", a)>; > >> >> >> > ?historyout = movie_graph(a, epochs, picklefile); > >> >> >> > } > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > I ran the script with the latest 0.92 version, which is > >> >> >> > loaded > >> >> >> > as > >> >> >> > a > >> >> >> > module > >> >> >> > on beagle. The I saw this: > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > >> >> >> > movies.swift > >> >> >> > Variable epochs defined in scope 99878388 shadows variable > >> >> >> > of > >> >> >> > same name > >> >> >> > in > >> >> >> > scope 1813605401 > >> >> >> > Variable picklefile defined in scope 99878388 shadows > >> >> >> > variable > >> >> >> > of > >> >> >> > same > >> >> >> > name > >> >> >> > in scope 1813605401 > >> >> >> > Swift svn swift-r4157 cog-r3056 > >> >> >> > > >> >> >> > RunID: 20110330-1636-ev8vm8gb > >> >> >> > Progress: > >> >> >> > Progress: Selecting site:3 Active:1 > >> >> >> > Progress: Selecting site:3 Checking status:1 > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished > >> >> >> > successfully:1 > >> >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > >> >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished > >> >> >> > successfully:2 > >> >> >> > Progress: Selecting site:1 Active:1 Finished successfully:2 > >> >> >> > Progress: Selecting site:1 Checking status:1 Finished > >> >> >> > successfully:2 > >> >> >> > The cache already contains > >> >> >> > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > >> >> >> > > >> >> >> > Execution failed: > >> >> >> > ? ? ? ?The cache already contains > >> >> >> > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > >> >> >> > > >> >> >> > > >> >> >> > Then I switched to an older version, it worked well. > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > >> >> >> > movies.swift > >> >> >> > Variable epochs defined in scope 212602028 shadows variable > >> >> >> > of > >> >> >> > same name > >> >> >> > in > >> >> >> > scope 1538939834 > >> >> >> > Variable picklefile defined in scope 212602028 shadows > >> >> >> > variable > >> >> >> > of same > >> >> >> > name > >> >> >> > in scope 1538939834 > >> >> >> > Swift svn swift-r3291 (swift modified locally) cog-r2750 > >> >> >> > (cog > >> >> >> > modified > >> >> >> > locally) > >> >> >> > > >> >> >> > RunID: 20110330-1639-gmbyz1qa > >> >> >> > Progress: > >> >> >> > Progress: Active:2 > >> >> >> > Progress: Active:1 Checking status:1 > >> >> >> > Final status: Finished successfully:2 > >> >> >> _______________________________________________ > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed Mar 30 20:40:53 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 30 Mar 2011 18:40:53 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <793966660.48334.1301535423996.JavaMail.root@zimbra.anl.gov> References: <793966660.48334.1301535423996.JavaMail.root@zimbra.anl.gov> Message-ID: <1301535653.20729.2.camel@blabla2.none> On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: > OK, what am I missing? Nothing. That shouldn't be happening. Here's what (approximately) you should get: Swift svn swift-r3526 (swift modified locally) cog-r656 (cog modified locally) RunID: 20110330-1839-y71gls4a Progress: time:0 [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 Final status: time:54 > > int arr[]; > > arr[0]=1; > arr[1]=2; > > foreach a in arr { > trace("for", a); > } > > login1$ swift zz3.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110331-0134-bfzhkgaa > Progress: > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > SwiftScript trace: for, 1 > SwiftScript trace: for, 2 > Final status: > > When did the foreach loop become the twice-each loop? > > I need to try some other revision and hosts with this. > > - Mike > > > ----- Original Message ----- > > Wow, I didn't know we can do that! I treat the docs too canonically :P > > > > 2011/3/30 Michael Wilde : > > > > > login1$ cat zz2.swift > > > > > > foreach a in [0:3] { > > > trace("for", a); > > > } > > > > > > login1$ swift zz2.swift > > > Swift svn swift-r4157 cog-r3056 > > > > > > RunID: 20110331-0057-huo8jei0 > > > Progress: > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 3 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 0 > > > Final status: > > > login1$ > > > > > > I suspect we need to make this more clear in the user guide and > > > tutorials :) > > > > I agree. > > > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > >> Or just use the concurrent mapper to let swift handle the output > > >> naming itself. The resume files can't persist through multiple > > >> sessions though. > > >> > > >> 2011/3/30 Michael Wilde : > > >> > The most common case for this error occurs when two iterations > > >> > within a foreach loop map an output file to the same physical > > >> > file > > >> > name. When swift runs and tries to put the output object into its > > >> > site cache, it sees that a file of the name name is already in > > >> > the > > >> > cache, and its semantics do not allow that. > > >> > > > >> > I have not yet stared at this code long enough to see if this > > >> > explains what is happening here. > > >> > > > >> > I also dont know why it might work under one version and fail > > >> > under > > >> > 0.92. If the above situation is occurring, perhaps there is some > > >> > randomness involved: loop iteration ordering; filename generation > > >> > randomness or difference, etc. > > >> > > > >> > But I would debug with that in mind: make sure that all *output* > > >> > fie > > >> > names mapped by the script are unique. Ideally, one should be > > >> > able > > >> > to find the culprit by grepping the swift log for all the mapped > > >> > file names and look for duplicates. > > >> > > > >> > - Mike > > >> > > > >> > > > >> > ----- Original Message ----- > > >> >> Or maybe local variables are static? Maybe they mapped to > > >> >> different > > >> >> files but to the same cache object? But I have been doing local > > >> >> variables in my own workflows though. > > >> >> > > >> >> 2011/3/30 Jonathan Monette : > > >> >> > Ok. I understand this error better. But shouldn't that be a > > >> >> > different > > >> >> > error then? Like a and b are mapped to the same file? I don't > > >> >> > know > > >> >> > if Swift > > >> >> > can know this but looking at the explanation and error it > > >> >> > should > > >> >> > unless this > > >> >> > cache message has a deeper meaning. > > >> >> > > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > >> >> > > > >> >> > wrote: > > >> >> >> > > >> >> >> I had this error before when two output mapper objects mapped > > >> >> >> to > > >> >> >> the same > > >> >> >> file. > > >> >> >> > > >> >> >> $ swift bug_same.swift > > >> >> >> Swift svn swift-r4208 cog-r3073 > > >> >> >> > > >> >> >> RunID: 20110330-1818-ygec7ppa > > >> >> >> Progress: time:0 > > >> >> >> The cache already contains > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > >> >> >> > > >> >> >> The cache already contains > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > >> >> >> > > >> >> >> Progress: time:1960 Stage in:1 Finished successfully:1 > > >> >> >> The cache already contains > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > >> >> >> > > >> >> >> [aespinosa at communicado testing]$ > > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift > > >> >> >> type file; > > >> >> >> > > >> >> >> app (file out) echo(string input) { > > >> >> >> echo input stdout=@filename(out); > > >> >> >> } > > >> >> >> > > >> >> >> file a <"foo">; > > >> >> >> file b <"foo">; > > >> >> >> > > >> >> >> a = echo("hello world"); > > >> >> >> b = echo("foo bar"); > > >> >> >> > > >> >> >> But i think you should be using other Swift mappers that does > > >> >> >> auto-numbering of files by default. > > >> >> >> > > >> >> >> -Allan > > >> >> >> > > >> >> >> 2011/3/30 Zhao Zhang : > > >> >> >> > Hi guys, > > >> >> >> > > > >> >> >> > I am seeing something weird in swfit-0.92. Any idea about > > >> >> >> > this? > > >> >> >> > The swift script is very simple: > > >> >> >> > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > > >> >> >> > type Pickle {} > > >> >> >> > type History {} > > >> >> >> > type Image {} > > >> >> >> > > > >> >> >> > app (History historyout) movie_graph (int rerun, int > > >> >> >> > epochs, > > >> >> >> > Pickle > > >> >> >> > picklefile) > > >> >> >> > { > > >> >> >> > movie_graph rerun epochs; > > >> >> >> > } > > >> >> >> > > > >> >> >> > int arr[]; > > >> >> >> > iterate i > > >> >> >> > { > > >> >> >> > arr[i] = i+1; > > >> >> >> > }until(i == 1); > > >> >> >> > > > >> >> >> > int epochs; > > >> >> >> > epochs = 3; > > >> >> >> > Pickle picklefile > >> >> >> > file="for_movies.pickled">; > > >> >> >> > foreach a in arr{ > > >> >> >> > History historyout > >> >> >> > file=@strcat("output/rerun", a, > > >> >> >> > "/histories.pickled-", a)>; > > >> >> >> > historyout = movie_graph(a, epochs, picklefile); > > >> >> >> > } > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > I ran the script with the latest 0.92 version, which is > > >> >> >> > loaded > > >> >> >> > as > > >> >> >> > a > > >> >> >> > module > > >> >> >> > on beagle. The I saw this: > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > > >> >> >> > movies.swift > > >> >> >> > Variable epochs defined in scope 99878388 shadows variable > > >> >> >> > of > > >> >> >> > same name > > >> >> >> > in > > >> >> >> > scope 1813605401 > > >> >> >> > Variable picklefile defined in scope 99878388 shadows > > >> >> >> > variable > > >> >> >> > of > > >> >> >> > same > > >> >> >> > name > > >> >> >> > in scope 1813605401 > > >> >> >> > Swift svn swift-r4157 cog-r3056 > > >> >> >> > > > >> >> >> > RunID: 20110330-1636-ev8vm8gb > > >> >> >> > Progress: > > >> >> >> > Progress: Selecting site:3 Active:1 > > >> >> >> > Progress: Selecting site:3 Checking status:1 > > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished > > >> >> >> > successfully:1 > > >> >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > > >> >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished > > >> >> >> > successfully:2 > > >> >> >> > Progress: Selecting site:1 Active:1 Finished successfully:2 > > >> >> >> > Progress: Selecting site:1 Checking status:1 Finished > > >> >> >> > successfully:2 > > >> >> >> > The cache already contains > > >> >> >> > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > >> >> >> > > > >> >> >> > Execution failed: > > >> >> >> > The cache already contains > > >> >> >> > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > >> >> >> > > > >> >> >> > > > >> >> >> > Then I switched to an older version, it worked well. > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > > >> >> >> > movies.swift > > >> >> >> > Variable epochs defined in scope 212602028 shadows variable > > >> >> >> > of > > >> >> >> > same name > > >> >> >> > in > > >> >> >> > scope 1538939834 > > >> >> >> > Variable picklefile defined in scope 212602028 shadows > > >> >> >> > variable > > >> >> >> > of same > > >> >> >> > name > > >> >> >> > in scope 1538939834 > > >> >> >> > Swift svn swift-r3291 (swift modified locally) cog-r2750 > > >> >> >> > (cog > > >> >> >> > modified > > >> >> >> > locally) > > >> >> >> > > > >> >> >> > RunID: 20110330-1639-gmbyz1qa > > >> >> >> > Progress: > > >> >> >> > Progress: Active:2 > > >> >> >> > Progress: Active:1 Checking status:1 > > >> >> >> > Final status: Finished successfully:2 > > >> >> >> _______________________________________________ > > > > From hategan at mcs.anl.gov Wed Mar 30 20:42:02 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 30 Mar 2011 18:42:02 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301535653.20729.2.camel@blabla2.none> References: <793966660.48334.1301535423996.JavaMail.root@zimbra.anl.gov> <1301535653.20729.2.camel@blabla2.none> Message-ID: <1301535722.20879.0.camel@blabla2.none> Why was trunk merged into the stable branch? On Wed, 2011-03-30 at 18:40 -0700, Mihael Hategan wrote: > On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: > > OK, what am I missing? > > Nothing. That shouldn't be happening. > > Here's what (approximately) you should get: > Swift svn swift-r3526 (swift modified locally) cog-r656 (cog modified > locally) > > RunID: 20110330-1839-y71gls4a > Progress: time:0 > [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 > [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 > Final status: time:54 > > > > > int arr[]; > > > > arr[0]=1; > > arr[1]=2; > > > > foreach a in arr { > > trace("for", a); > > } > > > > login1$ swift zz3.swift > > Swift svn swift-r4157 cog-r3056 > > > > RunID: 20110331-0134-bfzhkgaa > > Progress: > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 2 > > Final status: > > > > When did the foreach loop become the twice-each loop? > > > > I need to try some other revision and hosts with this. > > > > - Mike > > > > > > ----- Original Message ----- > > > Wow, I didn't know we can do that! I treat the docs too canonically :P > > > > > > 2011/3/30 Michael Wilde : > > > > > > > login1$ cat zz2.swift > > > > > > > > foreach a in [0:3] { > > > > trace("for", a); > > > > } > > > > > > > > login1$ swift zz2.swift > > > > Swift svn swift-r4157 cog-r3056 > > > > > > > > RunID: 20110331-0057-huo8jei0 > > > > Progress: > > > > SwiftScript trace: for, 1 > > > > SwiftScript trace: for, 3 > > > > SwiftScript trace: for, 2 > > > > SwiftScript trace: for, 0 > > > > Final status: > > > > login1$ > > > > > > > > I suspect we need to make this more clear in the user guide and > > > > tutorials :) > > > > > > I agree. > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > >> Or just use the concurrent mapper to let swift handle the output > > > >> naming itself. The resume files can't persist through multiple > > > >> sessions though. > > > >> > > > >> 2011/3/30 Michael Wilde : > > > >> > The most common case for this error occurs when two iterations > > > >> > within a foreach loop map an output file to the same physical > > > >> > file > > > >> > name. When swift runs and tries to put the output object into its > > > >> > site cache, it sees that a file of the name name is already in > > > >> > the > > > >> > cache, and its semantics do not allow that. > > > >> > > > > >> > I have not yet stared at this code long enough to see if this > > > >> > explains what is happening here. > > > >> > > > > >> > I also dont know why it might work under one version and fail > > > >> > under > > > >> > 0.92. If the above situation is occurring, perhaps there is some > > > >> > randomness involved: loop iteration ordering; filename generation > > > >> > randomness or difference, etc. > > > >> > > > > >> > But I would debug with that in mind: make sure that all *output* > > > >> > fie > > > >> > names mapped by the script are unique. Ideally, one should be > > > >> > able > > > >> > to find the culprit by grepping the swift log for all the mapped > > > >> > file names and look for duplicates. > > > >> > > > > >> > - Mike > > > >> > > > > >> > > > > >> > ----- Original Message ----- > > > >> >> Or maybe local variables are static? Maybe they mapped to > > > >> >> different > > > >> >> files but to the same cache object? But I have been doing local > > > >> >> variables in my own workflows though. > > > >> >> > > > >> >> 2011/3/30 Jonathan Monette : > > > >> >> > Ok. I understand this error better. But shouldn't that be a > > > >> >> > different > > > >> >> > error then? Like a and b are mapped to the same file? I don't > > > >> >> > know > > > >> >> > if Swift > > > >> >> > can know this but looking at the explanation and error it > > > >> >> > should > > > >> >> > unless this > > > >> >> > cache message has a deeper meaning. > > > >> >> > > > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > > >> >> > > > > >> >> > wrote: > > > >> >> >> > > > >> >> >> I had this error before when two output mapper objects mapped > > > >> >> >> to > > > >> >> >> the same > > > >> >> >> file. > > > >> >> >> > > > >> >> >> $ swift bug_same.swift > > > >> >> >> Swift svn swift-r4208 cog-r3073 > > > >> >> >> > > > >> >> >> RunID: 20110330-1818-ygec7ppa > > > >> >> >> Progress: time:0 > > > >> >> >> The cache already contains > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > >> >> >> > > > >> >> >> The cache already contains > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > >> >> >> > > > >> >> >> Progress: time:1960 Stage in:1 Finished successfully:1 > > > >> >> >> The cache already contains > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > >> >> >> > > > >> >> >> [aespinosa at communicado testing]$ > > > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift > > > >> >> >> type file; > > > >> >> >> > > > >> >> >> app (file out) echo(string input) { > > > >> >> >> echo input stdout=@filename(out); > > > >> >> >> } > > > >> >> >> > > > >> >> >> file a <"foo">; > > > >> >> >> file b <"foo">; > > > >> >> >> > > > >> >> >> a = echo("hello world"); > > > >> >> >> b = echo("foo bar"); > > > >> >> >> > > > >> >> >> But i think you should be using other Swift mappers that does > > > >> >> >> auto-numbering of files by default. > > > >> >> >> > > > >> >> >> -Allan > > > >> >> >> > > > >> >> >> 2011/3/30 Zhao Zhang : > > > >> >> >> > Hi guys, > > > >> >> >> > > > > >> >> >> > I am seeing something weird in swfit-0.92. Any idea about > > > >> >> >> > this? > > > >> >> >> > The swift script is very simple: > > > >> >> >> > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > > > >> >> >> > type Pickle {} > > > >> >> >> > type History {} > > > >> >> >> > type Image {} > > > >> >> >> > > > > >> >> >> > app (History historyout) movie_graph (int rerun, int > > > >> >> >> > epochs, > > > >> >> >> > Pickle > > > >> >> >> > picklefile) > > > >> >> >> > { > > > >> >> >> > movie_graph rerun epochs; > > > >> >> >> > } > > > >> >> >> > > > > >> >> >> > int arr[]; > > > >> >> >> > iterate i > > > >> >> >> > { > > > >> >> >> > arr[i] = i+1; > > > >> >> >> > }until(i == 1); > > > >> >> >> > > > > >> >> >> > int epochs; > > > >> >> >> > epochs = 3; > > > >> >> >> > Pickle picklefile > > >> >> >> > file="for_movies.pickled">; > > > >> >> >> > foreach a in arr{ > > > >> >> >> > History historyout > > >> >> >> > file=@strcat("output/rerun", a, > > > >> >> >> > "/histories.pickled-", a)>; > > > >> >> >> > historyout = movie_graph(a, epochs, picklefile); > > > >> >> >> > } > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > I ran the script with the latest 0.92 version, which is > > > >> >> >> > loaded > > > >> >> >> > as > > > >> >> >> > a > > > >> >> >> > module > > > >> >> >> > on beagle. The I saw this: > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > > > >> >> >> > movies.swift > > > >> >> >> > Variable epochs defined in scope 99878388 shadows variable > > > >> >> >> > of > > > >> >> >> > same name > > > >> >> >> > in > > > >> >> >> > scope 1813605401 > > > >> >> >> > Variable picklefile defined in scope 99878388 shadows > > > >> >> >> > variable > > > >> >> >> > of > > > >> >> >> > same > > > >> >> >> > name > > > >> >> >> > in scope 1813605401 > > > >> >> >> > Swift svn swift-r4157 cog-r3056 > > > >> >> >> > > > > >> >> >> > RunID: 20110330-1636-ev8vm8gb > > > >> >> >> > Progress: > > > >> >> >> > Progress: Selecting site:3 Active:1 > > > >> >> >> > Progress: Selecting site:3 Checking status:1 > > > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished > > > >> >> >> > successfully:1 > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished successfully:1 > > > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished > > > >> >> >> > successfully:2 > > > >> >> >> > Progress: Selecting site:1 Active:1 Finished successfully:2 > > > >> >> >> > Progress: Selecting site:1 Checking status:1 Finished > > > >> >> >> > successfully:2 > > > >> >> >> > The cache already contains > > > >> >> >> > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > >> >> >> > > > > >> >> >> > Execution failed: > > > >> >> >> > The cache already contains > > > >> >> >> > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > Then I switched to an older version, it worked well. > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > > > >> >> >> > movies.swift > > > >> >> >> > Variable epochs defined in scope 212602028 shadows variable > > > >> >> >> > of > > > >> >> >> > same name > > > >> >> >> > in > > > >> >> >> > scope 1538939834 > > > >> >> >> > Variable picklefile defined in scope 212602028 shadows > > > >> >> >> > variable > > > >> >> >> > of same > > > >> >> >> > name > > > >> >> >> > in scope 1538939834 > > > >> >> >> > Swift svn swift-r3291 (swift modified locally) cog-r2750 > > > >> >> >> > (cog > > > >> >> >> > modified > > > >> >> >> > locally) > > > >> >> >> > > > > >> >> >> > RunID: 20110330-1639-gmbyz1qa > > > >> >> >> > Progress: > > > >> >> >> > Progress: Active:2 > > > >> >> >> > Progress: Active:1 Checking status:1 > > > >> >> >> > Final status: Finished successfully:2 > > > >> >> >> _______________________________________________ > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Wed Mar 30 20:54:05 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 20:54:05 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301535722.20879.0.camel@blabla2.none> Message-ID: <712412218.48354.1301536445657.JavaMail.root@zimbra.anl.gov> Uhhhh - I dunno. I though Justin was going to merge changes from the stable branch (made after it branched) back to trunk on an as-needed basis, as a first step towards working towards 0.93 in trunk. That seemed a reasonable approach. The reverse is harder to understand if thats whats been happening. I know Justin's been doing some merging. Best to wait till he can explain what he's been doing. - Mike ----- Original Message ----- > Why was trunk merged into the stable branch? > > On Wed, 2011-03-30 at 18:40 -0700, Mihael Hategan wrote: > > On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: > > > OK, what am I missing? > > > > Nothing. That shouldn't be happening. > > > > Here's what (approximately) you should get: > > Swift svn swift-r3526 (swift modified locally) cog-r656 (cog > > modified > > locally) > > > > RunID: 20110330-1839-y71gls4a > > Progress: time:0 > > [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 > > [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 > > Final status: time:54 > > > > > > > > int arr[]; > > > > > > arr[0]=1; > > > arr[1]=2; > > > > > > foreach a in arr { > > > trace("for", a); > > > } > > > > > > login1$ swift zz3.swift > > > Swift svn swift-r4157 cog-r3056 > > > > > > RunID: 20110331-0134-bfzhkgaa > > > Progress: > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 2 > > > Final status: > > > > > > When did the foreach loop become the twice-each loop? > > > > > > I need to try some other revision and hosts with this. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > Wow, I didn't know we can do that! I treat the docs too > > > > canonically :P > > > > > > > > 2011/3/30 Michael Wilde : > > > > > > > > > login1$ cat zz2.swift > > > > > > > > > > foreach a in [0:3] { > > > > > trace("for", a); > > > > > } > > > > > > > > > > login1$ swift zz2.swift > > > > > Swift svn swift-r4157 cog-r3056 > > > > > > > > > > RunID: 20110331-0057-huo8jei0 > > > > > Progress: > > > > > SwiftScript trace: for, 1 > > > > > SwiftScript trace: for, 3 > > > > > SwiftScript trace: for, 2 > > > > > SwiftScript trace: for, 0 > > > > > Final status: > > > > > login1$ > > > > > > > > > > I suspect we need to make this more clear in the user guide > > > > > and > > > > > tutorials :) > > > > > > > > I agree. > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > ----- Original Message ----- > > > > >> Or just use the concurrent mapper to let swift handle the > > > > >> output > > > > >> naming itself. The resume files can't persist through > > > > >> multiple > > > > >> sessions though. > > > > >> > > > > >> 2011/3/30 Michael Wilde : > > > > >> > The most common case for this error occurs when two > > > > >> > iterations > > > > >> > within a foreach loop map an output file to the same > > > > >> > physical > > > > >> > file > > > > >> > name. When swift runs and tries to put the output object > > > > >> > into its > > > > >> > site cache, it sees that a file of the name name is already > > > > >> > in > > > > >> > the > > > > >> > cache, and its semantics do not allow that. > > > > >> > > > > > >> > I have not yet stared at this code long enough to see if > > > > >> > this > > > > >> > explains what is happening here. > > > > >> > > > > > >> > I also dont know why it might work under one version and > > > > >> > fail > > > > >> > under > > > > >> > 0.92. If the above situation is occurring, perhaps there is > > > > >> > some > > > > >> > randomness involved: loop iteration ordering; filename > > > > >> > generation > > > > >> > randomness or difference, etc. > > > > >> > > > > > >> > But I would debug with that in mind: make sure that all > > > > >> > *output* > > > > >> > fie > > > > >> > names mapped by the script are unique. Ideally, one should > > > > >> > be > > > > >> > able > > > > >> > to find the culprit by grepping the swift log for all the > > > > >> > mapped > > > > >> > file names and look for duplicates. > > > > >> > > > > > >> > - Mike > > > > >> > > > > > >> > > > > > >> > ----- Original Message ----- > > > > >> >> Or maybe local variables are static? Maybe they mapped to > > > > >> >> different > > > > >> >> files but to the same cache object? But I have been doing > > > > >> >> local > > > > >> >> variables in my own workflows though. > > > > >> >> > > > > >> >> 2011/3/30 Jonathan Monette : > > > > >> >> > Ok. I understand this error better. But shouldn't that > > > > >> >> > be a > > > > >> >> > different > > > > >> >> > error then? Like a and b are mapped to the same file? I > > > > >> >> > don't > > > > >> >> > know > > > > >> >> > if Swift > > > > >> >> > can know this but looking at the explanation and error > > > > >> >> > it > > > > >> >> > should > > > > >> >> > unless this > > > > >> >> > cache message has a deeper meaning. > > > > >> >> > > > > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > > > >> >> > > > > > >> >> > wrote: > > > > >> >> >> > > > > >> >> >> I had this error before when two output mapper objects > > > > >> >> >> mapped > > > > >> >> >> to > > > > >> >> >> the same > > > > >> >> >> file. > > > > >> >> >> > > > > >> >> >> $ swift bug_same.swift > > > > >> >> >> Swift svn swift-r4208 cog-r3073 > > > > >> >> >> > > > > >> >> >> RunID: 20110330-1818-ygec7ppa > > > > >> >> >> Progress: time:0 > > > > >> >> >> The cache already contains > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > >> >> >> > > > > >> >> >> The cache already contains > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > >> >> >> > > > > >> >> >> Progress: time:1960 Stage in:1 Finished successfully:1 > > > > >> >> >> The cache already contains > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > >> >> >> > > > > >> >> >> [aespinosa at communicado testing]$ > > > > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift > > > > >> >> >> type file; > > > > >> >> >> > > > > >> >> >> app (file out) echo(string input) { > > > > >> >> >> echo input stdout=@filename(out); > > > > >> >> >> } > > > > >> >> >> > > > > >> >> >> file a <"foo">; > > > > >> >> >> file b <"foo">; > > > > >> >> >> > > > > >> >> >> a = echo("hello world"); > > > > >> >> >> b = echo("foo bar"); > > > > >> >> >> > > > > >> >> >> But i think you should be using other Swift mappers > > > > >> >> >> that does > > > > >> >> >> auto-numbering of files by default. > > > > >> >> >> > > > > >> >> >> -Allan > > > > >> >> >> > > > > >> >> >> 2011/3/30 Zhao Zhang : > > > > >> >> >> > Hi guys, > > > > >> >> >> > > > > > >> >> >> > I am seeing something weird in swfit-0.92. Any idea > > > > >> >> >> > about > > > > >> >> >> > this? > > > > >> >> >> > The swift script is very simple: > > > > >> >> >> > > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > > > > >> >> >> > type Pickle {} > > > > >> >> >> > type History {} > > > > >> >> >> > type Image {} > > > > >> >> >> > > > > > >> >> >> > app (History historyout) movie_graph (int rerun, int > > > > >> >> >> > epochs, > > > > >> >> >> > Pickle > > > > >> >> >> > picklefile) > > > > >> >> >> > { > > > > >> >> >> > movie_graph rerun epochs; > > > > >> >> >> > } > > > > >> >> >> > > > > > >> >> >> > int arr[]; > > > > >> >> >> > iterate i > > > > >> >> >> > { > > > > >> >> >> > arr[i] = i+1; > > > > >> >> >> > }until(i == 1); > > > > >> >> >> > > > > > >> >> >> > int epochs; > > > > >> >> >> > epochs = 3; > > > > >> >> >> > Pickle picklefile > > > >> >> >> > file="for_movies.pickled">; > > > > >> >> >> > foreach a in arr{ > > > > >> >> >> > History historyout > > > >> >> >> > file=@strcat("output/rerun", a, > > > > >> >> >> > "/histories.pickled-", a)>; > > > > >> >> >> > historyout = movie_graph(a, epochs, picklefile); > > > > >> >> >> > } > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > I ran the script with the latest 0.92 version, which > > > > >> >> >> > is > > > > >> >> >> > loaded > > > > >> >> >> > as > > > > >> >> >> > a > > > > >> >> >> > module > > > > >> >> >> > on beagle. The I saw this: > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > >> >> >> > ./tc.data > > > > >> >> >> > movies.swift > > > > >> >> >> > Variable epochs defined in scope 99878388 shadows > > > > >> >> >> > variable > > > > >> >> >> > of > > > > >> >> >> > same name > > > > >> >> >> > in > > > > >> >> >> > scope 1813605401 > > > > >> >> >> > Variable picklefile defined in scope 99878388 shadows > > > > >> >> >> > variable > > > > >> >> >> > of > > > > >> >> >> > same > > > > >> >> >> > name > > > > >> >> >> > in scope 1813605401 > > > > >> >> >> > Swift svn swift-r4157 cog-r3056 > > > > >> >> >> > > > > > >> >> >> > RunID: 20110330-1636-ev8vm8gb > > > > >> >> >> > Progress: > > > > >> >> >> > Progress: Selecting site:3 Active:1 > > > > >> >> >> > Progress: Selecting site:3 Checking status:1 > > > > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished > > > > >> >> >> > successfully:1 > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > >> >> >> > successfully:1 > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > >> >> >> > successfully:1 > > > > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished > > > > >> >> >> > successfully:2 > > > > >> >> >> > Progress: Selecting site:1 Active:1 Finished > > > > >> >> >> > successfully:2 > > > > >> >> >> > Progress: Selecting site:1 Checking status:1 Finished > > > > >> >> >> > successfully:2 > > > > >> >> >> > The cache already contains > > > > >> >> >> > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > >> >> >> > > > > > >> >> >> > Execution failed: > > > > >> >> >> > The cache already contains > > > > >> >> >> > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > Then I switched to an older version, it worked well. > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > >> >> >> > ./tc.data > > > > >> >> >> > movies.swift > > > > >> >> >> > Variable epochs defined in scope 212602028 shadows > > > > >> >> >> > variable > > > > >> >> >> > of > > > > >> >> >> > same name > > > > >> >> >> > in > > > > >> >> >> > scope 1538939834 > > > > >> >> >> > Variable picklefile defined in scope 212602028 > > > > >> >> >> > shadows > > > > >> >> >> > variable > > > > >> >> >> > of same > > > > >> >> >> > name > > > > >> >> >> > in scope 1538939834 > > > > >> >> >> > Swift svn swift-r3291 (swift modified locally) > > > > >> >> >> > cog-r2750 > > > > >> >> >> > (cog > > > > >> >> >> > modified > > > > >> >> >> > locally) > > > > >> >> >> > > > > > >> >> >> > RunID: 20110330-1639-gmbyz1qa > > > > >> >> >> > Progress: > > > > >> >> >> > Progress: Active:2 > > > > >> >> >> > Progress: Active:1 Checking status:1 > > > > >> >> >> > Final status: Finished successfully:2 > > > > >> >> >> _______________________________________________ > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Mar 30 21:54:51 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 21:54:51 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301535653.20729.2.camel@blabla2.none> Message-ID: <1508405414.48475.1301540091963.JavaMail.root@zimbra.anl.gov> There seems to be something non-deterministic about this script: com$ cat zz5.swift int arr[]; int brr[]; arr[0]=1; arr[1]=2; brr = [1:2]; trace("arr",arr); trace("brr",brr); foreach a in arr { trace("for", a); } com$ (By the way, Im seeing the same error on communicado) The script above sometimes prints 2, 3, or 4 instances of the trace() inside the foreach. And sometimes it hangs on one of the two trace statements outside the loop. Most cases, it prints all 6 traces, as in the original failing case. com$ swift zz5.swift Swift svn swift-r4087 (swift modified locally) cog-r3051 RunID: 20110330-2148-qf5anxr6 Progress: time:2 SwiftScript trace: arr, arr.$[]/2 SwiftScript trace: brr, brr.$[]/2 SwiftScript trace: for, 1 SwiftScript trace: for, 2 Final status: time:12 Time: 1.163, rate: 14087 j/s com$ swift zz5.swift Swift svn swift-r4087 (swift modified locally) cog-r3051 RunID: 20110330-2148-kouc9zq3 Progress: time:3 SwiftScript trace: arr, arr.$[]/2 SwiftScript trace: brr, brr.$[]/2 SwiftScript trace: for, 2 SwiftScript trace: for, 2 SwiftScript trace: for, 1 SwiftScript trace: for, 1 Final status: time:16 Time: 1.214, rate: 13495 j/s com$ swift zz5.swift Swift svn swift-r4087 (swift modified locally) cog-r3051 RunID: 20110330-2148-lksn2a17 Progress: time:2 SwiftScript trace: arr, arr.$[]/2 SwiftScript trace: brr, brr.$[]/2 SwiftScript trace: for, 1 SwiftScript trace: for, 2 SwiftScript trace: for, 2 SwiftScript trace: for, 1 Final status: time:17 Time: 1.227, rate: 13352 j/s com$ swift zz5.swift Swift svn swift-r4087 (swift modified locally) cog-r3051 RunID: 20110330-2148-tl2xtxx6 Progress: time:1 SwiftScript trace: arr, arr.$[]/2 SwiftScript trace: for, 1 SwiftScript trace: for, 1 SwiftScript trace: brr, brr.$[]/2 SwiftScript trace: for, 2 SwiftScript trace: for, 2 Final status: time:14 Time: 1.224, rate: 13385 j/s com$ swift zz5.swift Swift svn swift-r4087 (swift modified locally) cog-r3051 RunID: 20110330-2148-mk5aypbg Progress: time:6 SwiftScript trace: arr, arr.$[]/2 SwiftScript trace: brr, brr.$[]/2 SwiftScript trace: for, 2 SwiftScript trace: for, 1 SwiftScript trace: for, 1 SwiftScript trace: for, 2 Final status: time:17 Time: 1.191, rate: 13756 j/s com$ swift zz5.swift Swift svn swift-r4087 (swift modified locally) cog-r3051 RunID: 20110330-2148-hgcbaxga Progress: time:2 SwiftScript trace: arr, arr.$[]/2 SwiftScript trace: for, 1 SwiftScript trace: for, 2 SwiftScript trace: for, 2 SwiftScript trace: for, 1 com$ swift zz5.swift Swift svn swift-r4087 (swift modified locally) cog-r3051 RunID: 20110330-2149-oaa0kuy8 Progress:SwiftScript trace: arr, arr.$[]/2 time:9 SwiftScript trace: for, 2 SwiftScript trace: brr, brr.$[]/2 SwiftScript trace: for, 2 SwiftScript trace: for, 1 Final status: time:17 Time: 1.241, rate: 13202 j/s com$ ----- Original Message ----- > On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: > > OK, what am I missing? > > Nothing. That shouldn't be happening. > > Here's what (approximately) you should get: > Swift svn swift-r3526 (swift modified locally) cog-r656 (cog modified > locally) > > RunID: 20110330-1839-y71gls4a > Progress: time:0 > [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 > [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 > Final status: time:54 > > > > > int arr[]; > > > > arr[0]=1; > > arr[1]=2; > > > > foreach a in arr { > > trace("for", a); > > } > > > > login1$ swift zz3.swift > > Swift svn swift-r4157 cog-r3056 > > > > RunID: 20110331-0134-bfzhkgaa > > Progress: > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 2 > > Final status: > > > > When did the foreach loop become the twice-each loop? > > > > I need to try some other revision and hosts with this. > > > > - Mike > > > > > > ----- Original Message ----- > > > Wow, I didn't know we can do that! I treat the docs too > > > canonically :P > > > > > > 2011/3/30 Michael Wilde : > > > > > > > login1$ cat zz2.swift > > > > > > > > foreach a in [0:3] { > > > > trace("for", a); > > > > } > > > > > > > > login1$ swift zz2.swift > > > > Swift svn swift-r4157 cog-r3056 > > > > > > > > RunID: 20110331-0057-huo8jei0 > > > > Progress: > > > > SwiftScript trace: for, 1 > > > > SwiftScript trace: for, 3 > > > > SwiftScript trace: for, 2 > > > > SwiftScript trace: for, 0 > > > > Final status: > > > > login1$ > > > > > > > > I suspect we need to make this more clear in the user guide and > > > > tutorials :) > > > > > > I agree. > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > >> Or just use the concurrent mapper to let swift handle the > > > >> output > > > >> naming itself. The resume files can't persist through multiple > > > >> sessions though. > > > >> > > > >> 2011/3/30 Michael Wilde : > > > >> > The most common case for this error occurs when two > > > >> > iterations > > > >> > within a foreach loop map an output file to the same physical > > > >> > file > > > >> > name. When swift runs and tries to put the output object into > > > >> > its > > > >> > site cache, it sees that a file of the name name is already > > > >> > in > > > >> > the > > > >> > cache, and its semantics do not allow that. > > > >> > > > > >> > I have not yet stared at this code long enough to see if this > > > >> > explains what is happening here. > > > >> > > > > >> > I also dont know why it might work under one version and fail > > > >> > under > > > >> > 0.92. If the above situation is occurring, perhaps there is > > > >> > some > > > >> > randomness involved: loop iteration ordering; filename > > > >> > generation > > > >> > randomness or difference, etc. > > > >> > > > > >> > But I would debug with that in mind: make sure that all > > > >> > *output* > > > >> > fie > > > >> > names mapped by the script are unique. Ideally, one should be > > > >> > able > > > >> > to find the culprit by grepping the swift log for all the > > > >> > mapped > > > >> > file names and look for duplicates. > > > >> > > > > >> > - Mike > > > >> > > > > >> > > > > >> > ----- Original Message ----- > > > >> >> Or maybe local variables are static? Maybe they mapped to > > > >> >> different > > > >> >> files but to the same cache object? But I have been doing > > > >> >> local > > > >> >> variables in my own workflows though. > > > >> >> > > > >> >> 2011/3/30 Jonathan Monette : > > > >> >> > Ok. I understand this error better. But shouldn't that be > > > >> >> > a > > > >> >> > different > > > >> >> > error then? Like a and b are mapped to the same file? I > > > >> >> > don't > > > >> >> > know > > > >> >> > if Swift > > > >> >> > can know this but looking at the explanation and error it > > > >> >> > should > > > >> >> > unless this > > > >> >> > cache message has a deeper meaning. > > > >> >> > > > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > > >> >> > > > > >> >> > wrote: > > > >> >> >> > > > >> >> >> I had this error before when two output mapper objects > > > >> >> >> mapped > > > >> >> >> to > > > >> >> >> the same > > > >> >> >> file. > > > >> >> >> > > > >> >> >> $ swift bug_same.swift > > > >> >> >> Swift svn swift-r4208 cog-r3073 > > > >> >> >> > > > >> >> >> RunID: 20110330-1818-ygec7ppa > > > >> >> >> Progress: time:0 > > > >> >> >> The cache already contains > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > >> >> >> > > > >> >> >> The cache already contains > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > >> >> >> > > > >> >> >> Progress: time:1960 Stage in:1 Finished successfully:1 > > > >> >> >> The cache already contains > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > >> >> >> > > > >> >> >> [aespinosa at communicado testing]$ > > > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift > > > >> >> >> type file; > > > >> >> >> > > > >> >> >> app (file out) echo(string input) { > > > >> >> >> echo input stdout=@filename(out); > > > >> >> >> } > > > >> >> >> > > > >> >> >> file a <"foo">; > > > >> >> >> file b <"foo">; > > > >> >> >> > > > >> >> >> a = echo("hello world"); > > > >> >> >> b = echo("foo bar"); > > > >> >> >> > > > >> >> >> But i think you should be using other Swift mappers that > > > >> >> >> does > > > >> >> >> auto-numbering of files by default. > > > >> >> >> > > > >> >> >> -Allan > > > >> >> >> > > > >> >> >> 2011/3/30 Zhao Zhang : > > > >> >> >> > Hi guys, > > > >> >> >> > > > > >> >> >> > I am seeing something weird in swfit-0.92. Any idea > > > >> >> >> > about > > > >> >> >> > this? > > > >> >> >> > The swift script is very simple: > > > >> >> >> > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > > > >> >> >> > type Pickle {} > > > >> >> >> > type History {} > > > >> >> >> > type Image {} > > > >> >> >> > > > > >> >> >> > app (History historyout) movie_graph (int rerun, int > > > >> >> >> > epochs, > > > >> >> >> > Pickle > > > >> >> >> > picklefile) > > > >> >> >> > { > > > >> >> >> > movie_graph rerun epochs; > > > >> >> >> > } > > > >> >> >> > > > > >> >> >> > int arr[]; > > > >> >> >> > iterate i > > > >> >> >> > { > > > >> >> >> > arr[i] = i+1; > > > >> >> >> > }until(i == 1); > > > >> >> >> > > > > >> >> >> > int epochs; > > > >> >> >> > epochs = 3; > > > >> >> >> > Pickle picklefile > > >> >> >> > file="for_movies.pickled">; > > > >> >> >> > foreach a in arr{ > > > >> >> >> > History historyout > > >> >> >> > file=@strcat("output/rerun", a, > > > >> >> >> > "/histories.pickled-", a)>; > > > >> >> >> > historyout = movie_graph(a, epochs, picklefile); > > > >> >> >> > } > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > I ran the script with the latest 0.92 version, which is > > > >> >> >> > loaded > > > >> >> >> > as > > > >> >> >> > a > > > >> >> >> > module > > > >> >> >> > on beagle. The I saw this: > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > >> >> >> > ./tc.data > > > >> >> >> > movies.swift > > > >> >> >> > Variable epochs defined in scope 99878388 shadows > > > >> >> >> > variable > > > >> >> >> > of > > > >> >> >> > same name > > > >> >> >> > in > > > >> >> >> > scope 1813605401 > > > >> >> >> > Variable picklefile defined in scope 99878388 shadows > > > >> >> >> > variable > > > >> >> >> > of > > > >> >> >> > same > > > >> >> >> > name > > > >> >> >> > in scope 1813605401 > > > >> >> >> > Swift svn swift-r4157 cog-r3056 > > > >> >> >> > > > > >> >> >> > RunID: 20110330-1636-ev8vm8gb > > > >> >> >> > Progress: > > > >> >> >> > Progress: Selecting site:3 Active:1 > > > >> >> >> > Progress: Selecting site:3 Checking status:1 > > > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished > > > >> >> >> > successfully:1 > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > >> >> >> > successfully:1 > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > >> >> >> > successfully:1 > > > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished > > > >> >> >> > successfully:2 > > > >> >> >> > Progress: Selecting site:1 Active:1 Finished > > > >> >> >> > successfully:2 > > > >> >> >> > Progress: Selecting site:1 Checking status:1 Finished > > > >> >> >> > successfully:2 > > > >> >> >> > The cache already contains > > > >> >> >> > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > >> >> >> > > > > >> >> >> > Execution failed: > > > >> >> >> > The cache already contains > > > >> >> >> > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > Then I switched to an older version, it worked well. > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > >> >> >> > ./tc.data > > > >> >> >> > movies.swift > > > >> >> >> > Variable epochs defined in scope 212602028 shadows > > > >> >> >> > variable > > > >> >> >> > of > > > >> >> >> > same name > > > >> >> >> > in > > > >> >> >> > scope 1538939834 > > > >> >> >> > Variable picklefile defined in scope 212602028 shadows > > > >> >> >> > variable > > > >> >> >> > of same > > > >> >> >> > name > > > >> >> >> > in scope 1538939834 > > > >> >> >> > Swift svn swift-r3291 (swift modified locally) > > > >> >> >> > cog-r2750 > > > >> >> >> > (cog > > > >> >> >> > modified > > > >> >> >> > locally) > > > >> >> >> > > > > >> >> >> > RunID: 20110330-1639-gmbyz1qa > > > >> >> >> > Progress: > > > >> >> >> > Progress: Active:2 > > > >> >> >> > Progress: Active:1 Checking status:1 > > > >> >> >> > Final status: Finished successfully:2 > > > >> >> >> _______________________________________________ > > > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed Mar 30 22:04:29 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 30 Mar 2011 20:04:29 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1508405414.48475.1301540091963.JavaMail.root@zimbra.anl.gov> References: <1508405414.48475.1301540091963.JavaMail.root@zimbra.anl.gov> Message-ID: <1301540669.23409.2.camel@blabla2.none> I think at this point we should stop testing 0.92 until we figure out the reason for the merge. Trunk contained a pretty dramatic change to the karajan engine and I would expect badness like that on a merge back to a stable branch. The previous behaviour (double iterations) alone is a sign of badness, and so this new thing doesn't surprise me. Mihael On Wed, 2011-03-30 at 21:54 -0500, Michael Wilde wrote: > There seems to be something non-deterministic about this script: > > com$ cat zz5.swift > int arr[]; > int brr[]; > > arr[0]=1; > arr[1]=2; > > brr = [1:2]; > > trace("arr",arr); > trace("brr",brr); > > foreach a in arr { > trace("for", a); > } > > com$ > > (By the way, Im seeing the same error on communicado) > > The script above sometimes prints 2, 3, or 4 instances of the trace() inside the foreach. And sometimes it hangs on one of the two trace statements outside the loop. Most cases, it prints all 6 traces, as in the original failing case. > > com$ swift zz5.swift > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > RunID: 20110330-2148-qf5anxr6 > Progress: time:2 > SwiftScript trace: arr, arr.$[]/2 > SwiftScript trace: brr, brr.$[]/2 > SwiftScript trace: for, 1 > SwiftScript trace: for, 2 > Final status: time:12 > Time: 1.163, rate: 14087 j/s > com$ swift zz5.swift > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > RunID: 20110330-2148-kouc9zq3 > Progress: time:3 > SwiftScript trace: arr, arr.$[]/2 > SwiftScript trace: brr, brr.$[]/2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > SwiftScript trace: for, 1 > Final status: time:16 > Time: 1.214, rate: 13495 j/s > com$ swift zz5.swift > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > RunID: 20110330-2148-lksn2a17 > Progress: time:2 > SwiftScript trace: arr, arr.$[]/2 > SwiftScript trace: brr, brr.$[]/2 > SwiftScript trace: for, 1 > SwiftScript trace: for, 2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > Final status: time:17 > Time: 1.227, rate: 13352 j/s > com$ swift zz5.swift > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > RunID: 20110330-2148-tl2xtxx6 > Progress: time:1 > SwiftScript trace: arr, arr.$[]/2 > SwiftScript trace: for, 1 > SwiftScript trace: for, 1 > SwiftScript trace: brr, brr.$[]/2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 2 > Final status: time:14 > Time: 1.224, rate: 13385 j/s > com$ swift zz5.swift > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > RunID: 20110330-2148-mk5aypbg > Progress: time:6 > SwiftScript trace: arr, arr.$[]/2 > SwiftScript trace: brr, brr.$[]/2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > SwiftScript trace: for, 1 > SwiftScript trace: for, 2 > Final status: time:17 > Time: 1.191, rate: 13756 j/s > com$ swift zz5.swift > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > RunID: 20110330-2148-hgcbaxga > Progress: time:2 > SwiftScript trace: arr, arr.$[]/2 > SwiftScript trace: for, 1 > SwiftScript trace: for, 2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > com$ swift zz5.swift > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > RunID: 20110330-2149-oaa0kuy8 > Progress:SwiftScript trace: arr, arr.$[]/2 > time:9 > SwiftScript trace: for, 2 > SwiftScript trace: brr, brr.$[]/2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > Final status: time:17 > Time: 1.241, rate: 13202 j/s > com$ > > > ----- Original Message ----- > > On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: > > > OK, what am I missing? > > > > Nothing. That shouldn't be happening. > > > > Here's what (approximately) you should get: > > Swift svn swift-r3526 (swift modified locally) cog-r656 (cog modified > > locally) > > > > RunID: 20110330-1839-y71gls4a > > Progress: time:0 > > [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 > > [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 > > Final status: time:54 > > > > > > > > int arr[]; > > > > > > arr[0]=1; > > > arr[1]=2; > > > > > > foreach a in arr { > > > trace("for", a); > > > } > > > > > > login1$ swift zz3.swift > > > Swift svn swift-r4157 cog-r3056 > > > > > > RunID: 20110331-0134-bfzhkgaa > > > Progress: > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 2 > > > Final status: > > > > > > When did the foreach loop become the twice-each loop? > > > > > > I need to try some other revision and hosts with this. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > Wow, I didn't know we can do that! I treat the docs too > > > > canonically :P > > > > > > > > 2011/3/30 Michael Wilde : > > > > > > > > > login1$ cat zz2.swift > > > > > > > > > > foreach a in [0:3] { > > > > > trace("for", a); > > > > > } > > > > > > > > > > login1$ swift zz2.swift > > > > > Swift svn swift-r4157 cog-r3056 > > > > > > > > > > RunID: 20110331-0057-huo8jei0 > > > > > Progress: > > > > > SwiftScript trace: for, 1 > > > > > SwiftScript trace: for, 3 > > > > > SwiftScript trace: for, 2 > > > > > SwiftScript trace: for, 0 > > > > > Final status: > > > > > login1$ > > > > > > > > > > I suspect we need to make this more clear in the user guide and > > > > > tutorials :) > > > > > > > > I agree. > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > ----- Original Message ----- > > > > >> Or just use the concurrent mapper to let swift handle the > > > > >> output > > > > >> naming itself. The resume files can't persist through multiple > > > > >> sessions though. > > > > >> > > > > >> 2011/3/30 Michael Wilde : > > > > >> > The most common case for this error occurs when two > > > > >> > iterations > > > > >> > within a foreach loop map an output file to the same physical > > > > >> > file > > > > >> > name. When swift runs and tries to put the output object into > > > > >> > its > > > > >> > site cache, it sees that a file of the name name is already > > > > >> > in > > > > >> > the > > > > >> > cache, and its semantics do not allow that. > > > > >> > > > > > >> > I have not yet stared at this code long enough to see if this > > > > >> > explains what is happening here. > > > > >> > > > > > >> > I also dont know why it might work under one version and fail > > > > >> > under > > > > >> > 0.92. If the above situation is occurring, perhaps there is > > > > >> > some > > > > >> > randomness involved: loop iteration ordering; filename > > > > >> > generation > > > > >> > randomness or difference, etc. > > > > >> > > > > > >> > But I would debug with that in mind: make sure that all > > > > >> > *output* > > > > >> > fie > > > > >> > names mapped by the script are unique. Ideally, one should be > > > > >> > able > > > > >> > to find the culprit by grepping the swift log for all the > > > > >> > mapped > > > > >> > file names and look for duplicates. > > > > >> > > > > > >> > - Mike > > > > >> > > > > > >> > > > > > >> > ----- Original Message ----- > > > > >> >> Or maybe local variables are static? Maybe they mapped to > > > > >> >> different > > > > >> >> files but to the same cache object? But I have been doing > > > > >> >> local > > > > >> >> variables in my own workflows though. > > > > >> >> > > > > >> >> 2011/3/30 Jonathan Monette : > > > > >> >> > Ok. I understand this error better. But shouldn't that be > > > > >> >> > a > > > > >> >> > different > > > > >> >> > error then? Like a and b are mapped to the same file? I > > > > >> >> > don't > > > > >> >> > know > > > > >> >> > if Swift > > > > >> >> > can know this but looking at the explanation and error it > > > > >> >> > should > > > > >> >> > unless this > > > > >> >> > cache message has a deeper meaning. > > > > >> >> > > > > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > > > >> >> > > > > > >> >> > wrote: > > > > >> >> >> > > > > >> >> >> I had this error before when two output mapper objects > > > > >> >> >> mapped > > > > >> >> >> to > > > > >> >> >> the same > > > > >> >> >> file. > > > > >> >> >> > > > > >> >> >> $ swift bug_same.swift > > > > >> >> >> Swift svn swift-r4208 cog-r3073 > > > > >> >> >> > > > > >> >> >> RunID: 20110330-1818-ygec7ppa > > > > >> >> >> Progress: time:0 > > > > >> >> >> The cache already contains > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > >> >> >> > > > > >> >> >> The cache already contains > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > >> >> >> > > > > >> >> >> Progress: time:1960 Stage in:1 Finished successfully:1 > > > > >> >> >> The cache already contains > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > >> >> >> > > > > >> >> >> [aespinosa at communicado testing]$ > > > > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift > > > > >> >> >> type file; > > > > >> >> >> > > > > >> >> >> app (file out) echo(string input) { > > > > >> >> >> echo input stdout=@filename(out); > > > > >> >> >> } > > > > >> >> >> > > > > >> >> >> file a <"foo">; > > > > >> >> >> file b <"foo">; > > > > >> >> >> > > > > >> >> >> a = echo("hello world"); > > > > >> >> >> b = echo("foo bar"); > > > > >> >> >> > > > > >> >> >> But i think you should be using other Swift mappers that > > > > >> >> >> does > > > > >> >> >> auto-numbering of files by default. > > > > >> >> >> > > > > >> >> >> -Allan > > > > >> >> >> > > > > >> >> >> 2011/3/30 Zhao Zhang : > > > > >> >> >> > Hi guys, > > > > >> >> >> > > > > > >> >> >> > I am seeing something weird in swfit-0.92. Any idea > > > > >> >> >> > about > > > > >> >> >> > this? > > > > >> >> >> > The swift script is very simple: > > > > >> >> >> > > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > > > > >> >> >> > type Pickle {} > > > > >> >> >> > type History {} > > > > >> >> >> > type Image {} > > > > >> >> >> > > > > > >> >> >> > app (History historyout) movie_graph (int rerun, int > > > > >> >> >> > epochs, > > > > >> >> >> > Pickle > > > > >> >> >> > picklefile) > > > > >> >> >> > { > > > > >> >> >> > movie_graph rerun epochs; > > > > >> >> >> > } > > > > >> >> >> > > > > > >> >> >> > int arr[]; > > > > >> >> >> > iterate i > > > > >> >> >> > { > > > > >> >> >> > arr[i] = i+1; > > > > >> >> >> > }until(i == 1); > > > > >> >> >> > > > > > >> >> >> > int epochs; > > > > >> >> >> > epochs = 3; > > > > >> >> >> > Pickle picklefile > > > >> >> >> > file="for_movies.pickled">; > > > > >> >> >> > foreach a in arr{ > > > > >> >> >> > History historyout > > > >> >> >> > file=@strcat("output/rerun", a, > > > > >> >> >> > "/histories.pickled-", a)>; > > > > >> >> >> > historyout = movie_graph(a, epochs, picklefile); > > > > >> >> >> > } > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > I ran the script with the latest 0.92 version, which is > > > > >> >> >> > loaded > > > > >> >> >> > as > > > > >> >> >> > a > > > > >> >> >> > module > > > > >> >> >> > on beagle. The I saw this: > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > >> >> >> > ./tc.data > > > > >> >> >> > movies.swift > > > > >> >> >> > Variable epochs defined in scope 99878388 shadows > > > > >> >> >> > variable > > > > >> >> >> > of > > > > >> >> >> > same name > > > > >> >> >> > in > > > > >> >> >> > scope 1813605401 > > > > >> >> >> > Variable picklefile defined in scope 99878388 shadows > > > > >> >> >> > variable > > > > >> >> >> > of > > > > >> >> >> > same > > > > >> >> >> > name > > > > >> >> >> > in scope 1813605401 > > > > >> >> >> > Swift svn swift-r4157 cog-r3056 > > > > >> >> >> > > > > > >> >> >> > RunID: 20110330-1636-ev8vm8gb > > > > >> >> >> > Progress: > > > > >> >> >> > Progress: Selecting site:3 Active:1 > > > > >> >> >> > Progress: Selecting site:3 Checking status:1 > > > > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished > > > > >> >> >> > successfully:1 > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > >> >> >> > successfully:1 > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > >> >> >> > successfully:1 > > > > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished > > > > >> >> >> > successfully:2 > > > > >> >> >> > Progress: Selecting site:1 Active:1 Finished > > > > >> >> >> > successfully:2 > > > > >> >> >> > Progress: Selecting site:1 Checking status:1 Finished > > > > >> >> >> > successfully:2 > > > > >> >> >> > The cache already contains > > > > >> >> >> > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > >> >> >> > > > > > >> >> >> > Execution failed: > > > > >> >> >> > The cache already contains > > > > >> >> >> > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > Then I switched to an older version, it worked well. > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > >> >> >> > ./tc.data > > > > >> >> >> > movies.swift > > > > >> >> >> > Variable epochs defined in scope 212602028 shadows > > > > >> >> >> > variable > > > > >> >> >> > of > > > > >> >> >> > same name > > > > >> >> >> > in > > > > >> >> >> > scope 1538939834 > > > > >> >> >> > Variable picklefile defined in scope 212602028 shadows > > > > >> >> >> > variable > > > > >> >> >> > of same > > > > >> >> >> > name > > > > >> >> >> > in scope 1538939834 > > > > >> >> >> > Swift svn swift-r3291 (swift modified locally) > > > > >> >> >> > cog-r2750 > > > > >> >> >> > (cog > > > > >> >> >> > modified > > > > >> >> >> > locally) > > > > >> >> >> > > > > > >> >> >> > RunID: 20110330-1639-gmbyz1qa > > > > >> >> >> > Progress: > > > > >> >> >> > Progress: Active:2 > > > > >> >> >> > Progress: Active:1 Checking status:1 > > > > >> >> >> > Final status: Finished successfully:2 > > > > >> >> >> _______________________________________________ > > > > > > > > > From wilde at mcs.anl.gov Wed Mar 30 22:10:09 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 22:10:09 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301540669.23409.2.camel@blabla2.none> Message-ID: <1948541983.48518.1301541009442.JavaMail.root@zimbra.anl.gov> Yeah, I'm done for now. Except that Im a little baffled as to why this is turning up now, as I thought I was running much more complex scripts (generating hundreds of thousands of files) with no sign of behavior like this, on what I think is the same revision. And what Im running is (I *think*) from before the merge you are talking about. Unless I'm misunderstanding what you discovered in svn. - Mike ----- Original Message ----- > I think at this point we should stop testing 0.92 until we figure out > the reason for the merge. > > Trunk contained a pretty dramatic change to the karajan engine and I > would expect badness like that on a merge back to a stable branch. The > previous behaviour (double iterations) alone is a sign of badness, and > so this new thing doesn't surprise me. > > Mihael > > On Wed, 2011-03-30 at 21:54 -0500, Michael Wilde wrote: > > There seems to be something non-deterministic about this script: > > > > com$ cat zz5.swift > > int arr[]; > > int brr[]; > > > > arr[0]=1; > > arr[1]=2; > > > > brr = [1:2]; > > > > trace("arr",arr); > > trace("brr",brr); > > > > foreach a in arr { > > trace("for", a); > > } > > > > com$ > > > > (By the way, Im seeing the same error on communicado) > > > > The script above sometimes prints 2, 3, or 4 instances of the > > trace() inside the foreach. And sometimes it hangs on one of the two > > trace statements outside the loop. Most cases, it prints all 6 > > traces, as in the original failing case. > > > > com$ swift zz5.swift > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > RunID: 20110330-2148-qf5anxr6 > > Progress: time:2 > > SwiftScript trace: arr, arr.$[]/2 > > SwiftScript trace: brr, brr.$[]/2 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 2 > > Final status: time:12 > > Time: 1.163, rate: 14087 j/s > > com$ swift zz5.swift > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > RunID: 20110330-2148-kouc9zq3 > > Progress: time:3 > > SwiftScript trace: arr, arr.$[]/2 > > SwiftScript trace: brr, brr.$[]/2 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 1 > > Final status: time:16 > > Time: 1.214, rate: 13495 j/s > > com$ swift zz5.swift > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > RunID: 20110330-2148-lksn2a17 > > Progress: time:2 > > SwiftScript trace: arr, arr.$[]/2 > > SwiftScript trace: brr, brr.$[]/2 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > Final status: time:17 > > Time: 1.227, rate: 13352 j/s > > com$ swift zz5.swift > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > RunID: 20110330-2148-tl2xtxx6 > > Progress: time:1 > > SwiftScript trace: arr, arr.$[]/2 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 1 > > SwiftScript trace: brr, brr.$[]/2 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 2 > > Final status: time:14 > > Time: 1.224, rate: 13385 j/s > > com$ swift zz5.swift > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > RunID: 20110330-2148-mk5aypbg > > Progress: time:6 > > SwiftScript trace: arr, arr.$[]/2 > > SwiftScript trace: brr, brr.$[]/2 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 2 > > Final status: time:17 > > Time: 1.191, rate: 13756 j/s > > com$ swift zz5.swift > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > RunID: 20110330-2148-hgcbaxga > > Progress: time:2 > > SwiftScript trace: arr, arr.$[]/2 > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > com$ swift zz5.swift > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > RunID: 20110330-2149-oaa0kuy8 > > Progress:SwiftScript trace: arr, arr.$[]/2 > > time:9 > > SwiftScript trace: for, 2 > > SwiftScript trace: brr, brr.$[]/2 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > Final status: time:17 > > Time: 1.241, rate: 13202 j/s > > com$ > > > > > > ----- Original Message ----- > > > On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: > > > > OK, what am I missing? > > > > > > Nothing. That shouldn't be happening. > > > > > > Here's what (approximately) you should get: > > > Swift svn swift-r3526 (swift modified locally) cog-r656 (cog > > > modified > > > locally) > > > > > > RunID: 20110330-1839-y71gls4a > > > Progress: time:0 > > > [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 > > > [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 > > > Final status: time:54 > > > > > > > > > > > int arr[]; > > > > > > > > arr[0]=1; > > > > arr[1]=2; > > > > > > > > foreach a in arr { > > > > trace("for", a); > > > > } > > > > > > > > login1$ swift zz3.swift > > > > Swift svn swift-r4157 cog-r3056 > > > > > > > > RunID: 20110331-0134-bfzhkgaa > > > > Progress: > > > > SwiftScript trace: for, 2 > > > > SwiftScript trace: for, 1 > > > > SwiftScript trace: for, 1 > > > > SwiftScript trace: for, 2 > > > > Final status: > > > > > > > > When did the foreach loop become the twice-each loop? > > > > > > > > I need to try some other revision and hosts with this. > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > Wow, I didn't know we can do that! I treat the docs too > > > > > canonically :P > > > > > > > > > > 2011/3/30 Michael Wilde : > > > > > > > > > > > login1$ cat zz2.swift > > > > > > > > > > > > foreach a in [0:3] { > > > > > > trace("for", a); > > > > > > } > > > > > > > > > > > > login1$ swift zz2.swift > > > > > > Swift svn swift-r4157 cog-r3056 > > > > > > > > > > > > RunID: 20110331-0057-huo8jei0 > > > > > > Progress: > > > > > > SwiftScript trace: for, 1 > > > > > > SwiftScript trace: for, 3 > > > > > > SwiftScript trace: for, 2 > > > > > > SwiftScript trace: for, 0 > > > > > > Final status: > > > > > > login1$ > > > > > > > > > > > > I suspect we need to make this more clear in the user guide > > > > > > and > > > > > > tutorials :) > > > > > > > > > > I agree. > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > >> Or just use the concurrent mapper to let swift handle the > > > > > >> output > > > > > >> naming itself. The resume files can't persist through > > > > > >> multiple > > > > > >> sessions though. > > > > > >> > > > > > >> 2011/3/30 Michael Wilde : > > > > > >> > The most common case for this error occurs when two > > > > > >> > iterations > > > > > >> > within a foreach loop map an output file to the same > > > > > >> > physical > > > > > >> > file > > > > > >> > name. When swift runs and tries to put the output object > > > > > >> > into > > > > > >> > its > > > > > >> > site cache, it sees that a file of the name name is > > > > > >> > already > > > > > >> > in > > > > > >> > the > > > > > >> > cache, and its semantics do not allow that. > > > > > >> > > > > > > >> > I have not yet stared at this code long enough to see if > > > > > >> > this > > > > > >> > explains what is happening here. > > > > > >> > > > > > > >> > I also dont know why it might work under one version and > > > > > >> > fail > > > > > >> > under > > > > > >> > 0.92. If the above situation is occurring, perhaps there > > > > > >> > is > > > > > >> > some > > > > > >> > randomness involved: loop iteration ordering; filename > > > > > >> > generation > > > > > >> > randomness or difference, etc. > > > > > >> > > > > > > >> > But I would debug with that in mind: make sure that all > > > > > >> > *output* > > > > > >> > fie > > > > > >> > names mapped by the script are unique. Ideally, one > > > > > >> > should be > > > > > >> > able > > > > > >> > to find the culprit by grepping the swift log for all the > > > > > >> > mapped > > > > > >> > file names and look for duplicates. > > > > > >> > > > > > > >> > - Mike > > > > > >> > > > > > > >> > > > > > > >> > ----- Original Message ----- > > > > > >> >> Or maybe local variables are static? Maybe they mapped > > > > > >> >> to > > > > > >> >> different > > > > > >> >> files but to the same cache object? But I have been > > > > > >> >> doing > > > > > >> >> local > > > > > >> >> variables in my own workflows though. > > > > > >> >> > > > > > >> >> 2011/3/30 Jonathan Monette : > > > > > >> >> > Ok. I understand this error better. But shouldn't that > > > > > >> >> > be > > > > > >> >> > a > > > > > >> >> > different > > > > > >> >> > error then? Like a and b are mapped to the same file? > > > > > >> >> > I > > > > > >> >> > don't > > > > > >> >> > know > > > > > >> >> > if Swift > > > > > >> >> > can know this but looking at the explanation and error > > > > > >> >> > it > > > > > >> >> > should > > > > > >> >> > unless this > > > > > >> >> > cache message has a deeper meaning. > > > > > >> >> > > > > > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > > > > >> >> > > > > > > >> >> > wrote: > > > > > >> >> >> > > > > > >> >> >> I had this error before when two output mapper > > > > > >> >> >> objects > > > > > >> >> >> mapped > > > > > >> >> >> to > > > > > >> >> >> the same > > > > > >> >> >> file. > > > > > >> >> >> > > > > > >> >> >> $ swift bug_same.swift > > > > > >> >> >> Swift svn swift-r4208 cog-r3073 > > > > > >> >> >> > > > > > >> >> >> RunID: 20110330-1818-ygec7ppa > > > > > >> >> >> Progress: time:0 > > > > > >> >> >> The cache already contains > > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > > >> >> >> > > > > > >> >> >> The cache already contains > > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > > >> >> >> > > > > > >> >> >> Progress: time:1960 Stage in:1 Finished > > > > > >> >> >> successfully:1 > > > > > >> >> >> The cache already contains > > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > > >> >> >> > > > > > >> >> >> [aespinosa at communicado testing]$ > > > > > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift > > > > > >> >> >> type file; > > > > > >> >> >> > > > > > >> >> >> app (file out) echo(string input) { > > > > > >> >> >> echo input stdout=@filename(out); > > > > > >> >> >> } > > > > > >> >> >> > > > > > >> >> >> file a <"foo">; > > > > > >> >> >> file b <"foo">; > > > > > >> >> >> > > > > > >> >> >> a = echo("hello world"); > > > > > >> >> >> b = echo("foo bar"); > > > > > >> >> >> > > > > > >> >> >> But i think you should be using other Swift mappers > > > > > >> >> >> that > > > > > >> >> >> does > > > > > >> >> >> auto-numbering of files by default. > > > > > >> >> >> > > > > > >> >> >> -Allan > > > > > >> >> >> > > > > > >> >> >> 2011/3/30 Zhao Zhang : > > > > > >> >> >> > Hi guys, > > > > > >> >> >> > > > > > > >> >> >> > I am seeing something weird in swfit-0.92. Any idea > > > > > >> >> >> > about > > > > > >> >> >> > this? > > > > > >> >> >> > The swift script is very simple: > > > > > >> >> >> > > > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > > > > > >> >> >> > type Pickle {} > > > > > >> >> >> > type History {} > > > > > >> >> >> > type Image {} > > > > > >> >> >> > > > > > > >> >> >> > app (History historyout) movie_graph (int rerun, > > > > > >> >> >> > int > > > > > >> >> >> > epochs, > > > > > >> >> >> > Pickle > > > > > >> >> >> > picklefile) > > > > > >> >> >> > { > > > > > >> >> >> > movie_graph rerun epochs; > > > > > >> >> >> > } > > > > > >> >> >> > > > > > > >> >> >> > int arr[]; > > > > > >> >> >> > iterate i > > > > > >> >> >> > { > > > > > >> >> >> > arr[i] = i+1; > > > > > >> >> >> > }until(i == 1); > > > > > >> >> >> > > > > > > >> >> >> > int epochs; > > > > > >> >> >> > epochs = 3; > > > > > >> >> >> > Pickle picklefile > > > > >> >> >> > file="for_movies.pickled">; > > > > > >> >> >> > foreach a in arr{ > > > > > >> >> >> > History historyout > > > > >> >> >> > file=@strcat("output/rerun", a, > > > > > >> >> >> > "/histories.pickled-", a)>; > > > > > >> >> >> > historyout = movie_graph(a, epochs, picklefile); > > > > > >> >> >> > } > > > > > >> >> >> > > > > > > >> >> >> > > > > > > >> >> >> > > > > > > >> >> >> > I ran the script with the latest 0.92 version, > > > > > >> >> >> > which is > > > > > >> >> >> > loaded > > > > > >> >> >> > as > > > > > >> >> >> > a > > > > > >> >> >> > module > > > > > >> >> >> > on beagle. The I saw this: > > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > > >> >> >> > ./tc.data > > > > > >> >> >> > movies.swift > > > > > >> >> >> > Variable epochs defined in scope 99878388 shadows > > > > > >> >> >> > variable > > > > > >> >> >> > of > > > > > >> >> >> > same name > > > > > >> >> >> > in > > > > > >> >> >> > scope 1813605401 > > > > > >> >> >> > Variable picklefile defined in scope 99878388 > > > > > >> >> >> > shadows > > > > > >> >> >> > variable > > > > > >> >> >> > of > > > > > >> >> >> > same > > > > > >> >> >> > name > > > > > >> >> >> > in scope 1813605401 > > > > > >> >> >> > Swift svn swift-r4157 cog-r3056 > > > > > >> >> >> > > > > > > >> >> >> > RunID: 20110330-1636-ev8vm8gb > > > > > >> >> >> > Progress: > > > > > >> >> >> > Progress: Selecting site:3 Active:1 > > > > > >> >> >> > Progress: Selecting site:3 Checking status:1 > > > > > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished > > > > > >> >> >> > successfully:1 > > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > > >> >> >> > successfully:1 > > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > > >> >> >> > successfully:1 > > > > > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished > > > > > >> >> >> > successfully:2 > > > > > >> >> >> > Progress: Selecting site:1 Active:1 Finished > > > > > >> >> >> > successfully:2 > > > > > >> >> >> > Progress: Selecting site:1 Checking status:1 > > > > > >> >> >> > Finished > > > > > >> >> >> > successfully:2 > > > > > >> >> >> > The cache already contains > > > > > >> >> >> > > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > > >> >> >> > > > > > > >> >> >> > Execution failed: > > > > > >> >> >> > The cache already contains > > > > > >> >> >> > > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > > >> >> >> > > > > > > >> >> >> > > > > > > >> >> >> > Then I switched to an older version, it worked > > > > > >> >> >> > well. > > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > > >> >> >> > ./tc.data > > > > > >> >> >> > movies.swift > > > > > >> >> >> > Variable epochs defined in scope 212602028 shadows > > > > > >> >> >> > variable > > > > > >> >> >> > of > > > > > >> >> >> > same name > > > > > >> >> >> > in > > > > > >> >> >> > scope 1538939834 > > > > > >> >> >> > Variable picklefile defined in scope 212602028 > > > > > >> >> >> > shadows > > > > > >> >> >> > variable > > > > > >> >> >> > of same > > > > > >> >> >> > name > > > > > >> >> >> > in scope 1538939834 > > > > > >> >> >> > Swift svn swift-r3291 (swift modified locally) > > > > > >> >> >> > cog-r2750 > > > > > >> >> >> > (cog > > > > > >> >> >> > modified > > > > > >> >> >> > locally) > > > > > >> >> >> > > > > > > >> >> >> > RunID: 20110330-1639-gmbyz1qa > > > > > >> >> >> > Progress: > > > > > >> >> >> > Progress: Active:2 > > > > > >> >> >> > Progress: Active:1 Checking status:1 > > > > > >> >> >> > Final status: Finished successfully:2 > > > > > >> >> >> _______________________________________________ > > > > > > > > > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Mar 30 22:12:02 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 22:12:02 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <878937680.48522.1301541103366.JavaMail.root@zimbra.anl.gov> Message-ID: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > Why was trunk merged into the stable branch? Can you clarify what you mean here? Trunk merged into *what* stable branch, when? Do you mean before 0.92 was generated, or after? Is 0.92 not tagged? - Mike > > On Wed, 2011-03-30 at 18:40 -0700, Mihael Hategan wrote: > > On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: > > > OK, what am I missing? > > > > Nothing. That shouldn't be happening. > > > > Here's what (approximately) you should get: > > Swift svn swift-r3526 (swift modified locally) cog-r656 (cog > > modified > > locally) > > > > RunID: 20110330-1839-y71gls4a > > Progress: time:0 > > [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 > > [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 > > Final status: time:54 > > > > > > > > int arr[]; > > > > > > arr[0]=1; > > > arr[1]=2; > > > > > > foreach a in arr { > > > trace("for", a); > > > } > > > > > > login1$ swift zz3.swift > > > Swift svn swift-r4157 cog-r3056 > > > > > > RunID: 20110331-0134-bfzhkgaa > > > Progress: > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 2 > > > Final status: > > > > > > When did the foreach loop become the twice-each loop? > > > > > > I need to try some other revision and hosts with this. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > Wow, I didn't know we can do that! I treat the docs too > > > > canonically :P > > > > > > > > 2011/3/30 Michael Wilde : > > > > > > > > > login1$ cat zz2.swift > > > > > > > > > > foreach a in [0:3] { > > > > > trace("for", a); > > > > > } > > > > > > > > > > login1$ swift zz2.swift > > > > > Swift svn swift-r4157 cog-r3056 > > > > > > > > > > RunID: 20110331-0057-huo8jei0 > > > > > Progress: > > > > > SwiftScript trace: for, 1 > > > > > SwiftScript trace: for, 3 > > > > > SwiftScript trace: for, 2 > > > > > SwiftScript trace: for, 0 > > > > > Final status: > > > > > login1$ > > > > > > > > > > I suspect we need to make this more clear in the user guide > > > > > and > > > > > tutorials :) > > > > > > > > I agree. > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > ----- Original Message ----- > > > > >> Or just use the concurrent mapper to let swift handle the > > > > >> output > > > > >> naming itself. The resume files can't persist through > > > > >> multiple > > > > >> sessions though. > > > > >> > > > > >> 2011/3/30 Michael Wilde : > > > > >> > The most common case for this error occurs when two > > > > >> > iterations > > > > >> > within a foreach loop map an output file to the same > > > > >> > physical > > > > >> > file > > > > >> > name. When swift runs and tries to put the output object > > > > >> > into its > > > > >> > site cache, it sees that a file of the name name is already > > > > >> > in > > > > >> > the > > > > >> > cache, and its semantics do not allow that. > > > > >> > > > > > >> > I have not yet stared at this code long enough to see if > > > > >> > this > > > > >> > explains what is happening here. > > > > >> > > > > > >> > I also dont know why it might work under one version and > > > > >> > fail > > > > >> > under > > > > >> > 0.92. If the above situation is occurring, perhaps there is > > > > >> > some > > > > >> > randomness involved: loop iteration ordering; filename > > > > >> > generation > > > > >> > randomness or difference, etc. > > > > >> > > > > > >> > But I would debug with that in mind: make sure that all > > > > >> > *output* > > > > >> > fie > > > > >> > names mapped by the script are unique. Ideally, one should > > > > >> > be > > > > >> > able > > > > >> > to find the culprit by grepping the swift log for all the > > > > >> > mapped > > > > >> > file names and look for duplicates. > > > > >> > > > > > >> > - Mike > > > > >> > > > > > >> > > > > > >> > ----- Original Message ----- > > > > >> >> Or maybe local variables are static? Maybe they mapped to > > > > >> >> different > > > > >> >> files but to the same cache object? But I have been doing > > > > >> >> local > > > > >> >> variables in my own workflows though. > > > > >> >> > > > > >> >> 2011/3/30 Jonathan Monette : > > > > >> >> > Ok. I understand this error better. But shouldn't that > > > > >> >> > be a > > > > >> >> > different > > > > >> >> > error then? Like a and b are mapped to the same file? I > > > > >> >> > don't > > > > >> >> > know > > > > >> >> > if Swift > > > > >> >> > can know this but looking at the explanation and error > > > > >> >> > it > > > > >> >> > should > > > > >> >> > unless this > > > > >> >> > cache message has a deeper meaning. > > > > >> >> > > > > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > > > >> >> > > > > > >> >> > wrote: > > > > >> >> >> > > > > >> >> >> I had this error before when two output mapper objects > > > > >> >> >> mapped > > > > >> >> >> to > > > > >> >> >> the same > > > > >> >> >> file. > > > > >> >> >> > > > > >> >> >> $ swift bug_same.swift > > > > >> >> >> Swift svn swift-r4208 cog-r3073 > > > > >> >> >> > > > > >> >> >> RunID: 20110330-1818-ygec7ppa > > > > >> >> >> Progress: time:0 > > > > >> >> >> The cache already contains > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > >> >> >> > > > > >> >> >> The cache already contains > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > >> >> >> > > > > >> >> >> Progress: time:1960 Stage in:1 Finished successfully:1 > > > > >> >> >> The cache already contains > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > >> >> >> > > > > >> >> >> [aespinosa at communicado testing]$ > > > > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift > > > > >> >> >> type file; > > > > >> >> >> > > > > >> >> >> app (file out) echo(string input) { > > > > >> >> >> echo input stdout=@filename(out); > > > > >> >> >> } > > > > >> >> >> > > > > >> >> >> file a <"foo">; > > > > >> >> >> file b <"foo">; > > > > >> >> >> > > > > >> >> >> a = echo("hello world"); > > > > >> >> >> b = echo("foo bar"); > > > > >> >> >> > > > > >> >> >> But i think you should be using other Swift mappers > > > > >> >> >> that does > > > > >> >> >> auto-numbering of files by default. > > > > >> >> >> > > > > >> >> >> -Allan > > > > >> >> >> > > > > >> >> >> 2011/3/30 Zhao Zhang : > > > > >> >> >> > Hi guys, > > > > >> >> >> > > > > > >> >> >> > I am seeing something weird in swfit-0.92. Any idea > > > > >> >> >> > about > > > > >> >> >> > this? > > > > >> >> >> > The swift script is very simple: > > > > >> >> >> > > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift > > > > >> >> >> > type Pickle {} > > > > >> >> >> > type History {} > > > > >> >> >> > type Image {} > > > > >> >> >> > > > > > >> >> >> > app (History historyout) movie_graph (int rerun, int > > > > >> >> >> > epochs, > > > > >> >> >> > Pickle > > > > >> >> >> > picklefile) > > > > >> >> >> > { > > > > >> >> >> > movie_graph rerun epochs; > > > > >> >> >> > } > > > > >> >> >> > > > > > >> >> >> > int arr[]; > > > > >> >> >> > iterate i > > > > >> >> >> > { > > > > >> >> >> > arr[i] = i+1; > > > > >> >> >> > }until(i == 1); > > > > >> >> >> > > > > > >> >> >> > int epochs; > > > > >> >> >> > epochs = 3; > > > > >> >> >> > Pickle picklefile > > > >> >> >> > file="for_movies.pickled">; > > > > >> >> >> > foreach a in arr{ > > > > >> >> >> > History historyout > > > >> >> >> > file=@strcat("output/rerun", a, > > > > >> >> >> > "/histories.pickled-", a)>; > > > > >> >> >> > historyout = movie_graph(a, epochs, picklefile); > > > > >> >> >> > } > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > I ran the script with the latest 0.92 version, which > > > > >> >> >> > is > > > > >> >> >> > loaded > > > > >> >> >> > as > > > > >> >> >> > a > > > > >> >> >> > module > > > > >> >> >> > on beagle. The I saw this: > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > >> >> >> > ./tc.data > > > > >> >> >> > movies.swift > > > > >> >> >> > Variable epochs defined in scope 99878388 shadows > > > > >> >> >> > variable > > > > >> >> >> > of > > > > >> >> >> > same name > > > > >> >> >> > in > > > > >> >> >> > scope 1813605401 > > > > >> >> >> > Variable picklefile defined in scope 99878388 shadows > > > > >> >> >> > variable > > > > >> >> >> > of > > > > >> >> >> > same > > > > >> >> >> > name > > > > >> >> >> > in scope 1813605401 > > > > >> >> >> > Swift svn swift-r4157 cog-r3056 > > > > >> >> >> > > > > > >> >> >> > RunID: 20110330-1636-ev8vm8gb > > > > >> >> >> > Progress: > > > > >> >> >> > Progress: Selecting site:3 Active:1 > > > > >> >> >> > Progress: Selecting site:3 Checking status:1 > > > > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished > > > > >> >> >> > successfully:1 > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > >> >> >> > successfully:1 > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > >> >> >> > successfully:1 > > > > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished > > > > >> >> >> > successfully:2 > > > > >> >> >> > Progress: Selecting site:1 Active:1 Finished > > > > >> >> >> > successfully:2 > > > > >> >> >> > Progress: Selecting site:1 Checking status:1 Finished > > > > >> >> >> > successfully:2 > > > > >> >> >> > The cache already contains > > > > >> >> >> > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > >> >> >> > > > > > >> >> >> > Execution failed: > > > > >> >> >> > The cache already contains > > > > >> >> >> > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > Then I switched to an older version, it worked well. > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > >> >> >> > ./tc.data > > > > >> >> >> > movies.swift > > > > >> >> >> > Variable epochs defined in scope 212602028 shadows > > > > >> >> >> > variable > > > > >> >> >> > of > > > > >> >> >> > same name > > > > >> >> >> > in > > > > >> >> >> > scope 1538939834 > > > > >> >> >> > Variable picklefile defined in scope 212602028 > > > > >> >> >> > shadows > > > > >> >> >> > variable > > > > >> >> >> > of same > > > > >> >> >> > name > > > > >> >> >> > in scope 1538939834 > > > > >> >> >> > Swift svn swift-r3291 (swift modified locally) > > > > >> >> >> > cog-r2750 > > > > >> >> >> > (cog > > > > >> >> >> > modified > > > > >> >> >> > locally) > > > > >> >> >> > > > > > >> >> >> > RunID: 20110330-1639-gmbyz1qa > > > > >> >> >> > Progress: > > > > >> >> >> > Progress: Active:2 > > > > >> >> >> > Progress: Active:1 Checking status:1 > > > > >> >> >> > Final status: Finished successfully:2 > > > > >> >> >> _______________________________________________ > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed Mar 30 22:19:38 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 30 Mar 2011 20:19:38 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1948541983.48518.1301541009442.JavaMail.root@zimbra.anl.gov> References: <1948541983.48518.1301541009442.JavaMail.root@zimbra.anl.gov> Message-ID: <1301541578.23803.7.camel@blabla2.none> On Wed, 2011-03-30 at 22:10 -0500, Michael Wilde wrote: > Yeah, I'm done for now. Except that Im a little baffled as to why > this is turning up now, as I thought I was running much more complex > scripts (generating hundreds of thousands of files) with no sign of > behavior like this, on what I think is the same revision. And what Im > running is (I *think*) from before the merge you are talking about. > Unless I'm misunderstanding what you discovered in svn. I have a feeling that all the troubles that started appearing last week might be due to this. Jonathan sees "... is already in the cache" which would happen with duplicate iterations. From hategan at mcs.anl.gov Wed Mar 30 22:21:37 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 30 Mar 2011 20:21:37 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> References: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> Message-ID: <1301541697.23803.9.camel@blabla2.none> On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: > > ----- Original Message ----- > > Why was trunk merged into the stable branch? > > Can you clarify what you mean here? Cog trunk was merged into cog branch 4.1.8 which corresponds to swift 0.92. > > Trunk merged into *what* stable branch, when? > > Do you mean before 0.92 was generated, or after? After. The 0.92 branch contains code from trunk that was added to trunk after 0.92 was merged. In particular 0.92 was, I think, not supposed to contain stuff from the fast branch and it does. > > Is 0.92 not tagged? It's a branch. From ketancmaheshwari at gmail.com Wed Mar 30 23:19:30 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 30 Mar 2011 23:19:30 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301541697.23803.9.camel@blabla2.none> References: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> <1301541697.23803.9.camel@blabla2.none> Message-ID: <03726D6E-F34F-4B99-8902-06C6DCED5122@gmail.com> Hello, Just a note that the installation on beagle from which Zhao originally reported this issue is installed from a precompiled binary. Following are my notes from the installation: ===== Downloaded precompiled binary. Version: 0.92 Date: March 24, 2011 at 3.00 PM from the url: http://www.ci.uchicago.edu/swift/packages/swift-0.92.tar.gz Created a symlink called default@ to this location ==== Also found at /soft/swift/0.92/notes.txt on beagle. Regards, Ketan On Mar 30, 2011, at 10:21 PM, Mihael Hategan wrote: > On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: >> >> ----- Original Message ----- >>> Why was trunk merged into the stable branch? >> >> Can you clarify what you mean here? > > Cog trunk was merged into cog branch 4.1.8 which corresponds to swift > 0.92. > >> >> Trunk merged into *what* stable branch, when? >> >> Do you mean before 0.92 was generated, or after? > > After. The 0.92 branch contains code from trunk that was added to trunk > after 0.92 was merged. In particular 0.92 was, I think, not supposed to > contain stuff from the fast branch and it does. > >> >> Is 0.92 not tagged? > > It's a branch. > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Wed Mar 30 23:32:08 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 23:32:08 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1948541983.48518.1301541009442.JavaMail.root@zimbra.anl.gov> Message-ID: <2027144891.48574.1301545928952.JavaMail.root@zimbra.anl.gov> It turns out that foreach loops in 0.92 based on either an array constant like [0:9] or on an array returned by readData() work fine, which explains why I didnt see the problem in my large modftdock tests. My outer loop was based on readData and my inner loop was based on an array constant. It would be interesting to learn (and fix) how this eluded the language tests. - Mike ----- Original Message ----- > Yeah, I'm done for now. Except that Im a little baffled as to why this > is turning up now, as I thought I was running much more complex > scripts (generating hundreds of thousands of files) with no sign of > behavior like this, on what I think is the same revision. And what Im > running is (I *think*) from before the merge you are talking about. > Unless I'm misunderstanding what you discovered in svn. > > - Mike > > > ----- Original Message ----- > > I think at this point we should stop testing 0.92 until we figure > > out > > the reason for the merge. > > > > Trunk contained a pretty dramatic change to the karajan engine and I > > would expect badness like that on a merge back to a stable branch. > > The > > previous behaviour (double iterations) alone is a sign of badness, > > and > > so this new thing doesn't surprise me. > > > > Mihael > > > > On Wed, 2011-03-30 at 21:54 -0500, Michael Wilde wrote: > > > There seems to be something non-deterministic about this script: > > > > > > com$ cat zz5.swift > > > int arr[]; > > > int brr[]; > > > > > > arr[0]=1; > > > arr[1]=2; > > > > > > brr = [1:2]; > > > > > > trace("arr",arr); > > > trace("brr",brr); > > > > > > foreach a in arr { > > > trace("for", a); > > > } > > > > > > com$ > > > > > > (By the way, Im seeing the same error on communicado) > > > > > > The script above sometimes prints 2, 3, or 4 instances of the > > > trace() inside the foreach. And sometimes it hangs on one of the > > > two > > > trace statements outside the loop. Most cases, it prints all 6 > > > traces, as in the original failing case. > > > > > > com$ swift zz5.swift > > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > > > RunID: 20110330-2148-qf5anxr6 > > > Progress: time:2 > > > SwiftScript trace: arr, arr.$[]/2 > > > SwiftScript trace: brr, brr.$[]/2 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 2 > > > Final status: time:12 > > > Time: 1.163, rate: 14087 j/s > > > com$ swift zz5.swift > > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > > > RunID: 20110330-2148-kouc9zq3 > > > Progress: time:3 > > > SwiftScript trace: arr, arr.$[]/2 > > > SwiftScript trace: brr, brr.$[]/2 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 1 > > > Final status: time:16 > > > Time: 1.214, rate: 13495 j/s > > > com$ swift zz5.swift > > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > > > RunID: 20110330-2148-lksn2a17 > > > Progress: time:2 > > > SwiftScript trace: arr, arr.$[]/2 > > > SwiftScript trace: brr, brr.$[]/2 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > Final status: time:17 > > > Time: 1.227, rate: 13352 j/s > > > com$ swift zz5.swift > > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > > > RunID: 20110330-2148-tl2xtxx6 > > > Progress: time:1 > > > SwiftScript trace: arr, arr.$[]/2 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: brr, brr.$[]/2 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 2 > > > Final status: time:14 > > > Time: 1.224, rate: 13385 j/s > > > com$ swift zz5.swift > > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > > > RunID: 20110330-2148-mk5aypbg > > > Progress: time:6 > > > SwiftScript trace: arr, arr.$[]/2 > > > SwiftScript trace: brr, brr.$[]/2 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 2 > > > Final status: time:17 > > > Time: 1.191, rate: 13756 j/s > > > com$ swift zz5.swift > > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > > > RunID: 20110330-2148-hgcbaxga > > > Progress: time:2 > > > SwiftScript trace: arr, arr.$[]/2 > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > com$ swift zz5.swift > > > Swift svn swift-r4087 (swift modified locally) cog-r3051 > > > > > > RunID: 20110330-2149-oaa0kuy8 > > > Progress:SwiftScript trace: arr, arr.$[]/2 > > > time:9 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: brr, brr.$[]/2 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > Final status: time:17 > > > Time: 1.241, rate: 13202 j/s > > > com$ > > > > > > > > > ----- Original Message ----- > > > > On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: > > > > > OK, what am I missing? > > > > > > > > Nothing. That shouldn't be happening. > > > > > > > > Here's what (approximately) you should get: > > > > Swift svn swift-r3526 (swift modified locally) cog-r656 (cog > > > > modified > > > > locally) > > > > > > > > RunID: 20110330-1839-y71gls4a > > > > Progress: time:0 > > > > [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 > > > > [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 > > > > Final status: time:54 > > > > > > > > > > > > > > int arr[]; > > > > > > > > > > arr[0]=1; > > > > > arr[1]=2; > > > > > > > > > > foreach a in arr { > > > > > trace("for", a); > > > > > } > > > > > > > > > > login1$ swift zz3.swift > > > > > Swift svn swift-r4157 cog-r3056 > > > > > > > > > > RunID: 20110331-0134-bfzhkgaa > > > > > Progress: > > > > > SwiftScript trace: for, 2 > > > > > SwiftScript trace: for, 1 > > > > > SwiftScript trace: for, 1 > > > > > SwiftScript trace: for, 2 > > > > > Final status: > > > > > > > > > > When did the foreach loop become the twice-each loop? > > > > > > > > > > I need to try some other revision and hosts with this. > > > > > > > > > > - Mike > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > Wow, I didn't know we can do that! I treat the docs too > > > > > > canonically :P > > > > > > > > > > > > 2011/3/30 Michael Wilde : > > > > > > > > > > > > > login1$ cat zz2.swift > > > > > > > > > > > > > > foreach a in [0:3] { > > > > > > > trace("for", a); > > > > > > > } > > > > > > > > > > > > > > login1$ swift zz2.swift > > > > > > > Swift svn swift-r4157 cog-r3056 > > > > > > > > > > > > > > RunID: 20110331-0057-huo8jei0 > > > > > > > Progress: > > > > > > > SwiftScript trace: for, 1 > > > > > > > SwiftScript trace: for, 3 > > > > > > > SwiftScript trace: for, 2 > > > > > > > SwiftScript trace: for, 0 > > > > > > > Final status: > > > > > > > login1$ > > > > > > > > > > > > > > I suspect we need to make this more clear in the user > > > > > > > guide > > > > > > > and > > > > > > > tutorials :) > > > > > > > > > > > > I agree. > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > >> Or just use the concurrent mapper to let swift handle the > > > > > > >> output > > > > > > >> naming itself. The resume files can't persist through > > > > > > >> multiple > > > > > > >> sessions though. > > > > > > >> > > > > > > >> 2011/3/30 Michael Wilde : > > > > > > >> > The most common case for this error occurs when two > > > > > > >> > iterations > > > > > > >> > within a foreach loop map an output file to the same > > > > > > >> > physical > > > > > > >> > file > > > > > > >> > name. When swift runs and tries to put the output > > > > > > >> > object > > > > > > >> > into > > > > > > >> > its > > > > > > >> > site cache, it sees that a file of the name name is > > > > > > >> > already > > > > > > >> > in > > > > > > >> > the > > > > > > >> > cache, and its semantics do not allow that. > > > > > > >> > > > > > > > >> > I have not yet stared at this code long enough to see > > > > > > >> > if > > > > > > >> > this > > > > > > >> > explains what is happening here. > > > > > > >> > > > > > > > >> > I also dont know why it might work under one version > > > > > > >> > and > > > > > > >> > fail > > > > > > >> > under > > > > > > >> > 0.92. If the above situation is occurring, perhaps > > > > > > >> > there > > > > > > >> > is > > > > > > >> > some > > > > > > >> > randomness involved: loop iteration ordering; filename > > > > > > >> > generation > > > > > > >> > randomness or difference, etc. > > > > > > >> > > > > > > > >> > But I would debug with that in mind: make sure that all > > > > > > >> > *output* > > > > > > >> > fie > > > > > > >> > names mapped by the script are unique. Ideally, one > > > > > > >> > should be > > > > > > >> > able > > > > > > >> > to find the culprit by grepping the swift log for all > > > > > > >> > the > > > > > > >> > mapped > > > > > > >> > file names and look for duplicates. > > > > > > >> > > > > > > > >> > - Mike > > > > > > >> > > > > > > > >> > > > > > > > >> > ----- Original Message ----- > > > > > > >> >> Or maybe local variables are static? Maybe they mapped > > > > > > >> >> to > > > > > > >> >> different > > > > > > >> >> files but to the same cache object? But I have been > > > > > > >> >> doing > > > > > > >> >> local > > > > > > >> >> variables in my own workflows though. > > > > > > >> >> > > > > > > >> >> 2011/3/30 Jonathan Monette : > > > > > > >> >> > Ok. I understand this error better. But shouldn't > > > > > > >> >> > that > > > > > > >> >> > be > > > > > > >> >> > a > > > > > > >> >> > different > > > > > > >> >> > error then? Like a and b are mapped to the same > > > > > > >> >> > file? > > > > > > >> >> > I > > > > > > >> >> > don't > > > > > > >> >> > know > > > > > > >> >> > if Swift > > > > > > >> >> > can know this but looking at the explanation and > > > > > > >> >> > error > > > > > > >> >> > it > > > > > > >> >> > should > > > > > > >> >> > unless this > > > > > > >> >> > cache message has a deeper meaning. > > > > > > >> >> > > > > > > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > > > > > > >> >> > > > > > > > >> >> > wrote: > > > > > > >> >> >> > > > > > > >> >> >> I had this error before when two output mapper > > > > > > >> >> >> objects > > > > > > >> >> >> mapped > > > > > > >> >> >> to > > > > > > >> >> >> the same > > > > > > >> >> >> file. > > > > > > >> >> >> > > > > > > >> >> >> $ swift bug_same.swift > > > > > > >> >> >> Swift svn swift-r4208 cog-r3073 > > > > > > >> >> >> > > > > > > >> >> >> RunID: 20110330-1818-ygec7ppa > > > > > > >> >> >> Progress: time:0 > > > > > > >> >> >> The cache already contains > > > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > > > >> >> >> > > > > > > >> >> >> The cache already contains > > > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > > > >> >> >> > > > > > > >> >> >> Progress: time:1960 Stage in:1 Finished > > > > > > >> >> >> successfully:1 > > > > > > >> >> >> The cache already contains > > > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > > > > > > >> >> >> > > > > > > >> >> >> [aespinosa at communicado testing]$ > > > > > > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift > > > > > > >> >> >> type file; > > > > > > >> >> >> > > > > > > >> >> >> app (file out) echo(string input) { > > > > > > >> >> >> echo input stdout=@filename(out); > > > > > > >> >> >> } > > > > > > >> >> >> > > > > > > >> >> >> file a <"foo">; > > > > > > >> >> >> file b <"foo">; > > > > > > >> >> >> > > > > > > >> >> >> a = echo("hello world"); > > > > > > >> >> >> b = echo("foo bar"); > > > > > > >> >> >> > > > > > > >> >> >> But i think you should be using other Swift mappers > > > > > > >> >> >> that > > > > > > >> >> >> does > > > > > > >> >> >> auto-numbering of files by default. > > > > > > >> >> >> > > > > > > >> >> >> -Allan > > > > > > >> >> >> > > > > > > >> >> >> 2011/3/30 Zhao Zhang : > > > > > > >> >> >> > Hi guys, > > > > > > >> >> >> > > > > > > > >> >> >> > I am seeing something weird in swfit-0.92. Any > > > > > > >> >> >> > idea > > > > > > >> >> >> > about > > > > > > >> >> >> > this? > > > > > > >> >> >> > The swift script is very simple: > > > > > > >> >> >> > > > > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat > > > > > > >> >> >> > movies.swift > > > > > > >> >> >> > type Pickle {} > > > > > > >> >> >> > type History {} > > > > > > >> >> >> > type Image {} > > > > > > >> >> >> > > > > > > > >> >> >> > app (History historyout) movie_graph (int rerun, > > > > > > >> >> >> > int > > > > > > >> >> >> > epochs, > > > > > > >> >> >> > Pickle > > > > > > >> >> >> > picklefile) > > > > > > >> >> >> > { > > > > > > >> >> >> > movie_graph rerun epochs; > > > > > > >> >> >> > } > > > > > > >> >> >> > > > > > > > >> >> >> > int arr[]; > > > > > > >> >> >> > iterate i > > > > > > >> >> >> > { > > > > > > >> >> >> > arr[i] = i+1; > > > > > > >> >> >> > }until(i == 1); > > > > > > >> >> >> > > > > > > > >> >> >> > int epochs; > > > > > > >> >> >> > epochs = 3; > > > > > > >> >> >> > Pickle picklefile > > > > > >> >> >> > file="for_movies.pickled">; > > > > > > >> >> >> > foreach a in arr{ > > > > > > >> >> >> > History historyout > > > > > >> >> >> > file=@strcat("output/rerun", a, > > > > > > >> >> >> > "/histories.pickled-", a)>; > > > > > > >> >> >> > historyout = movie_graph(a, epochs, picklefile); > > > > > > >> >> >> > } > > > > > > >> >> >> > > > > > > > >> >> >> > > > > > > > >> >> >> > > > > > > > >> >> >> > I ran the script with the latest 0.92 version, > > > > > > >> >> >> > which is > > > > > > >> >> >> > loaded > > > > > > >> >> >> > as > > > > > > >> >> >> > a > > > > > > >> >> >> > module > > > > > > >> >> >> > on beagle. The I saw this: > > > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > > > >> >> >> > ./tc.data > > > > > > >> >> >> > movies.swift > > > > > > >> >> >> > Variable epochs defined in scope 99878388 shadows > > > > > > >> >> >> > variable > > > > > > >> >> >> > of > > > > > > >> >> >> > same name > > > > > > >> >> >> > in > > > > > > >> >> >> > scope 1813605401 > > > > > > >> >> >> > Variable picklefile defined in scope 99878388 > > > > > > >> >> >> > shadows > > > > > > >> >> >> > variable > > > > > > >> >> >> > of > > > > > > >> >> >> > same > > > > > > >> >> >> > name > > > > > > >> >> >> > in scope 1813605401 > > > > > > >> >> >> > Swift svn swift-r4157 cog-r3056 > > > > > > >> >> >> > > > > > > > >> >> >> > RunID: 20110330-1636-ev8vm8gb > > > > > > >> >> >> > Progress: > > > > > > >> >> >> > Progress: Selecting site:3 Active:1 > > > > > > >> >> >> > Progress: Selecting site:3 Checking status:1 > > > > > > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished > > > > > > >> >> >> > successfully:1 > > > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > > > >> >> >> > successfully:1 > > > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished > > > > > > >> >> >> > successfully:1 > > > > > > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished > > > > > > >> >> >> > successfully:2 > > > > > > >> >> >> > Progress: Selecting site:1 Active:1 Finished > > > > > > >> >> >> > successfully:2 > > > > > > >> >> >> > Progress: Selecting site:1 Checking status:1 > > > > > > >> >> >> > Finished > > > > > > >> >> >> > successfully:2 > > > > > > >> >> >> > The cache already contains > > > > > > >> >> >> > > > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > > > >> >> >> > > > > > > > >> >> >> > Execution failed: > > > > > > >> >> >> > The cache already contains > > > > > > >> >> >> > > > > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > > > > > > >> >> >> > > > > > > > >> >> >> > > > > > > > >> >> >> > Then I switched to an older version, it worked > > > > > > >> >> >> > well. > > > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file > > > > > > >> >> >> > ./tc.data > > > > > > >> >> >> > movies.swift > > > > > > >> >> >> > Variable epochs defined in scope 212602028 > > > > > > >> >> >> > shadows > > > > > > >> >> >> > variable > > > > > > >> >> >> > of > > > > > > >> >> >> > same name > > > > > > >> >> >> > in > > > > > > >> >> >> > scope 1538939834 > > > > > > >> >> >> > Variable picklefile defined in scope 212602028 > > > > > > >> >> >> > shadows > > > > > > >> >> >> > variable > > > > > > >> >> >> > of same > > > > > > >> >> >> > name > > > > > > >> >> >> > in scope 1538939834 > > > > > > >> >> >> > Swift svn swift-r3291 (swift modified locally) > > > > > > >> >> >> > cog-r2750 > > > > > > >> >> >> > (cog > > > > > > >> >> >> > modified > > > > > > >> >> >> > locally) > > > > > > >> >> >> > > > > > > > >> >> >> > RunID: 20110330-1639-gmbyz1qa > > > > > > >> >> >> > Progress: > > > > > > >> >> >> > Progress: Active:2 > > > > > > >> >> >> > Progress: Active:1 Checking status:1 > > > > > > >> >> >> > Final status: Finished successfully:2 > > > > > > >> >> >> _______________________________________________ > > > > > > > > > > > > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Mar 30 23:33:50 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Mar 2011 23:33:50 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <03726D6E-F34F-4B99-8902-06C6DCED5122@gmail.com> Message-ID: <454427407.48576.1301546030400.JavaMail.root@zimbra.anl.gov> Ketan, is the swift module on Beagle, then, compiled without the ppn:N:cray enhancement? If so, it could not be working on Cray compute nodes, could it? - Mike ----- Original Message ----- > Hello, > > Just a note that the installation on beagle from which Zhao originally > reported this issue is installed from a precompiled binary. Following > are my notes from the installation: > > ===== > Downloaded precompiled binary. > Version: 0.92 > Date: March 24, 2011 at 3.00 PM > from the url: > http://www.ci.uchicago.edu/swift/packages/swift-0.92.tar.gz > Created a symlink called default@ to this location > ==== > > Also found at /soft/swift/0.92/notes.txt on beagle. > > Regards, > Ketan > > > > On Mar 30, 2011, at 10:21 PM, Mihael Hategan wrote: > > > On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: > >> > >> ----- Original Message ----- > >>> Why was trunk merged into the stable branch? > >> > >> Can you clarify what you mean here? > > > > Cog trunk was merged into cog branch 4.1.8 which corresponds to > > swift > > 0.92. > > > >> > >> Trunk merged into *what* stable branch, when? > >> > >> Do you mean before 0.92 was generated, or after? > > > > After. The 0.92 branch contains code from trunk that was added to > > trunk > > after 0.92 was merged. In particular 0.92 was, I think, not supposed > > to > > contain stuff from the fast branch and it does. > > > >> > >> Is 0.92 not tagged? > > > > It's a branch. > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Wed Mar 30 23:41:22 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 30 Mar 2011 23:41:22 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <454427407.48576.1301546030400.JavaMail.root@zimbra.anl.gov> References: <454427407.48576.1301546030400.JavaMail.root@zimbra.anl.gov> Message-ID: <27B80831-CAD2-44FB-AA08-A82F91A32F75@gmail.com> Mike, I have no idea about ppn:N:cray compilation enhancement. I put the latest, stable version *assuming* it is fit for beagle environment. Partly based on my knowledge that it has been working on BG/P environment. Ketan On Mar 30, 2011, at 11:33 PM, Michael Wilde wrote: > Ketan, is the swift module on Beagle, then, compiled without the ppn:N:cray enhancement? If so, it could not be working on Cray compute nodes, could it? > > - Mike > > > ----- Original Message ----- >> Hello, >> >> Just a note that the installation on beagle from which Zhao originally >> reported this issue is installed from a precompiled binary. Following >> are my notes from the installation: >> >> ===== >> Downloaded precompiled binary. >> Version: 0.92 >> Date: March 24, 2011 at 3.00 PM >> from the url: >> http://www.ci.uchicago.edu/swift/packages/swift-0.92.tar.gz >> Created a symlink called default@ to this location >> ==== >> >> Also found at /soft/swift/0.92/notes.txt on beagle. >> >> Regards, >> Ketan >> >> >> >> On Mar 30, 2011, at 10:21 PM, Mihael Hategan wrote: >> >>> On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: >>>> >>>> ----- Original Message ----- >>>>> Why was trunk merged into the stable branch? >>>> >>>> Can you clarify what you mean here? >>> >>> Cog trunk was merged into cog branch 4.1.8 which corresponds to >>> swift >>> 0.92. >>> >>>> >>>> Trunk merged into *what* stable branch, when? >>>> >>>> Do you mean before 0.92 was generated, or after? >>> >>> After. The 0.92 branch contains code from trunk that was added to >>> trunk >>> after 0.92 was merged. In particular 0.92 was, I think, not supposed >>> to >>> contain stuff from the fast branch and it does. >>> >>>> >>>> Is 0.92 not tagged? >>> >>> It's a branch. >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From wilde at mcs.anl.gov Thu Mar 31 00:16:31 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 00:16:31 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <27B80831-CAD2-44FB-AA08-A82F91A32F75@gmail.com> Message-ID: <606165489.48598.1301548591698.JavaMail.root@zimbra.anl.gov> Kets, lets discuss in person tomorrow. If this is the one you have been running, its very possible that you are running all your app() calls entirely on the login node, as without my mods Swift jobs do not do aprun(). Only my version of the code in /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast/bin/swift has the logic to run correctly on Beagle. I thought that *this* is what you put into the swift module on Beagle. If not, take a look at the difference in .submit files generated by plain 0.92 vs the revision above. We need to find a way to get this into svn: either into a new working trunk, or some branch based on 0.92 (eg 0.92cray?) - Mike ----- Original Message ----- > Mike, > > I have no idea about ppn:N:cray compilation enhancement. I put the > latest, stable version *assuming* it is fit for beagle environment. > Partly based on my knowledge that it has been working on BG/P > environment. > > Ketan > > > On Mar 30, 2011, at 11:33 PM, Michael Wilde wrote: > > > Ketan, is the swift module on Beagle, then, compiled without the > > ppn:N:cray enhancement? If so, it could not be working on Cray > > compute nodes, could it? > > > > - Mike > > > > > > ----- Original Message ----- > >> Hello, > >> > >> Just a note that the installation on beagle from which Zhao > >> originally > >> reported this issue is installed from a precompiled binary. > >> Following > >> are my notes from the installation: > >> > >> ===== > >> Downloaded precompiled binary. > >> Version: 0.92 > >> Date: March 24, 2011 at 3.00 PM > >> from the url: > >> http://www.ci.uchicago.edu/swift/packages/swift-0.92.tar.gz > >> Created a symlink called default@ to this location > >> ==== > >> > >> Also found at /soft/swift/0.92/notes.txt on beagle. > >> > >> Regards, > >> Ketan > >> > >> > >> > >> On Mar 30, 2011, at 10:21 PM, Mihael Hategan wrote: > >> > >>> On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: > >>>> > >>>> ----- Original Message ----- > >>>>> Why was trunk merged into the stable branch? > >>>> > >>>> Can you clarify what you mean here? > >>> > >>> Cog trunk was merged into cog branch 4.1.8 which corresponds to > >>> swift > >>> 0.92. > >>> > >>>> > >>>> Trunk merged into *what* stable branch, when? > >>>> > >>>> Do you mean before 0.92 was generated, or after? > >>> > >>> After. The 0.92 branch contains code from trunk that was added to > >>> trunk > >>> after 0.92 was merged. In particular 0.92 was, I think, not > >>> supposed > >>> to > >>> contain stuff from the fast branch and it does. > >>> > >>>> > >>>> Is 0.92 not tagged? > >>> > >>> It's a branch. > >>> > >>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From zhaozhang at uchicago.edu Thu Mar 31 08:26:08 2011 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 31 Mar 2011 08:26:08 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1915541056.48255.1301533103829.JavaMail.root@zimbra.anl.gov> References: <1915541056.48255.1301533103829.JavaMail.root@zimbra.anl.gov> Message-ID: <4D9480F0.7030802@uchicago.edu> Yeah, foreach a in [1:3] works perfectly. best zhao On 3/30/2011 7:58 PM, Michael Wilde wrote: > I wouldn't do that quite yet. This is very curious, but I think its the likely cause of what you are seeing. The foreach() loop in this case seems to be having double vision :) Its either some Swift subtlety or a bug: > > login1$ cat zz1.swift > int arr[]; > > iterate i > { > arr[i] = i+1; > trace(i, arr[i]); > }until(i == 1); > > foreach a in arr { > trace("for", a); > } > > login1$ swift zz1.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110331-0053-tkh8yla5 > Progress: > SwiftScript trace: 0, 1 > SwiftScript trace: for, 1 > SwiftScript trace: for, 1 > SwiftScript trace: 1, 2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 2 > Final status: > login1$ > > > But, Zhao, you could in the meantime use much simpler code like so: > > login1$ cat zz2.swift > > foreach a in [0:3] { > trace("for", a); > } > > login1$ swift zz2.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110331-0057-huo8jei0 > Progress: > SwiftScript trace: for, 1 > SwiftScript trace: for, 3 > SwiftScript trace: for, 2 > SwiftScript trace: for, 0 > Final status: > login1$ > > I suspect we need to make this more clear in the user guide and tutorials :) > > - Mike > > > ----- Original Message ----- >> Or just use the concurrent mapper to let swift handle the output >> naming itself. The resume files can't persist through multiple >> sessions though. >> >> 2011/3/30 Michael Wilde: >>> The most common case for this error occurs when two iterations >>> within a foreach loop map an output file to the same physical file >>> name. When swift runs and tries to put the output object into its >>> site cache, it sees that a file of the name name is already in the >>> cache, and its semantics do not allow that. >>> >>> I have not yet stared at this code long enough to see if this >>> explains what is happening here. >>> >>> I also dont know why it might work under one version and fail under >>> 0.92. If the above situation is occurring, perhaps there is some >>> randomness involved: loop iteration ordering; filename generation >>> randomness or difference, etc. >>> >>> But I would debug with that in mind: make sure that all *output* fie >>> names mapped by the script are unique. Ideally, one should be able >>> to find the culprit by grepping the swift log for all the mapped >>> file names and look for duplicates. >>> >>> - Mike >>> >>> >>> ----- Original Message ----- >>>> Or maybe local variables are static? Maybe they mapped to different >>>> files but to the same cache object? But I have been doing local >>>> variables in my own workflows though. >>>> >>>> 2011/3/30 Jonathan Monette: >>>>> Ok. I understand this error better. But shouldn't that be a >>>>> different >>>>> error then? Like a and b are mapped to the same file? I don't >>>>> know >>>>> if Swift >>>>> can know this but looking at the explanation and error it should >>>>> unless this >>>>> cache message has a deeper meaning. >>>>> >>>>> On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa >>>>> >>>>> wrote: >>>>>> I had this error before when two output mapper objects mapped to >>>>>> the same >>>>>> file. >>>>>> >>>>>> $ swift bug_same.swift >>>>>> Swift svn swift-r4208 cog-r3073 >>>>>> >>>>>> RunID: 20110330-1818-ygec7ppa >>>>>> Progress: time:0 >>>>>> The cache already contains >>>>>> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >>>>>> >>>>>> The cache already contains >>>>>> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >>>>>> >>>>>> Progress: time:1960 Stage in:1 Finished successfully:1 >>>>>> The cache already contains >>>>>> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >>>>>> >>>>>> [aespinosa at communicado testing]$ >>>>>> [aespinosa at communicado testing]$ cat bug_same.swift >>>>>> type file; >>>>>> >>>>>> app (file out) echo(string input) { >>>>>> echo input stdout=@filename(out); >>>>>> } >>>>>> >>>>>> file a<"foo">; >>>>>> file b<"foo">; >>>>>> >>>>>> a = echo("hello world"); >>>>>> b = echo("foo bar"); >>>>>> >>>>>> But i think you should be using other Swift mappers that does >>>>>> auto-numbering of files by default. >>>>>> >>>>>> -Allan >>>>>> >>>>>> 2011/3/30 Zhao Zhang: >>>>>>> Hi guys, >>>>>>> >>>>>>> I am seeing something weird in swfit-0.92. Any idea about >>>>>>> this? >>>>>>> The swift script is very simple: >>>>>>> >>>>>>> zzhang at sandbox:~/workplace/Andrey> cat movies.swift >>>>>>> type Pickle {} >>>>>>> type History {} >>>>>>> type Image {} >>>>>>> >>>>>>> app (History historyout) movie_graph (int rerun, int epochs, >>>>>>> Pickle >>>>>>> picklefile) >>>>>>> { >>>>>>> movie_graph rerun epochs; >>>>>>> } >>>>>>> >>>>>>> int arr[]; >>>>>>> iterate i >>>>>>> { >>>>>>> arr[i] = i+1; >>>>>>> }until(i == 1); >>>>>>> >>>>>>> int epochs; >>>>>>> epochs = 3; >>>>>>> Pickle picklefile>>>>>> file="for_movies.pickled">; >>>>>>> foreach a in arr{ >>>>>>> History historyout>>>>>> file=@strcat("output/rerun", a, >>>>>>> "/histories.pickled-", a)>; >>>>>>> historyout = movie_graph(a, epochs, picklefile); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> I ran the script with the latest 0.92 version, which is loaded >>>>>>> as >>>>>>> a >>>>>>> module >>>>>>> on beagle. The I saw this: >>>>>>> zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data >>>>>>> movies.swift >>>>>>> Variable epochs defined in scope 99878388 shadows variable of >>>>>>> same name >>>>>>> in >>>>>>> scope 1813605401 >>>>>>> Variable picklefile defined in scope 99878388 shadows variable >>>>>>> of >>>>>>> same >>>>>>> name >>>>>>> in scope 1813605401 >>>>>>> Swift svn swift-r4157 cog-r3056 >>>>>>> >>>>>>> RunID: 20110330-1636-ev8vm8gb >>>>>>> Progress: >>>>>>> Progress: Selecting site:3 Active:1 >>>>>>> Progress: Selecting site:3 Checking status:1 >>>>>>> Progress: Selecting site:2 Stage in:1 Finished successfully:1 >>>>>>> Progress: Selecting site:2 Active:1 Finished successfully:1 >>>>>>> Progress: Selecting site:2 Active:1 Finished successfully:1 >>>>>>> Progress: Selecting site:1 Stage in:1 Finished successfully:2 >>>>>>> Progress: Selecting site:1 Active:1 Finished successfully:2 >>>>>>> Progress: Selecting site:1 Checking status:1 Finished >>>>>>> successfully:2 >>>>>>> The cache already contains >>>>>>> >>>>>>> localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >>>>>>> >>>>>>> Execution failed: >>>>>>> The cache already contains >>>>>>> >>>>>>> localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >>>>>>> >>>>>>> >>>>>>> Then I switched to an older version, it worked well. >>>>>>> zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data >>>>>>> movies.swift >>>>>>> Variable epochs defined in scope 212602028 shadows variable of >>>>>>> same name >>>>>>> in >>>>>>> scope 1538939834 >>>>>>> Variable picklefile defined in scope 212602028 shadows >>>>>>> variable >>>>>>> of same >>>>>>> name >>>>>>> in scope 1538939834 >>>>>>> Swift svn swift-r3291 (swift modified locally) cog-r2750 (cog >>>>>>> modified >>>>>>> locally) >>>>>>> >>>>>>> RunID: 20110330-1639-gmbyz1qa >>>>>>> Progress: >>>>>>> Progress: Active:2 >>>>>>> Progress: Active:1 Checking status:1 >>>>>>> Final status: Finished successfully:2 >>>>>> _______________________________________________ From wilde at mcs.anl.gov Thu Mar 31 09:25:43 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 09:25:43 -0500 (CDT) Subject: [Swift-devel] Todays priorities Message-ID: <1704155067.49301.1301581543468.JavaMail.root@zimbra.anl.gov> Justin (with any help Mihael can offer): find out what went wrong in 0.92, both from a code merge point of view, and if possible, what is causing the twice-each bug. Does the bug exist in trunk? Ketan (with help from Justin): determine how to get the Cray support changes into SVN and generate a new Cray release from that svn-controlled code base. Sarah: add categories to bugzilla for: development tools and processes (to include build process, svn, bug tracking, etc) error messages and diag tools status reporting and logging Set all bugs with 0.93 targets to indicate that Adjust the ReleasePlace web page to point to these bugzilla bugs Make sure all bugs are assigned to one of us, or unassigned. Generate a list of the top 5 things on each person's place Lets use from here on Severity to indicate the impact on users and Priority to indicate where it stands on the plate of the assigned-to person, and adjust the bugs settings accordingly. All: get anything on your plates related to Swift development into bugzilla, and lets use it as our primary (ideally sole) Swift to-do-list. Thanks, Mike From wozniak at mcs.anl.gov Thu Mar 31 09:34:14 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Thu, 31 Mar 2011 09:34:14 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301535722.20879.0.camel@blabla2.none> References: <793966660.48334.1301535423996.JavaMail.root@zimbra.anl.gov> <1301535653.20729.2.camel@blabla2.none> <1301535722.20879.0.camel@blabla2.none> Message-ID: I did that to get things unified in trunk. I can revert things back to the state before that if you think it would help sort this out. On Wed, 30 Mar 2011, Mihael Hategan wrote: > Why was trunk merged into the stable branch? > > On Wed, 2011-03-30 at 18:40 -0700, Mihael Hategan wrote: >> On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: >>> OK, what am I missing? >> >> Nothing. That shouldn't be happening. >> >> Here's what (approximately) you should get: >> Swift svn swift-r3526 (swift modified locally) cog-r656 (cog modified >> locally) >> >> RunID: 20110330-1839-y71gls4a >> Progress: time:0 >> [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 >> [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 >> Final status: time:54 >> >>> >>> int arr[]; >>> >>> arr[0]=1; >>> arr[1]=2; >>> >>> foreach a in arr { >>> trace("for", a); >>> } >>> >>> login1$ swift zz3.swift >>> Swift svn swift-r4157 cog-r3056 >>> >>> RunID: 20110331-0134-bfzhkgaa >>> Progress: >>> SwiftScript trace: for, 2 >>> SwiftScript trace: for, 1 >>> SwiftScript trace: for, 1 >>> SwiftScript trace: for, 2 >>> Final status: >>> >>> When did the foreach loop become the twice-each loop? >>> >>> I need to try some other revision and hosts with this. >>> >>> - Mike >>> >>> >>> ----- Original Message ----- >>>> Wow, I didn't know we can do that! I treat the docs too canonically :P >>>> >>>> 2011/3/30 Michael Wilde : >>>> >>>>> login1$ cat zz2.swift >>>>> >>>>> foreach a in [0:3] { >>>>> trace("for", a); >>>>> } >>>>> >>>>> login1$ swift zz2.swift >>>>> Swift svn swift-r4157 cog-r3056 >>>>> >>>>> RunID: 20110331-0057-huo8jei0 >>>>> Progress: >>>>> SwiftScript trace: for, 1 >>>>> SwiftScript trace: for, 3 >>>>> SwiftScript trace: for, 2 >>>>> SwiftScript trace: for, 0 >>>>> Final status: >>>>> login1$ >>>>> >>>>> I suspect we need to make this more clear in the user guide and >>>>> tutorials :) >>>> >>>> I agree. >>>> >>>>> >>>>> - Mike >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> Or just use the concurrent mapper to let swift handle the output >>>>>> naming itself. The resume files can't persist through multiple >>>>>> sessions though. >>>>>> >>>>>> 2011/3/30 Michael Wilde : >>>>>>> The most common case for this error occurs when two iterations >>>>>>> within a foreach loop map an output file to the same physical >>>>>>> file >>>>>>> name. When swift runs and tries to put the output object into its >>>>>>> site cache, it sees that a file of the name name is already in >>>>>>> the >>>>>>> cache, and its semantics do not allow that. >>>>>>> >>>>>>> I have not yet stared at this code long enough to see if this >>>>>>> explains what is happening here. >>>>>>> >>>>>>> I also dont know why it might work under one version and fail >>>>>>> under >>>>>>> 0.92. If the above situation is occurring, perhaps there is some >>>>>>> randomness involved: loop iteration ordering; filename generation >>>>>>> randomness or difference, etc. >>>>>>> >>>>>>> But I would debug with that in mind: make sure that all *output* >>>>>>> fie >>>>>>> names mapped by the script are unique. Ideally, one should be >>>>>>> able >>>>>>> to find the culprit by grepping the swift log for all the mapped >>>>>>> file names and look for duplicates. >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> Or maybe local variables are static? Maybe they mapped to >>>>>>>> different >>>>>>>> files but to the same cache object? But I have been doing local >>>>>>>> variables in my own workflows though. >>>>>>>> >>>>>>>> 2011/3/30 Jonathan Monette : >>>>>>>>> Ok. I understand this error better. But shouldn't that be a >>>>>>>>> different >>>>>>>>> error then? Like a and b are mapped to the same file? I don't >>>>>>>>> know >>>>>>>>> if Swift >>>>>>>>> can know this but looking at the explanation and error it >>>>>>>>> should >>>>>>>>> unless this >>>>>>>>> cache message has a deeper meaning. >>>>>>>>> >>>>>>>>> On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I had this error before when two output mapper objects mapped >>>>>>>>>> to >>>>>>>>>> the same >>>>>>>>>> file. >>>>>>>>>> >>>>>>>>>> $ swift bug_same.swift >>>>>>>>>> Swift svn swift-r4208 cog-r3073 >>>>>>>>>> >>>>>>>>>> RunID: 20110330-1818-ygec7ppa >>>>>>>>>> Progress: time:0 >>>>>>>>>> The cache already contains >>>>>>>>>> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >>>>>>>>>> >>>>>>>>>> The cache already contains >>>>>>>>>> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >>>>>>>>>> >>>>>>>>>> Progress: time:1960 Stage in:1 Finished successfully:1 >>>>>>>>>> The cache already contains >>>>>>>>>> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. >>>>>>>>>> >>>>>>>>>> [aespinosa at communicado testing]$ >>>>>>>>>> [aespinosa at communicado testing]$ cat bug_same.swift >>>>>>>>>> type file; >>>>>>>>>> >>>>>>>>>> app (file out) echo(string input) { >>>>>>>>>> echo input stdout=@filename(out); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> file a <"foo">; >>>>>>>>>> file b <"foo">; >>>>>>>>>> >>>>>>>>>> a = echo("hello world"); >>>>>>>>>> b = echo("foo bar"); >>>>>>>>>> >>>>>>>>>> But i think you should be using other Swift mappers that does >>>>>>>>>> auto-numbering of files by default. >>>>>>>>>> >>>>>>>>>> -Allan >>>>>>>>>> >>>>>>>>>> 2011/3/30 Zhao Zhang : >>>>>>>>>>> Hi guys, >>>>>>>>>>> >>>>>>>>>>> I am seeing something weird in swfit-0.92. Any idea about >>>>>>>>>>> this? >>>>>>>>>>> The swift script is very simple: >>>>>>>>>>> >>>>>>>>>>> zzhang at sandbox:~/workplace/Andrey> cat movies.swift >>>>>>>>>>> type Pickle {} >>>>>>>>>>> type History {} >>>>>>>>>>> type Image {} >>>>>>>>>>> >>>>>>>>>>> app (History historyout) movie_graph (int rerun, int >>>>>>>>>>> epochs, >>>>>>>>>>> Pickle >>>>>>>>>>> picklefile) >>>>>>>>>>> { >>>>>>>>>>> movie_graph rerun epochs; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> int arr[]; >>>>>>>>>>> iterate i >>>>>>>>>>> { >>>>>>>>>>> arr[i] = i+1; >>>>>>>>>>> }until(i == 1); >>>>>>>>>>> >>>>>>>>>>> int epochs; >>>>>>>>>>> epochs = 3; >>>>>>>>>>> Pickle picklefile >>>>>>>>>> file="for_movies.pickled">; >>>>>>>>>>> foreach a in arr{ >>>>>>>>>>> History historyout >>>>>>>>>> file=@strcat("output/rerun", a, >>>>>>>>>>> "/histories.pickled-", a)>; >>>>>>>>>>> historyout = movie_graph(a, epochs, picklefile); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I ran the script with the latest 0.92 version, which is >>>>>>>>>>> loaded >>>>>>>>>>> as >>>>>>>>>>> a >>>>>>>>>>> module >>>>>>>>>>> on beagle. The I saw this: >>>>>>>>>>> zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data >>>>>>>>>>> movies.swift >>>>>>>>>>> Variable epochs defined in scope 99878388 shadows variable >>>>>>>>>>> of >>>>>>>>>>> same name >>>>>>>>>>> in >>>>>>>>>>> scope 1813605401 >>>>>>>>>>> Variable picklefile defined in scope 99878388 shadows >>>>>>>>>>> variable >>>>>>>>>>> of >>>>>>>>>>> same >>>>>>>>>>> name >>>>>>>>>>> in scope 1813605401 >>>>>>>>>>> Swift svn swift-r4157 cog-r3056 >>>>>>>>>>> >>>>>>>>>>> RunID: 20110330-1636-ev8vm8gb >>>>>>>>>>> Progress: >>>>>>>>>>> Progress: Selecting site:3 Active:1 >>>>>>>>>>> Progress: Selecting site:3 Checking status:1 >>>>>>>>>>> Progress: Selecting site:2 Stage in:1 Finished >>>>>>>>>>> successfully:1 >>>>>>>>>>> Progress: Selecting site:2 Active:1 Finished successfully:1 >>>>>>>>>>> Progress: Selecting site:2 Active:1 Finished successfully:1 >>>>>>>>>>> Progress: Selecting site:1 Stage in:1 Finished >>>>>>>>>>> successfully:2 >>>>>>>>>>> Progress: Selecting site:1 Active:1 Finished successfully:2 >>>>>>>>>>> Progress: Selecting site:1 Checking status:1 Finished >>>>>>>>>>> successfully:2 >>>>>>>>>>> The cache already contains >>>>>>>>>>> >>>>>>>>>>> localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >>>>>>>>>>> >>>>>>>>>>> Execution failed: >>>>>>>>>>> The cache already contains >>>>>>>>>>> >>>>>>>>>>> localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Then I switched to an older version, it worked well. >>>>>>>>>>> zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data >>>>>>>>>>> movies.swift >>>>>>>>>>> Variable epochs defined in scope 212602028 shadows variable >>>>>>>>>>> of >>>>>>>>>>> same name >>>>>>>>>>> in >>>>>>>>>>> scope 1538939834 >>>>>>>>>>> Variable picklefile defined in scope 212602028 shadows >>>>>>>>>>> variable >>>>>>>>>>> of same >>>>>>>>>>> name >>>>>>>>>>> in scope 1538939834 >>>>>>>>>>> Swift svn swift-r3291 (swift modified locally) cog-r2750 >>>>>>>>>>> (cog >>>>>>>>>>> modified >>>>>>>>>>> locally) >>>>>>>>>>> >>>>>>>>>>> RunID: 20110330-1639-gmbyz1qa >>>>>>>>>>> Progress: >>>>>>>>>>> Progress: Active:2 >>>>>>>>>>> Progress: Active:1 Checking status:1 >>>>>>>>>>> Final status: Finished successfully:2 >>>>>>>>>> _______________________________________________ >>>>> >>> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From bugzilla-daemon at mcs.anl.gov Thu Mar 31 10:08:50 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 31 Mar 2011 10:08:50 -0500 (CDT) Subject: [Swift-devel] [Bug 281] New: naming convention for runs and runtime entities Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=281 Summary: naming convention for runs and runtime entities Product: Swift Version: trunk Platform: PC OS/Version: Mac OS Status: NEW Keywords: running Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: ketan at mcs.anl.gov The names of the .submit files generated by Swift runtime are different than those of the corresponding run ids. It would be convenient for a user tracing her run to find all the generated entities with a same name as the run id or a subset of run id. For instance, for a run id of 20110331-0948-r1qbu9a6 the id of the PBS submit file could be PBSr1qbu9a6.submit. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From hategan at mcs.anl.gov Thu Mar 31 12:03:51 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 31 Mar 2011 10:03:51 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: References: <793966660.48334.1301535423996.JavaMail.root@zimbra.anl.gov> <1301535653.20729.2.camel@blabla2.none> <1301535722.20879.0.camel@blabla2.none> Message-ID: <1301591031.32581.0.camel@blabla2.none> I think that was in the reverse direction. The branch needs to go into trunk and not the other way around. Mihael On Thu, 2011-03-31 at 09:34 -0500, Justin M Wozniak wrote: > I did that to get things unified in trunk. I can revert things back to > the state before that if you think it would help sort this out. > > On Wed, 30 Mar 2011, Mihael Hategan wrote: > > > Why was trunk merged into the stable branch? > > > > On Wed, 2011-03-30 at 18:40 -0700, Mihael Hategan wrote: > >> On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote: > >>> OK, what am I missing? > >> > >> Nothing. That shouldn't be happening. > >> > >> Here's what (approximately) you should get: > >> Swift svn swift-r3526 (swift modified locally) cog-r656 (cog modified > >> locally) > >> > >> RunID: 20110330-1839-y71gls4a > >> Progress: time:0 > >> [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2 > >> [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1 > >> Final status: time:54 > >> > >>> > >>> int arr[]; > >>> > >>> arr[0]=1; > >>> arr[1]=2; > >>> > >>> foreach a in arr { > >>> trace("for", a); > >>> } > >>> > >>> login1$ swift zz3.swift > >>> Swift svn swift-r4157 cog-r3056 > >>> > >>> RunID: 20110331-0134-bfzhkgaa > >>> Progress: > >>> SwiftScript trace: for, 2 > >>> SwiftScript trace: for, 1 > >>> SwiftScript trace: for, 1 > >>> SwiftScript trace: for, 2 > >>> Final status: > >>> > >>> When did the foreach loop become the twice-each loop? > >>> > >>> I need to try some other revision and hosts with this. > >>> > >>> - Mike > >>> > >>> > >>> ----- Original Message ----- > >>>> Wow, I didn't know we can do that! I treat the docs too canonically :P > >>>> > >>>> 2011/3/30 Michael Wilde : > >>>> > >>>>> login1$ cat zz2.swift > >>>>> > >>>>> foreach a in [0:3] { > >>>>> trace("for", a); > >>>>> } > >>>>> > >>>>> login1$ swift zz2.swift > >>>>> Swift svn swift-r4157 cog-r3056 > >>>>> > >>>>> RunID: 20110331-0057-huo8jei0 > >>>>> Progress: > >>>>> SwiftScript trace: for, 1 > >>>>> SwiftScript trace: for, 3 > >>>>> SwiftScript trace: for, 2 > >>>>> SwiftScript trace: for, 0 > >>>>> Final status: > >>>>> login1$ > >>>>> > >>>>> I suspect we need to make this more clear in the user guide and > >>>>> tutorials :) > >>>> > >>>> I agree. > >>>> > >>>>> > >>>>> - Mike > >>>>> > >>>>> > >>>>> ----- Original Message ----- > >>>>>> Or just use the concurrent mapper to let swift handle the output > >>>>>> naming itself. The resume files can't persist through multiple > >>>>>> sessions though. > >>>>>> > >>>>>> 2011/3/30 Michael Wilde : > >>>>>>> The most common case for this error occurs when two iterations > >>>>>>> within a foreach loop map an output file to the same physical > >>>>>>> file > >>>>>>> name. When swift runs and tries to put the output object into its > >>>>>>> site cache, it sees that a file of the name name is already in > >>>>>>> the > >>>>>>> cache, and its semantics do not allow that. > >>>>>>> > >>>>>>> I have not yet stared at this code long enough to see if this > >>>>>>> explains what is happening here. > >>>>>>> > >>>>>>> I also dont know why it might work under one version and fail > >>>>>>> under > >>>>>>> 0.92. If the above situation is occurring, perhaps there is some > >>>>>>> randomness involved: loop iteration ordering; filename generation > >>>>>>> randomness or difference, etc. > >>>>>>> > >>>>>>> But I would debug with that in mind: make sure that all *output* > >>>>>>> fie > >>>>>>> names mapped by the script are unique. Ideally, one should be > >>>>>>> able > >>>>>>> to find the culprit by grepping the swift log for all the mapped > >>>>>>> file names and look for duplicates. > >>>>>>> > >>>>>>> - Mike > >>>>>>> > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> Or maybe local variables are static? Maybe they mapped to > >>>>>>>> different > >>>>>>>> files but to the same cache object? But I have been doing local > >>>>>>>> variables in my own workflows though. > >>>>>>>> > >>>>>>>> 2011/3/30 Jonathan Monette : > >>>>>>>>> Ok. I understand this error better. But shouldn't that be a > >>>>>>>>> different > >>>>>>>>> error then? Like a and b are mapped to the same file? I don't > >>>>>>>>> know > >>>>>>>>> if Swift > >>>>>>>>> can know this but looking at the explanation and error it > >>>>>>>>> should > >>>>>>>>> unless this > >>>>>>>>> cache message has a deeper meaning. > >>>>>>>>> > >>>>>>>>> On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa > >>>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> I had this error before when two output mapper objects mapped > >>>>>>>>>> to > >>>>>>>>>> the same > >>>>>>>>>> file. > >>>>>>>>>> > >>>>>>>>>> $ swift bug_same.swift > >>>>>>>>>> Swift svn swift-r4208 cog-r3073 > >>>>>>>>>> > >>>>>>>>>> RunID: 20110330-1818-ygec7ppa > >>>>>>>>>> Progress: time:0 > >>>>>>>>>> The cache already contains > >>>>>>>>>> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >>>>>>>>>> > >>>>>>>>>> The cache already contains > >>>>>>>>>> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >>>>>>>>>> > >>>>>>>>>> Progress: time:1960 Stage in:1 Finished successfully:1 > >>>>>>>>>> The cache already contains > >>>>>>>>>> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo. > >>>>>>>>>> > >>>>>>>>>> [aespinosa at communicado testing]$ > >>>>>>>>>> [aespinosa at communicado testing]$ cat bug_same.swift > >>>>>>>>>> type file; > >>>>>>>>>> > >>>>>>>>>> app (file out) echo(string input) { > >>>>>>>>>> echo input stdout=@filename(out); > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> file a <"foo">; > >>>>>>>>>> file b <"foo">; > >>>>>>>>>> > >>>>>>>>>> a = echo("hello world"); > >>>>>>>>>> b = echo("foo bar"); > >>>>>>>>>> > >>>>>>>>>> But i think you should be using other Swift mappers that does > >>>>>>>>>> auto-numbering of files by default. > >>>>>>>>>> > >>>>>>>>>> -Allan > >>>>>>>>>> > >>>>>>>>>> 2011/3/30 Zhao Zhang : > >>>>>>>>>>> Hi guys, > >>>>>>>>>>> > >>>>>>>>>>> I am seeing something weird in swfit-0.92. Any idea about > >>>>>>>>>>> this? > >>>>>>>>>>> The swift script is very simple: > >>>>>>>>>>> > >>>>>>>>>>> zzhang at sandbox:~/workplace/Andrey> cat movies.swift > >>>>>>>>>>> type Pickle {} > >>>>>>>>>>> type History {} > >>>>>>>>>>> type Image {} > >>>>>>>>>>> > >>>>>>>>>>> app (History historyout) movie_graph (int rerun, int > >>>>>>>>>>> epochs, > >>>>>>>>>>> Pickle > >>>>>>>>>>> picklefile) > >>>>>>>>>>> { > >>>>>>>>>>> movie_graph rerun epochs; > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> int arr[]; > >>>>>>>>>>> iterate i > >>>>>>>>>>> { > >>>>>>>>>>> arr[i] = i+1; > >>>>>>>>>>> }until(i == 1); > >>>>>>>>>>> > >>>>>>>>>>> int epochs; > >>>>>>>>>>> epochs = 3; > >>>>>>>>>>> Pickle picklefile >>>>>>>>>>> file="for_movies.pickled">; > >>>>>>>>>>> foreach a in arr{ > >>>>>>>>>>> History historyout >>>>>>>>>>> file=@strcat("output/rerun", a, > >>>>>>>>>>> "/histories.pickled-", a)>; > >>>>>>>>>>> historyout = movie_graph(a, epochs, picklefile); > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I ran the script with the latest 0.92 version, which is > >>>>>>>>>>> loaded > >>>>>>>>>>> as > >>>>>>>>>>> a > >>>>>>>>>>> module > >>>>>>>>>>> on beagle. The I saw this: > >>>>>>>>>>> zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > >>>>>>>>>>> movies.swift > >>>>>>>>>>> Variable epochs defined in scope 99878388 shadows variable > >>>>>>>>>>> of > >>>>>>>>>>> same name > >>>>>>>>>>> in > >>>>>>>>>>> scope 1813605401 > >>>>>>>>>>> Variable picklefile defined in scope 99878388 shadows > >>>>>>>>>>> variable > >>>>>>>>>>> of > >>>>>>>>>>> same > >>>>>>>>>>> name > >>>>>>>>>>> in scope 1813605401 > >>>>>>>>>>> Swift svn swift-r4157 cog-r3056 > >>>>>>>>>>> > >>>>>>>>>>> RunID: 20110330-1636-ev8vm8gb > >>>>>>>>>>> Progress: > >>>>>>>>>>> Progress: Selecting site:3 Active:1 > >>>>>>>>>>> Progress: Selecting site:3 Checking status:1 > >>>>>>>>>>> Progress: Selecting site:2 Stage in:1 Finished > >>>>>>>>>>> successfully:1 > >>>>>>>>>>> Progress: Selecting site:2 Active:1 Finished successfully:1 > >>>>>>>>>>> Progress: Selecting site:2 Active:1 Finished successfully:1 > >>>>>>>>>>> Progress: Selecting site:1 Stage in:1 Finished > >>>>>>>>>>> successfully:2 > >>>>>>>>>>> Progress: Selecting site:1 Active:1 Finished successfully:2 > >>>>>>>>>>> Progress: Selecting site:1 Checking status:1 Finished > >>>>>>>>>>> successfully:2 > >>>>>>>>>>> The cache already contains > >>>>>>>>>>> > >>>>>>>>>>> localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > >>>>>>>>>>> > >>>>>>>>>>> Execution failed: > >>>>>>>>>>> The cache already contains > >>>>>>>>>>> > >>>>>>>>>>> localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Then I switched to an older version, it worked well. > >>>>>>>>>>> zzhang at sandbox:~/workplace/Andrey> swift -tc.file ./tc.data > >>>>>>>>>>> movies.swift > >>>>>>>>>>> Variable epochs defined in scope 212602028 shadows variable > >>>>>>>>>>> of > >>>>>>>>>>> same name > >>>>>>>>>>> in > >>>>>>>>>>> scope 1538939834 > >>>>>>>>>>> Variable picklefile defined in scope 212602028 shadows > >>>>>>>>>>> variable > >>>>>>>>>>> of same > >>>>>>>>>>> name > >>>>>>>>>>> in scope 1538939834 > >>>>>>>>>>> Swift svn swift-r3291 (swift modified locally) cog-r2750 > >>>>>>>>>>> (cog > >>>>>>>>>>> modified > >>>>>>>>>>> locally) > >>>>>>>>>>> > >>>>>>>>>>> RunID: 20110330-1639-gmbyz1qa > >>>>>>>>>>> Progress: > >>>>>>>>>>> Progress: Active:2 > >>>>>>>>>>> Progress: Active:1 Checking status:1 > >>>>>>>>>>> Final status: Finished successfully:2 > >>>>>>>>>> _______________________________________________ > >>>>> > >>> > >> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From wozniak at mcs.anl.gov Thu Mar 31 12:48:53 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Thu, 31 Mar 2011 12:48:53 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301541697.23803.9.camel@blabla2.none> References: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> <1301541697.23803.9.camel@blabla2.none> Message-ID: By doing a bisection I found that the regression seems to occur in Swift trunk between 3835:3836 (around 01/01/2011). Still looking... On Wed, 30 Mar 2011, Mihael Hategan wrote: > On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: >> >> ----- Original Message ----- >>> Why was trunk merged into the stable branch? >> >> Can you clarify what you mean here? > > Cog trunk was merged into cog branch 4.1.8 which corresponds to swift > 0.92. > >> >> Trunk merged into *what* stable branch, when? >> >> Do you mean before 0.92 was generated, or after? > > After. The 0.92 branch contains code from trunk that was added to trunk > after 0.92 was merged. In particular 0.92 was, I think, not supposed to > contain stuff from the fast branch and it does. > >> >> Is 0.92 not tagged? > > It's a branch. > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From hategan at mcs.anl.gov Thu Mar 31 13:34:37 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 31 Mar 2011 11:34:37 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: References: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> <1301541697.23803.9.camel@blabla2.none> Message-ID: <1301596477.1319.0.camel@blabla2.none> Can we have a quick phone chat about this? On Thu, 2011-03-31 at 12:48 -0500, Justin M Wozniak wrote: > By doing a bisection I found that the regression seems to occur in Swift > trunk between 3835:3836 (around 01/01/2011). Still looking... > > On Wed, 30 Mar 2011, Mihael Hategan wrote: > > > On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: > >> > >> ----- Original Message ----- > >>> Why was trunk merged into the stable branch? > >> > >> Can you clarify what you mean here? > > > > Cog trunk was merged into cog branch 4.1.8 which corresponds to swift > > 0.92. > > > >> > >> Trunk merged into *what* stable branch, when? > >> > >> Do you mean before 0.92 was generated, or after? > > > > After. The 0.92 branch contains code from trunk that was added to trunk > > after 0.92 was merged. In particular 0.92 was, I think, not supposed to > > contain stuff from the fast branch and it does. > > > >> > >> Is 0.92 not tagged? > > > > It's a branch. > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From wozniak at mcs.anl.gov Thu Mar 31 13:35:21 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Thu, 31 Mar 2011 13:35:21 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301596477.1319.0.camel@blabla2.none> References: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> <1301541697.23803.9.camel@blabla2.none> <1301596477.1319.0.camel@blabla2.none> Message-ID: Sure: (630) 252-3351 On Thu, 31 Mar 2011, Mihael Hategan wrote: > Can we have a quick phone chat about this? > > On Thu, 2011-03-31 at 12:48 -0500, Justin M Wozniak wrote: >> By doing a bisection I found that the regression seems to occur in Swift >> trunk between 3835:3836 (around 01/01/2011). Still looking... >> >> On Wed, 30 Mar 2011, Mihael Hategan wrote: >> >>> On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: >>>> >>>> ----- Original Message ----- >>>>> Why was trunk merged into the stable branch? >>>> >>>> Can you clarify what you mean here? >>> >>> Cog trunk was merged into cog branch 4.1.8 which corresponds to swift >>> 0.92. >>> >>>> >>>> Trunk merged into *what* stable branch, when? >>>> >>>> Do you mean before 0.92 was generated, or after? >>> >>> After. The 0.92 branch contains code from trunk that was added to trunk >>> after 0.92 was merged. In particular 0.92 was, I think, not supposed to >>> contain stuff from the fast branch and it does. >>> >>>> >>>> Is 0.92 not tagged? >>> >>> It's a branch. >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> > > > -- Justin M Wozniak From ketancmaheshwari at gmail.com Thu Mar 31 13:48:52 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 31 Mar 2011 13:48:52 -0500 Subject: [Swift-devel] gensites templates Message-ID: Hello, Does a set of predefined gensites templates for common clusters exists somewhere? I could see an example for surveyor here: https://sites.google.com/site/swiftparallelscripting/home/managingsites Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Thu Mar 31 13:51:13 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 31 Mar 2011 13:51:13 -0500 Subject: [Swift-devel] gensites templates In-Reply-To: References: Message-ID: Hi Ketan, I know the Swift page in the CI has a couple. Also, I believe a couple of configs are in svn as well. One place that I can name is the sites testing directory -Allan 2011/3/31 Ketan Maheshwari : > Hello, > > Does a set of predefined gensites templates for common clusters exists > somewhere? > > I could see an example for surveyor here: > https://sites.google.com/site/swiftparallelscripting/home/managingsites > > Ketan > > From hategan at mcs.anl.gov Thu Mar 31 13:52:34 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 31 Mar 2011 11:52:34 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: References: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> <1301541697.23803.9.camel@blabla2.none> <1301596477.1319.0.camel@blabla2.none> Message-ID: <1301597554.1319.5.camel@blabla2.none> We decided the following: - I will revert the changes in the 0.92 branch - re-commit bug fixes that were committed after the merge - merge the 0.92 branch to trunk - fix the problems in trunk One issue that came up was the exact "workflow" in the branching/releasing/merging. I think we discussed this previously on this mailing list, and it may just be that Sarah has made a wiki page about it. But I'll summarize: - before a release, trunk is branched into a release branch - heavy development may continue in trunk while the branch only sees bug fixes - after the release, the branch is merged back to trunk (so that bug fixes in the branch make it back to trunk) - rinse and repeat Mihael On Thu, 2011-03-31 at 13:35 -0500, Justin M Wozniak wrote: > Sure: (630) 252-3351 > > On Thu, 31 Mar 2011, Mihael Hategan wrote: > > > Can we have a quick phone chat about this? > > > > On Thu, 2011-03-31 at 12:48 -0500, Justin M Wozniak wrote: > >> By doing a bisection I found that the regression seems to occur in Swift > >> trunk between 3835:3836 (around 01/01/2011). Still looking... > >> > >> On Wed, 30 Mar 2011, Mihael Hategan wrote: > >> > >>> On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: > >>>> > >>>> ----- Original Message ----- > >>>>> Why was trunk merged into the stable branch? > >>>> > >>>> Can you clarify what you mean here? > >>> > >>> Cog trunk was merged into cog branch 4.1.8 which corresponds to swift > >>> 0.92. > >>> > >>>> > >>>> Trunk merged into *what* stable branch, when? > >>>> > >>>> Do you mean before 0.92 was generated, or after? > >>> > >>> After. The 0.92 branch contains code from trunk that was added to trunk > >>> after 0.92 was merged. In particular 0.92 was, I think, not supposed to > >>> contain stuff from the fast branch and it does. > >>> > >>>> > >>>> Is 0.92 not tagged? > >>> > >>> It's a branch. > >>> > >>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >> > > > > > > > From dk0966 at cs.ship.edu Thu Mar 31 14:01:04 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Thu, 31 Mar 2011 15:01:04 -0400 Subject: [Swift-devel] gensites templates In-Reply-To: References: Message-ID: Hey Ketan, The list of available gensites templates can be found with the gensites -T command. Currently, the only templates available are: intrepid local local-pbs-coasters pads queenbee sge-local ssh ssh-pbs-coasters surveyor These templates are stored in etc/sites and are based on the provider test templates located in tests/providers. I think the plan is to add more in the future, but this was based on the list we created for the initial round of provider testing. David On Thu, Mar 31, 2011 at 2:48 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > Hello, > > Does a set of predefined gensites templates for common clusters exists > somewhere? > > I could see an example for surveyor here: > https://sites.google.com/site/swiftparallelscripting/home/managingsites > > Ketan > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Mar 31 14:09:02 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 14:09:02 -0500 (CDT) Subject: [Swift-devel] gensites templates In-Reply-To: Message-ID: <1777624154.51628.1301598542321.JavaMail.root@zimbra.anl.gov> We should change the doc page to show an example or two that are much simpler and more commonly used than Surveyor. - Mike ----- Original Message ----- > Hey Ketan, > > > The list of available gensites templates can be found with the > gensites -T command. Currently, the only templates available are: > > > > intrepid > local > local-pbs-coasters > pads > queenbee > sge-local > ssh > ssh-pbs-coasters > surveyor > > > These templates are stored in etc/sites and are based on the provider > test templates located in tests/providers. I think the plan is to add > more in the future, but this was based on the list we created for the > initial round of provider testing. > > > David > > On Thu, Mar 31, 2011 at 2:48 PM, Ketan Maheshwari < > ketancmaheshwari at gmail.com > wrote: > > > Hello, > > Does a set of predefined gensites templates for common clusters exists > somewhere? > > I could see an example for surveyor here: > https://sites.google.com/site/swiftparallelscripting/home/managingsites > > Ketan > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Mar 31 14:17:12 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 31 Mar 2011 14:17:12 -0500 Subject: [Swift-devel] gensites templates In-Reply-To: <1777624154.51628.1301598542321.JavaMail.root@zimbra.anl.gov> References: <1777624154.51628.1301598542321.JavaMail.root@zimbra.anl.gov> Message-ID: <69711031-2AFC-4B91-8F59-E5E998E66509@gmail.com> Right. Also, I could not find the input properties files through which these templates are supposed to be generated. Do they exist elsewhere? The document says, the properties could be integrated into swift.properties. I am trying to segregate most common properties for common clusters into dedicated .properties files. So, please let me know if you find these somewhere on your local disk or know of their location. In parallel I will fire a find+grep :) Ketan On Mar 31, 2011, at 2:09 PM, Michael Wilde wrote: > We should change the doc page to show an example or two that are much simpler and more commonly used than Surveyor. > > - Mike > > ----- Original Message ----- >> Hey Ketan, >> >> >> The list of available gensites templates can be found with the >> gensites -T command. Currently, the only templates available are: >> >> >> >> intrepid >> local >> local-pbs-coasters >> pads >> queenbee >> sge-local >> ssh >> ssh-pbs-coasters >> surveyor >> >> >> These templates are stored in etc/sites and are based on the provider >> test templates located in tests/providers. I think the plan is to add >> more in the future, but this was based on the list we created for the >> initial round of provider testing. >> >> >> David >> >> On Thu, Mar 31, 2011 at 2:48 PM, Ketan Maheshwari < >> ketancmaheshwari at gmail.com > wrote: >> >> >> Hello, >> >> Does a set of predefined gensites templates for common clusters exists >> somewhere? >> >> I could see an example for surveyor here: >> https://sites.google.com/site/swiftparallelscripting/home/managingsites >> >> Ketan >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From wilde at mcs.anl.gov Thu Mar 31 14:27:08 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 14:27:08 -0500 (CDT) Subject: [Swift-devel] gensites templates In-Reply-To: <69711031-2AFC-4B91-8F59-E5E998E66509@gmail.com> Message-ID: <1251568435.51782.1301599628522.JavaMail.root@zimbra.anl.gov> I think the idea was that the user would specify -config cf on the command line and include the #sites directives in that file along with any swift.property settings. We should suggest refinements of the latest version and document now, so we can get this into 0.93 and really start pushing it for users (and using it ourselves). Ketan and I talked a bit about making sure that the template is processed through shell expansion, so that we could include expressions in the substitutions. It may also be good to be able to include conditional selection of template lines based on user parameter values. Main thing now is to make it easy for the user to know what each template is for; which one to use; and what parameters to each template are optional, mandatory, default, and what they all mean. In the doc page, many examples matching real needs in priority and complexity order are essential. Ketan, can you open a bug on this if we dont have one yet and paste this in? - Mike ----- Original Message ----- > Right. Also, I could not find the input properties files through which > these templates are supposed to be generated. Do they exist elsewhere? > > The document says, the properties could be integrated into > swift.properties. I am trying to segregate most common properties for > common clusters into dedicated .properties files. > > So, please let me know if you find these somewhere on your local disk > or know of their location. In parallel I will fire a find+grep :) > > > Ketan > > On Mar 31, 2011, at 2:09 PM, Michael Wilde wrote: > > > We should change the doc page to show an example or two that are > > much simpler and more commonly used than Surveyor. > > > > - Mike > > > > ----- Original Message ----- > >> Hey Ketan, > >> > >> > >> The list of available gensites templates can be found with the > >> gensites -T command. Currently, the only templates available are: > >> > >> > >> > >> intrepid > >> local > >> local-pbs-coasters > >> pads > >> queenbee > >> sge-local > >> ssh > >> ssh-pbs-coasters > >> surveyor > >> > >> > >> These templates are stored in etc/sites and are based on the > >> provider > >> test templates located in tests/providers. I think the plan is to > >> add > >> more in the future, but this was based on the list we created for > >> the > >> initial round of provider testing. > >> > >> > >> David > >> > >> On Thu, Mar 31, 2011 at 2:48 PM, Ketan Maheshwari < > >> ketancmaheshwari at gmail.com > wrote: > >> > >> > >> Hello, > >> > >> Does a set of predefined gensites templates for common clusters > >> exists > >> somewhere? > >> > >> I could see an example for surveyor here: > >> https://sites.google.com/site/swiftparallelscripting/home/managingsites > >> > >> Ketan > >> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > >> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Thu Mar 31 17:28:17 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 31 Mar 2011 17:28:17 -0500 (CDT) Subject: [Swift-devel] [Bug 285] New: Improve import statement Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=285 Summary: Improve import statement Product: Swift Version: 0.93 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: SwiftScript language AssignedTo: skenny at uchicago.edu ReportedBy: wilde at mcs.anl.gov CC: mandaya at rose-hulman.edu Allow import to be anywhere in file (maybe not important given the way Swift handles definitions? (Or give a clear error message if not) Allow import from an INCLUDEPATH (higher prio) Allow dir names in files to import, releative to each entry in dirpath (low prio) Needs some discussion and usage testing; John Dennis of NCAR wants to use. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu Mar 31 17:34:54 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 31 Mar 2011 17:34:54 -0500 (CDT) Subject: [Swift-devel] [Bug 287] New: Swift loops with no explanation when no pending jobs will fit into any possible coaster block Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=287 Summary: Swift loops with no explanation when no pending jobs will fit into any possible coaster block Product: Swift Version: 0.93 Platform: All OS/Version: All Status: NEW Severity: major Priority: P1 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov CC: hategan at mcs.anl.gov Example: tc entry is: localhost cat /bin/cat null null GLOBUS::maxwalltime="00:05:00" sites pool is: 1 1 1 1 120 100 100 0.00 10000 /home/wilde/swiftwork cat app declares need for 5 mins walltime only possible coaster slot is 2 mins walltime so Swift just loops with a job in the queue that never gets run: RunID: 20110331-1702-3kfa6xa3 Progress: Progress: Initializing site shared directory:1 Progress: Stage in:1 Progress: Submitted:1 Progress: Submitted:1 User never gets an error like "No coaster slots exist with sufficient time remaining to run your job. I think the coaster block times out for inactivity, another one starts, and nothing gets run, and the user is left in the dark as to why. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug. You are watching the reporter. From wilde at mcs.anl.gov Thu Mar 31 17:39:15 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 17:39:15 -0500 (CDT) Subject: [Swift-devel] Re: bug-zilla import In-Reply-To: Message-ID: <2120723126.53046.1301611155341.JavaMail.root@zimbra.anl.gov> That certainly does, in large part, Jon! I'm cc'ing John Dennis here and I apologize to you for my forgetfullness! We need to get this into the User Guide (along with the other built in functions that you and Justin and others added) Nice piece of development work! - Mike ----- Original Message ----- > Sent that a bit early. > The INCLUDEPATH variable is SWIFT_LIB. export > SWIFT_LIB="/home/user/Swift_files" and then do import "hello". It will > check relatively for hello.swift and if its not there it will then > check the SWIFT_LIB variable. Doesn't that satisfy the bug-zilla > report? > > > On Thu, Mar 31, 2011 at 5:32 PM, Jonathan Monette < > jon.monette at gmail.com > wrote: > > > Mike, > The bug-zilla report you just filed on the import, doesn't it do most > of that already? I had added modifications during the summer to > improve import a bit. You can do import "test/test_script" and it will > import test_script from the test directory relative to the calling > script. The INCLUDEPATH > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein > > > > > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Thu Mar 31 17:40:33 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Thu, 31 Mar 2011 17:40:33 -0500 Subject: [Swift-devel] Re: bug-zilla import In-Reply-To: <2120723126.53046.1301611155341.JavaMail.root@zimbra.anl.gov> References: <2120723126.53046.1301611155341.JavaMail.root@zimbra.anl.gov> Message-ID: Don't worry about it. I believe I am the only that uses this so it's fine. Glad it's getting use now. On Thu, Mar 31, 2011 at 5:39 PM, Michael Wilde wrote: > That certainly does, in large part, Jon! > > I'm cc'ing John Dennis here and I apologize to you for my forgetfullness! > > We need to get this into the User Guide (along with the other built in > functions that you and Justin and others added) > > Nice piece of development work! > > - Mike > > > ----- Original Message ----- > > Sent that a bit early. > > The INCLUDEPATH variable is SWIFT_LIB. export > > SWIFT_LIB="/home/user/Swift_files" and then do import "hello". It will > > check relatively for hello.swift and if its not there it will then > > check the SWIFT_LIB variable. Doesn't that satisfy the bug-zilla > > report? > > > > > > On Thu, Mar 31, 2011 at 5:32 PM, Jonathan Monette < > > jon.monette at gmail.com > wrote: > > > > > > Mike, > > The bug-zilla report you just filed on the import, doesn't it do most > > of that already? I had added modifications during the summer to > > improve import a bit. You can do import "test/test_script" and it will > > import test_script from the test directory relative to the calling > > script. The INCLUDEPATH > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > > > > > > > > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Mar 31 20:14:28 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 20:14:28 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301597554.1319.5.camel@blabla2.none> Message-ID: <521870354.53458.1301620468642.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > We decided the following: > - I will revert the changes in the 0.92 branch > - re-commit bug fixes that were committed after the merge > - merge the 0.92 branch to trunk > - fix the problems in trunk Sounds good. But when and how does the fix get to users? Either create a 0.92.1 release (sounds hard based on above) or create a 0.93 release (in which case should we create the 0.93 branch from trunk as soon as this is fixed?) How long to re-test? (Thats a question for Sarah, Justin, and Ketan) Could this include the Cray support mods? > One issue that came up was the exact "workflow" in the > branching/releasing/merging. I think we discussed this previously on > this mailing list, and it may just be that Sarah has made a wiki page > about it. But I'll summarize: > - before a release, trunk is branched into a release branch > - heavy development may continue in trunk while the branch only sees > bug > fixes > - after the release, the branch is merged back to trunk (so that bug > fixes in the branch make it back to trunk) > - rinse and repeat Sounds good. Add to this a general approach for fix-only point releases to a release if waiting for the next major release will take too long? (Ie 0.92.1 in the current case?) I suspect we'd benefit from release-often in smaller increments. Also, the above workflow should consider how harder changes like the fast branch or coaster block allocator would be done. A parallel development branch in which the developer does frequent upgrades to stay close to trunk? All above sounds very good. Sarah, is this indeed on the ReleasePlans page, and if not, could you put it there, and we can refine that to cover some of the issues above and discuss? - Mike > Mihael > > On Thu, 2011-03-31 at 13:35 -0500, Justin M Wozniak wrote: > > Sure: (630) 252-3351 > > > > On Thu, 31 Mar 2011, Mihael Hategan wrote: > > > > > Can we have a quick phone chat about this? > > > > > > On Thu, 2011-03-31 at 12:48 -0500, Justin M Wozniak wrote: > > >> By doing a bisection I found that the regression seems to occur > > >> in Swift > > >> trunk between 3835:3836 (around 01/01/2011). Still looking... > > >> > > >> On Wed, 30 Mar 2011, Mihael Hategan wrote: > > >> > > >>> On Wed, 2011-03-30 at 22:12 -0500, Michael Wilde wrote: > > >>>> > > >>>> ----- Original Message ----- > > >>>>> Why was trunk merged into the stable branch? > > >>>> > > >>>> Can you clarify what you mean here? > > >>> > > >>> Cog trunk was merged into cog branch 4.1.8 which corresponds to > > >>> swift > > >>> 0.92. > > >>> > > >>>> > > >>>> Trunk merged into *what* stable branch, when? > > >>>> > > >>>> Do you mean before 0.92 was generated, or after? > > >>> > > >>> After. The 0.92 branch contains code from trunk that was added > > >>> to trunk > > >>> after 0.92 was merged. In particular 0.92 was, I think, not > > >>> supposed to > > >>> contain stuff from the fast branch and it does. > > >>> > > >>>> > > >>>> Is 0.92 not tagged? > > >>> > > >>> It's a branch. > > >>> > > >>> > > >>> _______________________________________________ > > >>> Swift-devel mailing list > > >>> Swift-devel at ci.uchicago.edu > > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>> > > >> > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu Mar 31 20:25:36 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 31 Mar 2011 18:25:36 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <521870354.53458.1301620468642.JavaMail.root@zimbra.anl.gov> References: <521870354.53458.1301620468642.JavaMail.root@zimbra.anl.gov> Message-ID: <1301621136.11290.8.camel@blabla2.none> On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > ----- Original Message ----- > > We decided the following: > > - I will revert the changes in the 0.92 branch > > - re-commit bug fixes that were committed after the merge > > - merge the 0.92 branch to trunk > > - fix the problems in trunk > > Sounds good. But when and how does the fix get to users? The package(s) are fine. Though we should probably also have a source package. The merge was done after the package(s) were uploaded to the swift site. This only affects folks who have checked out from SVN the 0.92 branch after the merge 9 days (or so) ago. We should send an email to the user list once this is fixed. We may also want to send an email warning them not to check out from SVN but download the precompiled package instead. I am a bit confused though. I would have expected the release to come with some announcement of some form. > > Either create a 0.92.1 release (sounds hard based on above) > or create a 0.93 release (in which case should we create the 0.93 branch from trunk as soon as this is fixed?) > > How long to re-test? (Thats a question for Sarah, Justin, and Ketan) > Could this include the Cray support mods? No! Fixing a problem is not a venue for introducing untested things into a release. But it could be discussed separately :) Mihael From wilde at mcs.anl.gov Thu Mar 31 20:35:55 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 20:35:55 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301621136.11290.8.camel@blabla2.none> Message-ID: <1777809980.53483.1301621755597.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > > ----- Original Message ----- > > > We decided the following: > > > - I will revert the changes in the 0.92 branch > > > - re-commit bug fixes that were committed after the merge > > > - merge the 0.92 branch to trunk > > > - fix the problems in trunk > > > > Sounds good. But when and how does the fix get to users? > > The package(s) are fine. Though we should probably also have a source > package. The merge was done after the package(s) were uploaded to the > swift site. Ah, great! > This only affects folks who have checked out from SVN the 0.92 branch > after the merge 9 days (or so) ago. Hmm - I question that. The release we use, based on 0.92 on Beagle, shows the twice-each error, and it was made on Feb 25, about 35 days ago. Does this merit clarification? > We should send an email to the user list once this is fixed. We may > also > want to send an email warning them not to check out from SVN but > download the precompiled package instead. OK. I cant say that this will reach everyone. Perhaps some status notes on the Download page are in order. The 0.91 link there is wrong, so we need to fix that page anyways. > I am a bit confused though. I would have expected the release to come > with some announcement of some form. Agreed. We kept this low profile because we were trying to coordinate it with a Web change that we never accomplished. And we've lost the habit of swift-user announcements but got to get back to doing that. So, yes. > > > > > Either create a 0.92.1 release (sounds hard based on above) > > or create a 0.93 release (in which case should we create the 0.93 > > branch from trunk as soon as this is fixed?) > > > > How long to re-test? (Thats a question for Sarah, Justin, and Ketan) > > Could this include the Cray support mods? > > No! Fixing a problem is not a venue for introducing untested things > into > a release. I meant the Cray feature for 0.93 not 0.92.1 Yes, that should be tested. But its being used pretty heavily. - Mike > But it could be discussed separately :) > > Mihael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Mar 31 20:48:02 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 20:48:02 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1777809980.53483.1301621755597.JavaMail.root@zimbra.anl.gov> Message-ID: <374065344.53495.1301622482774.JavaMail.root@zimbra.anl.gov> Please check this proposed warning on the Downloads page and let me know if its what we need there: http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php I also fixed the 0.91 typo (but the downloads dont actually work from this test web. I think they will once this is committed and pushed live). - Mike ----- Original Message ----- > ----- Original Message ----- > > On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > > > ----- Original Message ----- > > > > We decided the following: > > > > - I will revert the changes in the 0.92 branch > > > > - re-commit bug fixes that were committed after the merge > > > > - merge the 0.92 branch to trunk > > > > - fix the problems in trunk > > > > > > Sounds good. But when and how does the fix get to users? > > > > The package(s) are fine. Though we should probably also have a > > source > > package. The merge was done after the package(s) were uploaded to > > the > > swift site. > > Ah, great! > > > This only affects folks who have checked out from SVN the 0.92 > > branch > > after the merge 9 days (or so) ago. > > Hmm - I question that. The release we use, based on 0.92 on Beagle, > shows the twice-each error, and it was made on Feb 25, about 35 days > ago. Does this merit clarification? > > > We should send an email to the user list once this is fixed. We may > > also > > want to send an email warning them not to check out from SVN but > > download the precompiled package instead. > > OK. I cant say that this will reach everyone. Perhaps some status > notes on the Download page are in order. The 0.91 link there is wrong, > so we need to fix that page anyways. > > > I am a bit confused though. I would have expected the release to > > come > > with some announcement of some form. > > Agreed. We kept this low profile because we were trying to coordinate > it with a Web change that we never accomplished. And we've lost the > habit of swift-user announcements but got to get back to doing that. > So, yes. > > > > > > > > Either create a 0.92.1 release (sounds hard based on above) > > > or create a 0.93 release (in which case should we create the 0.93 > > > branch from trunk as soon as this is fixed?) > > > > > > How long to re-test? (Thats a question for Sarah, Justin, and > > > Ketan) > > > Could this include the Cray support mods? > > > > No! Fixing a problem is not a venue for introducing untested things > > into > > a release. > > I meant the Cray feature for 0.93 not 0.92.1 > Yes, that should be tested. > But its being used pretty heavily. > > - Mike > > > But it could be discussed separately :) > > > > Mihael > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Mar 31 20:57:06 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 20:57:06 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <374065344.53495.1301622482774.JavaMail.root@zimbra.anl.gov> Message-ID: <1310504872.53534.1301623026989.JavaMail.root@zimbra.anl.gov> And I will send this to swift-user: "Dear Swift Users, On March 29 we discovered that the Release 0.92 branches of the Swift and CoG trees were changed after the release and a concurrency bug was introduced. If you are running Swift from this *source code* base, please revert back to a known-working release such as the 0.92 binary release if at all possible. We're working on restoring the 0.92 SVN branch to the correct state and will report back to this email list when that is done." Anything else to say? Feel free to send this out, adjusted as needed, or just tell me what to change and I will. - Mike ----- Original Message ----- > Please check this proposed warning on the Downloads page and let me > know if its what we need there: > > http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php > > I also fixed the 0.91 typo (but the downloads dont actually work from > this test web. I think they will once this is committed and pushed > live). > > - Mike > > > ----- Original Message ----- > > ----- Original Message ----- > > > On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > > > > ----- Original Message ----- > > > > > We decided the following: > > > > > - I will revert the changes in the 0.92 branch > > > > > - re-commit bug fixes that were committed after the merge > > > > > - merge the 0.92 branch to trunk > > > > > - fix the problems in trunk > > > > > > > > Sounds good. But when and how does the fix get to users? > > > > > > The package(s) are fine. Though we should probably also have a > > > source > > > package. The merge was done after the package(s) were uploaded to > > > the > > > swift site. > > > > Ah, great! > > > > > This only affects folks who have checked out from SVN the 0.92 > > > branch > > > after the merge 9 days (or so) ago. > > > > Hmm - I question that. The release we use, based on 0.92 on Beagle, > > shows the twice-each error, and it was made on Feb 25, about 35 days > > ago. Does this merit clarification? > > > > > We should send an email to the user list once this is fixed. We > > > may > > > also > > > want to send an email warning them not to check out from SVN but > > > download the precompiled package instead. > > > > OK. I cant say that this will reach everyone. Perhaps some status > > notes on the Download page are in order. The 0.91 link there is > > wrong, > > so we need to fix that page anyways. > > > > > I am a bit confused though. I would have expected the release to > > > come > > > with some announcement of some form. > > > > Agreed. We kept this low profile because we were trying to > > coordinate > > it with a Web change that we never accomplished. And we've lost the > > habit of swift-user announcements but got to get back to doing that. > > So, yes. > > > > > > > > > > > Either create a 0.92.1 release (sounds hard based on above) > > > > or create a 0.93 release (in which case should we create the > > > > 0.93 > > > > branch from trunk as soon as this is fixed?) > > > > > > > > How long to re-test? (Thats a question for Sarah, Justin, and > > > > Ketan) > > > > Could this include the Cray support mods? > > > > > > No! Fixing a problem is not a venue for introducing untested > > > things > > > into > > > a release. > > > > I meant the Cray feature for 0.93 not 0.92.1 > > Yes, that should be tested. > > But its being used pretty heavily. > > > > - Mike > > > > > But it could be discussed separately :) > > > > > > Mihael > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu Mar 31 21:03:19 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 31 Mar 2011 19:03:19 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1777809980.53483.1301621755597.JavaMail.root@zimbra.anl.gov> References: <1777809980.53483.1301621755597.JavaMail.root@zimbra.anl.gov> Message-ID: <1301623399.12764.2.camel@blabla2.none> On Thu, 2011-03-31 at 20:35 -0500, Michael Wilde wrote: > > This only affects folks who have checked out from SVN the 0.92 branch > > after the merge 9 days (or so) ago. > > Hmm - I question that. The release we use, based on 0.92 on Beagle, > shows the twice-each error, and it was made on Feb 25, about 35 days > ago. Does this merit clarification? Perhaps. Did you do an update in the mean time? [...] > > I meant the Cray feature for 0.93 not 0.92.1 > Yes, that should be tested. > But its being used pretty heavily. I agree with that. Mihael From hategan at mcs.anl.gov Thu Mar 31 21:04:35 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 31 Mar 2011 19:04:35 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1310504872.53534.1301623026989.JavaMail.root@zimbra.anl.gov> References: <1310504872.53534.1301623026989.JavaMail.root@zimbra.anl.gov> Message-ID: <1301623475.12764.3.camel@blabla2.none> I think both are good as they are. Would you like me to send it? Mihael On Thu, 2011-03-31 at 20:57 -0500, Michael Wilde wrote: > And I will send this to swift-user: > > "Dear Swift Users, > > On March 29 we discovered that the Release 0.92 branches of the Swift and CoG trees were changed after the release and a concurrency bug was introduced. If you are running Swift from this *source code* base, please revert back to a known-working release such as the 0.92 binary release if at all possible. > > We're working on restoring the 0.92 SVN branch to the correct state and will report back to this email list when that is done." > > Anything else to say? Feel free to send this out, adjusted as needed, or just tell me what to change and I will. > > - Mike > > > ----- Original Message ----- > > Please check this proposed warning on the Downloads page and let me > > know if its what we need there: > > > > http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php > > > > I also fixed the 0.91 typo (but the downloads dont actually work from > > this test web. I think they will once this is committed and pushed > > live). > > > > - Mike > > > > > > ----- Original Message ----- > > > ----- Original Message ----- > > > > On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > > > > > ----- Original Message ----- > > > > > > We decided the following: > > > > > > - I will revert the changes in the 0.92 branch > > > > > > - re-commit bug fixes that were committed after the merge > > > > > > - merge the 0.92 branch to trunk > > > > > > - fix the problems in trunk > > > > > > > > > > Sounds good. But when and how does the fix get to users? > > > > > > > > The package(s) are fine. Though we should probably also have a > > > > source > > > > package. The merge was done after the package(s) were uploaded to > > > > the > > > > swift site. > > > > > > Ah, great! > > > > > > > This only affects folks who have checked out from SVN the 0.92 > > > > branch > > > > after the merge 9 days (or so) ago. > > > > > > Hmm - I question that. The release we use, based on 0.92 on Beagle, > > > shows the twice-each error, and it was made on Feb 25, about 35 days > > > ago. Does this merit clarification? > > > > > > > We should send an email to the user list once this is fixed. We > > > > may > > > > also > > > > want to send an email warning them not to check out from SVN but > > > > download the precompiled package instead. > > > > > > OK. I cant say that this will reach everyone. Perhaps some status > > > notes on the Download page are in order. The 0.91 link there is > > > wrong, > > > so we need to fix that page anyways. > > > > > > > I am a bit confused though. I would have expected the release to > > > > come > > > > with some announcement of some form. > > > > > > Agreed. We kept this low profile because we were trying to > > > coordinate > > > it with a Web change that we never accomplished. And we've lost the > > > habit of swift-user announcements but got to get back to doing that. > > > So, yes. > > > > > > > > > > > > > > Either create a 0.92.1 release (sounds hard based on above) > > > > > or create a 0.93 release (in which case should we create the > > > > > 0.93 > > > > > branch from trunk as soon as this is fixed?) > > > > > > > > > > How long to re-test? (Thats a question for Sarah, Justin, and > > > > > Ketan) > > > > > Could this include the Cray support mods? > > > > > > > > No! Fixing a problem is not a venue for introducing untested > > > > things > > > > into > > > > a release. > > > > > > I meant the Cray feature for 0.93 not 0.92.1 > > > Yes, that should be tested. > > > But its being used pretty heavily. > > > > > > - Mike > > > > > > > But it could be discussed separately :) > > > > > > > > Mihael > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From jon.monette at gmail.com Thu Mar 31 21:34:10 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Thu, 31 Mar 2011 21:34:10 -0500 Subject: [Swift-devel] gensites slots problem Message-ID: I have #site slots=100 in my sites.properties. I run gensites -L templates -p sites.properties pads-coasters.xml and I get the error "Not specified: SLOTS" This is because gensites does not parse out the slots line from the sites.properties file but then tries to replace _SLOT_ variable. This problem is in both 0.92 and trunk. -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Mar 31 21:41:52 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 21:41:52 -0500 (CDT) Subject: [Swift-devel] Re: gensites slots problem In-Reply-To: Message-ID: <1792519212.53589.1301625711999.JavaMail.root@zimbra.anl.gov> Can you commit a fix, Jon? Thanks, Mike ----- Original Message ----- > I have #site slots=100 in my sites.properties. I run gensites -L > templates -p sites.properties pads-coasters.xml and I get the error > "Not specified: SLOTS" This is because gensites does not parse out the > slots line from the sites.properties file but then tries to replace > _SLOT_ variable. This problem is in both 0.92 and trunk. > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Thu Mar 31 21:43:24 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Thu, 31 Mar 2011 21:43:24 -0500 Subject: [Swift-devel] Re: gensites slots problem In-Reply-To: <1792519212.53589.1301625711999.JavaMail.root@zimbra.anl.gov> References: <1792519212.53589.1301625711999.JavaMail.root@zimbra.anl.gov> Message-ID: yea. I'll commit a fix into the trunk. Don't have the branch 0.92 checked out but if a release for 0.93 is going to come out soon the fix can go in there. On Thu, Mar 31, 2011 at 9:41 PM, Michael Wilde wrote: > Can you commit a fix, Jon? > > Thanks, > > Mike > > ----- Original Message ----- > > I have #site slots=100 in my sites.properties. I run gensites -L > > templates -p sites.properties pads-coasters.xml and I get the error > > "Not specified: SLOTS" This is because gensites does not parse out the > > slots line from the sites.properties file but then tries to replace > > _SLOT_ variable. This problem is in both 0.92 and trunk. > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Mar 31 22:27:34 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 31 Mar 2011 22:27:34 -0500 (CDT) Subject: [Swift-devel] Re: Your message to Swift-commit awaits moderator approval In-Reply-To: Message-ID: <1615879940.53640.1301628454283.JavaMail.root@zimbra.anl.gov> Jon, Im not sure whats happening with these commit notifications. you should subscribe to the list, but I marked you as "accept" on this list for now. I cleaned out all the other pending notifications. We should add instructions for new developers/committers to subscribe to this list (in a general info page for new developers) Whats confusing me here is that I also got a "awaits moderator approval" even though Im on the list. - Mike ----- Original Message ----- > Just bringing this to your attention. > > > ---------- Forwarded message ---------- > From: < swift-commit-bounces at ci.uchicago.edu > > Date: Thu, Mar 31, 2011 at 10:12 PM > Subject: Your message to Swift-commit awaits moderator approval > To: jonmon at ci.uchicago.edu > > > Your mail to 'Swift-commit' with the subject > > r4236 - trunk/bin > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Post by non-member to a members-only list > > Either the message will get posted to the list, or you will receive > notification of the moderator's decision. If you would like to cancel > this posting, please visit the following URL: > > http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/82735f2236917ed3bfa130c8b93c84dc223123df > > > > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Thu Mar 31 22:44:35 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 31 Mar 2011 22:44:35 -0500 (CDT) Subject: [Swift-devel] [Bug 289] New: Add mechanism to delete temporary files no longer in scope Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=289 Summary: Add mechanism to delete temporary files no longer in scope Product: Swift Version: 0.93 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P1 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov NCAR (John Dennis) has requested a mechanism to allow a script to declare that all (or specific) anonymously-mapped temporary files be deleted when they can no longer be referenced within the script. This form of automated file garbage collection would greatly simplify their scripts, as they do this today using explicit rm() functions called from Swift. There is a fair bit of discussion in swift-devel threads on this topic from Summer 2010, as well as details in a paper that John/NCAR submitted on their use of Swift. Marked this prio 1 because NCAR is an important user community for the Swift project. This feature needs a more precise definition and some design discussion. Its possible that the limited data flow analysis done for array closing may also be useful in determining when unmapped temporary files could be removed. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu Mar 31 22:45:58 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 31 Mar 2011 22:45:58 -0500 (CDT) Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=291 Summary: Add a exists() function to test for file existence Product: Swift Version: 0.93 Platform: PC OS/Version: Mac OS Status: NEW Severity: enhancement Priority: P1 Component: SwiftScript language AssignedTo: wozniak at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov Requested by John Dennis / NCAR. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu Mar 31 22:50:32 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 31 Mar 2011 22:50:32 -0500 (CDT) Subject: [Swift-devel] [Bug 293] New: Add ability to return multiple non-file values from an app() function Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=293 Summary: Add ability to return multiple non-file values from an app() function Product: Swift Version: 0.93 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov We'd like to consider adding a calling convention for app functions to return any number of arbitrary objects, based on the app executable returning a text file that privdes the values in a manner similar to the text returned by the ext mapper. Its possible that we could do this using something like readData2 in a function written in Swift code above the app() function, in which case this bug turns into an action item to document how to do this nicely. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From ashwin_rajeev at hotmail.com Sat Mar 19 12:35:27 2011 From: ashwin_rajeev at hotmail.com (ashwin rajeev) Date: Sat, 19 Mar 2011 23:05:27 +0530 Subject: [Swift-devel] [swift-dev] FW: In-Reply-To: References: Message-ID: I am using Swift 1.0-beta9 to connect swift at rooms.swift.im.I can see all other chatting.But cannot post any message there .It is showing "Couldn't send message: Message was rejected ".Am i suppose to do anything more to chat in swift at rooms.swift.im -------------- next part -------------- An HTML attachment was scrubbed... URL: From tianyu491433909 at 163.com Sun Mar 20 02:18:48 2011 From: tianyu491433909 at 163.com (tianyu491433909) Date: Sun, 20 Mar 2011 15:18:48 +0800 (CST) Subject: [Swift-devel] a swift question Message-ID: <65e88291.ac6c.12ed221696f.Coremail.tianyu491433909@163.com> Dear Mr/Ms: I'm doingworkon theswift, pleasehelp mesolve aproblem.May I askhow swiftis runningthatscript has afew nodes,wherethe code thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: