From aespinosa at cs.uchicago.edu Fri Apr 1 05:35:00 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 1 Apr 2011 05:35:00 -0500 Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: References: Message-ID: Would it be better to handle it as an exception? type file; app foo (file x) { foo x; } app (file x) produce_a () { } file a <"dne">; try{ foo (a); } catch (dne) { a = produce_a(); foo(a); } 2011/3/31 : > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=291 > > ? ? ? ? ? Summary: Add a exists() function to test for file existence > ? ? ? ? ? Product: Swift > ? ? ? ? ? Version: 0.93 > ? ? ? ? ?Platform: PC > ? ? ? ?OS/Version: Mac OS > ? ? ? ? ? ?Status: NEW > ? ? ? ? ?Severity: enhancement > ? ? ? ? ?Priority: P1 > ? ? ? ? Component: SwiftScript language > ? ? ? ?AssignedTo: wozniak at mcs.anl.gov > ? ? ? ?ReportedBy: wilde at mcs.anl.gov > > > Requested by John Dennis / NCAR. > > -- > Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are watching the reporter. > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From bugzilla-daemon at mcs.anl.gov Fri Apr 1 05:49:34 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 1 Apr 2011 05:49:34 -0500 (CDT) Subject: [Swift-devel] [Bug 299] New: deadlock on workflow Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=299 Summary: deadlock on workflow Product: Swift Version: 0.92 Platform: PC OS/Version: Linux Status: ASSIGNED Severity: major Priority: P2 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: aespinosa at cs.uchicago.edu Related thread: http://mail.ci.uchicago.edu/pipermail/swift-devel/2011-February/007476.html -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From bugzilla-daemon at mcs.anl.gov Fri Apr 1 05:53:26 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 1 Apr 2011 05:53:26 -0500 (CDT) Subject: [Swift-devel] [Bug 301] New: hang checker waiting on a partially executed job Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=301 Summary: hang checker waiting on a partially executed job Product: Swift Version: trunk Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: aespinosa at cs.uchicago.edu Swift's listening for a job that's at least in 'Initializing shared directory state'. The job does not progress after that. Related thread: http://mail.ci.uchicago.edu/pipermail/swift-devel/2011-March/007662.html The problem of the hanging job maybe related to Bug 299 (https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=299) -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From bugzilla-daemon at mcs.anl.gov Fri Apr 1 06:14:59 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 1 Apr 2011 06:14:59 -0500 (CDT) Subject: [Swift-devel] [Bug 273] resume is currently broken In-Reply-To: References: Message-ID: <20110401111459.389801C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=273 Allan Espinosa changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|resume is currently broken |resume is currently broken |(trunk) | -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From bugzilla-daemon at mcs.anl.gov Fri Apr 1 06:38:27 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 1 Apr 2011 06:38:27 -0500 (CDT) Subject: [Swift-devel] [Bug 307] New: array slicing Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=307 Summary: array slicing Product: Swift Version: trunk Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: aespinosa at cs.uchicago.edu Feature request: http://mail.ci.uchicago.edu/pipermail/swift-user/2011-March/001881.html -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From tim.g.armstrong at gmail.com Fri Apr 1 10:22:55 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Fri, 1 Apr 2011 10:22:55 -0500 Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: References: Message-ID: Try/catch statements don't have a clean intepretation if you're trying to follow the flow of data through the program. On Fri, Apr 1, 2011 at 5:35 AM, Allan Espinosa wrote: > Would it be better to handle it as an exception? > > type file; > > app foo (file x) { > foo x; > } > > app (file x) produce_a () { > } > > file a <"dne">; > > try{ > foo (a); > } > catch (dne) { > a = produce_a(); > foo(a); > } > > 2011/3/31 : > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=291 > > > > Summary: Add a exists() function to test for file existence > > Product: Swift > > Version: 0.93 > > Platform: PC > > OS/Version: Mac OS > > Status: NEW > > Severity: enhancement > > Priority: P1 > > Component: SwiftScript language > > AssignedTo: wozniak at mcs.anl.gov > > ReportedBy: wilde at mcs.anl.gov > > > > > > Requested by John Dennis / NCAR. > > > > -- > > Configure bugmail: > https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are watching the reporter. > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Fri Apr 1 10:23:25 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Fri, 1 Apr 2011 10:23:25 -0500 Subject: [Swift-devel] Re: Your message to Swift-commit awaits moderator approval In-Reply-To: <1615879940.53640.1301628454283.JavaMail.root@zimbra.anl.gov> References: <1615879940.53640.1301628454283.JavaMail.root@zimbra.anl.gov> Message-ID: I've also been having the same problem. On Thu, Mar 31, 2011 at 10:27 PM, Michael Wilde wrote: > Jon, Im not sure whats happening with these commit notifications. > you should subscribe to the list, but I marked you as "accept" on this list > for now. > > I cleaned out all the other pending notifications. > > We should add instructions for new developers/committers to subscribe to > this list (in a general info page for new developers) > > Whats confusing me here is that I also got a "awaits moderator approval" > even though Im on the list. > > - Mike > > > ----- Original Message ----- > > Just bringing this to your attention. > > > > > > ---------- Forwarded message ---------- > > From: < swift-commit-bounces at ci.uchicago.edu > > > Date: Thu, Mar 31, 2011 at 10:12 PM > > Subject: Your message to Swift-commit awaits moderator approval > > To: jonmon at ci.uchicago.edu > > > > > > Your mail to 'Swift-commit' with the subject > > > > r4236 - trunk/bin > > > > Is being held until the list moderator can review it for approval. > > > > The reason it is being held: > > > > Post by non-member to a members-only list > > > > Either the message will get posted to the list, or you will receive > > notification of the moderator's decision. If you would like to cancel > > this posting, please visit the following URL: > > > > > http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/82735f2236917ed3bfa130c8b93c84dc223123df > > > > > > > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Fri Apr 1 10:32:18 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 1 Apr 2011 10:32:18 -0500 Subject: [Swift-devel] Request: replies to bugzilla reports on the list should reflect on the bug page as well Message-ID: I don't know how to to this in bugzilla. But in the OSG GOC bug tracker, my over-email replies to the bug gets reflected on the ticket page. -Allan From wilde at mcs.anl.gov Fri Apr 1 10:51:03 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 1 Apr 2011 10:51:03 -0500 (CDT) Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: Message-ID: <1299017498.54908.1301673063970.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > Try/catch statements don't have a clean intepretation if you're trying > to follow the flow of data through the program. I cant comment on this, as its likely a point of programming style that could involve much debate. On the issue of an exists() function, I feel: - we should first verify that exists() will solve the NCAR need in a sufficiently clean way - if so, implement it, because its easy to do - I do favor an exception handling model in the language, but feel that this is a much more complex PL and systems research project - far more complex just to design not to mention implement than the exists() function. So even if we would conclude that exceptions and try/catch would be cleaner, we dont have the resources or the driving requirement to do that in any near time frame. I think it would be a year-long MS-scale project. - Mike > > > On Fri, Apr 1, 2011 at 5:35 AM, Allan Espinosa < > aespinosa at cs.uchicago.edu > wrote: > > > Would it be better to handle it as an exception? > > type file; > > app foo (file x) { > foo x; > } > > app (file x) produce_a () { > } > > file a <"dne">; > > try{ > foo (a); > } > catch (dne) { > a = produce_a(); > foo(a); > } > > 2011/3/31 < bugzilla-daemon at mcs.anl.gov >: > > > > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=291 > > > > Summary: Add a exists() function to test for file existence > > Product: Swift > > Version: 0.93 > > Platform: PC > > OS/Version: Mac OS > > Status: NEW > > Severity: enhancement > > Priority: P1 > > Component: SwiftScript language > > AssignedTo: wozniak at mcs.anl.gov > > ReportedBy: wilde at mcs.anl.gov > > > > > > Requested by John Dennis / NCAR. > > > > -- > > Configure bugmail: > > https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are watching the reporter. > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > Allan M. Espinosa < http://amespinosa.wordpress.com > > PhD student, Computer Science > University of Chicago < http://people.cs.uchicago.edu/~aespinosa > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Apr 1 12:32:57 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 01 Apr 2011 10:32:57 -0700 Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: <1299017498.54908.1301673063970.JavaMail.root@zimbra.anl.gov> References: <1299017498.54908.1301673063970.JavaMail.root@zimbra.anl.gov> Message-ID: <1301679177.25559.0.camel@blabla2.none> On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: > - we should first verify that exists() will solve the NCAR need in a sufficiently clean way I think this is important. Can we get a description of the problem instead of a (otherwise) random proposal for a solution? From wilde at mcs.anl.gov Fri Apr 1 12:51:29 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 1 Apr 2011 12:51:29 -0500 (CDT) Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: <1301679177.25559.0.camel@blabla2.none> Message-ID: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> Basically as far as I understand: the presence or absence of a particular data file within the inout dataset is to be used to determine whether the code to process that dataset subsection gets invoked or not: if (exists("extra.data")) { DataFile extraInput<"extra.data">; extraResult = analyze(extraInput); } The above is my assumption based on a phone call. We can and should verify the assumption with a simple example. I also thought we can try this today by seeing if extraInput can be an array, mapped to zero items if nothing to do and 1 item if something to do. That would at least let us test the use case. John, can you verify if the example Swift lines above are what you are looking for here? - Mike ----- Original Message ----- > On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: > > > - we should first verify that exists() will solve the NCAR need in a > > sufficiently clean way > > I think this is important. Can we get a description of the problem > instead of a (otherwise) random proposal for a solution? -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri Apr 1 13:25:02 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 1 Apr 2011 13:25:02 -0500 (CDT) Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: <2A5A416A-CAB6-4D82-B55F-0CFBC8F3B770@ucar.edu> Message-ID: <114092944.55753.1301682302213.JavaMail.root@zimbra.anl.gov> John, I cc'ed you to confirm that the function is exactly what you were looking for, and that the simple code below matches exactly the requirement thats driving this feature request. Can you confirm that both of these are true? We just want to make sure that we understand the need and determine if the proposed exists() function is the best way to address it. From your comment below, I was not quite sure if exists() is exactly the right approach here. Thanks, Mike ----- Original Message ----- > Michael, > > This type of function would be great to have. > > John > On Apr 1, 2011, at 11:51 AM, Michael Wilde wrote: > > > Basically as far as I understand: the presence or absence of a > > particular data file within the inout dataset is to be used to > > determine whether the code to process that dataset subsection gets > > invoked or not: > > > > if (exists("extra.data")) { > > DataFile extraInput<"extra.data">; > > extraResult = analyze(extraInput); > > } > > > > The above is my assumption based on a phone call. We can and should > > verify the assumption with a simple example. > > > > I also thought we can try this today by seeing if extraInput can be > > an array, mapped to zero items if nothing to do and 1 item if > > something to do. That would at least let us test the use case. > > > > John, can you verify if the example Swift lines above are what you > > are looking for here? > > > > - Mike > > > > ----- Original Message ----- > >> On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: > >> > >>> - we should first verify that exists() will solve the NCAR need in > >>> a > >>> sufficiently clean way > >> > >> I think this is important. Can we get a description of the problem > >> instead of a (otherwise) random proposal for a solution? > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Fri Apr 1 13:25:38 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 1 Apr 2011 13:25:38 -0500 (CDT) Subject: [Swift-devel] [Bug 303] fixed_array_mapper versus array_mapper on output In-Reply-To: References: Message-ID: <20110401182538.E0DED1C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=303 Allan Espinosa changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|aespinosa at cs.uchicago.edu |hategan at mcs.anl.gov --- Comment #2 from Allan Espinosa 2011-04-01 13:25:38 --- Update on the latest trunk: Swift svn swift-r4208 cog-r3073 (which include's Mihael's patch to trunk r4089 : http://mail.ci.uchicago.edu/pipermail/swift-devel/2011-February/007466.html) fixed_array_mapper version: type file; app(file o[]) split(file i){ split "-l" 1 @filename(i) "seqout."; } file out[] ; file input <"seq.in">; out = split(input); $swift split.swift Swift svn swift-r4208 cog-r3073 RunID: 20110401-1319-tsq0pb48 Progress: time:0 No events in 10s. Registered futures: ---- Waiting threads: ---- No events in 10s. Registered futures: ---- Waiting threads: ---- Progress: time:30007 Initializing site shared directory:1 No events in 10s. Registered futures: ---- Waiting threads: ---- ... ... ... looks like fixed_array_mapper doesn't work anymore. out[] should be waiting on four futures correct? -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From bugzilla-daemon at mcs.anl.gov Fri Apr 1 13:31:30 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 1 Apr 2011 13:31:30 -0500 (CDT) Subject: [Swift-devel] [Bug 303] fixed_array_mapper versus array_mapper on output In-Reply-To: References: Message-ID: <20110401183130.C60021C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=303 Allan Espinosa changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |swift-devel at ci.uchicago.edu --- Comment #3 from Allan Espinosa 2011-04-01 13:31:30 --- Latest trunk on array_mapper version: type file; app(file o[]) split(file i){ split "-l" 1 @filename(i) "seqout."; } string s[] = ["seqout.aa", "seqout.ab", "seqout.ac","seqout.ad"]; file out[] ; file input <"seq.in">; out = split(input); Swift svn swift-r4208 cog-r3073 RunID: 20110401-1328-p44eon1h Progress: time:1 No events in 10s. Registered futures: file[] out Closed, no listeners ---- Waiting threads: ---- No events in 10s. Registered futures: file[] out Closed, no listeners ---- Waiting threads: ---- Progress: time:30008 Initializing site shared directory:1 No events in 10s. Registered futures: file[] out Closed, no listeners ---- Waiting threads: I'm not sure now if r4089 fixed the problem or not since later commits like the merge from the fast branch could have broken things. Also, the app() split() function should be included in the list of waiting thread when it got the state of 'initializing shared directory' right? -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are watching the assignee of the bug. From ketancmaheshwari at gmail.com Fri Apr 1 13:46:03 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 1 Apr 2011 13:46:03 -0500 Subject: [Swift-devel] Swift Documentation Platform Message-ID: <7CB7BF6B-7DE4-4E93-AF3D-176164F5A65C@gmail.com> Hi, While we close in on a decision, I am writing this as an observation to the ongoing discussion about Swift Documentation Platform. And some thoughts in the end. As of now we have a Main, existing Swift Documentation page: [1] http://www.ci.uchicago.edu/swift/docs/index.php In addition, we have pages on CI wiki related to Swift documentations: [2] http://www.ci.uchicago.edu/wiki/bin/view/SWFT/WebHome The above link contains many useful but semi-complete/unrounded pages on cookbooks, tutorials and may technical notes. We have a page on cog wiki dedicated to Coasters: [3] http://wiki.cogkit.org/wiki/Coasters The pictures are very neat but slightly outdated and needs update. More pictures would also be required in my opinion to explain several of the relatively new coasters concepts. We have google sites whose contents overlap with [1] , likely that it is completely redundant to [1] [4] https://sites.google.com/site/swiftguide/home A wealth of information about Swift techniques, examples, issues, notes and ideas are living on the following: [5] https://bugzilla.mcs.anl.gov/swift/ [6] mail.ci.uchicago.edu/pipermail/swift-user [7] http://mail.ci.uchicago.edu/pipermail/swift-devel/ [8] Mike's Swift Notes (As an attached doc) -------------- next part -------------- A non-text attachment was scrubbed... Name: Notes.SwiftUsage.doc Type: application/msword Size: 212480 bytes Desc: not available URL: -------------- next part -------------- [9] \O/ <--- Mihael's head .............................................. (kidding Mihael) This really calls for some kind of massive distillation and unification effort. Ideally we want *only* one place where all the information resides that a user wants to know. Things could be hyperlinked to some extent but it has its own harms. A simple, serial, low-branched documentation would be ideal in my opinion. I am, at the moment putting some energy into the above and trying to think on structuring my efforts which I am finding a bit overwhelming. May I have your valuable suggestions on what would be a super master stroke strategy that could gather all these scattered gems into one nice place. Thanks for reading this long and a bit rantish mail and putting on your ideas. Cheers, Ketan From wozniak at mcs.anl.gov Fri Apr 1 14:06:30 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Fri, 1 Apr 2011 14:06:30 -0500 (CDT) Subject: [Swift-devel] Swift Documentation Platform In-Reply-To: <7CB7BF6B-7DE4-4E93-AF3D-176164F5A65C@gmail.com> References: <7CB7BF6B-7DE4-4E93-AF3D-176164F5A65C@gmail.com> Message-ID: Building this list is a good start. Where should we store it? You could also add the other CoG and Karajan docs. As of Wednesday we decided we are giving up on Google Sites for user documentation, but we are going to continue with the swift-devel site for internal (but publically-readable) notes. We're thinking of putting at least a month into asciidoc. Once we have that in place, we should be able to easily paste things into there from any source. Justin On Fri, 1 Apr 2011, Ketan Maheshwari wrote: > Hi, > > While we close in on a decision, I am writing this as an observation to > the ongoing discussion about Swift Documentation Platform. And some > thoughts in the end. > > As of now we have a Main, existing Swift Documentation page: > > [1] http://www.ci.uchicago.edu/swift/docs/index.php > > In addition, we have pages on CI wiki related to Swift documentations: > > > [2] http://www.ci.uchicago.edu/wiki/bin/view/SWFT/WebHome > > The above link contains many useful but semi-complete/unrounded pages on > cookbooks, tutorials and may technical notes. > > > We have a page on cog wiki dedicated to Coasters: > > [3] http://wiki.cogkit.org/wiki/Coasters > > The pictures are very neat but slightly outdated and needs update. More > pictures would also be required in my opinion to explain several of the > relatively new coasters concepts. > > > We have google sites whose contents overlap with [1] , likely that it is > completely redundant to [1] > > [4] https://sites.google.com/site/swiftguide/home > > > A wealth of information about Swift techniques, examples, issues, notes > and ideas are living on the following: [5] > https://bugzilla.mcs.anl.gov/swift/ > > [6] mail.ci.uchicago.edu/pipermail/swift-user > > [7] http://mail.ci.uchicago.edu/pipermail/swift-devel/ > > [8] Mike's Swift Notes (As an attached doc) -- Justin M Wozniak From bugzilla-daemon at mcs.anl.gov Fri Apr 1 14:13:18 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 1 Apr 2011 14:13:18 -0500 (CDT) Subject: [Swift-devel] [Bug 303] fixed_array_mapper versus array_mapper on output In-Reply-To: References: Message-ID: <20110401191318.DBFD51C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=303 Allan Espinosa changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WORKSFORME --- Comment #4 from Allan Espinosa 2011-04-01 14:13:18 --- Proposed workaround: ad-hoc join to convert an array of strings to a space separated string: $ cat join.swift type file; app(file o[]) split(file i){ split "-l" 1 @filename(i) "seqout."; } string s[] = ["seqout.aa", "seqout.ab", "seqout.ac","seqout.ad"]; string tomap[]; tomap[-1] = ""; /* ad-hoc join */ foreach si, i in s { tomap[i] = @strcat(tomap[i-1], si, " "); } file out[] ; file input <"seq.in">; out = split(input); swift-r3835 cog-r2988 RunID: 20110401-1411-a88h1ak5 Progress: Final status: Finished successfully:1 -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are watching the assignee of the bug. From bugzilla-daemon at mcs.anl.gov Fri Apr 1 16:23:44 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 1 Apr 2011 16:23:44 -0500 (CDT) Subject: [Swift-devel] [Bug 303] fixed_array_mapper versus array_mapper on output In-Reply-To: References: Message-ID: <20110401212344.942531C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=303 Michael Wilde changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED CC| |wilde at mcs.anl.gov Resolution|WORKSFORME | -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are watching someone on the CC list of the bug. You are watching the assignee of the bug. From bugzilla-daemon at mcs.anl.gov Fri Apr 1 16:25:41 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 1 Apr 2011 16:25:41 -0500 (CDT) Subject: [Swift-devel] [Bug 303] fixed_array_mapper versus array_mapper on output In-Reply-To: References: Message-ID: <20110401212541.E3D151C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=303 Michael Wilde changed: What |Removed |Added ---------------------------------------------------------------------------- Platform|PC |All Version|trunk |0.92 --- Comment #5 from Michael Wilde 2011-04-01 16:25:41 --- We need to turn each of these examples of array mapping into a test case and ensure that it works on both trunk and 0.92. Some of these examples should be added to the user guide. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are watching someone on the CC list of the bug. You are watching the assignee of the bug. From wilde at mcs.anl.gov Fri Apr 1 16:49:58 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 1 Apr 2011 16:49:58 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301623475.12764.3.camel@blabla2.none> Message-ID: <201785481.56903.1301694598367.JavaMail.root@zimbra.anl.gov> I think we mist-spoke: The posted release 0.92 also exhibits the twice-each bug as far as I acn tell. Mihael, Justin: can you test asap to confirm or refute that observation? Thanks, - Mike which swift: ~/swift/rev/swift-0.92/bin/swift com$ swift -version Swift svn swift-r4157 cog-r3056 com$ cd ~/swift/lab com$ cat zz3.swift int arr[]; arr[0]=1; arr[1]=2; foreach a in arr { trace("for", a); } com$ swift zz3.swift Swift svn swift-r4157 cog-r3056 RunID: 20110401-1645-yyy87p39 Progress: SwiftScript trace: for, 2 SwiftScript trace: for, 1 SwiftScript trace: for, 1 SwiftScript trace: for, 2 Final status: com$ ----- Original Message ----- > I think both are good as they are. > > Would you like me to send it? > > Mihael > > On Thu, 2011-03-31 at 20:57 -0500, Michael Wilde wrote: > > And I will send this to swift-user: > > > > "Dear Swift Users, > > > > On March 29 we discovered that the Release 0.92 branches of the > > Swift and CoG trees were changed after the release and a concurrency > > bug was introduced. If you are running Swift from this *source code* > > base, please revert back to a known-working release such as the 0.92 > > binary release if at all possible. > > > > We're working on restoring the 0.92 SVN branch to the correct state > > and will report back to this email list when that is done." > > > > Anything else to say? Feel free to send this out, adjusted as > > needed, or just tell me what to change and I will. > > > > - Mike > > > > > > ----- Original Message ----- > > > Please check this proposed warning on the Downloads page and let > > > me > > > know if its what we need there: > > > > > > http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php > > > > > > I also fixed the 0.91 typo (but the downloads dont actually work > > > from > > > this test web. I think they will once this is committed and pushed > > > live). > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > ----- Original Message ----- > > > > > On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > > > > > > ----- Original Message ----- > > > > > > > We decided the following: > > > > > > > - I will revert the changes in the 0.92 branch > > > > > > > - re-commit bug fixes that were committed after the merge > > > > > > > - merge the 0.92 branch to trunk > > > > > > > - fix the problems in trunk > > > > > > > > > > > > Sounds good. But when and how does the fix get to users? > > > > > > > > > > The package(s) are fine. Though we should probably also have a > > > > > source > > > > > package. The merge was done after the package(s) were uploaded > > > > > to > > > > > the > > > > > swift site. > > > > > > > > Ah, great! > > > > > > > > > This only affects folks who have checked out from SVN the 0.92 > > > > > branch > > > > > after the merge 9 days (or so) ago. > > > > > > > > Hmm - I question that. The release we use, based on 0.92 on > > > > Beagle, > > > > shows the twice-each error, and it was made on Feb 25, about 35 > > > > days > > > > ago. Does this merit clarification? > > > > > > > > > We should send an email to the user list once this is fixed. > > > > > We > > > > > may > > > > > also > > > > > want to send an email warning them not to check out from SVN > > > > > but > > > > > download the precompiled package instead. > > > > > > > > OK. I cant say that this will reach everyone. Perhaps some > > > > status > > > > notes on the Download page are in order. The 0.91 link there is > > > > wrong, > > > > so we need to fix that page anyways. > > > > > > > > > I am a bit confused though. I would have expected the release > > > > > to > > > > > come > > > > > with some announcement of some form. > > > > > > > > Agreed. We kept this low profile because we were trying to > > > > coordinate > > > > it with a Web change that we never accomplished. And we've lost > > > > the > > > > habit of swift-user announcements but got to get back to doing > > > > that. > > > > So, yes. > > > > > > > > > > > > > > > > > Either create a 0.92.1 release (sounds hard based on above) > > > > > > or create a 0.93 release (in which case should we create the > > > > > > 0.93 > > > > > > branch from trunk as soon as this is fixed?) > > > > > > > > > > > > How long to re-test? (Thats a question for Sarah, Justin, > > > > > > and > > > > > > Ketan) > > > > > > Could this include the Cray support mods? > > > > > > > > > > No! Fixing a problem is not a venue for introducing untested > > > > > things > > > > > into > > > > > a release. > > > > > > > > I meant the Cray feature for 0.93 not 0.92.1 > > > > Yes, that should be tested. > > > > But its being used pretty heavily. > > > > > > > > - Mike > > > > > > > > > But it could be discussed separately :) > > > > > > > > > > Mihael > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri Apr 1 17:19:17 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 1 Apr 2011 17:19:17 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <201785481.56903.1301694598367.JavaMail.root@zimbra.anl.gov> Message-ID: <599442617.57099.1301696357109.JavaMail.root@zimbra.anl.gov> And Swift 0.91 works OK - it does *not* exhibit the twice-each bug. Justin: when you went backwards down the Swift 0.92 branch on Thursday morning, what did you find in terms of where it appeared the bug was introduced? - Mike ----- Original Message ----- > I think we mist-spoke: The posted release 0.92 also exhibits the > twice-each bug as far as I acn tell. > > Mihael, Justin: can you test asap to confirm or refute that > observation? > > Thanks, > > - Mike > > which swift: ~/swift/rev/swift-0.92/bin/swift > > com$ swift -version > Swift svn swift-r4157 cog-r3056 > > com$ cd ~/swift/lab > com$ cat zz3.swift > int arr[]; > > arr[0]=1; > arr[1]=2; > > foreach a in arr { > trace("for", a); > } > > com$ swift zz3.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110401-1645-yyy87p39 > Progress: > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > SwiftScript trace: for, 1 > SwiftScript trace: for, 2 > Final status: > com$ > > > ----- Original Message ----- > > I think both are good as they are. > > > > Would you like me to send it? > > > > Mihael > > > > On Thu, 2011-03-31 at 20:57 -0500, Michael Wilde wrote: > > > And I will send this to swift-user: > > > > > > "Dear Swift Users, > > > > > > On March 29 we discovered that the Release 0.92 branches of the > > > Swift and CoG trees were changed after the release and a > > > concurrency > > > bug was introduced. If you are running Swift from this *source > > > code* > > > base, please revert back to a known-working release such as the > > > 0.92 > > > binary release if at all possible. > > > > > > We're working on restoring the 0.92 SVN branch to the correct > > > state > > > and will report back to this email list when that is done." > > > > > > Anything else to say? Feel free to send this out, adjusted as > > > needed, or just tell me what to change and I will. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > Please check this proposed warning on the Downloads page and let > > > > me > > > > know if its what we need there: > > > > > > > > http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php > > > > > > > > I also fixed the 0.91 typo (but the downloads dont actually work > > > > from > > > > this test web. I think they will once this is committed and > > > > pushed > > > > live). > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > ----- Original Message ----- > > > > > > On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > > > > > > > ----- Original Message ----- > > > > > > > > We decided the following: > > > > > > > > - I will revert the changes in the 0.92 branch > > > > > > > > - re-commit bug fixes that were committed after the > > > > > > > > merge > > > > > > > > - merge the 0.92 branch to trunk > > > > > > > > - fix the problems in trunk > > > > > > > > > > > > > > Sounds good. But when and how does the fix get to users? > > > > > > > > > > > > The package(s) are fine. Though we should probably also have > > > > > > a > > > > > > source > > > > > > package. The merge was done after the package(s) were > > > > > > uploaded > > > > > > to > > > > > > the > > > > > > swift site. > > > > > > > > > > Ah, great! > > > > > > > > > > > This only affects folks who have checked out from SVN the > > > > > > 0.92 > > > > > > branch > > > > > > after the merge 9 days (or so) ago. > > > > > > > > > > Hmm - I question that. The release we use, based on 0.92 on > > > > > Beagle, > > > > > shows the twice-each error, and it was made on Feb 25, about > > > > > 35 > > > > > days > > > > > ago. Does this merit clarification? > > > > > > > > > > > We should send an email to the user list once this is fixed. > > > > > > We > > > > > > may > > > > > > also > > > > > > want to send an email warning them not to check out from SVN > > > > > > but > > > > > > download the precompiled package instead. > > > > > > > > > > OK. I cant say that this will reach everyone. Perhaps some > > > > > status > > > > > notes on the Download page are in order. The 0.91 link there > > > > > is > > > > > wrong, > > > > > so we need to fix that page anyways. > > > > > > > > > > > I am a bit confused though. I would have expected the > > > > > > release > > > > > > to > > > > > > come > > > > > > with some announcement of some form. > > > > > > > > > > Agreed. We kept this low profile because we were trying to > > > > > coordinate > > > > > it with a Web change that we never accomplished. And we've > > > > > lost > > > > > the > > > > > habit of swift-user announcements but got to get back to doing > > > > > that. > > > > > So, yes. > > > > > > > > > > > > > > > > > > > > Either create a 0.92.1 release (sounds hard based on > > > > > > > above) > > > > > > > or create a 0.93 release (in which case should we create > > > > > > > the > > > > > > > 0.93 > > > > > > > branch from trunk as soon as this is fixed?) > > > > > > > > > > > > > > How long to re-test? (Thats a question for Sarah, Justin, > > > > > > > and > > > > > > > Ketan) > > > > > > > Could this include the Cray support mods? > > > > > > > > > > > > No! Fixing a problem is not a venue for introducing untested > > > > > > things > > > > > > into > > > > > > a release. > > > > > > > > > > I meant the Cray feature for 0.93 not 0.92.1 > > > > > Yes, that should be tested. > > > > > But its being used pretty heavily. > > > > > > > > > > - Mike > > > > > > > > > > > But it could be discussed separately :) > > > > > > > > > > > > Mihael > > > > > > > > > > -- > > > > > Michael Wilde > > > > > Computation Institute, University of Chicago > > > > > Mathematics and Computer Science Division > > > > > Argonne National Laboratory > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Fri Apr 1 17:33:04 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Fri, 1 Apr 2011 17:33:04 -0500 (Central Daylight Time) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <599442617.57099.1301696357109.JavaMail.root@zimbra.anl.gov> References: <599442617.57099.1301696357109.JavaMail.root@zimbra.anl.gov> Message-ID: I found that it appeared between Swift r3835 and r3837. On Fri, 1 Apr 2011, Michael Wilde wrote: > And Swift 0.91 works OK - it does *not* exhibit the twice-each bug. > > Justin: when you went backwards down the Swift 0.92 branch on Thursday > morning, what did you find in terms of where it appeared the bug was > introduced? > > - Mike > > > ----- Original Message ----- >> I think we mist-spoke: The posted release 0.92 also exhibits the >> twice-each bug as far as I acn tell. >> >> Mihael, Justin: can you test asap to confirm or refute that >> observation? >> >> Thanks, >> >> - Mike >> >> which swift: ~/swift/rev/swift-0.92/bin/swift >> >> com$ swift -version >> Swift svn swift-r4157 cog-r3056 >> >> com$ cd ~/swift/lab >> com$ cat zz3.swift >> int arr[]; >> >> arr[0]=1; >> arr[1]=2; >> >> foreach a in arr { >> trace("for", a); >> } >> >> com$ swift zz3.swift >> Swift svn swift-r4157 cog-r3056 >> >> RunID: 20110401-1645-yyy87p39 >> Progress: >> SwiftScript trace: for, 2 >> SwiftScript trace: for, 1 >> SwiftScript trace: for, 1 >> SwiftScript trace: for, 2 >> Final status: >> com$ >> >> >> ----- Original Message ----- >>> I think both are good as they are. >>> >>> Would you like me to send it? >>> >>> Mihael >>> >>> On Thu, 2011-03-31 at 20:57 -0500, Michael Wilde wrote: >>>> And I will send this to swift-user: >>>> >>>> "Dear Swift Users, >>>> >>>> On March 29 we discovered that the Release 0.92 branches of the >>>> Swift and CoG trees were changed after the release and a >>>> concurrency >>>> bug was introduced. If you are running Swift from this *source >>>> code* >>>> base, please revert back to a known-working release such as the >>>> 0.92 >>>> binary release if at all possible. >>>> >>>> We're working on restoring the 0.92 SVN branch to the correct >>>> state >>>> and will report back to this email list when that is done." >>>> >>>> Anything else to say? Feel free to send this out, adjusted as >>>> needed, or just tell me what to change and I will. >>>> >>>> - Mike >>>> >>>> >>>> ----- Original Message ----- >>>>> Please check this proposed warning on the Downloads page and let >>>>> me >>>>> know if its what we need there: >>>>> >>>>> http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php >>>>> >>>>> I also fixed the 0.91 typo (but the downloads dont actually work >>>>> from >>>>> this test web. I think they will once this is committed and >>>>> pushed >>>>> live). >>>>> >>>>> - Mike >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> ----- Original Message ----- >>>>>>> On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: >>>>>>>> ----- Original Message ----- >>>>>>>>> We decided the following: >>>>>>>>> - I will revert the changes in the 0.92 branch >>>>>>>>> - re-commit bug fixes that were committed after the >>>>>>>>> merge >>>>>>>>> - merge the 0.92 branch to trunk >>>>>>>>> - fix the problems in trunk >>>>>>>> >>>>>>>> Sounds good. But when and how does the fix get to users? >>>>>>> >>>>>>> The package(s) are fine. Though we should probably also have >>>>>>> a >>>>>>> source >>>>>>> package. The merge was done after the package(s) were >>>>>>> uploaded >>>>>>> to >>>>>>> the >>>>>>> swift site. >>>>>> >>>>>> Ah, great! >>>>>> >>>>>>> This only affects folks who have checked out from SVN the >>>>>>> 0.92 >>>>>>> branch >>>>>>> after the merge 9 days (or so) ago. >>>>>> >>>>>> Hmm - I question that. The release we use, based on 0.92 on >>>>>> Beagle, >>>>>> shows the twice-each error, and it was made on Feb 25, about >>>>>> 35 >>>>>> days >>>>>> ago. Does this merit clarification? >>>>>> >>>>>>> We should send an email to the user list once this is fixed. >>>>>>> We >>>>>>> may >>>>>>> also >>>>>>> want to send an email warning them not to check out from SVN >>>>>>> but >>>>>>> download the precompiled package instead. >>>>>> >>>>>> OK. I cant say that this will reach everyone. Perhaps some >>>>>> status >>>>>> notes on the Download page are in order. The 0.91 link there >>>>>> is >>>>>> wrong, >>>>>> so we need to fix that page anyways. >>>>>> >>>>>>> I am a bit confused though. I would have expected the >>>>>>> release >>>>>>> to >>>>>>> come >>>>>>> with some announcement of some form. >>>>>> >>>>>> Agreed. We kept this low profile because we were trying to >>>>>> coordinate >>>>>> it with a Web change that we never accomplished. And we've >>>>>> lost >>>>>> the >>>>>> habit of swift-user announcements but got to get back to doing >>>>>> that. >>>>>> So, yes. >>>>>>> >>>>>>>> >>>>>>>> Either create a 0.92.1 release (sounds hard based on >>>>>>>> above) >>>>>>>> or create a 0.93 release (in which case should we create >>>>>>>> the >>>>>>>> 0.93 >>>>>>>> branch from trunk as soon as this is fixed?) >>>>>>>> >>>>>>>> How long to re-test? (Thats a question for Sarah, Justin, >>>>>>>> and >>>>>>>> Ketan) >>>>>>>> Could this include the Cray support mods? >>>>>>> >>>>>>> No! Fixing a problem is not a venue for introducing untested >>>>>>> things >>>>>>> into >>>>>>> a release. >>>>>> >>>>>> I meant the Cray feature for 0.93 not 0.92.1 >>>>>> Yes, that should be tested. >>>>>> But its being used pretty heavily. >>>>>> >>>>>> - Mike >>>>>> >>>>>>> But it could be discussed separately :) >>>>>>> >>>>>>> Mihael >>>>>> >>>>>> -- >>>>>> Michael Wilde >>>>>> Computation Institute, University of Chicago >>>>>> Mathematics and Computer Science Division >>>>>> Argonne National Laboratory >>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>>> -- >>>>> Michael Wilde >>>>> Computation Institute, University of Chicago >>>>> Mathematics and Computer Science Division >>>>> Argonne National Laboratory >>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Justin M Wozniak From wilde at mcs.anl.gov Fri Apr 1 17:50:58 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 1 Apr 2011 17:50:58 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: Message-ID: <1780706965.57162.1301698258805.JavaMail.root@zimbra.anl.gov> So I think somehow in your discussions with Mihael yesterday this either got missed or mis-interpreted. These revs were revs *on the way* to 0.92, and were *not* a part of your integration of trunk into the 0.92 branch after the 0.92 release. I just called Mihael, and he will look at these later this weekend. - Mike ----- Original Message ----- > I found that it appeared between Swift r3835 and r3837. > > On Fri, 1 Apr 2011, Michael Wilde wrote: > > > And Swift 0.91 works OK - it does *not* exhibit the twice-each bug. > > > > Justin: when you went backwards down the Swift 0.92 branch on > > Thursday > > morning, what did you find in terms of where it appeared the bug was > > introduced? > > > > - Mike > > > > > > ----- Original Message ----- > >> I think we mist-spoke: The posted release 0.92 also exhibits the > >> twice-each bug as far as I acn tell. > >> > >> Mihael, Justin: can you test asap to confirm or refute that > >> observation? > >> > >> Thanks, > >> > >> - Mike > >> > >> which swift: ~/swift/rev/swift-0.92/bin/swift > >> > >> com$ swift -version > >> Swift svn swift-r4157 cog-r3056 > >> > >> com$ cd ~/swift/lab > >> com$ cat zz3.swift > >> int arr[]; > >> > >> arr[0]=1; > >> arr[1]=2; > >> > >> foreach a in arr { > >> trace("for", a); > >> } > >> > >> com$ swift zz3.swift > >> Swift svn swift-r4157 cog-r3056 > >> > >> RunID: 20110401-1645-yyy87p39 > >> Progress: > >> SwiftScript trace: for, 2 > >> SwiftScript trace: for, 1 > >> SwiftScript trace: for, 1 > >> SwiftScript trace: for, 2 > >> Final status: > >> com$ > >> > >> > >> ----- Original Message ----- > >>> I think both are good as they are. > >>> > >>> Would you like me to send it? > >>> > >>> Mihael > >>> > >>> On Thu, 2011-03-31 at 20:57 -0500, Michael Wilde wrote: > >>>> And I will send this to swift-user: > >>>> > >>>> "Dear Swift Users, > >>>> > >>>> On March 29 we discovered that the Release 0.92 branches of the > >>>> Swift and CoG trees were changed after the release and a > >>>> concurrency > >>>> bug was introduced. If you are running Swift from this *source > >>>> code* > >>>> base, please revert back to a known-working release such as the > >>>> 0.92 > >>>> binary release if at all possible. > >>>> > >>>> We're working on restoring the 0.92 SVN branch to the correct > >>>> state > >>>> and will report back to this email list when that is done." > >>>> > >>>> Anything else to say? Feel free to send this out, adjusted as > >>>> needed, or just tell me what to change and I will. > >>>> > >>>> - Mike > >>>> > >>>> > >>>> ----- Original Message ----- > >>>>> Please check this proposed warning on the Downloads page and let > >>>>> me > >>>>> know if its what we need there: > >>>>> > >>>>> http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php > >>>>> > >>>>> I also fixed the 0.91 typo (but the downloads dont actually work > >>>>> from > >>>>> this test web. I think they will once this is committed and > >>>>> pushed > >>>>> live). > >>>>> > >>>>> - Mike > >>>>> > >>>>> > >>>>> ----- Original Message ----- > >>>>>> ----- Original Message ----- > >>>>>>> On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > >>>>>>>> ----- Original Message ----- > >>>>>>>>> We decided the following: > >>>>>>>>> - I will revert the changes in the 0.92 branch > >>>>>>>>> - re-commit bug fixes that were committed after the > >>>>>>>>> merge > >>>>>>>>> - merge the 0.92 branch to trunk > >>>>>>>>> - fix the problems in trunk > >>>>>>>> > >>>>>>>> Sounds good. But when and how does the fix get to users? > >>>>>>> > >>>>>>> The package(s) are fine. Though we should probably also have > >>>>>>> a > >>>>>>> source > >>>>>>> package. The merge was done after the package(s) were > >>>>>>> uploaded > >>>>>>> to > >>>>>>> the > >>>>>>> swift site. > >>>>>> > >>>>>> Ah, great! > >>>>>> > >>>>>>> This only affects folks who have checked out from SVN the > >>>>>>> 0.92 > >>>>>>> branch > >>>>>>> after the merge 9 days (or so) ago. > >>>>>> > >>>>>> Hmm - I question that. The release we use, based on 0.92 on > >>>>>> Beagle, > >>>>>> shows the twice-each error, and it was made on Feb 25, about > >>>>>> 35 > >>>>>> days > >>>>>> ago. Does this merit clarification? > >>>>>> > >>>>>>> We should send an email to the user list once this is fixed. > >>>>>>> We > >>>>>>> may > >>>>>>> also > >>>>>>> want to send an email warning them not to check out from SVN > >>>>>>> but > >>>>>>> download the precompiled package instead. > >>>>>> > >>>>>> OK. I cant say that this will reach everyone. Perhaps some > >>>>>> status > >>>>>> notes on the Download page are in order. The 0.91 link there > >>>>>> is > >>>>>> wrong, > >>>>>> so we need to fix that page anyways. > >>>>>> > >>>>>>> I am a bit confused though. I would have expected the > >>>>>>> release > >>>>>>> to > >>>>>>> come > >>>>>>> with some announcement of some form. > >>>>>> > >>>>>> Agreed. We kept this low profile because we were trying to > >>>>>> coordinate > >>>>>> it with a Web change that we never accomplished. And we've > >>>>>> lost > >>>>>> the > >>>>>> habit of swift-user announcements but got to get back to doing > >>>>>> that. > >>>>>> So, yes. > >>>>>>> > >>>>>>>> > >>>>>>>> Either create a 0.92.1 release (sounds hard based on > >>>>>>>> above) > >>>>>>>> or create a 0.93 release (in which case should we create > >>>>>>>> the > >>>>>>>> 0.93 > >>>>>>>> branch from trunk as soon as this is fixed?) > >>>>>>>> > >>>>>>>> How long to re-test? (Thats a question for Sarah, Justin, > >>>>>>>> and > >>>>>>>> Ketan) > >>>>>>>> Could this include the Cray support mods? > >>>>>>> > >>>>>>> No! Fixing a problem is not a venue for introducing untested > >>>>>>> things > >>>>>>> into > >>>>>>> a release. > >>>>>> > >>>>>> I meant the Cray feature for 0.93 not 0.92.1 > >>>>>> Yes, that should be tested. > >>>>>> But its being used pretty heavily. > >>>>>> > >>>>>> - Mike > >>>>>> > >>>>>>> But it could be discussed separately :) > >>>>>>> > >>>>>>> Mihael > >>>>>> > >>>>>> -- > >>>>>> Michael Wilde > >>>>>> Computation Institute, University of Chicago > >>>>>> Mathematics and Computer Science Division > >>>>>> Argonne National Laboratory > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Swift-devel mailing list > >>>>>> Swift-devel at ci.uchicago.edu > >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>>>> _______________________________________________ > >>>>> Swift-devel mailing list > >>>>> Swift-devel at ci.uchicago.edu > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>> > >> > >> -- > >> Michael Wilde > >> Computation Institute, University of Chicago > >> Mathematics and Computer Science Division > >> Argonne National Laboratory > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > Justin M Wozniak -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Apr 1 21:37:15 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 01 Apr 2011 19:37:15 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: References: <599442617.57099.1301696357109.JavaMail.root@zimbra.anl.gov> Message-ID: <1301711835.7410.1.camel@blabla2.none> That's possible, looking at the code. Though what makes you think it's that? Mihael On Fri, 2011-04-01 at 17:33 -0500, Justin M Wozniak wrote: > I found that it appeared between Swift r3835 and r3837. > > On Fri, 1 Apr 2011, Michael Wilde wrote: > > > And Swift 0.91 works OK - it does *not* exhibit the twice-each bug. > > > > Justin: when you went backwards down the Swift 0.92 branch on Thursday > > morning, what did you find in terms of where it appeared the bug was > > introduced? > > > > - Mike > > > > > > ----- Original Message ----- > >> I think we mist-spoke: The posted release 0.92 also exhibits the > >> twice-each bug as far as I acn tell. > >> > >> Mihael, Justin: can you test asap to confirm or refute that > >> observation? > >> > >> Thanks, > >> > >> - Mike > >> > >> which swift: ~/swift/rev/swift-0.92/bin/swift > >> > >> com$ swift -version > >> Swift svn swift-r4157 cog-r3056 > >> > >> com$ cd ~/swift/lab > >> com$ cat zz3.swift > >> int arr[]; > >> > >> arr[0]=1; > >> arr[1]=2; > >> > >> foreach a in arr { > >> trace("for", a); > >> } > >> > >> com$ swift zz3.swift > >> Swift svn swift-r4157 cog-r3056 > >> > >> RunID: 20110401-1645-yyy87p39 > >> Progress: > >> SwiftScript trace: for, 2 > >> SwiftScript trace: for, 1 > >> SwiftScript trace: for, 1 > >> SwiftScript trace: for, 2 > >> Final status: > >> com$ > >> > >> > >> ----- Original Message ----- > >>> I think both are good as they are. > >>> > >>> Would you like me to send it? > >>> > >>> Mihael > >>> > >>> On Thu, 2011-03-31 at 20:57 -0500, Michael Wilde wrote: > >>>> And I will send this to swift-user: > >>>> > >>>> "Dear Swift Users, > >>>> > >>>> On March 29 we discovered that the Release 0.92 branches of the > >>>> Swift and CoG trees were changed after the release and a > >>>> concurrency > >>>> bug was introduced. If you are running Swift from this *source > >>>> code* > >>>> base, please revert back to a known-working release such as the > >>>> 0.92 > >>>> binary release if at all possible. > >>>> > >>>> We're working on restoring the 0.92 SVN branch to the correct > >>>> state > >>>> and will report back to this email list when that is done." > >>>> > >>>> Anything else to say? Feel free to send this out, adjusted as > >>>> needed, or just tell me what to change and I will. > >>>> > >>>> - Mike > >>>> > >>>> > >>>> ----- Original Message ----- > >>>>> Please check this proposed warning on the Downloads page and let > >>>>> me > >>>>> know if its what we need there: > >>>>> > >>>>> http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php > >>>>> > >>>>> I also fixed the 0.91 typo (but the downloads dont actually work > >>>>> from > >>>>> this test web. I think they will once this is committed and > >>>>> pushed > >>>>> live). > >>>>> > >>>>> - Mike > >>>>> > >>>>> > >>>>> ----- Original Message ----- > >>>>>> ----- Original Message ----- > >>>>>>> On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > >>>>>>>> ----- Original Message ----- > >>>>>>>>> We decided the following: > >>>>>>>>> - I will revert the changes in the 0.92 branch > >>>>>>>>> - re-commit bug fixes that were committed after the > >>>>>>>>> merge > >>>>>>>>> - merge the 0.92 branch to trunk > >>>>>>>>> - fix the problems in trunk > >>>>>>>> > >>>>>>>> Sounds good. But when and how does the fix get to users? > >>>>>>> > >>>>>>> The package(s) are fine. Though we should probably also have > >>>>>>> a > >>>>>>> source > >>>>>>> package. The merge was done after the package(s) were > >>>>>>> uploaded > >>>>>>> to > >>>>>>> the > >>>>>>> swift site. > >>>>>> > >>>>>> Ah, great! > >>>>>> > >>>>>>> This only affects folks who have checked out from SVN the > >>>>>>> 0.92 > >>>>>>> branch > >>>>>>> after the merge 9 days (or so) ago. > >>>>>> > >>>>>> Hmm - I question that. The release we use, based on 0.92 on > >>>>>> Beagle, > >>>>>> shows the twice-each error, and it was made on Feb 25, about > >>>>>> 35 > >>>>>> days > >>>>>> ago. Does this merit clarification? > >>>>>> > >>>>>>> We should send an email to the user list once this is fixed. > >>>>>>> We > >>>>>>> may > >>>>>>> also > >>>>>>> want to send an email warning them not to check out from SVN > >>>>>>> but > >>>>>>> download the precompiled package instead. > >>>>>> > >>>>>> OK. I cant say that this will reach everyone. Perhaps some > >>>>>> status > >>>>>> notes on the Download page are in order. The 0.91 link there > >>>>>> is > >>>>>> wrong, > >>>>>> so we need to fix that page anyways. > >>>>>> > >>>>>>> I am a bit confused though. I would have expected the > >>>>>>> release > >>>>>>> to > >>>>>>> come > >>>>>>> with some announcement of some form. > >>>>>> > >>>>>> Agreed. We kept this low profile because we were trying to > >>>>>> coordinate > >>>>>> it with a Web change that we never accomplished. And we've > >>>>>> lost > >>>>>> the > >>>>>> habit of swift-user announcements but got to get back to doing > >>>>>> that. > >>>>>> So, yes. > >>>>>>> > >>>>>>>> > >>>>>>>> Either create a 0.92.1 release (sounds hard based on > >>>>>>>> above) > >>>>>>>> or create a 0.93 release (in which case should we create > >>>>>>>> the > >>>>>>>> 0.93 > >>>>>>>> branch from trunk as soon as this is fixed?) > >>>>>>>> > >>>>>>>> How long to re-test? (Thats a question for Sarah, Justin, > >>>>>>>> and > >>>>>>>> Ketan) > >>>>>>>> Could this include the Cray support mods? > >>>>>>> > >>>>>>> No! Fixing a problem is not a venue for introducing untested > >>>>>>> things > >>>>>>> into > >>>>>>> a release. > >>>>>> > >>>>>> I meant the Cray feature for 0.93 not 0.92.1 > >>>>>> Yes, that should be tested. > >>>>>> But its being used pretty heavily. > >>>>>> > >>>>>> - Mike > >>>>>> > >>>>>>> But it could be discussed separately :) > >>>>>>> > >>>>>>> Mihael > >>>>>> > >>>>>> -- > >>>>>> Michael Wilde > >>>>>> Computation Institute, University of Chicago > >>>>>> Mathematics and Computer Science Division > >>>>>> Argonne National Laboratory > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Swift-devel mailing list > >>>>>> Swift-devel at ci.uchicago.edu > >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>>>> _______________________________________________ > >>>>> Swift-devel mailing list > >>>>> Swift-devel at ci.uchicago.edu > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>> > >> > >> -- > >> Michael Wilde > >> Computation Institute, University of Chicago > >> Mathematics and Computer Science Division > >> Argonne National Laboratory > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From benc at hawaga.org.uk Fri Apr 1 23:34:02 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 2 Apr 2011 04:34:02 +0000 (GMT) Subject: [Swift-devel] saturday morning rambling about exceptions In-Reply-To: References: Message-ID: > Try/catch statements don't have a clean intepretation if you're trying to > follow the flow of data through the program. Right. I thought about this a bit before but it started getting all haskelly and something i didn't want to approach - as mike says its much more researchy. The main idea which I found most comfortable was Haskell's Either type. That can model something like exceptions but is more amenable to a dataflow style of programming. (although C implements something very restricted similar in the form of pointers which can be NULL, or when returning a return code that is in some range of values for an error and some other range of values for success) The way the haskell Either type works is that something which might "throw an exception" returns a value that is either an error object or a correct value object. So then you use an if or a case or anything else that can change behaviour based on data to change your flow. Imagine in C: p = malloc(5); if(p == NULL) { /* catch-exception, but dataflow style - we don't have to look at p right after the malloc. We can store it into a structure and come back to it later to check. unfortuantely, there is not enough space in p to represent any richer kind of exception information other than "FAIL" */ } else { { /* no exception, and we have the correct value in p */ } In haskell, that looks like: p = somefunc case p of Right d -> doSomethingWith d -- somefunc succeeded and returned -- us value d Left ex -> handleError ex -- somefunc failed and returned -- value ex as the equivalnet of a -- Java Exception object. Now it gets a bit awkward having to unwrap "Right d" after every such function call, and in that respect I think a syntax that looks more like try-catch is much easier to program with. However I think the right way to go is to define its semantics in terms of "dataflow" style exceptions above rather than control-flow. I think that's possible and would result in something that satisfies both the dataflow-purist camp and the "I like exception syntax" camp (both camps of which I'm in ;) -- From benc at hawaga.org.uk Fri Apr 1 23:36:32 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 2 Apr 2011 04:36:32 +0000 (GMT) Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> References: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> Message-ID: > Basically as far as I understand: the presence or absence of a > particular data file within the inout dataset is to be used to determine > whether the code to process that dataset subsection gets invoked or not: > > if (exists("extra.data")) { > DataFile extraInput<"extra.data">; > extraResult = analyze(extraInput); > } Somehow this feels to me like you should be able to say something liek: DataFile extraInput<"extra.data">; extraResult =? analyze(extraInput); and have extra.data either kept from before or repopulated. Or even have the swift runtime see that extraInput exists already and always skip the statement to create it, without anything else expressed in the code. -- From hategan at mcs.anl.gov Sat Apr 2 03:33:52 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 02 Apr 2011 01:33:52 -0700 Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: References: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> Message-ID: <1301733232.18577.3.camel@blabla2.none> On Sat, 2011-04-02 at 04:36 +0000, Ben Clifford wrote: > > Basically as far as I understand: the presence or absence of a > > particular data file within the inout dataset is to be used to determine > > whether the code to process that dataset subsection gets invoked or not: > > > > > if (exists("extra.data")) { > > DataFile extraInput<"extra.data">; > > extraResult = analyze(extraInput); > > } > > Somehow this feels to me like you should be able to say something liek: > > DataFile extraInput<"extra.data">; > extraResult =? analyze(extraInput); Right. Or if you didn't want to be cryptic: DataFile optional extraInput<"extra.data">; ... And I am emphasizing here that the optional attribute should belong to the data. You probably don't want to have it optional with one invocation and non optional with another. Given that the successful completion of a run requires that all invocations complete successfully, anything else would be silly. From benc at hawaga.org.uk Sat Apr 2 04:00:16 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 2 Apr 2011 09:00:16 +0000 (GMT) Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: <1301733232.18577.3.camel@blabla2.none> References: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> <1301733232.18577.3.camel@blabla2.none> Message-ID: > And I am emphasizing here that the optional attribute should belong to > the data. You probably don't want to have it optional with one > invocation and non optional with another. Given that the successful > completion of a run requires that all invocations complete successfully, > anything else would be silly. maybe "optional" is not the right word. its data that needs to exist. but you are specifying two ways for it to exist in the scope of the current run: either you find it on a filesystem, or you run some program that outputs it. but I agree that whatever it is called, its attached to the data not to an invocation. it sounds a lot like the virtual data idea. -- From dennis at ucar.edu Fri Apr 1 13:30:38 2011 From: dennis at ucar.edu (John Dennis) Date: Fri, 1 Apr 2011 12:30:38 -0600 Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: <114092944.55753.1301682302213.JavaMail.root@zimbra.anl.gov> References: <114092944.55753.1301682302213.JavaMail.root@zimbra.anl.gov> Message-ID: On Apr 1, 2011, at 12:25 PM, Michael Wilde wrote: > John, I cc'ed you to confirm that the function is exactly what you > were looking for, and that the simple code below matches exactly the > requirement thats driving this feature request. > > Can you confirm that both of these are true? Michael, So assuming that there is a logical negation operator in Swift the function you describe matches my requirements. Thanks, John > We just want to make sure that we understand the need and determine > if the proposed exists() function is the best way to address it. > From your comment below, I was not quite sure if exists() is exactly > the right approach here. > > Thanks, > > Mike > > > ----- Original Message ----- >> Michael, >> >> This type of function would be great to have. >> >> John >> On Apr 1, 2011, at 11:51 AM, Michael Wilde wrote: >> >>> Basically as far as I understand: the presence or absence of a >>> particular data file within the inout dataset is to be used to >>> determine whether the code to process that dataset subsection gets >>> invoked or not: >>> >>> if (exists("extra.data")) { >>> DataFile extraInput<"extra.data">; >>> extraResult = analyze(extraInput); >>> } >>> >>> The above is my assumption based on a phone call. We can and should >>> verify the assumption with a simple example. >>> >>> I also thought we can try this today by seeing if extraInput can be >>> an array, mapped to zero items if nothing to do and 1 item if >>> something to do. That would at least let us test the use case. >>> >>> John, can you verify if the example Swift lines above are what you >>> are looking for here? >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: >>>> >>>>> - we should first verify that exists() will solve the NCAR need in >>>>> a >>>>> sufficiently clean way >>>> >>>> I think this is important. Can we get a description of the problem >>>> instead of a (otherwise) random proposal for a solution? >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From dennis at ucar.edu Fri Apr 1 13:04:07 2011 From: dennis at ucar.edu (John Dennis) Date: Fri, 1 Apr 2011 12:04:07 -0600 Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> References: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> Message-ID: <2A5A416A-CAB6-4D82-B55F-0CFBC8F3B770@ucar.edu> Michael, This type of function would be great to have. John On Apr 1, 2011, at 11:51 AM, Michael Wilde wrote: > Basically as far as I understand: the presence or absence of a > particular data file within the inout dataset is to be used to > determine whether the code to process that dataset subsection gets > invoked or not: > > if (exists("extra.data")) { > DataFile extraInput<"extra.data">; > extraResult = analyze(extraInput); > } > > The above is my assumption based on a phone call. We can and should > verify the assumption with a simple example. > > I also thought we can try this today by seeing if extraInput can be > an array, mapped to zero items if nothing to do and 1 item if > something to do. That would at least let us test the use case. > > John, can you verify if the example Swift lines above are what you > are looking for here? > > - Mike > > ----- Original Message ----- >> On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: >> >>> - we should first verify that exists() will solve the NCAR need in a >>> sufficiently clean way >> >> I think this is important. Can we get a description of the problem >> instead of a (otherwise) random proposal for a solution? > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From ketan at mcs.anl.gov Fri Apr 1 17:03:42 2011 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Fri, 1 Apr 2011 17:03:42 -0500 Subject: [Swift-devel] Coasters Configuration Optimizations Message-ID: <05C21979-33F0-4DCD-A99B-DCE5771FE26C@mcs.anl.gov> Hello, Today, I successfully ran an experiment with 5000 tasks on beagle with Coasters. These modFTdock tasks correspond to the production grade modFTdock parameters and each task takes around 20 minutes to complete. After some discussions with Mike, I configured my sites.xml file to obtain necessary resources on beagle. However, it seems that I did not configure my sites.xml optimally as the resources requested exceeded my requirements. In summary 13 blocks for 24 hours and 4 blocks for 22 hours (120 nodes) were requested while the needed were at most 4 hours on each block or more on less number of blocks. Attached are the following: sites.xml tc.data A qstat snapshot at the completion of the experiment. Suggestions and insights on how to optimize the configuration are welcome. Regards, Ketan -------------- next part -------------- A non-text attachment was scrubbed... Name: sites.xml Type: application/xml Size: 1653 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: tc.data Type: application/octet-stream Size: 239 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qstat.snapshotatjob4955.txt URL: -------------- next part -------------- From wilde at mcs.anl.gov Sat Apr 2 08:16:43 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 2 Apr 2011 08:16:43 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301711835.7410.1.camel@blabla2.none> Message-ID: <162077988.57719.1301750203568.JavaMail.root@zimbra.anl.gov> Mihael, My understanding is that Justin used a binary search approach: he kept extracting a point-in-time snapshot from SVN, and built and tested it, selecting dates in binary-search mode until he narrowed down the revision range that caused the failure to between r3835 and r3837. Justin can confirm this when he sees this message, but don't count on confirmation very soon this weekend. Best to do you own tests to verify (or I can if that would help). - Mike ----- Original Message ----- > That's possible, looking at the code. > > Though what makes you think it's that? > > Mihael > > On Fri, 2011-04-01 at 17:33 -0500, Justin M Wozniak wrote: > > I found that it appeared between Swift r3835 and r3837. > > > > On Fri, 1 Apr 2011, Michael Wilde wrote: > > > > > And Swift 0.91 works OK - it does *not* exhibit the twice-each > > > bug. > > > > > > Justin: when you went backwards down the Swift 0.92 branch on > > > Thursday > > > morning, what did you find in terms of where it appeared the bug > > > was > > > introduced? > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > >> I think we mist-spoke: The posted release 0.92 also exhibits the > > >> twice-each bug as far as I acn tell. > > >> > > >> Mihael, Justin: can you test asap to confirm or refute that > > >> observation? > > >> > > >> Thanks, > > >> > > >> - Mike > > >> > > >> which swift: ~/swift/rev/swift-0.92/bin/swift > > >> > > >> com$ swift -version > > >> Swift svn swift-r4157 cog-r3056 > > >> > > >> com$ cd ~/swift/lab > > >> com$ cat zz3.swift > > >> int arr[]; > > >> > > >> arr[0]=1; > > >> arr[1]=2; > > >> > > >> foreach a in arr { > > >> trace("for", a); > > >> } > > >> > > >> com$ swift zz3.swift > > >> Swift svn swift-r4157 cog-r3056 > > >> > > >> RunID: 20110401-1645-yyy87p39 > > >> Progress: > > >> SwiftScript trace: for, 2 > > >> SwiftScript trace: for, 1 > > >> SwiftScript trace: for, 1 > > >> SwiftScript trace: for, 2 > > >> Final status: > > >> com$ > > >> > > >> > > >> ----- Original Message ----- > > >>> I think both are good as they are. > > >>> > > >>> Would you like me to send it? > > >>> > > >>> Mihael > > >>> > > >>> On Thu, 2011-03-31 at 20:57 -0500, Michael Wilde wrote: > > >>>> And I will send this to swift-user: > > >>>> > > >>>> "Dear Swift Users, > > >>>> > > >>>> On March 29 we discovered that the Release 0.92 branches of the > > >>>> Swift and CoG trees were changed after the release and a > > >>>> concurrency > > >>>> bug was introduced. If you are running Swift from this *source > > >>>> code* > > >>>> base, please revert back to a known-working release such as the > > >>>> 0.92 > > >>>> binary release if at all possible. > > >>>> > > >>>> We're working on restoring the 0.92 SVN branch to the correct > > >>>> state > > >>>> and will report back to this email list when that is done." > > >>>> > > >>>> Anything else to say? Feel free to send this out, adjusted as > > >>>> needed, or just tell me what to change and I will. > > >>>> > > >>>> - Mike > > >>>> > > >>>> > > >>>> ----- Original Message ----- > > >>>>> Please check this proposed warning on the Downloads page and > > >>>>> let > > >>>>> me > > >>>>> know if its what we need there: > > >>>>> > > >>>>> http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php > > >>>>> > > >>>>> I also fixed the 0.91 typo (but the downloads dont actually > > >>>>> work > > >>>>> from > > >>>>> this test web. I think they will once this is committed and > > >>>>> pushed > > >>>>> live). > > >>>>> > > >>>>> - Mike > > >>>>> > > >>>>> > > >>>>> ----- Original Message ----- > > >>>>>> ----- Original Message ----- > > >>>>>>> On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > > >>>>>>>> ----- Original Message ----- > > >>>>>>>>> We decided the following: > > >>>>>>>>> - I will revert the changes in the 0.92 branch > > >>>>>>>>> - re-commit bug fixes that were committed after the > > >>>>>>>>> merge > > >>>>>>>>> - merge the 0.92 branch to trunk > > >>>>>>>>> - fix the problems in trunk > > >>>>>>>> > > >>>>>>>> Sounds good. But when and how does the fix get to users? > > >>>>>>> > > >>>>>>> The package(s) are fine. Though we should probably also have > > >>>>>>> a > > >>>>>>> source > > >>>>>>> package. The merge was done after the package(s) were > > >>>>>>> uploaded > > >>>>>>> to > > >>>>>>> the > > >>>>>>> swift site. > > >>>>>> > > >>>>>> Ah, great! > > >>>>>> > > >>>>>>> This only affects folks who have checked out from SVN the > > >>>>>>> 0.92 > > >>>>>>> branch > > >>>>>>> after the merge 9 days (or so) ago. > > >>>>>> > > >>>>>> Hmm - I question that. The release we use, based on 0.92 on > > >>>>>> Beagle, > > >>>>>> shows the twice-each error, and it was made on Feb 25, about > > >>>>>> 35 > > >>>>>> days > > >>>>>> ago. Does this merit clarification? > > >>>>>> > > >>>>>>> We should send an email to the user list once this is fixed. > > >>>>>>> We > > >>>>>>> may > > >>>>>>> also > > >>>>>>> want to send an email warning them not to check out from SVN > > >>>>>>> but > > >>>>>>> download the precompiled package instead. > > >>>>>> > > >>>>>> OK. I cant say that this will reach everyone. Perhaps some > > >>>>>> status > > >>>>>> notes on the Download page are in order. The 0.91 link there > > >>>>>> is > > >>>>>> wrong, > > >>>>>> so we need to fix that page anyways. > > >>>>>> > > >>>>>>> I am a bit confused though. I would have expected the > > >>>>>>> release > > >>>>>>> to > > >>>>>>> come > > >>>>>>> with some announcement of some form. > > >>>>>> > > >>>>>> Agreed. We kept this low profile because we were trying to > > >>>>>> coordinate > > >>>>>> it with a Web change that we never accomplished. And we've > > >>>>>> lost > > >>>>>> the > > >>>>>> habit of swift-user announcements but got to get back to > > >>>>>> doing > > >>>>>> that. > > >>>>>> So, yes. > > >>>>>>> > > >>>>>>>> > > >>>>>>>> Either create a 0.92.1 release (sounds hard based on > > >>>>>>>> above) > > >>>>>>>> or create a 0.93 release (in which case should we create > > >>>>>>>> the > > >>>>>>>> 0.93 > > >>>>>>>> branch from trunk as soon as this is fixed?) > > >>>>>>>> > > >>>>>>>> How long to re-test? (Thats a question for Sarah, Justin, > > >>>>>>>> and > > >>>>>>>> Ketan) > > >>>>>>>> Could this include the Cray support mods? > > >>>>>>> > > >>>>>>> No! Fixing a problem is not a venue for introducing untested > > >>>>>>> things > > >>>>>>> into > > >>>>>>> a release. > > >>>>>> > > >>>>>> I meant the Cray feature for 0.93 not 0.92.1 > > >>>>>> Yes, that should be tested. > > >>>>>> But its being used pretty heavily. > > >>>>>> > > >>>>>> - Mike > > >>>>>> > > >>>>>>> But it could be discussed separately :) > > >>>>>>> > > >>>>>>> Mihael > > >>>>>> > > >>>>>> -- > > >>>>>> Michael Wilde > > >>>>>> Computation Institute, University of Chicago > > >>>>>> Mathematics and Computer Science Division > > >>>>>> Argonne National Laboratory > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> Swift-devel mailing list > > >>>>>> Swift-devel at ci.uchicago.edu > > >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>>>> > > >>>>> -- > > >>>>> Michael Wilde > > >>>>> Computation Institute, University of Chicago > > >>>>> Mathematics and Computer Science Division > > >>>>> Argonne National Laboratory > > >>>>> > > >>>>> _______________________________________________ > > >>>>> Swift-devel mailing list > > >>>>> Swift-devel at ci.uchicago.edu > > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>>> > > >> > > >> -- > > >> Michael Wilde > > >> Computation Institute, University of Chicago > > >> Mathematics and Computer Science Division > > >> Argonne National Laboratory > > >> > > >> _______________________________________________ > > >> Swift-devel mailing list > > >> Swift-devel at ci.uchicago.edu > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Sat Apr 2 08:36:12 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 2 Apr 2011 08:36:12 -0500 (CDT) Subject: [Swift-devel] [Bug 313] New: update.sh script to push Swift web contents to live site gives lengthy errors Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=313 Summary: update.sh script to push Swift web contents to live site gives lengthy errors Product: Swift Version: 0.93 Platform: PC OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: wozniak at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov I have been seeing the permission errors below from this script for a while now. Justin, I think you mentioned that you know what causes them so Im assigning this to you. Errors are: com$ cd /ci/www/projects/swift com$ ./update.sh chmod: changing permissions of `./update.sh': Operation not permitted chmod: changing permissions of `updatenodocs.sh': Operation not permitted --------- Updating www... ---------- U downloads/index.php Updated to revision 4241. find: ./guides/trunk/historical: Permission denied find: ./guides/trunk/formatting: Permission denied find: ./guides/trunk/plot-tour: Permission denied find: ./guides/release-0.91/historical: Permission denied find: ./guides/release-0.91/formatting: Permission denied find: ./guides/release-0.91/plot-tour: Permission denied --------- Updating docs... ---------- --------- Updating guide: guides/trunk ---------- /ci/www/projects/swift/guides/trunk /ci/www/projects/swift ln: accessing `formatting/docbook': Permission denied ln: accessing `formatting/fop': Permission denied Skipped '.' ./update.sh: line 23: ./buildguides.sh: Permission denied /ci/www/projects/swift/guides/trunk/userguide /ci/www/projects/swift/guides/trunk /ci/www/projects/swift chmod: changing permissions of `appmodel.php': Operation not permitted chmod: changing permissions of `buildoptions.php': Operation not permitted chmod: changing permissions of `cdm.php': Operation not permitted chmod: changing permissions of `clustering.php': Operation not permitted chmod: changing permissions of `coasters.php': Operation not permitted chmod: changing permissions of `commands.php': Operation not permitted chmod: changing permissions of `engineconfiguration.php': Operation not permitted chmod: changing permissions of `extending.php': Operation not permitted chmod: changing permissions of `functions.php': Operation not permitted chmod: changing permissions of `index.php': Operation not permitted chmod: changing permissions of `kickstart.php': Operation not permitted chmod: changing permissions of `language.php': Operation not permitted chmod: changing permissions of `localhowtos.php': Operation not permitted chmod: changing permissions of `mappers.php': Operation not permitted chmod: changing permissions of `overview.php': Operation not permitted chmod: changing permissions of `procedures.php': Operation not permitted chmod: changing permissions of `profiles.php': Operation not permitted chmod: changing permissions of `reliability.php': Operation not permitted chmod: changing permissions of `sitecatalog.php': Operation not permitted chmod: changing permissions of `techoverview.php': Operation not permitted chmod: changing permissions of `transformationcatalog.php': Operation not permitted chmod: changing permissions of `userguide-rotated.jpeg': Operation not permitted chmod: changing permissions of `userguide-shane.jpeg': Operation not permitted chmod: changing permissions of `swift-site-model.png': Operation not permitted chmod: changing permissions of `type-hierarchy.png': Operation not permitted chmod: changing permissions of `appmodel.php': Operation not permitted chmod: changing permissions of `buildoptions.php': Operation not permitted chmod: changing permissions of `cdm.php': Operation not permitted chmod: changing permissions of `clustering.php': Operation not permitted chmod: changing permissions of `coasters.php': Operation not permitted chmod: changing permissions of `commands.php': Operation not permitted chmod: changing permissions of `engineconfiguration.php': Operation not permitted chmod: changing permissions of `extending.php': Operation not permitted chmod: changing permissions of `functions.php': Operation not permitted chmod: changing permissions of `index.php': Operation not permitted chmod: changing permissions of `kickstart.php': Operation not permitted chmod: changing permissions of `language.php': Operation not permitted chmod: changing permissions of `localhowtos.php': Operation not permitted chmod: changing permissions of `mappers.php': Operation not permitted chmod: changing permissions of `overview.php': Operation not permitted chmod: changing permissions of `procedures.php': Operation not permitted chmod: changing permissions of `profiles.php': Operation not permitted chmod: changing permissions of `reliability.php': Operation not permitted chmod: changing permissions of `sitecatalog.php': Operation not permitted chmod: changing permissions of `techoverview.php': Operation not permitted chmod: changing permissions of `transformationcatalog.php': Operation not permitted chmod: cannot access `*.pdf': No such file or directory /ci/www/projects/swift/guides/trunk /ci/www/projects/swift chmod: changing permissions of `userguide': Operation not permitted chmod: changing permissions of `userguide/procedures.php': Operation not permitted chmod: changing permissions of `userguide/commands.php': Operation not permitted chmod: changing permissions of `userguide/localhowtos.php': Operation not permitted chmod: changing permissions of `userguide/profiles.php': Operation not permitted chmod: changing permissions of `userguide/appmodel.php': Operation not permitted chmod: changing permissions of `userguide/coasters.php': Operation not permitted chmod: changing permissions of `userguide/swift-site-model.png': Operation not permitted chmod: changing permissions of `userguide/mappers.php': Operation not permitted chmod: changing permissions of `userguide/overview.php': Operation not permitted chmod: changing permissions of `userguide/kickstart.php': Operation not permitted chmod: changing permissions of `userguide/sitecatalog.php': Operation not permitted chmod: changing permissions of `userguide/functions.php': Operation not permitted chmod: changing permissions of `userguide/cdm.php': Operation not permitted chmod: changing permissions of `userguide/extending.php': Operation not permitted chmod: changing permissions of `userguide/language.php': Operation not permitted chmod: changing permissions of `userguide/buildoptions.php': Operation not permitted chmod: changing permissions of `userguide/index.php': Operation not permitted chmod: changing permissions of `userguide/reliability.php': Operation not permitted chmod: changing permissions of `userguide/techoverview.php': Operation not permitted chmod: changing permissions of `userguide/type-hierarchy.png': Operation not permitted chmod: changing permissions of `userguide/engineconfiguration.php': Operation not permitted chmod: changing permissions of `userguide/clustering.php': Operation not permitted chmod: changing permissions of `userguide/userguide-shane.jpeg': Operation not permitted chmod: changing permissions of `userguide/userguide-rotated.jpeg': Operation not permitted chmod: changing permissions of `userguide/transformationcatalog.php': Operation not permitted chmod: changing permissions of `userguide': Operation not permitted /ci/www/projects/swift --------- Updating guide: guides/release-0.91 ---------- /ci/www/projects/swift/guides/release-0.91 /ci/www/projects/swift ln: accessing `formatting/docbook': Permission denied ln: accessing `formatting/fop': Permission denied Skipped '.' ./update.sh: line 23: ./buildguides.sh: Permission denied /ci/www/projects/swift/guides/release-0.91/userguide /ci/www/projects/swift/guides/release-0.91 /ci/www/projects/swift chmod: changing permissions of `appmodel.php': Operation not permitted chmod: changing permissions of `buildoptions.php': Operation not permitted chmod: changing permissions of `clustering.php': Operation not permitted chmod: changing permissions of `coasters.php': Operation not permitted chmod: changing permissions of `commands.php': Operation not permitted chmod: changing permissions of `engineconfiguration.php': Operation not permitted chmod: changing permissions of `extending.php': Operation not permitted chmod: changing permissions of `functions.php': Operation not permitted chmod: changing permissions of `index.php': Operation not permitted chmod: changing permissions of `kickstart.php': Operation not permitted chmod: changing permissions of `language.php': Operation not permitted chmod: changing permissions of `localhowtos.php': Operation not permitted chmod: changing permissions of `mappers.php': Operation not permitted chmod: changing permissions of `overview.php': Operation not permitted chmod: changing permissions of `procedures.php': Operation not permitted chmod: changing permissions of `profiles.php': Operation not permitted chmod: changing permissions of `reliability.php': Operation not permitted chmod: changing permissions of `sitecatalog.php': Operation not permitted chmod: changing permissions of `techoverview.php': Operation not permitted chmod: changing permissions of `transformationcatalog.php': Operation not permitted chmod: changing permissions of `userguide-rotated.jpeg': Operation not permitted chmod: changing permissions of `userguide-shane.jpeg': Operation not permitted chmod: changing permissions of `swift-site-model.png': Operation not permitted chmod: changing permissions of `type-hierarchy.png': Operation not permitted chmod: changing permissions of `appmodel.php': Operation not permitted chmod: changing permissions of `buildoptions.php': Operation not permitted chmod: changing permissions of `clustering.php': Operation not permitted chmod: changing permissions of `coasters.php': Operation not permitted chmod: changing permissions of `commands.php': Operation not permitted chmod: changing permissions of `engineconfiguration.php': Operation not permitted chmod: changing permissions of `extending.php': Operation not permitted chmod: changing permissions of `functions.php': Operation not permitted chmod: changing permissions of `index.php': Operation not permitted chmod: changing permissions of `kickstart.php': Operation not permitted chmod: changing permissions of `language.php': Operation not permitted chmod: changing permissions of `localhowtos.php': Operation not permitted chmod: changing permissions of `mappers.php': Operation not permitted chmod: changing permissions of `overview.php': Operation not permitted chmod: changing permissions of `procedures.php': Operation not permitted chmod: changing permissions of `profiles.php': Operation not permitted chmod: changing permissions of `reliability.php': Operation not permitted chmod: changing permissions of `sitecatalog.php': Operation not permitted chmod: changing permissions of `techoverview.php': Operation not permitted chmod: changing permissions of `transformationcatalog.php': Operation not permitted chmod: cannot access `*.pdf': No such file or directory /ci/www/projects/swift/guides/release-0.91 /ci/www/projects/swift chmod: changing permissions of `userguide': Operation not permitted chmod: changing permissions of `userguide/procedures.php': Operation not permitted chmod: changing permissions of `userguide/commands.php': Operation not permitted chmod: changing permissions of `userguide/localhowtos.php': Operation not permitted chmod: changing permissions of `userguide/profiles.php': Operation not permitted chmod: changing permissions of `userguide/appmodel.php': Operation not permitted chmod: changing permissions of `userguide/coasters.php': Operation not permitted chmod: changing permissions of `userguide/swift-site-model.png': Operation not permitted chmod: changing permissions of `userguide/mappers.php': Operation not permitted chmod: changing permissions of `userguide/overview.php': Operation not permitted chmod: changing permissions of `userguide/kickstart.php': Operation not permitted chmod: changing permissions of `userguide/sitecatalog.php': Operation not permitted chmod: changing permissions of `userguide/functions.php': Operation not permitted chmod: changing permissions of `userguide/extending.php': Operation not permitted chmod: changing permissions of `userguide/language.php': Operation not permitted chmod: changing permissions of `userguide/buildoptions.php': Operation not permitted chmod: changing permissions of `userguide/index.php': Operation not permitted chmod: changing permissions of `userguide/reliability.php': Operation not permitted chmod: changing permissions of `userguide/techoverview.php': Operation not permitted chmod: changing permissions of `userguide/type-hierarchy.png': Operation not permitted chmod: changing permissions of `userguide/engineconfiguration.php': Operation not permitted chmod: changing permissions of `userguide/clustering.php': Operation not permitted chmod: changing permissions of `userguide/userguide-shane.jpeg': Operation not permitted chmod: changing permissions of `userguide/userguide-rotated.jpeg': Operation not permitted chmod: changing permissions of `userguide/transformationcatalog.php': Operation not permitted chmod: changing permissions of `userguide': Operation not permitted /ci/www/projects/swift --------- All done ---------- com -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wilde at mcs.anl.gov Sat Apr 2 08:54:54 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 2 Apr 2011 08:54:54 -0500 (CDT) Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: Message-ID: <1089753936.57755.1301752494884.JavaMail.root@zimbra.anl.gov> Sounds good, John. Yes, there is a logical negation operator, "!": com$ cat bool.swift boolean b = true; if(!b) { trace("its true"); } else { trace("its false"); } com$ swift bool.swift Swift svn swift-r3826 cog-r2988 RunID: 20110402-0854-r3p5rrqa Progress: SwiftScript trace: its false Final status: com$ - Mike ----- Original Message ----- > On Apr 1, 2011, at 12:25 PM, Michael Wilde wrote: > > > John, I cc'ed you to confirm that the function is exactly what you > > were looking for, and that the simple code below matches exactly the > > requirement thats driving this feature request. > > > > Can you confirm that both of these are true? > > Michael, > > So assuming that there is a logical negation operator in Swift the > function you describe matches my requirements. > > Thanks, > John > > > > We just want to make sure that we understand the need and determine > > if the proposed exists() function is the best way to address it. > > From your comment below, I was not quite sure if exists() is exactly > > the right approach here. > > > > Thanks, > > > > Mike > > > > > > ----- Original Message ----- > >> Michael, > >> > >> This type of function would be great to have. > >> > >> John > >> On Apr 1, 2011, at 11:51 AM, Michael Wilde wrote: > >> > >>> Basically as far as I understand: the presence or absence of a > >>> particular data file within the inout dataset is to be used to > >>> determine whether the code to process that dataset subsection gets > >>> invoked or not: > >>> > >>> if (exists("extra.data")) { > >>> DataFile extraInput<"extra.data">; > >>> extraResult = analyze(extraInput); > >>> } > >>> > >>> The above is my assumption based on a phone call. We can and > >>> should > >>> verify the assumption with a simple example. > >>> > >>> I also thought we can try this today by seeing if extraInput can > >>> be > >>> an array, mapped to zero items if nothing to do and 1 item if > >>> something to do. That would at least let us test the use case. > >>> > >>> John, can you verify if the example Swift lines above are what you > >>> are looking for here? > >>> > >>> - Mike > >>> > >>> ----- Original Message ----- > >>>> On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: > >>>> > >>>>> - we should first verify that exists() will solve the NCAR need > >>>>> in > >>>>> a > >>>>> sufficiently clean way > >>>> > >>>> I think this is important. Can we get a description of the > >>>> problem > >>>> instead of a (otherwise) random proposal for a solution? > >>> > >>> -- > >>> Michael Wilde > >>> Computation Institute, University of Chicago > >>> Mathematics and Computer Science Division > >>> Argonne National Laboratory > >>> > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From tim.g.armstrong at gmail.com Sat Apr 2 09:38:50 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Sat, 2 Apr 2011 09:38:50 -0500 Subject: [Swift-devel] Re: saturday morning rambling about exceptions In-Reply-To: References: Message-ID: Yeah, I like the simplicity of if(!exists(...)). With try/catch I just think it would be difficult for the user to reason about what state things are in when an exception occurs and we end up in the catch block - it seems very nondeterministic which threads will be running and how much progress they will have made - tim On Fri, Apr 1, 2011 at 11:34 PM, Ben Clifford wrote: > > > Try/catch statements don't have a clean intepretation if you're trying to > > follow the flow of data through the program. > > Right. > > I thought about this a bit before but it started getting all haskelly and > something i didn't want to approach - as mike says its much more > researchy. > > The main idea which I found most comfortable was Haskell's Either type. > That can model something like exceptions but is more amenable to a > dataflow style of programming. > > (although C implements something very restricted similar in the form of > pointers which can be NULL, or when returning a return code that is in > some range of values for an error and some other range of values for > success) > > The way the haskell Either type works is that something which might "throw > an exception" returns a value that is either an error object or a correct > value object. > > So then you use an if or a case or anything else that can change behaviour > based on data to change your flow. > > Imagine in C: > > p = malloc(5); > if(p == NULL) { > /* catch-exception, but dataflow style - we don't have to look at p > right after the malloc. We can store it into a structure and > come back to it later to check. > unfortuantely, there is not enough space in p to represent > any richer kind of exception information other than "FAIL" */ > } else { > { > /* no exception, and we have the correct value in p */ > } > > In haskell, that looks like: > p = somefunc > case p of > Right d -> doSomethingWith d -- somefunc succeeded and returned > -- us value d > Left ex -> handleError ex -- somefunc failed and returned > -- value ex as the equivalnet of a > -- Java Exception object. > > Now it gets a bit awkward having to unwrap "Right d" after every such > function call, and in that respect I think a syntax that looks more like > try-catch is much easier to program with. However I think the right way to > go is to define its semantics in terms of "dataflow" style exceptions > above rather than control-flow. > > I think that's possible and would result in something that satisfies both > the dataflow-purist camp and the "I like exception syntax" camp (both > camps of which I'm in ;) > > -- > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sat Apr 2 10:36:04 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 2 Apr 2011 10:36:04 -0500 (CDT) Subject: [Swift-devel] Re: saturday morning rambling about exceptions In-Reply-To: Message-ID: <810281195.57855.1301758564983.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > Yeah, I like the simplicity of if(!exists(...)). I do too, but Ben, I like the ideas you suggest of dataflow-based exception handling and think we should continue the discussion. I think that the exists() use case is not one for exception handling. Rather, it raises other programming model issues related to both mapping and to the original make-like "virtual data" semantics of VDL: if a file doesnt exist you execute the function to create it. Maybe we can cleanly integrate this back into Swift in an optional manner (ie, it only happens when explicitly requested). On the other hand, maybe thats exactly what exists() will give us? Dont know yet. I think the exception handling mechanism is very useful to pursue for true exception conditions. And there to, we have the issue of when should exceptions get handled transparently by the runtime system and when should the script be able to see them? Maybe, there, too, exception handling happens transparent unless the user sets explicit exception handling. We'd need to consider if this changes with a dataflow-based approach. - Mike > With try/catch I just think it would be difficult for the user to > reason about what state things are in when an exception occurs and we > end up in the catch block - it seems very nondeterministic which > threads will be running and how much progress they will have made > > - tim > > > > > On Fri, Apr 1, 2011 at 11:34 PM, Ben Clifford < benc at hawaga.org.uk > > wrote: > > > > > Try/catch statements don't have a clean intepretation if you're > > trying to > > follow the flow of data through the program. > > Right. > > I thought about this a bit before but it started getting all haskelly > and > something i didn't want to approach - as mike says its much more > researchy. > > The main idea which I found most comfortable was Haskell's Either > type. > That can model something like exceptions but is more amenable to a > dataflow style of programming. > > (although C implements something very restricted similar in the form > of > pointers which can be NULL, or when returning a return code that is in > some range of values for an error and some other range of values for > success) > > The way the haskell Either type works is that something which might > "throw > an exception" returns a value that is either an error object or a > correct > value object. > > So then you use an if or a case or anything else that can change > behaviour > based on data to change your flow. > > Imagine in C: > > p = malloc(5); > if(p == NULL) { > /* catch-exception, but dataflow style - we don't have to look at p > right after the malloc. We can store it into a structure and > come back to it later to check. > unfortuantely, there is not enough space in p to represent > any richer kind of exception information other than "FAIL" */ > } else { > { > /* no exception, and we have the correct value in p */ > } > > In haskell, that looks like: > p = somefunc > case p of > Right d -> doSomethingWith d -- somefunc succeeded and returned > -- us value d > Left ex -> handleError ex -- somefunc failed and returned > -- value ex as the equivalnet of a > -- Java Exception object. > > Now it gets a bit awkward having to unwrap "Right d" after every such > function call, and in that respect I think a syntax that looks more > like > try-catch is much easier to program with. However I think the right > way to > go is to define its semantics in terms of "dataflow" style exceptions > above rather than control-flow. > > I think that's possible and would result in something that satisfies > both > the dataflow-purist camp and the "I like exception syntax" camp (both > camps of which I'm in ;) > > -- > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Sat Apr 2 12:26:34 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sat, 2 Apr 2011 12:26:34 -0500 (Central Daylight Time) Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: <2A5A416A-CAB6-4D82-B55F-0CFBC8F3B770@ucar.edu> References: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> <2A5A416A-CAB6-4D82-B55F-0CFBC8F3B770@ucar.edu> Message-ID: I have a prototype of this, I'll get it checked in later today. Justin On Fri, 1 Apr 2011, John Dennis wrote: > Michael, > > This type of function would be great to have. > > John > On Apr 1, 2011, at 11:51 AM, Michael Wilde wrote: > >> Basically as far as I understand: the presence or absence of a particular >> data file within the inout dataset is to be used to determine whether the >> code to process that dataset subsection gets invoked or not: >> >> if (exists("extra.data")) { >> DataFile extraInput<"extra.data">; >> extraResult = analyze(extraInput); >> } >> >> The above is my assumption based on a phone call. We can and should verify >> the assumption with a simple example. >> >> I also thought we can try this today by seeing if extraInput can be an >> array, mapped to zero items if nothing to do and 1 item if something to do. >> That would at least let us test the use case. >> >> John, can you verify if the example Swift lines above are what you are >> looking for here? >> >> - Mike >> >> ----- Original Message ----- >>> On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: >>> >>>> - we should first verify that exists() will solve the NCAR need in a >>>> sufficiently clean way >>> >>> I think this is important. Can we get a description of the problem >>> instead of a (otherwise) random proposal for a solution? >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Justin M Wozniak From wozniak at mcs.anl.gov Sat Apr 2 12:45:49 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sat, 2 Apr 2011 12:45:49 -0500 (Central Daylight Time) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <162077988.57719.1301750203568.JavaMail.root@zimbra.anl.gov> References: <162077988.57719.1301750203568.JavaMail.root@zimbra.anl.gov> Message-ID: That's right- I will run it again today to confirm. On Sat, 2 Apr 2011, Michael Wilde wrote: > Mihael, > > My understanding is that Justin used a binary search approach: he kept > extracting a point-in-time snapshot from SVN, and built and tested it, > selecting dates in binary-search mode until he narrowed down the > revision range that caused the failure to between r3835 and r3837. > > Justin can confirm this when he sees this message, but don't count on > confirmation very soon this weekend. Best to do you own tests to verify > (or I can if that would help). > > - Mike > > ----- Original Message ----- >> That's possible, looking at the code. >> >> Though what makes you think it's that? >> >> Mihael >> >> On Fri, 2011-04-01 at 17:33 -0500, Justin M Wozniak wrote: >>> I found that it appeared between Swift r3835 and r3837. >>> >>> On Fri, 1 Apr 2011, Michael Wilde wrote: >>> >>>> And Swift 0.91 works OK - it does *not* exhibit the twice-each >>>> bug. >>>> >>>> Justin: when you went backwards down the Swift 0.92 branch on >>>> Thursday >>>> morning, what did you find in terms of where it appeared the bug >>>> was >>>> introduced? >>>> >>>> - Mike >>>> >>>> >>>> ----- Original Message ----- >>>>> I think we mist-spoke: The posted release 0.92 also exhibits the >>>>> twice-each bug as far as I acn tell. >>>>> >>>>> Mihael, Justin: can you test asap to confirm or refute that >>>>> observation? >>>>> >>>>> Thanks, >>>>> >>>>> - Mike >>>>> >>>>> which swift: ~/swift/rev/swift-0.92/bin/swift >>>>> >>>>> com$ swift -version >>>>> Swift svn swift-r4157 cog-r3056 >>>>> >>>>> com$ cd ~/swift/lab >>>>> com$ cat zz3.swift >>>>> int arr[]; >>>>> >>>>> arr[0]=1; >>>>> arr[1]=2; >>>>> >>>>> foreach a in arr { >>>>> trace("for", a); >>>>> } >>>>> >>>>> com$ swift zz3.swift >>>>> Swift svn swift-r4157 cog-r3056 >>>>> >>>>> RunID: 20110401-1645-yyy87p39 >>>>> Progress: >>>>> SwiftScript trace: for, 2 >>>>> SwiftScript trace: for, 1 >>>>> SwiftScript trace: for, 1 >>>>> SwiftScript trace: for, 2 >>>>> Final status: >>>>> com$ >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> I think both are good as they are. >>>>>> >>>>>> Would you like me to send it? >>>>>> >>>>>> Mihael >>>>>> >>>>>> On Thu, 2011-03-31 at 20:57 -0500, Michael Wilde wrote: >>>>>>> And I will send this to swift-user: >>>>>>> >>>>>>> "Dear Swift Users, >>>>>>> >>>>>>> On March 29 we discovered that the Release 0.92 branches of the >>>>>>> Swift and CoG trees were changed after the release and a >>>>>>> concurrency >>>>>>> bug was introduced. If you are running Swift from this *source >>>>>>> code* >>>>>>> base, please revert back to a known-working release such as the >>>>>>> 0.92 >>>>>>> binary release if at all possible. >>>>>>> >>>>>>> We're working on restoring the 0.92 SVN branch to the correct >>>>>>> state >>>>>>> and will report back to this email list when that is done." >>>>>>> >>>>>>> Anything else to say? Feel free to send this out, adjusted as >>>>>>> needed, or just tell me what to change and I will. >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> Please check this proposed warning on the Downloads page and >>>>>>>> let >>>>>>>> me >>>>>>>> know if its what we need there: >>>>>>>> >>>>>>>> http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php >>>>>>>> >>>>>>>> I also fixed the 0.91 typo (but the downloads dont actually >>>>>>>> work >>>>>>>> from >>>>>>>> this test web. I think they will once this is committed and >>>>>>>> pushed >>>>>>>> live). >>>>>>>> >>>>>>>> - Mike >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> ----- Original Message ----- >>>>>>>>>> On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: >>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>> We decided the following: >>>>>>>>>>>> - I will revert the changes in the 0.92 branch >>>>>>>>>>>> - re-commit bug fixes that were committed after the >>>>>>>>>>>> merge >>>>>>>>>>>> - merge the 0.92 branch to trunk >>>>>>>>>>>> - fix the problems in trunk >>>>>>>>>>> >>>>>>>>>>> Sounds good. But when and how does the fix get to users? >>>>>>>>>> >>>>>>>>>> The package(s) are fine. Though we should probably also have >>>>>>>>>> a >>>>>>>>>> source >>>>>>>>>> package. The merge was done after the package(s) were >>>>>>>>>> uploaded >>>>>>>>>> to >>>>>>>>>> the >>>>>>>>>> swift site. >>>>>>>>> >>>>>>>>> Ah, great! >>>>>>>>> >>>>>>>>>> This only affects folks who have checked out from SVN the >>>>>>>>>> 0.92 >>>>>>>>>> branch >>>>>>>>>> after the merge 9 days (or so) ago. >>>>>>>>> >>>>>>>>> Hmm - I question that. The release we use, based on 0.92 on >>>>>>>>> Beagle, >>>>>>>>> shows the twice-each error, and it was made on Feb 25, about >>>>>>>>> 35 >>>>>>>>> days >>>>>>>>> ago. Does this merit clarification? >>>>>>>>> >>>>>>>>>> We should send an email to the user list once this is fixed. >>>>>>>>>> We >>>>>>>>>> may >>>>>>>>>> also >>>>>>>>>> want to send an email warning them not to check out from SVN >>>>>>>>>> but >>>>>>>>>> download the precompiled package instead. >>>>>>>>> >>>>>>>>> OK. I cant say that this will reach everyone. Perhaps some >>>>>>>>> status >>>>>>>>> notes on the Download page are in order. The 0.91 link there >>>>>>>>> is >>>>>>>>> wrong, >>>>>>>>> so we need to fix that page anyways. >>>>>>>>> >>>>>>>>>> I am a bit confused though. I would have expected the >>>>>>>>>> release >>>>>>>>>> to >>>>>>>>>> come >>>>>>>>>> with some announcement of some form. >>>>>>>>> >>>>>>>>> Agreed. We kept this low profile because we were trying to >>>>>>>>> coordinate >>>>>>>>> it with a Web change that we never accomplished. And we've >>>>>>>>> lost >>>>>>>>> the >>>>>>>>> habit of swift-user announcements but got to get back to >>>>>>>>> doing >>>>>>>>> that. >>>>>>>>> So, yes. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Either create a 0.92.1 release (sounds hard based on >>>>>>>>>>> above) >>>>>>>>>>> or create a 0.93 release (in which case should we create >>>>>>>>>>> the >>>>>>>>>>> 0.93 >>>>>>>>>>> branch from trunk as soon as this is fixed?) >>>>>>>>>>> >>>>>>>>>>> How long to re-test? (Thats a question for Sarah, Justin, >>>>>>>>>>> and >>>>>>>>>>> Ketan) >>>>>>>>>>> Could this include the Cray support mods? >>>>>>>>>> >>>>>>>>>> No! Fixing a problem is not a venue for introducing untested >>>>>>>>>> things >>>>>>>>>> into >>>>>>>>>> a release. >>>>>>>>> >>>>>>>>> I meant the Cray feature for 0.93 not 0.92.1 >>>>>>>>> Yes, that should be tested. >>>>>>>>> But its being used pretty heavily. >>>>>>>>> >>>>>>>>> - Mike >>>>>>>>> >>>>>>>>>> But it could be discussed separately :) >>>>>>>>>> >>>>>>>>>> Mihael >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Michael Wilde >>>>>>>>> Computation Institute, University of Chicago >>>>>>>>> Mathematics and Computer Science Division >>>>>>>>> Argonne National Laboratory >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Swift-devel mailing list >>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>> >>>>>>>> -- >>>>>>>> Michael Wilde >>>>>>>> Computation Institute, University of Chicago >>>>>>>> Mathematics and Computer Science Division >>>>>>>> Argonne National Laboratory >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Swift-devel mailing list >>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>> >>>>> >>>>> -- >>>>> Michael Wilde >>>>> Computation Institute, University of Chicago >>>>> Mathematics and Computer Science Division >>>>> Argonne National Laboratory >>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>> > > -- Justin M Wozniak From hategan at mcs.anl.gov Sat Apr 2 13:18:30 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 02 Apr 2011 11:18:30 -0700 Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: References: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> <1301733232.18577.3.camel@blabla2.none> Message-ID: <1301768310.19940.5.camel@blabla2.none> On Sat, 2011-04-02 at 09:00 +0000, Ben Clifford wrote: > > And I am emphasizing here that the optional attribute should belong to > > the data. You probably don't want to have it optional with one > > invocation and non optional with another. Given that the successful > > completion of a run requires that all invocations complete successfully, > > anything else would be silly. > > maybe "optional" is not the right word. its data that needs to exist. but > you are specifying two ways for it to exist in the scope of the current > run: either you find it on a filesystem, or you run some program that > outputs it. but I agree that whatever it is called, its attached to the > data not to an invocation. it sounds a lot like the virtual data idea. > "maybe"? but that sounds imperative rather than declarative. From hategan at mcs.anl.gov Sat Apr 2 13:20:04 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 02 Apr 2011 11:20:04 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: References: <162077988.57719.1301750203568.JavaMail.root@zimbra.anl.gov> Message-ID: <1301768404.19940.6.camel@blabla2.none> On Sat, 2011-04-02 at 12:45 -0500, Justin M Wozniak wrote: > That's right- I will run it again today to confirm. > > On Sat, 2 Apr 2011, Michael Wilde wrote: > > > Mihael, > > > > My understanding is that Justin used a binary search approach: Nicely done. > he kept > > extracting a point-in-time snapshot from SVN, and built and tested it, > > selecting dates in binary-search mode until he narrowed down the > > revision range that caused the failure to between r3835 and r3837. > > > > Justin can confirm this when he sees this message, but don't count on > > confirmation very soon this weekend. Best to do you own tests to verify > > (or I can if that would help). > > > > - Mike > > > > ----- Original Message ----- > >> That's possible, looking at the code. > >> > >> Though what makes you think it's that? > >> > >> Mihael > >> > >> On Fri, 2011-04-01 at 17:33 -0500, Justin M Wozniak wrote: > >>> I found that it appeared between Swift r3835 and r3837. > >>> > >>> On Fri, 1 Apr 2011, Michael Wilde wrote: > >>> > >>>> And Swift 0.91 works OK - it does *not* exhibit the twice-each > >>>> bug. > >>>> > >>>> Justin: when you went backwards down the Swift 0.92 branch on > >>>> Thursday > >>>> morning, what did you find in terms of where it appeared the bug > >>>> was > >>>> introduced? > >>>> > >>>> - Mike > >>>> > >>>> > >>>> ----- Original Message ----- > >>>>> I think we mist-spoke: The posted release 0.92 also exhibits the > >>>>> twice-each bug as far as I acn tell. > >>>>> > >>>>> Mihael, Justin: can you test asap to confirm or refute that > >>>>> observation? > >>>>> > >>>>> Thanks, > >>>>> > >>>>> - Mike > >>>>> > >>>>> which swift: ~/swift/rev/swift-0.92/bin/swift > >>>>> > >>>>> com$ swift -version > >>>>> Swift svn swift-r4157 cog-r3056 > >>>>> > >>>>> com$ cd ~/swift/lab > >>>>> com$ cat zz3.swift > >>>>> int arr[]; > >>>>> > >>>>> arr[0]=1; > >>>>> arr[1]=2; > >>>>> > >>>>> foreach a in arr { > >>>>> trace("for", a); > >>>>> } > >>>>> > >>>>> com$ swift zz3.swift > >>>>> Swift svn swift-r4157 cog-r3056 > >>>>> > >>>>> RunID: 20110401-1645-yyy87p39 > >>>>> Progress: > >>>>> SwiftScript trace: for, 2 > >>>>> SwiftScript trace: for, 1 > >>>>> SwiftScript trace: for, 1 > >>>>> SwiftScript trace: for, 2 > >>>>> Final status: > >>>>> com$ > >>>>> > >>>>> > >>>>> ----- Original Message ----- > >>>>>> I think both are good as they are. > >>>>>> > >>>>>> Would you like me to send it? > >>>>>> > >>>>>> Mihael > >>>>>> > >>>>>> On Thu, 2011-03-31 at 20:57 -0500, Michael Wilde wrote: > >>>>>>> And I will send this to swift-user: > >>>>>>> > >>>>>>> "Dear Swift Users, > >>>>>>> > >>>>>>> On March 29 we discovered that the Release 0.92 branches of the > >>>>>>> Swift and CoG trees were changed after the release and a > >>>>>>> concurrency > >>>>>>> bug was introduced. If you are running Swift from this *source > >>>>>>> code* > >>>>>>> base, please revert back to a known-working release such as the > >>>>>>> 0.92 > >>>>>>> binary release if at all possible. > >>>>>>> > >>>>>>> We're working on restoring the 0.92 SVN branch to the correct > >>>>>>> state > >>>>>>> and will report back to this email list when that is done." > >>>>>>> > >>>>>>> Anything else to say? Feel free to send this out, adjusted as > >>>>>>> needed, or just tell me what to change and I will. > >>>>>>> > >>>>>>> - Mike > >>>>>>> > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> Please check this proposed warning on the Downloads page and > >>>>>>>> let > >>>>>>>> me > >>>>>>>> know if its what we need there: > >>>>>>>> > >>>>>>>> http://www.ci.uchicago.edu/~wilde/swift/downloads/index.php > >>>>>>>> > >>>>>>>> I also fixed the 0.91 typo (but the downloads dont actually > >>>>>>>> work > >>>>>>>> from > >>>>>>>> this test web. I think they will once this is committed and > >>>>>>>> pushed > >>>>>>>> live). > >>>>>>>> > >>>>>>>> - Mike > >>>>>>>> > >>>>>>>> > >>>>>>>> ----- Original Message ----- > >>>>>>>>> ----- Original Message ----- > >>>>>>>>>> On Thu, 2011-03-31 at 20:14 -0500, Michael Wilde wrote: > >>>>>>>>>>> ----- Original Message ----- > >>>>>>>>>>>> We decided the following: > >>>>>>>>>>>> - I will revert the changes in the 0.92 branch > >>>>>>>>>>>> - re-commit bug fixes that were committed after the > >>>>>>>>>>>> merge > >>>>>>>>>>>> - merge the 0.92 branch to trunk > >>>>>>>>>>>> - fix the problems in trunk > >>>>>>>>>>> > >>>>>>>>>>> Sounds good. But when and how does the fix get to users? > >>>>>>>>>> > >>>>>>>>>> The package(s) are fine. Though we should probably also have > >>>>>>>>>> a > >>>>>>>>>> source > >>>>>>>>>> package. The merge was done after the package(s) were > >>>>>>>>>> uploaded > >>>>>>>>>> to > >>>>>>>>>> the > >>>>>>>>>> swift site. > >>>>>>>>> > >>>>>>>>> Ah, great! > >>>>>>>>> > >>>>>>>>>> This only affects folks who have checked out from SVN the > >>>>>>>>>> 0.92 > >>>>>>>>>> branch > >>>>>>>>>> after the merge 9 days (or so) ago. > >>>>>>>>> > >>>>>>>>> Hmm - I question that. The release we use, based on 0.92 on > >>>>>>>>> Beagle, > >>>>>>>>> shows the twice-each error, and it was made on Feb 25, about > >>>>>>>>> 35 > >>>>>>>>> days > >>>>>>>>> ago. Does this merit clarification? > >>>>>>>>> > >>>>>>>>>> We should send an email to the user list once this is fixed. > >>>>>>>>>> We > >>>>>>>>>> may > >>>>>>>>>> also > >>>>>>>>>> want to send an email warning them not to check out from SVN > >>>>>>>>>> but > >>>>>>>>>> download the precompiled package instead. > >>>>>>>>> > >>>>>>>>> OK. I cant say that this will reach everyone. Perhaps some > >>>>>>>>> status > >>>>>>>>> notes on the Download page are in order. The 0.91 link there > >>>>>>>>> is > >>>>>>>>> wrong, > >>>>>>>>> so we need to fix that page anyways. > >>>>>>>>> > >>>>>>>>>> I am a bit confused though. I would have expected the > >>>>>>>>>> release > >>>>>>>>>> to > >>>>>>>>>> come > >>>>>>>>>> with some announcement of some form. > >>>>>>>>> > >>>>>>>>> Agreed. We kept this low profile because we were trying to > >>>>>>>>> coordinate > >>>>>>>>> it with a Web change that we never accomplished. And we've > >>>>>>>>> lost > >>>>>>>>> the > >>>>>>>>> habit of swift-user announcements but got to get back to > >>>>>>>>> doing > >>>>>>>>> that. > >>>>>>>>> So, yes. > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Either create a 0.92.1 release (sounds hard based on > >>>>>>>>>>> above) > >>>>>>>>>>> or create a 0.93 release (in which case should we create > >>>>>>>>>>> the > >>>>>>>>>>> 0.93 > >>>>>>>>>>> branch from trunk as soon as this is fixed?) > >>>>>>>>>>> > >>>>>>>>>>> How long to re-test? (Thats a question for Sarah, Justin, > >>>>>>>>>>> and > >>>>>>>>>>> Ketan) > >>>>>>>>>>> Could this include the Cray support mods? > >>>>>>>>>> > >>>>>>>>>> No! Fixing a problem is not a venue for introducing untested > >>>>>>>>>> things > >>>>>>>>>> into > >>>>>>>>>> a release. > >>>>>>>>> > >>>>>>>>> I meant the Cray feature for 0.93 not 0.92.1 > >>>>>>>>> Yes, that should be tested. > >>>>>>>>> But its being used pretty heavily. > >>>>>>>>> > >>>>>>>>> - Mike > >>>>>>>>> > >>>>>>>>>> But it could be discussed separately :) > >>>>>>>>>> > >>>>>>>>>> Mihael > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Michael Wilde > >>>>>>>>> Computation Institute, University of Chicago > >>>>>>>>> Mathematics and Computer Science Division > >>>>>>>>> Argonne National Laboratory > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Swift-devel mailing list > >>>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Michael Wilde > >>>>>>>> Computation Institute, University of Chicago > >>>>>>>> Mathematics and Computer Science Division > >>>>>>>> Argonne National Laboratory > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Swift-devel mailing list > >>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>> > >>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>>>> _______________________________________________ > >>>>> Swift-devel mailing list > >>>>> Swift-devel at ci.uchicago.edu > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>> > >>>> > >>> > > > > > From wilde at mcs.anl.gov Sat Apr 2 14:07:04 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 2 Apr 2011 14:07:04 -0500 (CDT) Subject: [Swift-devel] Re: Important: Please confirm evidence on twiceEach() bug In-Reply-To: <1301768243.19940.4.camel@blabla2.none> Message-ID: <908171605.57992.1301771224770.JavaMail.root@zimbra.anl.gov> Moving an off-list thread to the list: A few comments below just to make sure we're all in sync... > I was just trying to figure out what the problem was and whether the > binary packages were affected. My tests show that the bug occurs in the 0.92 binary and *not* in the 0.91 binary. Please replicate these tests yourself to verify. My test was: com$ cat zz3.swift int arr[]; arr[0]=1; arr[1]=2; foreach a in arr { trace("for", a); } com$ PATH=~/swift/rev/swift-0.91/bin:$PATH com$ which swift ~/swift/rev/swift-0.91/bin/swift com$ swift zz3.swift Swift svn swift-r3826 cog-r2988 RunID: 20110402-1401-68rni1b1 Progress: SwiftScript trace: for, 1 SwiftScript trace: for, 2 Final status: com$ PATH=~/swift/rev/swift-0.92/bin:$PATH com$ which swift ~/swift/rev/swift-0.92/bin/swift com$ swift zz3.swift Swift svn swift-r4157 cog-r3056 RunID: 20110402-1402-zqhod0ha Progress: SwiftScript trace: for, 2 SwiftScript trace: for, 1 SwiftScript trace: for, 1 SwiftScript trace: for, 2 Final status: com$ > In that process, I had a suspicion that it was the merge because I > thought 0.92 was tested before the release and the bug should have > shown > up. The bug has *some* subtlety in that I did massive runs for modftdock on 0.92 and they never encountered this bug. But I verified that in at least two cases the bug does *not* occur: foreach a,i in [0:9] vec=readData(); foreach a,i in vec I had two nested loops: the outer was driven by a readData array; the inner by a constant array. I think its either the case that we have no test case for this problem, or that some tests are silently failing in our testing. > It seemed reasonable to suspect that the merge introduced it, but > I > didn't test that to confirm it. I will now. This is what I am questioning: it seems clear that this but *could not possibly* have come form Justin's recent (wrong-direction) merge in prep for 0.93. > However, I do think the merge was done the other way around, and I > revert it to keep in line with what we agreed was the "proper" way of > handling releases. Yes, I agree totally here: Justin's merge was done in the wrong direction - a well-intentioned mis-interpretetion of the Subversion book Merge chapter. I knew that Justin was going to do this, but I was not astute enough in merge and release methodology to have caught the problem. My fault as much as Justin's. But its clear to me that: - the twiceEach bug was introduced long before this merge - its good that we have the 0.92 branch now restored to its proper state - we should do something like an 0.91.2 (or .1) release containing the twiceEach fix - we should document our release methods and start (resume?) using tags as well - we need to add a test case to the test suite for this bug, and keep growing the test suite with both functional and regression tests. > Apart from that I will figure out what exactly the bug was due to and > try to fix it. Great - thanks! - Mike From hategan at mcs.anl.gov Sat Apr 2 14:14:13 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 02 Apr 2011 12:14:13 -0700 Subject: [Swift-devel] Re: Important: Please confirm evidence on twiceEach() bug In-Reply-To: <908171605.57992.1301771224770.JavaMail.root@zimbra.anl.gov> References: <908171605.57992.1301771224770.JavaMail.root@zimbra.anl.gov> Message-ID: <1301771653.21635.1.camel@blabla2.none> On Sat, 2011-04-02 at 14:07 -0500, Michael Wilde wrote: > > It seemed reasonable to suspect that the merge introduced it, but > > I > > didn't test that to confirm it. I will now. > > This is what I am questioning: it seems clear that this but *could not > possibly* have come form Justin's recent (wrong-direction) merge in > prep for 0.93. Oh, it became clear in the mean time that it is not true. But at the time, it seemed likely. From wilde at mcs.anl.gov Sat Apr 2 15:34:55 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 2 Apr 2011 15:34:55 -0500 (CDT) Subject: [Swift-devel] Swift Documentation Platform In-Reply-To: Message-ID: <955316759.58065.1301776495245.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > Building this list is a good start. Where should we store it? Same web as the ReleasePlan: https://sites.google.com/site/swiftdevel/release-plans which is where we should move: https://sites.google.com/site/swiftparallelscripting/swift-docs-todos > > You could also add the other CoG and Karajan docs. > > As of Wednesday we decided we are giving up on Google Sites for user > documentation, but we are going to continue with the swift-devel site > for > internal (but publically-readable) notes. This sounds good. If any pages there need to be private to the group, they can. Maybe one more sites/ web: swftgroup for internal group matters unrelated to swift. > We're thinking of putting at least a month into asciidoc. Once we have > that in place, we should be able to easily paste things into there > from > any source. With a few guidelines, we can try editing a few of the doc items in transit from the cookbook into asciidoc. We dont need the whole doc toolchain working: just enough to generate eg an html page or pdf. Is that something we can do soon? We'd need to start and maintain a content writing guide and some references to asciidoc. Then we need a way to separate the "core web" from "per release" documents. - Mike - Mike > > Justin > > On Fri, 1 Apr 2011, Ketan Maheshwari wrote: > > > Hi, > > > > While we close in on a decision, I am writing this as an observation > > to > > the ongoing discussion about Swift Documentation Platform. And some > > thoughts in the end. > > > > As of now we have a Main, existing Swift Documentation page: > > > > [1] http://www.ci.uchicago.edu/swift/docs/index.php > > > > In addition, we have pages on CI wiki related to Swift > > documentations: > > > > > > [2] http://www.ci.uchicago.edu/wiki/bin/view/SWFT/WebHome > > > > The above link contains many useful but semi-complete/unrounded > > pages on > > cookbooks, tutorials and may technical notes. > > > > > > We have a page on cog wiki dedicated to Coasters: > > > > [3] http://wiki.cogkit.org/wiki/Coasters > > > > The pictures are very neat but slightly outdated and needs update. > > More > > pictures would also be required in my opinion to explain several of > > the > > relatively new coasters concepts. > > > > > > We have google sites whose contents overlap with [1] , likely that > > it is > > completely redundant to [1] > > > > [4] https://sites.google.com/site/swiftguide/home > > > > > > A wealth of information about Swift techniques, examples, issues, > > notes > > and ideas are living on the following: [5] > > https://bugzilla.mcs.anl.gov/swift/ > > > > [6] mail.ci.uchicago.edu/pipermail/swift-user > > > > [7] http://mail.ci.uchicago.edu/pipermail/swift-devel/ > > > > [8] Mike's Swift Notes (As an attached doc) > > -- > Justin M Wozniak > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Sat Apr 2 16:38:39 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 2 Apr 2011 16:38:39 -0500 (CDT) Subject: [Swift-devel] [Bug 315] New: better diagnostics to identify unmapped paths Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=315 Summary: better diagnostics to identify unmapped paths Product: Swift Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: aespinosa at cs.uchicago.edu This works $ cat seq.in 1 2 3 4 $ cat array_incomplete.swift type file; app(file o[]) split(file i, int j){ split "-l" 1 @filename(i) @strcat(j, "seqout."); } file input <"seq.in">; foreach j in [0:1] { string foo; if (j == 0) { foo = "0seqout.aa, 0seqout.ab, 0seqout.ac, 0seqout.ad"; } if (j == 1) { foo = "1seqout.aa, 1seqout.ab, 1seqout.ac, 1seqout.ad"; } file out[] ; if (j < 1) { out = split(input, j); // line 21 } else { out = split(input, j); // line 23 } } $$ ~/swift/swift-0.92/bin/swift -tc.file tc.data array_incomplete.swift Swift svn swift-r4157 cog-r3056 RunID: 20110402-1619-dn042ure Progress: Progress: Stage in:1 Finished successfully:1 Final status: Finished successfully:2 But if I comment out line 21, we get this error message: ~/swift/swift-0.92/bin/swift -tc.file tc.data array_incomplete.swift Swift svn swift-r4157 cog-r3056 RunID: 20110402-1621-k63e22s2 Progress: Execution failed: mapper.existing() returned a path [0] that it cannot subsequently map The swift log file contains this corresponding entry: 011-04-02 16:21:13,887-0500 DEBUG VDL2ExecutionContext vdl:setfieldvalue @ array_incomplete.kml, line: 155: java.lang.IllegalStateException: mapper.existing() returned a path [0] that it cannot subsequently map Checking the kml file we have this call to setFieldValue(): array_incomplete.kml:155 foo swift#string#17003 Mike and I concluded that the reason for the error was that out[] is already recognized as an output dataset in line 23. But for j=0, out[] as an output file was not produced and will result in unmapped paths. Is there a better way to diagnose this problem off the bat? Or is this the expected behavior? Can out[] be an input file (non-intermediate) for j=0 and output file for j=1? -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From aespinosa at cs.uchicago.edu Sat Apr 2 16:41:15 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Sat, 2 Apr 2011 16:41:15 -0500 Subject: [Swift-devel] Fwd: [Swift-user] determining unmapped paths In-Reply-To: <655120943.25600.1301012918101.JavaMail.root@zimbra.anl.gov> References: <655120943.25600.1301012918101.JavaMail.root@zimbra.anl.gov> Message-ID: Filed as bug 315: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=315 2011/3/24 Michael Wilde : > Sarah, this is a perfect example of a messaging deficiency to fix. Can you add to bugzilla? > > Thanks, > > Mike > > > ----- Forwarded Message ----- > From: "Allan Espinosa" > To: "Swift-User" > Sent: Thursday, March 24, 2011 4:28:51 PM > Subject: [Swift-user] determining unmapped paths > > I'm trying figure out where in my workflow is causing this problem: > > 2011-03-24 16:23:50,485-0500 WARN ?FlowNode Ex098 > java.lang.IllegalStateException: mapper.existing() returned a path [3] > that it cannot subsequently map > ? ? ? ?at org.griphyn.vdl.mapping.RootDataNode.checkInputs(RootDataNode.java:129) > ? ? ? ?at org.griphyn.vdl.mapping.RootArrayDataNode.checkInputs(RootArrayDataNode.java:67) > ? ? ? ?at org.griphyn.vdl.mapping.RootArrayDataNode.innerInit(RootArrayDataNode.java:53) > ? ? ? ?at org.griphyn.vdl.mapping.RootArrayDataNode.handleClosed(RootArrayDataNode.java:80) > ? ? ? ?at org.griphyn.vdl.mapping.AbstractDataNode.notifyListeners(AbstractDataNode.java:583) > ? ? ? ?at org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:396) > ? ? ? ?at org.griphyn.vdl.mapping.ArrayDataNode.closeDeep(ArrayDataNode.java:51) > ? ? ? ?at org.griphyn.vdl.karajan.lib.PartialCloseDataset.function(PartialCloseDataset.java:79) > ? ? ? ?at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) > ? ? ? ?at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > ? ? ? ?at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > ? ? ? ?at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > ? ? ? ?at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > ? ? ? ?at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > ? ? ? ?at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > ? ? ? ?at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > ? ? ? ?at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > ? ? ? ?at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > ? ? ? ?at java.util.concurrent.FutureTask.run(FutureTask.java:138) > ? ? ? ?at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > ? ? ? ?at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > > It doesn't specify which data object it crashes on so i'm quite > clueless at this point. > > I'm using the latest trunk > > any particular log4j class i should be enabling to debug? > > > Thanks, > -Allan > > -- -- Allan M. Espinosa PhD student, Computer Science University of Chicago From bugzilla-daemon at mcs.anl.gov Sat Apr 2 17:15:33 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 2 Apr 2011 17:15:33 -0500 (CDT) Subject: [Swift-devel] [Bug 307] array slicing In-Reply-To: References: Message-ID: <20110402221533.A80AC1C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=307 Allan Espinosa changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |swift-devel at ci.uchicago.edu Resolution| |WORKSFORME --- Comment #1 from Allan Espinosa 2011-04-02 17:15:33 --- Proposed workaround: Use sparse arrays: Example usecase: we want to send 2 elements to each agg_job(); type file; app agg_job(file in[]) { echo @filenames(in); } file foo[] ; file sub1[] ; // elements [0:1] file sub2[] ; // elements [2:3] foreach i in [0:1]{ sub1[i] = foo[i]; } foreach i in [2:3]{ sub2[i] = foo[i]; } agg_job(sub1); agg_job(sub2) -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are watching the assignee of the bug. From bugzilla-daemon at mcs.anl.gov Sat Apr 2 18:18:25 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 2 Apr 2011 18:18:25 -0500 (CDT) Subject: [Swift-devel] [Bug 315] better diagnostics to identify unmapped paths In-Reply-To: References: Message-ID: <20110402231825.BAB9D2BFB3@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=315 Michael Wilde changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED CC| |wilde at mcs.anl.gov --- Comment #1 from Michael Wilde 2011-04-02 18:18:25 --- First, to note: this test fails in 0.92 and works in 0.91. Even stranger, in 0.92 the test *succeeds* in about 1 out of 20 tries, which suggests there is some kind of race condition here. I think we can also simplify the test case further, and retry. Doing that now. ... Further experimentation suggests this has something to do with the way conditionals and local array variables nested inside a foreach loop are handled. Depending on how much of the statements inside the loop are commented out, the code either works or fails in 0.92. So we should wait until we have a 0.92 point release with twiceEach fixed, and then re-test. If it still fails, I (or Allan or Mihael) should try a few variations and make sure all logically correct ones work, and that incorrect ones get reasonable error messages. Lastly, the original intent for this bug was to also generate a better error message stating which object Swift is having trouble mapping. Also note: you need the following tc entry in addition to Allan's test script below: localhost split /usr/bin/split INSTALLED INTEL32::LINUX null --- >From a larger perspective: This bug and the test cases for it raise a very confusing aspect of Swift semantics: how mapping of output arrays affects Swift's notion of the array size. We need to document the rules that govern Swift's behavior in these cases and cleanly define how this relates to both size and array closing. We need to decide if Swift *should* be giving a runtime error in the case Allan shows, and what that error should be. My understanding of this example is the following: - we have an output array of files out[] - we map that array to N file names - we call a function f() to create the array - the app function uses filenames(out) to place the filenames on the command line Now, when will @filenames() be runnable? (wrt futures and array closing)? Can this array grow later? Does the app() need to create all filenames mapped in the out() array? These seem to be subtle issues that need to be documented, with clarifying, runnable test examples. We can split this doc test out to a separate enh bg after the primary problem is fixed. I realized when trying to simplify the test case that the documentation points above are not so closely related to the actual problem here as I initially thought. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug. From hategan at mcs.anl.gov Sat Apr 2 20:37:04 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 02 Apr 2011 18:37:04 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301597554.1319.5.camel@blabla2.none> References: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> <1301541697.23803.9.camel@blabla2.none> <1301596477.1319.0.camel@blabla2.none> <1301597554.1319.5.camel@blabla2.none> Message-ID: <1301794624.12893.1.camel@blabla2.none> On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > We decided the following: > - I will revert the changes in the 0.92 branch done > - re-commit bug fixes that were committed after the merge done > - merge the 0.92 branch to trunk done > - fix the problems in trunk and done except the problem was in the branch. I think it was a manual merge of mine gone wrong. Trunk should be clean. We should make a patch release. Mihael From wilde at mcs.anl.gov Sun Apr 3 13:16:11 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 3 Apr 2011 13:16:11 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301794624.12893.1.camel@blabla2.none> Message-ID: <938578309.58852.1301854571833.JavaMail.root@zimbra.anl.gov> Nice - thank you, Mihael! Can you make the branch next? Call it 0.92.1 or .2, whatever you think is best practice (I favor 0.92.1 but have no strong feeling). - Mike ----- Original Message ----- > On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > > We decided the following: > > - I will revert the changes in the 0.92 branch > > done > > > - re-commit bug fixes that were committed after the merge > > done > > > - merge the 0.92 branch to trunk > > done > > > - fix the problems in trunk > > and done except the problem was in the branch. I think it was a manual > merge of mine gone wrong. Trunk should be clean. > > We should make a patch release. > > Mihael > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Sun Apr 3 13:19:47 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 03 Apr 2011 11:19:47 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <938578309.58852.1301854571833.JavaMail.root@zimbra.anl.gov> References: <938578309.58852.1301854571833.JavaMail.root@zimbra.anl.gov> Message-ID: <1301854787.25788.0.camel@blabla2.none> That would be a tag and we'd do it after we test the current code in the branch a bit. On Sun, 2011-04-03 at 13:16 -0500, Michael Wilde wrote: > Nice - thank you, Mihael! Can you make the branch next? Call it 0.92.1 or .2, whatever you think is best practice (I favor 0.92.1 but have no strong feeling). > > - Mike > > ----- Original Message ----- > > On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > > > We decided the following: > > > - I will revert the changes in the 0.92 branch > > > > done > > > > > - re-commit bug fixes that were committed after the merge > > > > done > > > > > - merge the 0.92 branch to trunk > > > > done > > > > > - fix the problems in trunk > > > > and done except the problem was in the branch. I think it was a manual > > merge of mine gone wrong. Trunk should be clean. > > > > We should make a patch release. > > > > Mihael > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From bugzilla-daemon at mcs.anl.gov Sun Apr 3 16:23:47 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 3 Apr 2011 16:23:47 -0500 (CDT) Subject: [Swift-devel] [Bug 291] Add a exists() function to test for file existence In-Reply-To: References: Message-ID: <20110403212347.3F19A2DE32@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=291 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #1 from Justin Wozniak 2011-04-03 16:23:46 --- Implemented. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Sun Apr 3 16:48:14 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 3 Apr 2011 16:48:14 -0500 (CDT) Subject: [Swift-devel] [Bug 261] update.sh script (for pushing web content live) gives errors In-Reply-To: References: Message-ID: <20110403214814.1FF4A2DF27@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=261 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |benc at hawaga.org.uk, | |wilde at mcs.anl.gov --- Comment #2 from Justin Wozniak 2011-04-03 16:48:13 --- Working on this now. I will fix the group perms that I can. Seems that some files in here are owned by benc. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Sun Apr 3 16:50:17 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 3 Apr 2011 16:50:17 -0500 (CDT) Subject: [Swift-devel] [Bug 313] update.sh script to push Swift web contents to live site gives lengthy errors In-Reply-To: References: Message-ID: <20110403215017.921822DF3C@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=313 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #1 from Justin Wozniak 2011-04-03 16:50:17 --- Ok, Mike, try again. I think in the future I will have to set my umask to a more permissive setting when working here. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wozniak at mcs.anl.gov Sun Apr 3 16:56:54 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sun, 3 Apr 2011 16:56:54 -0500 (Central Daylight Time) Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: References: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> <2A5A416A-CAB6-4D82-B55F-0CFBC8F3B770@ucar.edu> Message-ID: Ok, John, you can give this a try in trunk if you like. The syntax for the built-in is @exists, it takes a string and returns a boolean. Justin On Sat, 2 Apr 2011, Justin M Wozniak wrote: > > I have a prototype of this, I'll get it checked in later today. > Justin > > On Fri, 1 Apr 2011, John Dennis wrote: > >> Michael, >> >> This type of function would be great to have. >> >> John >> On Apr 1, 2011, at 11:51 AM, Michael Wilde wrote: >> >>> Basically as far as I understand: the presence or absence of a particular >>> data file within the inout dataset is to be used to determine whether the >>> code to process that dataset subsection gets invoked or not: >>> >>> if (exists("extra.data")) { >>> DataFile extraInput<"extra.data">; >>> extraResult = analyze(extraInput); >>> } >>> >>> The above is my assumption based on a phone call. We can and should >>> verify the assumption with a simple example. >>> >>> I also thought we can try this today by seeing if extraInput can be an >>> array, mapped to zero items if nothing to do and 1 item if something to >>> do. That would at least let us test the use case. >>> >>> John, can you verify if the example Swift lines above are what you are >>> looking for here? >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: >>>> >>>>> - we should first verify that exists() will solve the NCAR need in a >>>>> sufficiently clean way >>>> >>>> I think this is important. Can we get a description of the problem >>>> instead of a (otherwise) random proposal for a solution? >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Justin M Wozniak From jon.monette at gmail.com Sun Apr 3 19:14:32 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 3 Apr 2011 19:14:32 -0500 Subject: [Swift-devel] compile error Message-ID: I just tried compiling swift-r4251 and got the following compile errror: compile: [echo] [swift]: COMPILE [mkdir] Created dir: /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build [javac] Compiling 377 source files to /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build [javac] /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:578: cannot find symbol [javac] symbol : method getAnyNumOdInputArgs() [javac] location: class org.griphyn.vdl.engine.ProcedureSignature [javac] if (!proc.getAnyNumOdInputArgs() && (call.sizeOfInputArray() < proc.sizeOfInputArray() - noOfOptInArgs || [javac] ^ [javac] /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:587: cannot find symbol [javac] symbol : method getAnyNumOdOutputArgs() [javac] location: class org.griphyn.vdl.engine.ProcedureSignature [javac] if (!proc.getAnyNumOdOutputArgs() && (call.sizeOfOutputArray() != proc.sizeOfOutputArray())) [javac] ^ [javac] /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:599: cannot find symbol [javac] symbol : method getAnyNumOdInputArgs() [javac] location: class org.griphyn.vdl.engine.ProcedureSignature [javac] if (proc.getAnyNumOdInputArgs()) { [javac] ^ [javac] /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:626: cannot find symbol [javac] symbol : method getAnyNumOdInputArgs() [javac] location: class org.griphyn.vdl.engine.ProcedureSignature [javac] if (!proc.getAnyNumOdInputArgs() && noOfMandArgs < proc.sizeOfInputArray() - noOfOptInArgs) [javac] ^ [javac] /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:667: cannot find symbol [javac] symbol : method getAnyNumOdOutputArgs() [javac] location: class org.griphyn.vdl.engine.ProcedureSignature [javac] if (proc.getAnyNumOdOutputArgs()) { [javac] ^ [javac] /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:955: cannot find symbol [javac] symbol : method getAnyNumOdInputArgs() [javac] location: class org.griphyn.vdl.engine.ProcedureSignature [javac] if (!funcSignature.getAnyNumOdInputArgs() && [javac] ^ [javac] /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:966: cannot find symbol [javac] symbol : method getAnyNumOdInputArgs() [javac] location: class org.griphyn.vdl.engine.ProcedureSignature [javac] if (!funcSignature.getAnyNumOdInputArgs()) { [javac] ^ [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 7 errors BUILD FAILED /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:73: The following error occurred while executing this line: /autonfs/home/jonmon/Library/Swift/trunk/cog/mbuild.xml:465: The following error occurred while executing this line: /autonfs/home/jonmon/Library/Swift/trunk/cog/mbuild.xml:228: Compile failed; see the compiler error output for details. I am using javac version 1.6.0_22 -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Sun Apr 3 21:54:36 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sun, 3 Apr 2011 21:54:36 -0500 (Central Daylight Time) Subject: [Swift-devel] compile error In-Reply-To: References: Message-ID: Oops, please try again at r4251. On Sun, 3 Apr 2011, Jonathan Monette wrote: > I just tried compiling swift-r4251 and got the following compile errror: > compile: > [echo] [swift]: COMPILE > [mkdir] Created dir: > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build > [javac] Compiling 377 source files to > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build > [javac] > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:578: > cannot find symbol > [javac] symbol : method getAnyNumOdInputArgs() > [javac] location: class org.griphyn.vdl.engine.ProcedureSignature > [javac] if (!proc.getAnyNumOdInputArgs() && (call.sizeOfInputArray() < > proc.sizeOfInputArray() - noOfOptInArgs || > [javac] ^ > [javac] > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:587: > cannot find symbol > [javac] symbol : method getAnyNumOdOutputArgs() > [javac] location: class org.griphyn.vdl.engine.ProcedureSignature > [javac] if (!proc.getAnyNumOdOutputArgs() && (call.sizeOfOutputArray() > != proc.sizeOfOutputArray())) > [javac] ^ > [javac] > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:599: > cannot find symbol > [javac] symbol : method getAnyNumOdInputArgs() > [javac] location: class org.griphyn.vdl.engine.ProcedureSignature > [javac] if (proc.getAnyNumOdInputArgs()) { > [javac] ^ > [javac] > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:626: > cannot find symbol > [javac] symbol : method getAnyNumOdInputArgs() > [javac] location: class org.griphyn.vdl.engine.ProcedureSignature > [javac] if (!proc.getAnyNumOdInputArgs() && noOfMandArgs < > proc.sizeOfInputArray() - noOfOptInArgs) > [javac] ^ > [javac] > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:667: > cannot find symbol > [javac] symbol : method getAnyNumOdOutputArgs() > [javac] location: class org.griphyn.vdl.engine.ProcedureSignature > [javac] if (proc.getAnyNumOdOutputArgs()) { > [javac] ^ > [javac] > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:955: > cannot find symbol > [javac] symbol : method getAnyNumOdInputArgs() > [javac] location: class org.griphyn.vdl.engine.ProcedureSignature > [javac] if (!funcSignature.getAnyNumOdInputArgs() && > [javac] ^ > [javac] > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:966: > cannot find symbol > [javac] symbol : method getAnyNumOdInputArgs() > [javac] location: class org.griphyn.vdl.engine.ProcedureSignature > [javac] if (!funcSignature.getAnyNumOdInputArgs()) { > [javac] ^ > [javac] Note: Some input files use unchecked or unsafe operations. > [javac] Note: Recompile with -Xlint:unchecked for details. > [javac] 7 errors > > BUILD FAILED > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:73: The > following error occurred while executing this line: > /autonfs/home/jonmon/Library/Swift/trunk/cog/mbuild.xml:465: The following > error occurred while executing this line: > /autonfs/home/jonmon/Library/Swift/trunk/cog/mbuild.xml:228: Compile failed; > see the compiler error output for details. > > I am using javac version 1.6.0_22 > > -- Justin M Wozniak From jon.monette at gmail.com Sun Apr 3 22:02:02 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 3 Apr 2011 22:02:02 -0500 Subject: [Swift-devel] compile error In-Reply-To: References: Message-ID: That fixed the compile error. On Sun, Apr 3, 2011 at 9:54 PM, Justin M Wozniak wrote: > > Oops, please try again at r4251. > > > On Sun, 3 Apr 2011, Jonathan Monette wrote: > > I just tried compiling swift-r4251 and got the following compile errror: >> compile: >> [echo] [swift]: COMPILE >> [mkdir] Created dir: >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build >> [javac] Compiling 377 source files to >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build >> [javac] >> >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:578: >> cannot find symbol >> [javac] symbol : method getAnyNumOdInputArgs() >> [javac] location: class org.griphyn.vdl.engine.ProcedureSignature >> [javac] if (!proc.getAnyNumOdInputArgs() && (call.sizeOfInputArray() < >> proc.sizeOfInputArray() - noOfOptInArgs || >> [javac] ^ >> [javac] >> >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:587: >> cannot find symbol >> [javac] symbol : method getAnyNumOdOutputArgs() >> [javac] location: class org.griphyn.vdl.engine.ProcedureSignature >> [javac] if (!proc.getAnyNumOdOutputArgs() && (call.sizeOfOutputArray() >> != proc.sizeOfOutputArray())) >> [javac] ^ >> [javac] >> >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:599: >> cannot find symbol >> [javac] symbol : method getAnyNumOdInputArgs() >> [javac] location: class org.griphyn.vdl.engine.ProcedureSignature >> [javac] if (proc.getAnyNumOdInputArgs()) { >> [javac] ^ >> [javac] >> >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:626: >> cannot find symbol >> [javac] symbol : method getAnyNumOdInputArgs() >> [javac] location: class org.griphyn.vdl.engine.ProcedureSignature >> [javac] if (!proc.getAnyNumOdInputArgs() && noOfMandArgs < >> proc.sizeOfInputArray() - noOfOptInArgs) >> [javac] ^ >> [javac] >> >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:667: >> cannot find symbol >> [javac] symbol : method getAnyNumOdOutputArgs() >> [javac] location: class org.griphyn.vdl.engine.ProcedureSignature >> [javac] if (proc.getAnyNumOdOutputArgs()) { >> [javac] ^ >> [javac] >> >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:955: >> cannot find symbol >> [javac] symbol : method getAnyNumOdInputArgs() >> [javac] location: class org.griphyn.vdl.engine.ProcedureSignature >> [javac] if (!funcSignature.getAnyNumOdInputArgs() && >> [javac] ^ >> [javac] >> >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/src/org/griphyn/vdl/engine/Karajan.java:966: >> cannot find symbol >> [javac] symbol : method getAnyNumOdInputArgs() >> [javac] location: class org.griphyn.vdl.engine.ProcedureSignature >> [javac] if (!funcSignature.getAnyNumOdInputArgs()) { >> [javac] ^ >> [javac] Note: Some input files use unchecked or unsafe operations. >> [javac] Note: Recompile with -Xlint:unchecked for details. >> [javac] 7 errors >> >> BUILD FAILED >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:73: >> The >> following error occurred while executing this line: >> /autonfs/home/jonmon/Library/Swift/trunk/cog/mbuild.xml:465: The following >> error occurred while executing this line: >> /autonfs/home/jonmon/Library/Swift/trunk/cog/mbuild.xml:228: Compile >> failed; >> see the compiler error output for details. >> >> I am using javac version 1.6.0_22 >> >> >> > -- > Justin M Wozniak > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sun Apr 3 23:21:38 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 3 Apr 2011 23:21:38 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301854787.25788.0.camel@blabla2.none> Message-ID: <1959342990.59362.1301890898256.JavaMail.root@zimbra.anl.gov> Ive tried the failing examples on a fresh checkout of the 0.92 branch, and that works now - excellent. Allan and Jon, can you try your scripts on this latest 0.92 from svn? Ive also re-built the 0.92 version with Cray support. That seems to work as well. When I updated my 0.92 branch I jumped to: swift-r4252, cog-r3088, which I assume is largely from un-doing the inadvertent post-0.92 merge from trunk? - Mike ----- Original Message ----- > That would be a tag and we'd do it after we test the current code in > the > branch a bit. > > On Sun, 2011-04-03 at 13:16 -0500, Michael Wilde wrote: > > Nice - thank you, Mihael! Can you make the branch next? Call it > > 0.92.1 or .2, whatever you think is best practice (I favor 0.92.1 > > but have no strong feeling). > > > > - Mike > > > > ----- Original Message ----- > > > On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > > > > We decided the following: > > > > - I will revert the changes in the 0.92 branch > > > > > > done > > > > > > > - re-commit bug fixes that were committed after the merge > > > > > > done > > > > > > > - merge the 0.92 branch to trunk > > > > > > done > > > > > > > - fix the problems in trunk > > > > > > and done except the problem was in the branch. I think it was a > > > manual > > > merge of mine gone wrong. Trunk should be clean. > > > > > > We should make a patch release. > > > > > > Mihael > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Sun Apr 3 23:25:15 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 3 Apr 2011 23:25:15 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1959342990.59362.1301890898256.JavaMail.root@zimbra.anl.gov> References: <1301854787.25788.0.camel@blabla2.none> <1959342990.59362.1301890898256.JavaMail.root@zimbra.anl.gov> Message-ID: Sure. Is it all in 0.92 or is it some branch like 0.92.1? On Sun, Apr 3, 2011 at 11:21 PM, Michael Wilde wrote: > Ive tried the failing examples on a fresh checkout of the 0.92 branch, and > that works now - excellent. > > Allan and Jon, can you try your scripts on this latest 0.92 from svn? > > Ive also re-built the 0.92 version with Cray support. That seems to work as > well. > > When I updated my 0.92 branch I jumped to: swift-r4252, cog-r3088, which I > assume is largely from un-doing the inadvertent post-0.92 merge from trunk? > > - Mike > > ----- Original Message ----- > > That would be a tag and we'd do it after we test the current code in > > the > > branch a bit. > > > > On Sun, 2011-04-03 at 13:16 -0500, Michael Wilde wrote: > > > Nice - thank you, Mihael! Can you make the branch next? Call it > > > 0.92.1 or .2, whatever you think is best practice (I favor 0.92.1 > > > but have no strong feeling). > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > > > > > We decided the following: > > > > > - I will revert the changes in the 0.92 branch > > > > > > > > done > > > > > > > > > - re-commit bug fixes that were committed after the merge > > > > > > > > done > > > > > > > > > - merge the 0.92 branch to trunk > > > > > > > > done > > > > > > > > > - fix the problems in trunk > > > > > > > > and done except the problem was in the branch. I think it was a > > > > manual > > > > merge of mine gone wrong. Trunk should be clean. > > > > > > > > We should make a patch release. > > > > > > > > Mihael > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sun Apr 3 23:28:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 3 Apr 2011 23:28:46 -0500 (CDT) Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: Message-ID: <38660267.59366.1301891326199.JavaMail.root@zimbra.anl.gov> Its in the 0.92 SVN source branches (of both CoG and Swift). Once we test and tag it, it will get released as 0.92.1 - Mike ----- Original Message ----- > Sure. Is it all in 0.92 or is it some branch like 0.92.1? > > > On Sun, Apr 3, 2011 at 11:21 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Ive tried the failing examples on a fresh checkout of the 0.92 branch, > and that works now - excellent. > > Allan and Jon, can you try your scripts on this latest 0.92 from svn? > > Ive also re-built the 0.92 version with Cray support. That seems to > work as well. > > When I updated my 0.92 branch I jumped to: swift-r4252, cog-r3088, > which I assume is largely from un-doing the inadvertent post-0.92 > merge from trunk? > > - Mike > > > > > ----- Original Message ----- > > That would be a tag and we'd do it after we test the current code in > > the > > branch a bit. > > > > On Sun, 2011-04-03 at 13:16 -0500, Michael Wilde wrote: > > > Nice - thank you, Mihael! Can you make the branch next? Call it > > > 0.92.1 or .2, whatever you think is best practice (I favor 0.92.1 > > > but have no strong feeling). > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > > > > > We decided the following: > > > > > - I will revert the changes in the 0.92 branch > > > > > > > > done > > > > > > > > > - re-commit bug fixes that were committed after the merge > > > > > > > > done > > > > > > > > > - merge the 0.92 branch to trunk > > > > > > > > done > > > > > > > > > - fix the problems in trunk > > > > > > > > and done except the problem was in the branch. I think it was a > > > > manual > > > > merge of mine gone wrong. Trunk should be clean. > > > > > > > > We should make a patch release. > > > > > > > > Mihael > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > > > > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Sun Apr 3 23:29:21 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 3 Apr 2011 23:29:21 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <38660267.59366.1301891326199.JavaMail.root@zimbra.anl.gov> References: <38660267.59366.1301891326199.JavaMail.root@zimbra.anl.gov> Message-ID: Ok. I will give it a try On Sun, Apr 3, 2011 at 11:28 PM, Michael Wilde wrote: > Its in the 0.92 SVN source branches (of both CoG and Swift). Once we test > and tag it, it will get released as 0.92.1 > > - Mike > > ----- Original Message ----- > > Sure. Is it all in 0.92 or is it some branch like 0.92.1? > > > > > > On Sun, Apr 3, 2011 at 11:21 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Ive tried the failing examples on a fresh checkout of the 0.92 branch, > > and that works now - excellent. > > > > Allan and Jon, can you try your scripts on this latest 0.92 from svn? > > > > Ive also re-built the 0.92 version with Cray support. That seems to > > work as well. > > > > When I updated my 0.92 branch I jumped to: swift-r4252, cog-r3088, > > which I assume is largely from un-doing the inadvertent post-0.92 > > merge from trunk? > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > That would be a tag and we'd do it after we test the current code in > > > the > > > branch a bit. > > > > > > On Sun, 2011-04-03 at 13:16 -0500, Michael Wilde wrote: > > > > Nice - thank you, Mihael! Can you make the branch next? Call it > > > > 0.92.1 or .2, whatever you think is best practice (I favor 0.92.1 > > > > but have no strong feeling). > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > > > > > > We decided the following: > > > > > > - I will revert the changes in the 0.92 branch > > > > > > > > > > done > > > > > > > > > > > - re-commit bug fixes that were committed after the merge > > > > > > > > > > done > > > > > > > > > > > - merge the 0.92 branch to trunk > > > > > > > > > > done > > > > > > > > > > > - fix the problems in trunk > > > > > > > > > > and done except the problem was in the branch. I think it was a > > > > > manual > > > > > merge of mine gone wrong. Trunk should be clean. > > > > > > > > > > We should make a patch release. > > > > > > > > > > Mihael > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > > > > > > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Apr 3 23:53:57 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 03 Apr 2011 21:53:57 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1959342990.59362.1301890898256.JavaMail.root@zimbra.anl.gov> References: <1959342990.59362.1301890898256.JavaMail.root@zimbra.anl.gov> Message-ID: <1301892837.19399.0.camel@blabla2.none> On Sun, 2011-04-03 at 23:21 -0500, Michael Wilde wrote: > Ive tried the failing examples on a fresh checkout of the 0.92 branch, and that works now - excellent. > > Allan and Jon, can you try your scripts on this latest 0.92 from svn? > > Ive also re-built the 0.92 version with Cray support. That seems to work as well. > > When I updated my 0.92 branch I jumped to: swift-r4252, cog-r3088, > which I assume is largely from un-doing the inadvertent post-0.92 > merge from trunk? Yes. Svn revisions are monotonic in time. > > - Mike > > ----- Original Message ----- > > That would be a tag and we'd do it after we test the current code in > > the > > branch a bit. > > > > On Sun, 2011-04-03 at 13:16 -0500, Michael Wilde wrote: > > > Nice - thank you, Mihael! Can you make the branch next? Call it > > > 0.92.1 or .2, whatever you think is best practice (I favor 0.92.1 > > > but have no strong feeling). > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > > > > > We decided the following: > > > > > - I will revert the changes in the 0.92 branch > > > > > > > > done > > > > > > > > > - re-commit bug fixes that were committed after the merge > > > > > > > > done > > > > > > > > > - merge the 0.92 branch to trunk > > > > > > > > done > > > > > > > > > - fix the problems in trunk > > > > > > > > and done except the problem was in the branch. I think it was a > > > > manual > > > > merge of mine gone wrong. Trunk should be clean. > > > > > > > > We should make a patch release. > > > > > > > > Mihael > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From jon.monette at gmail.com Mon Apr 4 00:04:39 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 4 Apr 2011 00:04:39 -0500 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301892837.19399.0.camel@blabla2.none> References: <1959342990.59362.1301890898256.JavaMail.root@zimbra.anl.gov> <1301892837.19399.0.camel@blabla2.none> Message-ID: My small workflow finished without error and was faster than what trunk was doing(maybe had something to do with the twice foreach loop). Running my largest workflow to see what happens. But seems to be working fine. On Sun, Apr 3, 2011 at 11:53 PM, Mihael Hategan wrote: > On Sun, 2011-04-03 at 23:21 -0500, Michael Wilde wrote: > > Ive tried the failing examples on a fresh checkout of the 0.92 branch, > and that works now - excellent. > > > > Allan and Jon, can you try your scripts on this latest 0.92 from svn? > > > > Ive also re-built the 0.92 version with Cray support. That seems to work > as well. > > > > When I updated my 0.92 branch I jumped to: swift-r4252, cog-r3088, > > which I assume is largely from un-doing the inadvertent post-0.92 > > merge from trunk? > > Yes. Svn revisions are monotonic in time. > > > > > - Mike > > > > ----- Original Message ----- > > > That would be a tag and we'd do it after we test the current code in > > > the > > > branch a bit. > > > > > > On Sun, 2011-04-03 at 13:16 -0500, Michael Wilde wrote: > > > > Nice - thank you, Mihael! Can you make the branch next? Call it > > > > 0.92.1 or .2, whatever you think is best practice (I favor 0.92.1 > > > > but have no strong feeling). > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > > > > > > We decided the following: > > > > > > - I will revert the changes in the 0.92 branch > > > > > > > > > > done > > > > > > > > > > > - re-commit bug fixes that were committed after the merge > > > > > > > > > > done > > > > > > > > > > > - merge the 0.92 branch to trunk > > > > > > > > > > done > > > > > > > > > > > - fix the problems in trunk > > > > > > > > > > and done except the problem was in the branch. I think it was a > > > > > manual > > > > > merge of mine gone wrong. Trunk should be clean. > > > > > > > > > > We should make a patch release. > > > > > > > > > > Mihael > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Apr 4 13:51:37 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 4 Apr 2011 13:51:37 -0500 (CDT) Subject: [Swift-devel] prelim instructions for asciidoc? Message-ID: <1519782816.62044.1301943097680.JavaMail.root@zimbra.anl.gov> Justin, do you have any preliminary instructions for running asciidoc in some simple standalone fashion so people can start to try it? Not a priority, but if you have prelim tools set up or even identified, we can get a jump start on writing, conversion and gathering of material. Im thinking that a first step can be grabbing cookbook items or email fragments and getting their text into SVN into a doc/inprogress dir while we coerce each small section into asciidoc. - Mike From wozniak at mcs.anl.gov Mon Apr 4 14:19:23 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 4 Apr 2011 14:19:23 -0500 (CDT) Subject: [Swift-devel] Re: prelim instructions for asciidoc? In-Reply-To: <1519782816.62044.1301943097680.JavaMail.root@zimbra.anl.gov> References: <1519782816.62044.1301943097680.JavaMail.root@zimbra.anl.gov> Message-ID: My initial asciidoc notes are in: https://svn.ci.uchicago.edu/svn/vdl2/test/asciidoc-manual The output is up at: http://www.mcs.anl.gov/~wozniak/manual.html Justin On Mon, 4 Apr 2011, Michael Wilde wrote: > Justin, do you have any preliminary instructions for running asciidoc in > some simple standalone fashion so people can start to try it? > > Not a priority, but if you have prelim tools set up or even identified, > we can get a jump start on writing, conversion and gathering of > material. > > Im thinking that a first step can be grabbing cookbook items or email > fragments and getting their text into SVN into a doc/inprogress dir > while we coerce each small section into asciidoc. > > - Mike -- Justin M Wozniak From wilde at mcs.anl.gov Mon Apr 4 15:06:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 4 Apr 2011 15:06:46 -0500 (CDT) Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: Message-ID: <1249515087.62581.1301947606402.JavaMail.root@zimbra.anl.gov> Sounds good David. Ketan agreed he would test on Beagle. Thanks, Mike ----- Original Message ----- > No problem, I should have some time later tonight to re-run the > provider tests on the CI/Argonne machines. > > > David > > > On Mon, Apr 4, 2011 at 2:52 PM, Sarah Kenny < skenny at uchicago.edu > > wrote: > > > today is pretty booked with uci stuff but i can give this my full > attention tomorrow if that will work. i can take abe, ranger and also > test on local workstations if you want to take ci/argonne machines > david (?) or suggest something else if you like. > > > > On Mon, Apr 4, 2011 at 11:42 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > David, Sarah, > > How quickly could you re-divide the Swift site test plan between you > and confirm back to swift-devel that we are ready to tag and release > the branch as 0.92.1? > > Before we do that, you need to add a test to the test suite that can > replicate the twice-each bug and verify that its detected in 0.92 and > corrected in 0.92.1 > > Can you possibly do this by noon tomorrow? > > Can you post a checklist of tests with names of who's going to run > them? > > > given that this is floating around now in 2 different forms, can we > agree on a single location? (for sanity's sake :P) > > https://sites.google.com/site/swiftdevel/site-specific-testing > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ReleasePlans#developer_notes > > > > > > > Depending on what you can commit to, I will see if I, Ketan, and/or > Justin can help take various sites as well. I feel we really need to > do this quickly so we have a stable trusted release out there. > > > > Thanks, > > Mike > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketan at mcs.anl.gov Mon Apr 4 15:40:57 2011 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 4 Apr 2011 15:40:57 -0500 Subject: [Swift-devel] Re: prelim instructions for asciidoc? In-Reply-To: References: <1519782816.62044.1301943097680.JavaMail.root@zimbra.anl.gov> Message-ID: <873283AE-B549-43C0-963A-154291A47E2F@mcs.anl.gov> Justin, Is asciidoc pre-installed somewhere or it needs to be installed? Ketan On Apr 4, 2011, at 2:19 PM, Justin M Wozniak wrote: > > My initial asciidoc notes are in: > > https://svn.ci.uchicago.edu/svn/vdl2/test/asciidoc-manual > > The output is up at: > > http://www.mcs.anl.gov/~wozniak/manual.html > > Justin > > On Mon, 4 Apr 2011, Michael Wilde wrote: > >> Justin, do you have any preliminary instructions for running asciidoc in some simple standalone fashion so people can start to try it? >> >> Not a priority, but if you have prelim tools set up or even identified, we can get a jump start on writing, conversion and gathering of material. >> >> Im thinking that a first step can be grabbing cookbook items or email fragments and getting their text into SVN into a doc/inprogress dir while we coerce each small section into asciidoc. >> >> - Mike > > -- > Justin M Wozniak > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wozniak at mcs.anl.gov Mon Apr 4 15:53:23 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 4 Apr 2011 15:53:23 -0500 (CDT) Subject: [Swift-devel] Re: prelim instructions for asciidoc? In-Reply-To: <873283AE-B549-43C0-963A-154291A47E2F@mcs.anl.gov> References: <1519782816.62044.1301943097680.JavaMail.root@zimbra.anl.gov> <873283AE-B549-43C0-963A-154291A47E2F@mcs.anl.gov> Message-ID: It needs to be installed. Justin On Mon, 4 Apr 2011, Ketan Maheshwari wrote: > Justin, > > Is asciidoc pre-installed somewhere or it needs to be installed? > > Ketan > > On Apr 4, 2011, at 2:19 PM, Justin M Wozniak wrote: > >> >> My initial asciidoc notes are in: >> >> https://svn.ci.uchicago.edu/svn/vdl2/test/asciidoc-manual >> >> The output is up at: >> >> http://www.mcs.anl.gov/~wozniak/manual.html >> >> Justin >> >> On Mon, 4 Apr 2011, Michael Wilde wrote: >> >>> Justin, do you have any preliminary instructions for running asciidoc >>> in some simple standalone fashion so people can start to try it? >>> >>> Not a priority, but if you have prelim tools set up or even >>> identified, we can get a jump start on writing, conversion and >>> gathering of material. >>> >>> Im thinking that a first step can be grabbing cookbook items or email >>> fragments and getting their text into SVN into a doc/inprogress dir >>> while we coerce each small section into asciidoc. >>> >>> - Mike >> >> -- >> Justin M Wozniak >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Justin M Wozniak From wilde at mcs.anl.gov Mon Apr 4 15:58:02 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 4 Apr 2011 15:58:02 -0500 (CDT) Subject: [Swift-devel] Re: prelim instructions for asciidoc? In-Reply-To: Message-ID: <1452296610.62943.1301950682895.JavaMail.root@zimbra.anl.gov> This looks like a good example of a technical site built almost entirely with asciidoc: http://www.networkupstools.org/index.html - Mike ----- Original Message ----- > It needs to be installed. > Justin > > On Mon, 4 Apr 2011, Ketan Maheshwari wrote: > > > Justin, > > > > Is asciidoc pre-installed somewhere or it needs to be installed? > > > > Ketan > > > > On Apr 4, 2011, at 2:19 PM, Justin M Wozniak wrote: > > > >> > >> My initial asciidoc notes are in: > >> > >> https://svn.ci.uchicago.edu/svn/vdl2/test/asciidoc-manual > >> > >> The output is up at: > >> > >> http://www.mcs.anl.gov/~wozniak/manual.html > >> > >> Justin > >> > >> On Mon, 4 Apr 2011, Michael Wilde wrote: > >> > >>> Justin, do you have any preliminary instructions for running > >>> asciidoc > >>> in some simple standalone fashion so people can start to try it? > >>> > >>> Not a priority, but if you have prelim tools set up or even > >>> identified, we can get a jump start on writing, conversion and > >>> gathering of material. > >>> > >>> Im thinking that a first step can be grabbing cookbook items or > >>> email > >>> fragments and getting their text into SVN into a doc/inprogress > >>> dir > >>> while we coerce each small section into asciidoc. > >>> > >>> - Mike > >> > >> -- > >> Justin M Wozniak > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > Justin M Wozniak -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Mon Apr 4 16:12:13 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 4 Apr 2011 16:12:13 -0500 (CDT) Subject: [Swift-devel] Re: prelim instructions for asciidoc? In-Reply-To: Message-ID: <310355473.63032.1301951533351.JavaMail.root@zimbra.anl.gov> wow, that was easy! $ wget http://sourceforge.net/projects/asciidoc/files/asciidoc/8.6.4/asciidoc-8.6.4.tar.gz/download $ tar -xzf asciidoc-8.6.4.tar.gz $ cd asciidoc-8.6.4 $ ./configure --prefix $HOME/asciidoc $ make install $ PATH=$PATH:$HOME/asciidoc/bin $ asciidoc manual.txt # creates manual.html - Mike ----- Original Message ----- > It needs to be installed. > Justin > > On Mon, 4 Apr 2011, Ketan Maheshwari wrote: > > > Justin, > > > > Is asciidoc pre-installed somewhere or it needs to be installed? > > > > Ketan > > > > On Apr 4, 2011, at 2:19 PM, Justin M Wozniak wrote: > > > >> > >> My initial asciidoc notes are in: > >> > >> https://svn.ci.uchicago.edu/svn/vdl2/test/asciidoc-manual > >> > >> The output is up at: > >> > >> http://www.mcs.anl.gov/~wozniak/manual.html > >> > >> Justin > >> > >> On Mon, 4 Apr 2011, Michael Wilde wrote: > >> > >>> Justin, do you have any preliminary instructions for running > >>> asciidoc > >>> in some simple standalone fashion so people can start to try it? > >>> > >>> Not a priority, but if you have prelim tools set up or even > >>> identified, we can get a jump start on writing, conversion and > >>> gathering of material. > >>> > >>> Im thinking that a first step can be grabbing cookbook items or > >>> email > >>> fragments and getting their text into SVN into a doc/inprogress > >>> dir > >>> while we coerce each small section into asciidoc. > >>> > >>> - Mike > >> > >> -- > >> Justin M Wozniak > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > Justin M Wozniak -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketan at mcs.anl.gov Mon Apr 4 16:19:57 2011 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 4 Apr 2011 16:19:57 -0500 Subject: [Swift-devel] Re: prelim instructions for asciidoc? In-Reply-To: <310355473.63032.1301951533351.JavaMail.root@zimbra.anl.gov> References: <310355473.63032.1301951533351.JavaMail.root@zimbra.anl.gov> Message-ID: <28F22E08-2153-4211-A413-5EA97117543F@mcs.anl.gov> And, asciidoc -n manual.txt #creates manual.html with section numbers :-) Ketan On Apr 4, 2011, at 4:12 PM, Michael Wilde wrote: > wow, that was easy! > > $ wget http://sourceforge.net/projects/asciidoc/files/asciidoc/8.6.4/asciidoc-8.6.4.tar.gz/download > $ tar -xzf asciidoc-8.6.4.tar.gz > $ cd asciidoc-8.6.4 > $ ./configure --prefix $HOME/asciidoc > $ make install > $ PATH=$PATH:$HOME/asciidoc/bin > $ asciidoc manual.txt # creates manual.html > > - Mike > > ----- Original Message ----- >> It needs to be installed. >> Justin >> >> On Mon, 4 Apr 2011, Ketan Maheshwari wrote: >> >>> Justin, >>> >>> Is asciidoc pre-installed somewhere or it needs to be installed? >>> >>> Ketan >>> >>> On Apr 4, 2011, at 2:19 PM, Justin M Wozniak wrote: >>> >>>> >>>> My initial asciidoc notes are in: >>>> >>>> https://svn.ci.uchicago.edu/svn/vdl2/test/asciidoc-manual >>>> >>>> The output is up at: >>>> >>>> http://www.mcs.anl.gov/~wozniak/manual.html >>>> >>>> Justin >>>> >>>> On Mon, 4 Apr 2011, Michael Wilde wrote: >>>> >>>>> Justin, do you have any preliminary instructions for running >>>>> asciidoc >>>>> in some simple standalone fashion so people can start to try it? >>>>> >>>>> Not a priority, but if you have prelim tools set up or even >>>>> identified, we can get a jump start on writing, conversion and >>>>> gathering of material. >>>>> >>>>> Im thinking that a first step can be grabbing cookbook items or >>>>> email >>>>> fragments and getting their text into SVN into a doc/inprogress >>>>> dir >>>>> while we coerce each small section into asciidoc. >>>>> >>>>> - Mike >>>> >>>> -- >>>> Justin M Wozniak >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> >> -- >> Justin M Wozniak > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From ketancmaheshwari at gmail.com Mon Apr 4 16:36:10 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 4 Apr 2011 16:36:10 -0500 Subject: [Swift-devel] Re: prelim instructions for asciidoc? In-Reply-To: <28F22E08-2153-4211-A413-5EA97117543F@mcs.anl.gov> References: <310355473.63032.1301951533351.JavaMail.root@zimbra.anl.gov> <28F22E08-2153-4211-A413-5EA97117543F@mcs.anl.gov> Message-ID: <913F7D8C-3B8A-43A0-A028-4540E477409A@gmail.com> Hi, asciidoc -a toc -n manual.txt # toc + numbered sections Ketan On Apr 4, 2011, at 4:19 PM, Ketan Maheshwari wrote: > And, > > asciidoc -n manual.txt #creates manual.html with section numbers :-) > > Ketan > > On Apr 4, 2011, at 4:12 PM, Michael Wilde wrote: > >> wow, that was easy! >> >> $ wget http://sourceforge.net/projects/asciidoc/files/asciidoc/8.6.4/asciidoc-8.6.4.tar.gz/download >> $ tar -xzf asciidoc-8.6.4.tar.gz >> $ cd asciidoc-8.6.4 >> $ ./configure --prefix $HOME/asciidoc >> $ make install >> $ PATH=$PATH:$HOME/asciidoc/bin >> $ asciidoc manual.txt # creates manual.html >> >> - Mike >> >> ----- Original Message ----- >>> It needs to be installed. >>> Justin >>> >>> On Mon, 4 Apr 2011, Ketan Maheshwari wrote: >>> >>>> Justin, >>>> >>>> Is asciidoc pre-installed somewhere or it needs to be installed? >>>> >>>> Ketan >>>> >>>> On Apr 4, 2011, at 2:19 PM, Justin M Wozniak wrote: >>>> >>>>> >>>>> My initial asciidoc notes are in: >>>>> >>>>> https://svn.ci.uchicago.edu/svn/vdl2/test/asciidoc-manual >>>>> >>>>> The output is up at: >>>>> >>>>> http://www.mcs.anl.gov/~wozniak/manual.html >>>>> >>>>> Justin >>>>> >>>>> On Mon, 4 Apr 2011, Michael Wilde wrote: >>>>> >>>>>> Justin, do you have any preliminary instructions for running >>>>>> asciidoc >>>>>> in some simple standalone fashion so people can start to try it? >>>>>> >>>>>> Not a priority, but if you have prelim tools set up or even >>>>>> identified, we can get a jump start on writing, conversion and >>>>>> gathering of material. >>>>>> >>>>>> Im thinking that a first step can be grabbing cookbook items or >>>>>> email >>>>>> fragments and getting their text into SVN into a doc/inprogress >>>>>> dir >>>>>> while we coerce each small section into asciidoc. >>>>>> >>>>>> - Mike >>>>> >>>>> -- >>>>> Justin M Wozniak >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>> >>> -- >>> Justin M Wozniak >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From bugzilla-daemon at mcs.anl.gov Mon Apr 4 22:21:12 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 4 Apr 2011 22:21:12 -0500 (CDT) Subject: [Swift-devel] [Bug 319] New: Set logging level via swift.properties to a few pre-defined levels Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=319 Summary: Set logging level via swift.properties to a few pre-defined levels Product: Swift Version: 0.93 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Log processing and plotting AssignedTo: wozniak at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov Implement log properties of fast, info, normal, debug, trace (or similar, much like coaster worker.pl) fast - very light logging, for performance research info - lighter than normal logging normal - a log level that doesnt degrade performance much, tells the user everything they need, and is sufficient to drive all features including provenance, Text User Interface (-tui option) and performance plotting - debug - supports most debugging needed - trace - for only the most detailed level of debugging The current log4j control mechanism should be used only for fine grained developer control. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wilde at mcs.anl.gov Mon Apr 4 22:23:34 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 4 Apr 2011 22:23:34 -0500 (CDT) Subject: [Swift-devel] Re: Google Summer of Code Application In-Reply-To: Message-ID: <135804436.64241.1301973814929.JavaMail.root@zimbra.anl.gov> Dear Ashish, I suggest project #1 below, the integration of Swift into Globus Online. Swift Bugzilla is located at https://bugzilla.mcs.anl.gov/swift/ I'll try to locate a bug to work on. Are you familiar with log4j? We need to encode useful sets of log4j processing options as a set of simple log levels settable as a swift property. Ive just filed this as: Bug 319 - Set logging level via swift.properties to a few pre-defined levels Justin or Mihael may want to comment on this. But feel free to peruse bugzilla for something else suitable to tackle. - Mike ----- Original Message ----- > Respected Sir, > > I know that i am extremely late in starting this conversation, but i > have just got free from my exams and other commitments > I am a 4 th year undergraduate student of the computer science > department from Indian Institute of Technology,Delhi. > > I am interested in the following two projects from your project ideas. > > 1. Integrate swift with Globus Online to provide an application > execution and scripting interface. > 2. Implementing efficient map-reduce models using swift parallel > programming. > > A small description of my skills relevant to the two projects are > - Experience of Bash Scripting and python scripting.I have extensively > used bash scripting. > - Basic understanding of REST. During an intern at Netapp, Bangalore, > I was required to use SOAP for my work. > - Experience of working with map reduce and running hadoop > programs(You can have a look at one of the assignments that i did in > my course on cloud computing and virtualziation > http://www.cse.iitd.ernet.in/~sbansal/csl862/pa3/ ). > - Experience of writing programs in java.I have for my courses > extensively used java. > > > I will like you to kindly tell me, which of these two projects would > you like me to focus on, for my gsoc application.Also the gsoc > application template mentions that we should try and fix bugs or > provide some feedback on how a bug should be fixed. Could you tell me > about any specific bug that I should focus on, in order to demonstrate > my coding competence? > > Regards > Ashish -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Mon Apr 4 22:26:34 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 4 Apr 2011 22:26:34 -0500 (CDT) Subject: [Swift-devel] [Bug 319] Set logging level via swift.properties to a few pre-defined levels In-Reply-To: References: Message-ID: <20110405032634.0D3E21C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=319 --- Comment #1 from Michael Wilde 2011-04-04 22:26:33 --- add nice timestamps to stdout log - hh:mm:ss (currently is a simple integer per a recent fix by Mihael) same log record as is sent to stdout should occur in detailed log for corss-referencing -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Tue Apr 5 10:25:02 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 5 Apr 2011 10:25:02 -0500 (CDT) Subject: [Swift-devel] [Bug 321] New: Improve "cant find wrapper log" error message and document in new Debugging chapter Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321 Summary: Improve "cant find wrapper log" error message and document in new Debugging chapter Product: Swift Version: 0.93 Platform: All OS/Version: All Status: ASSIGNED Severity: major Priority: P1 Component: error messages AssignedTo: skenny at uchicago.edu ReportedBy: wilde at mcs.anl.gov This bug is filed to deal specifically with the very frequent error message that is perplexing a new Swift user in the message I cc below. I think we should do several specific things in response. I mark this high prio because erros very similar to this one are probably the highest cause of Swift problems, confusion, and frustration for new users. 1. Reword the message "failed to transfer wrapper log" to say what it means 2. Document the process of where to look for issues causing this. 3. More clearly document the tag, how and when to use it, and how it affects debugging. Stress that its a performance enhancement that inhibits debugging and should not be used until a workflow is stable. (Note that in this case the user put scratch on the home filesystem which defeats the purpose. Im not sure f this caused a failure; I think not, but would be good to verify. This should not be in our basic templates, although this user didnt use gensites. 4. Work through the common cases of app-not-found, app-not-executable, app-encoutered-an-error (signal or non-zero exit) and app-didnt-return-expected files. Maybe I missed a few cases, but these are the main ones. Each one should be *instantly* recognizable, and ideally we should report all these in plain clear English that enables the user to fix them instantly. 5. Review this email thread for anything I missed in either debugging or documentation. The results of fixing this bug should be: - better error messages for the above cases - the start of a user guide section on Debugging (use asciidoc; a new standalobg doc that becomes a user guide chapter is fine for now). - Mike Hi Weiyang, I'm cc'ing this to swift-user, where you should send all questions, so that other Swift developers and users can offer help as well, and all users can learn from the answers. Its hard for me to debug this without seeing your .swift script and your log file. The error message means: Swift tried to run a batch job to execute an app() call, and the attempt did not even return the per-job log file (the "wrapper log") from the execution site (ie what you defined in your pool entry). First, please comment out the tag. Thats an optimization and may be confusing things (or may even be cause of the error, but that is less likely). The most likely cause of the problem here is that your application "touch" is listed in your tc under the wrong pathname: touch is /bin/touch, not /usr/bin. The best way to get a sense of how far your script progressed is to do a full "find" under your swiftwork directory from the directory for the failing run id (test-20110405-0940-rqa4nyka), and see what is there. The shared/ directory should contain whatever input files were processed. If you did not use the tag, then there should also be a "job directory" for each app swift tried to run. This is described in the user guide. Since these were not present, I concluded that Swift could not execute your application. The Swift team is working to improve both these messages and the documentation for debugging such situations to make this much easier to spot. Also, I dont know how far your workflow ran the previous time, but thus looks like a large run. You should test new workflows (or even any changes you make) on very small runs first, so that there are fewer files and parallel jobs to sort through when debugging new scripts. - Mike ----- Original Message ----- > Hello, > > > My swift codes just encountered new problems: After submitted jobs > using foreach it's saying > > > Failed to transfer wrapper log from test-20110405-0940-rqa4nyka/info/u > on pbs > > > My guess is there're sth wrong with pbs (the execution provider) > > > My sites.xml and tc.data is exactly kept the same since last time you > advised > > > sites.xml: > > > > > > > 1:00:00 > 8 > 8 > > 1 > 1 > 1.99 > 10000 > CI-SES000031 > > > /home/frankwang/tmp > /home/frankwang/swiftwork > > > > > > pbs echo /bin/echo INSTALLED INTEL32::LINUX null > pbs sh /bin/bash INSTALLED INTEL32::LINUX null > pbs touch /usr/bin/touch INSTALLED INTEL32::LINUX > GLOBUS::maxwalltime="0:1" > > > The same problem occured when I was using various version of swift. > > > Can you take a time to figure it out? > > > Weiyang -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Tue Apr 5 10:27:25 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 5 Apr 2011 10:27:25 -0500 (CDT) Subject: [Swift-devel] [Bug 321] Improve "cant find wrapper log" error message and document in new Debugging chapter In-Reply-To: References: Message-ID: <20110405152725.266B21C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321 --- Comment #1 from Michael Wilde 2011-04-05 10:27:24 --- Please follow this email thread through to resolution for any other message and documentation needed: http://mail.ci.uchicago.edu/pipermail/swift-user/2011-April/001909.html Both the problem and the expertise level of this user are "just right" to better understand what most users with only moderate computing skills need in order to succeed with Swift. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wilde at mcs.anl.gov Tue Apr 5 10:45:23 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 10:45:23 -0500 (CDT) Subject: [Swift-devel] Fwd: [Bug 321] Improve "cant find wrapper log" error message and document in new Debugging chapter In-Reply-To: <20110405152725.2D9751C074@wind.mcs.anl.gov> Message-ID: <1105880808.65692.1302018323354.JavaMail.root@zimbra.anl.gov> Sarah, I assigned you this bug because I think its an ideal place to get started on the error message and debugability-improvement project. So after you complete the testing for 0.92.1, can you shift your focus to this? You will need to discuss on the list how to break the work down into smaller pieces and likely discuss the questions that come up. I think you should immediately on starting create a new debugging.txt asciidoc User Guide section, which you can develop standalone until the asciidoc effort is far enough along to manage chapters as separate files. If there's anything else on your to-do list for 0.93, we can discuss in tomorrows developer phone call how to prioritize the work. Does this sound like a good way to dive into the message-improvement project? - Mike ----- Forwarded Message ----- From: bugzilla-daemon at mcs.anl.gov To: wilde at mcs.anl.gov Sent: Tuesday, April 5, 2011 10:27:25 AM Subject: [Bug 321] Improve "cant find wrapper log" error message and document in new Debugging chapter https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321 --- Comment #1 from Michael Wilde 2011-04-05 10:27:24 --- Please follow this email thread through to resolution for any other message and documentation needed: http://mail.ci.uchicago.edu/pipermail/swift-user/2011-April/001909.html Both the problem and the expertise level of this user are "just right" to better understand what most users with only moderate computing skills need in order to succeed with Swift. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Tue Apr 5 10:51:48 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 5 Apr 2011 10:51:48 -0500 (CDT) Subject: [Swift-devel] [Bug 289] Add mechanism to delete temporary files no longer in scope In-Reply-To: References: Message-ID: <20110405155148.9B9F61C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=289 Michael Wilde changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED AssignedTo|hategan at mcs.anl.gov |wozniak at mcs.anl.gov Target Milestone|--- |v0.93 --- Comment #1 from Michael Wilde 2011-04-05 10:51:48 --- Justin, please add your proposal for this feature here. Mihael, please review and comment, and advise Justin on code paths as needed. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From dennis at ucar.edu Tue Apr 5 10:59:23 2011 From: dennis at ucar.edu (John Dennis) Date: Tue, 5 Apr 2011 09:59:23 -0600 Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: References: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> <2A5A416A-CAB6-4D82-B55F-0CFBC8F3B770@ucar.edu> Message-ID: Justin, Thanks for adding this feature. I not currently setup on a system with the swift trunk. I suspect Sheri is currently work with the trunk. Sheri, could you take a look at this feature during your work with swift? Thanks, John On Apr 3, 2011, at 3:56 PM, Justin M Wozniak wrote: > > Ok, John, you can give this a try in trunk if you like. The syntax > for the built-in is @exists, it takes a string and returns a boolean. > Justin > > On Sat, 2 Apr 2011, Justin M Wozniak wrote: > >> >> I have a prototype of this, I'll get it checked in later today. >> Justin >> >> On Fri, 1 Apr 2011, John Dennis wrote: >> >>> Michael, >>> >>> This type of function would be great to have. >>> John >>> On Apr 1, 2011, at 11:51 AM, Michael Wilde wrote: >>>> Basically as far as I understand: the presence or absence of a >>>> particular data file within the inout dataset is to be used to >>>> determine whether the code to process that dataset subsection >>>> gets invoked or not: >>>> if (exists("extra.data")) { >>>> DataFile extraInput<"extra.data">; >>>> extraResult = analyze(extraInput); >>>> } >>>> The above is my assumption based on a phone call. We can and >>>> should verify the assumption with a simple example. >>>> I also thought we can try this today by seeing if extraInput can >>>> be an array, mapped to zero items if nothing to do and 1 item if >>>> something to do. That would at least let us test the use case. >>>> John, can you verify if the example Swift lines above are what >>>> you are looking for here? >>>> - Mike >>>> ----- Original Message ----- >>>>> On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: >>>>>> - we should first verify that exists() will solve the NCAR need >>>>>> in a >>>>>> sufficiently clean way >>>>> I think this is important. Can we get a description of the problem >>>>> instead of a (otherwise) random proposal for a solution? >>>> -- >>>> Michael Wilde >>>> Computation Institute, University of Chicago >>>> Mathematics and Computer Science Division >>>> Argonne National Laboratory >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > -- > Justin M Wozniak From wilde at mcs.anl.gov Tue Apr 5 11:42:50 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 11:42:50 -0500 (CDT) Subject: [Swift-devel] Re: R library(ncdf) [PADS Support #12231] In-Reply-To: <8AF6D68D-2A89-4EDC-878B-9BAB4C63E655@ci.uchicago.edu> Message-ID: <370016820.65991.1302021770552.JavaMail.root@zimbra.anl.gov> Hi Neil, Thanks for the heads-up. No, I have never tried building R with ncdf. The next 4 weeks are very busy for me as well with travel, workshops and proposals. I'm sending to swift-devel as well as Rob and Sheri of the MCS Climate group to see if they have any info on building the netCDF package for R. Regards, Mike ----- Original Message ----- > Hi, Mike. Have you ever tried to install the ncdf package for R? I > have a ticket open with Ti, but he seems to be having trouble with it. > He thought he had it at one point (see below) but there was a problem > -- I think he used the OS version of R. I'll send you my reply from > when I tried to use what he had built. If you have any spare cycles > for this in the interest of moving my SwiftR project forward over the > next 2-3 weeks that would be great. I'll let you know if I hear > anything from Ti. I'm not bugging him about it this week because I > have my hands full and I think he's helping get ready for the Globus > meeting, but I'm just trying to keep the issue alive. Thanks, Mike. > > > Neil > > > Begin forwarded message: > > > From: "Ti Leggett" > > Date: March 28, 2011 12:42:58 PM CDT > > To: nbest at ci.uchicago.edu > > Subject: R library(ncdf) [PADS Support #12231] > > Reply-To: pads-support at ci.uchicago.edu > > > > Bleh, accidentally hit comment instead of reply: > > > > > > The magic command was this: > > R CMD INSTALL -l /soft/R-ncdf-1.8-gcc4.1-r1/lib/R2.12/site-library > > /software/common/src/ncdf_1.8.1.tar.gz > > --configure-args="LDFLAGS='-L/soft/hdf5-1.8.2-gcc4.1-r1/lib -lhdf5 > > -lhdf5_hl' --with-netcdf_incdir=/soft/netcdf-4.0-gcc4.1-r1/include > > --with-netcdf_libdir=/soft/netcdf-4.0-gcc4.1-r1/lib" > > > > Add the following to your .soft to use it: > > > > +netcdf-4.0-gcc4.1-r1 > > +szip-2.1-gcc4.1-r1 > > +hdf5-1.8.2-gcc4.1-r1 > > +R-ncdf-gcc-R2.12 -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Tue Apr 5 11:51:16 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 5 Apr 2011 11:51:16 -0500 (CDT) Subject: [Swift-devel] [Bug 323] New: Report status for, and document, how to get Swift to submit jobs meeting a site's scheduler constraints Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=323 Summary: Report status for, and document, how to get Swift to submit jobs meeting a site's scheduler constraints Product: Swift Version: 1.0 Platform: All OS/Version: All Status: ASSIGNED Severity: normal Priority: P3 Component: Documentation AssignedTo: ketan at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov A user guide section on "How do I get Swift to submit jobs into a specific queue, reservation, etc" would be useful. This is the main reason that people initially deviate from coaster defaults into more complex pool entries: to force jobs to fit into some site-imposed constraint. Ketan's experience reflected in the thread below, of not knowing why his job didnt run, illustrates the need for both enhanced reporting and status from Swift as well as documentation. ----- Original Message ----- > Ketan, > > No, your job 64063 is not eligible to run in the development > reservation. You requested 5 nodes (mppwidth = 120), and the > development reservation is limited to 3. > > If your job is less than 1 hour, and also less than 3 nodes (72 cores) > than your job will run in this reservation. > > Jason Hedden > > On Apr 5, 2011, at 10:54 AM, Ketan Maheshwari wrote: > > > Jason, > > > > Is it possible to verify if my request (walltime is 49 minutes) went > > to this "Development Reservation" queue. > > > > Ketan > > > > On Mar 23, 2011, at 10:26 AM, Jason Hedden wrote: > > > >> I've enabled the development reservation on Beagle. Monday through > >> Friday between 8:00AM and 5:00PM 3 nodes are reserved for jobs that > >> request less than 1 hour walltime. Similarly to PADS, no additional > >> job parameters are required, any job that fits these requirements > >> will run in the reservation. > >> > >> Jason Hedden > >> > >> On Mar 22, 2011, at 4:26 PM, Daniel S. Katz wrote: > >> > >>> I agree, we definitely want at least some small jobs to be able to > >>> run quickly most of the time, at least during business hours on > >>> weekdays. > >>> > >>> On Mar 23, 2011, at 2:29 AM, Lorenzo Pesce wrote: > >>> > >>>> Seems reasonable and useful to me. > >>>> > >>>> On Mar 22, 2011, at 1:12 PM, Ti Leggett wrote: > >>>> > >>>>> -----BEGIN PGP SIGNED MESSAGE----- > >>>>> Hash: SHA1 > >>>>> > >>>>> I think that decision will be based on what kinds of jobs we > >>>>> want to favor, so I imagine in a few months when we convene to > >>>>> make the policy decision. > >>>>> > >>>>> On Mar 22, 2011, at 1:11 PM, Michael Wilde wrote: > >>>>> > >>>>>> No objection: it sounds like a good idea. > >>>>>> > >>>>>> I can't recall, but are we also going to have a development or > >>>>>> fast queue, or is the scheduler just going to treat requests as > >>>>>> if smaller requests get higher priority? > >>>>>> > >>>>>> - Mike > >>>>>> > >>>>>> ----- Original Message ----- > >>>>>>> -----BEGIN PGP SIGNED MESSAGE----- > >>>>>>> Hash: SHA1 > >>>>>>> > >>>>>>> This was just brought up by a user. There's currently no easy > >>>>>>> way for > >>>>>>> a user to test out their code, especially an MPI code other > >>>>>>> than to > >>>>>>> submit an interactive job and wait for it to finally run. What > >>>>>>> we do > >>>>>>> on other clusters is have a development reservation during > >>>>>>> "business" > >>>>>>> hours, say 8:00am to 5:00pm for jobs an hour or less. Anyone > >>>>>>> object to > >>>>>>> putting aside 3 nodes for these types of jobs for people to be > >>>>>>> able to > >>>>>>> test code quickly? > >>>>>>> -----BEGIN PGP SIGNATURE----- > >>>>>>> Version: GnuPG/MacGPG2 v2.0.14 (Darwin) > >>>>>>> > >>>>>>> iEYEARECAAYFAk2I5W0ACgkQ4RgdOxQVi0DkHQCfSCyc/hpAJ7prmphcK64EsrC9 > >>>>>>> 5y8AoJs1EsE5nSRHt22p9bkDlHalZsvL > >>>>>>> =7e74 > >>>>>>> -----END PGP SIGNATURE----- > >>>>>>> _______________________________________________ > >>>>>>> beagle-planning mailing list > >>>>>>> beagle-planning at ci.uchicago.edu > >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/beagle-planning > >>>>>> > >>>>>> -- > >>>>>> Michael Wilde > >>>>>> Computation Institute, University of Chicago > >>>>>> Mathematics and Computer Science Division > >>>>>> Argonne National Laboratory > >>>>>> > >>>>> > >>>>> -----BEGIN PGP SIGNATURE----- > >>>>> Version: GnuPG/MacGPG2 v2.0.14 (Darwin) > >>>>> > >>>>> iEYEARECAAYFAk2I5poACgkQ4RgdOxQVi0BsswCggHcK5CgdUjazD0eoBaWE7nAY > >>>>> 6R8An1Wcz9e+C8pA5w9wpyJdaJ3hGg79 > >>>>> =YP0B > >>>>> -----END PGP SIGNATURE----- > >>>>> _______________________________________________ > >>>>> beagle-planning mailing list > >>>>> beagle-planning at ci.uchicago.edu > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/beagle-planning > >>>> > >>>> _______________________________________________ > >>>> beagle-planning mailing list > >>>> beagle-planning at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/beagle-planning > >>> > >>> -- > >>> Daniel S. Katz > >>> University of Chicago > >>> (773) 834-7186 (voice) > >>> (773) 834-3700 (fax) > >>> d.katz at ieee.org or dsk at ci.uchicago.edu > >>> http://www.ci.uchicago.edu/~dsk/ > >>> > >>> > >>> > >>> _______________________________________________ > >>> beagle-planning mailing list > >>> beagle-planning at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/beagle-planning > >> > >> _______________________________________________ > >> beagle-planning mailing list > >> beagle-planning at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/beagle-planning > > > > _______________________________________________ > beagle-planning mailing list > beagle-planning at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/beagle-planning -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wilde at mcs.anl.gov Tue Apr 5 11:52:48 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 11:52:48 -0500 (CDT) Subject: [Swift-devel] Re: R library(ncdf) [PADS Support #12231] In-Reply-To: <370016820.65991.1302021770552.JavaMail.root@zimbra.anl.gov> Message-ID: <916965871.66061.1302022368502.JavaMail.root@zimbra.anl.gov> ----- Forwarded Message ----- From: "Michael Wilde" To: "Neil Best" Cc: "joshua elliott" , "PADS Support" Sent: Tuesday, April 5, 2011 11:47:04 AM Subject: Re: R library(ncdf) [PADS Support #12231] According to this thread I found on google, the R install may be looking for HDF5 libs to load netCDF: http://www.unidata.ucar.edu/support/help/MailArchives/netcdf-perl/msg00374.html Does the R package doc say anything about how to ensure that the HDF5 libs are in your LD_LIBRARY_PATH? - MIke ----- Original Message ----- > Begin forwarded message: > > > From: Neil Best > > Date: March 28, 2011 3:04:17 PM CDT > > To: pads-support at ci.uchicago.edu > > Subject: Re: R library(ncdf) [PADS Support #12231] > > > > In a fresh login shell starting a fresh R session after changing my > > .soft I get this: > > > >> sessionInfo() > > R version 2.12.0 (2010-10-15) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > >> .libPaths() > > [1] "/home/nbest/R/x86_64-unknown-linux-gnu-library/2.12/" > > [2] "/soft/R-DBI-0.2-5-gcc4.1-r1/lib/R2.12/site-library" > > [3] "/soft/R-RPostgreSQL-0.1-7-gcc4.1-r1/lib/R2.12/site-library" > > [4] "/soft/R-sp-0.9-72-gcc4.1-r1/lib/R2.12/site-library" > > [5] "/soft/R-rgdal-0.6-30-gcc4.1-r1/lib/R2.12/site-library" > > [6] "/soft/R-RColorBrewer-1.0.2-gcc4.1-r1/lib/R2.12/site-library" > > [7] "/soft/R-ncdf-1.8-gcc4.1-r1/lib/R2.12/site-library" > > [8] "/soft/R-2.12.0-gcc4.1-r1/lib64/R/library" > >> library(ncdf) > > Error: package 'ncdf' was built before R 2.10.0: please re-install > > it > > > > > > You were able to load it? What would be different between our > > sessions? It's not in my home library, so I reasonably sure that > > it's trying to load your build. > > > > [nbest at login2 R]$ find ~nbest/R -type d -name ncdf > > [nbest at login2 R]$ find /soft/R-ncdf-1.8-gcc4.1-r1/ -type d -name > > ncdf > > /soft/R-ncdf-1.8-gcc4.1-r1/lib/R2.12/site-library/ncdf > > > > I tried installing it to ~nbest/R using the args you gave me, but it > > still fails: > > > > [nbest at login2 src]$ R CMD INSTALL -l ~/R > > /software/common/src/ncdf_1.8.1.tar.gz > > --configure-args="LDFLAGS='-L/soft/hdf5-1.8.2-gcc4.1-r1/lib -lhdf5 > > -lhdf5_hl' --with-netcdf_incdir=/soft/netcdf-4.0-gcc4.1-r1/include > > --with-netcdf_libdir=/soft/netcdf-4.0-gcc4.1-r1/lib" > > . . . > > unable to load shared object > > '/autonfs/home/nbest/R/ncdf/libs/ncdf.so': > > /soft/netcdf-4.0-gcc4.1-r1/lib/libnetcdf.so.5: undefined symbol: > > H5P_CLS_FILE_CREATE_g > > > > I think that's the same error as before. I still can't imagine > > what's different about our environments. Can you? > > > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory ----- Original Message ----- > Hi Neil, > > Thanks for the heads-up. No, I have never tried building R with ncdf. > > The next 4 weeks are very busy for me as well with travel, workshops > and proposals. > > I'm sending to swift-devel as well as Rob and Sheri of the MCS Climate > group to see if they have any info on building the netCDF package for > R. > > Regards, > > Mike > > ----- Original Message ----- > > Hi, Mike. Have you ever tried to install the ncdf package for R? I > > have a ticket open with Ti, but he seems to be having trouble with > > it. > > He thought he had it at one point (see below) but there was a > > problem > > -- I think he used the OS version of R. I'll send you my reply from > > when I tried to use what he had built. If you have any spare cycles > > for this in the interest of moving my SwiftR project forward over > > the > > next 2-3 weeks that would be great. I'll let you know if I hear > > anything from Ti. I'm not bugging him about it this week because I > > have my hands full and I think he's helping get ready for the Globus > > meeting, but I'm just trying to keep the issue alive. Thanks, Mike. > > > > > > Neil > > > > > > Begin forwarded message: > > > > > From: "Ti Leggett" > > > Date: March 28, 2011 12:42:58 PM CDT > > > To: nbest at ci.uchicago.edu > > > Subject: R library(ncdf) [PADS Support #12231] > > > Reply-To: pads-support at ci.uchicago.edu > > > > > > Bleh, accidentally hit comment instead of reply: > > > > > > > > > The magic command was this: > > > R CMD INSTALL -l /soft/R-ncdf-1.8-gcc4.1-r1/lib/R2.12/site-library > > > /software/common/src/ncdf_1.8.1.tar.gz > > > --configure-args="LDFLAGS='-L/soft/hdf5-1.8.2-gcc4.1-r1/lib -lhdf5 > > > -lhdf5_hl' --with-netcdf_incdir=/soft/netcdf-4.0-gcc4.1-r1/include > > > --with-netcdf_libdir=/soft/netcdf-4.0-gcc4.1-r1/lib" > > > > > > Add the following to your .soft to use it: > > > > > > +netcdf-4.0-gcc4.1-r1 > > > +szip-2.1-gcc4.1-r1 > > > +hdf5-1.8.2-gcc4.1-r1 > > > +R-ncdf-gcc-R2.12 > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Tue Apr 5 13:21:39 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 5 Apr 2011 13:21:39 -0500 (CDT) Subject: [Swift-devel] [Bug 321] Improve "cant find wrapper log" error message and document in new Debugging chapter In-Reply-To: References: Message-ID: <20110405182139.E994B1C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321 ketan changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ketan at mcs.anl.gov --- Comment #2 from ketan 2011-04-05 13:21:39 --- Might be good to put a diagnostic remark on the lines of "The filesystem you are using might not be writable by provider" as this is often the case on "failed to transfer ..." errors. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wilde at mcs.anl.gov Tue Apr 5 13:36:42 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 13:36:42 -0500 (CDT) Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: Message-ID: <1738774714.66684.1302028602212.JavaMail.root@zimbra.anl.gov> Thanks, David. Please cc all discussion of this sort to swift-devel. I assume SVN is working for you now? (It was working for me, from communicadao, around 9AM this morning). - Mike ----- Original Message ----- > It appears that there may be a problem with svn.ci.uchicago.edu . I am > unable to connect from an SVN client or through the web interface - > both attempts just hang indefinitely. I have sent an email to support > (ticket 12539), but just wanted to give you guys a heads up that there > may be an issue there. I will try to run the tests again in the > morning. > > David > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > David, Sarah, > > How quickly could you re-divide the Swift site test plan between you > and confirm back to swift-devel that we are ready to tag and release > the branch as 0.92.1? > > Before we do that, you need to add a test to the test suite that can > replicate the twice-each bug and verify that its detected in 0.92 and > corrected in 0.92.1 > > Can you possibly do this by noon tomorrow? > > Can you post a checklist of tests with names of who's going to run > them? > > Depending on what you can commit to, I will see if I, Ketan, and/or > Justin can help take various sites as well. I feel we really need to > do this quickly so we have a stable trusted release out there. > > > > Thanks, > > Mike > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Tue Apr 5 13:52:38 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 5 Apr 2011 13:52:38 -0500 (CDT) Subject: [Swift-devel] [Bug 325] New: indicate host gridftp server that caused the error: Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=325 Summary: indicate host gridftp server that caused the error: Product: Swift Version: trunk Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: error messages AssignedTo: skenny at uchicago.edu ReportedBy: aespinosa at cs.uchicago.edu CC: swift-devel at ci.uchicago.edu type file; app (file o) cat(file i) { cat i @stdout=o; } file y<"gsiftp://somehost//filename3">; file x <"gsiftp://somehost//filename"; x=cat(y); sites.xml ... ... /someworkdir When something goes wrong with either hosts 'somehost' and 'another' host, logs and error messages only report this: ... Caused by: org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: Error communicating with the GridFTP server Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: Error communicating with the GridFTP server Caused by: org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: Error communicating with the GridFTP server Caused by: java.net.ConnectException: Connection refused Final status: Failed:4 Finished in previous run:11 This is manageable for single remote resources. but with more pools (hence gridftp endpoints), this is getting harder to debug -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From wilde at mcs.anl.gov Tue Apr 5 14:10:09 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 14:10:09 -0500 (CDT) Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: <1738774714.66684.1302028602212.JavaMail.root@zimbra.anl.gov> Message-ID: <721045497.66956.1302030609111.JavaMail.root@zimbra.anl.gov> David, Sarah, Ketan, Can you all report back to the devel list on your progress on testing the release? Ie, what systems are you testing, and which of those tests are complete? When will the rest be done, and hence when are we ready to tag and release the fix? I asked who will create the test to confirm that the twice-each bug is fixed, but no one responded. Which of the three of you feel you know how to do this? Is this being tested in your new tests? Ketan tells me that in the 0.92+ interim release I made for Beagle it looks like the resume feature is not working. I was aware that such a bug was reported in trunk, but in the original 0.92 Cray version (under /home/wilde/swift/rev) resume *was* working. Does the test suite test the resume feature at the moment? Lastly, who will tag and upload the new release, remove or change the red warning in the download page, and announce 0.92.1 on swift-user? - Mike ----- Original Message ----- > Thanks, David. Please cc all discussion of this sort to swift-devel. > > I assume SVN is working for you now? (It was working for me, from > communicadao, around 9AM this morning). > > - Mike > > > ----- Original Message ----- > > It appears that there may be a problem with svn.ci.uchicago.edu . I > > am > > unable to connect from an SVN client or through the web interface - > > both attempts just hang indefinitely. I have sent an email to > > support > > (ticket 12539), but just wanted to give you guys a heads up that > > there > > may be an issue there. I will try to run the tests again in the > > morning. > > > > David > > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > David, Sarah, > > > > How quickly could you re-divide the Swift site test plan between you > > and confirm back to swift-devel that we are ready to tag and release > > the branch as 0.92.1? > > > > Before we do that, you need to add a test to the test suite that can > > replicate the twice-each bug and verify that its detected in 0.92 > > and > > corrected in 0.92.1 > > > > Can you possibly do this by noon tomorrow? > > > > Can you post a checklist of tests with names of who's going to run > > them? > > > > Depending on what you can commit to, I will see if I, Ketan, and/or > > Justin can help take various sites as well. I feel we really need to > > do this quickly so we have a stable trusted release out there. > > > > > > > > Thanks, > > > > Mike > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From skenny at uchicago.edu Tue Apr 5 14:13:37 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 5 Apr 2011 12:13:37 -0700 Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: <721045497.66956.1302030609111.JavaMail.root@zimbra.anl.gov> References: <1738774714.66684.1302028602212.JavaMail.root@zimbra.anl.gov> <721045497.66956.1302030609111.JavaMail.root@zimbra.anl.gov> Message-ID: i'm currently working on a swift script to replicate the bug for .92 which i will then commit to svn in the test suite. if you mike, or ketan already have this let me know (i'm trying to hack the script jon posted to the list) and i'll use yours...david said he doesn't have one. as i said, my plan was to test on ranger, abe and a couple of (uci) local workstations. ~sk On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde wrote: > David, Sarah, Ketan, > > Can you all report back to the devel list on your progress on testing the > release? Ie, what systems are you testing, and which of those tests are > complete? When will the rest be done, and hence when are we ready to tag > and release the fix? > > I asked who will create the test to confirm that the twice-each bug is > fixed, but no one responded. Which of the three of you feel you know how to > do this? Is this being tested in your new tests? > > Ketan tells me that in the 0.92+ interim release I made for Beagle it looks > like the resume feature is not working. I was aware that such a bug was > reported in trunk, but in the original 0.92 Cray version (under > /home/wilde/swift/rev) resume *was* working. Does the test suite test the > resume feature at the moment? > > Lastly, who will tag and upload the new release, remove or change the red > warning in the download page, and announce 0.92.1 on swift-user? > > - Mike > > > ----- Original Message ----- > > Thanks, David. Please cc all discussion of this sort to swift-devel. > > > > I assume SVN is working for you now? (It was working for me, from > > communicadao, around 9AM this morning). > > > > - Mike > > > > > > ----- Original Message ----- > > > It appears that there may be a problem with svn.ci.uchicago.edu . I > > > am > > > unable to connect from an SVN client or through the web interface - > > > both attempts just hang indefinitely. I have sent an email to > > > support > > > (ticket 12539), but just wanted to give you guys a heads up that > > > there > > > may be an issue there. I will try to run the tests again in the > > > morning. > > > > > > David > > > > > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > David, Sarah, > > > > > > How quickly could you re-divide the Swift site test plan between you > > > and confirm back to swift-devel that we are ready to tag and release > > > the branch as 0.92.1? > > > > > > Before we do that, you need to add a test to the test suite that can > > > replicate the twice-each bug and verify that its detected in 0.92 > > > and > > > corrected in 0.92.1 > > > > > > Can you possibly do this by noon tomorrow? > > > > > > Can you post a checklist of tests with names of who's going to run > > > them? > > > > > > Depending on what you can commit to, I will see if I, Ketan, and/or > > > Justin can help take various sites as well. I feel we really need to > > > do this quickly so we have a stable trusted release out there. > > > > > > > > > > > > Thanks, > > > > > > Mike > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Tue Apr 5 14:17:52 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 5 Apr 2011 14:17:52 -0500 Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: References: <1738774714.66684.1302028602212.JavaMail.root@zimbra.anl.gov> <721045497.66956.1302030609111.JavaMail.root@zimbra.anl.gov> Message-ID: Sarah, I do not have the test you are asking for yet. I am looking at the test suite and will start on Beagle soon. Ketan On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: > i'm currently working on a swift script to replicate the bug for .92 which i will then commit to svn in the test suite. if you mike, or ketan already have this let me know (i'm trying to hack the script jon posted to the list) and i'll use yours...david said he doesn't have one. > > as i said, my plan was to test on ranger, abe and a couple of (uci) local workstations. > > ~sk > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde wrote: > David, Sarah, Ketan, > > Can you all report back to the devel list on your progress on testing the release? Ie, what systems are you testing, and which of those tests are complete? When will the rest be done, and hence when are we ready to tag and release the fix? > > I asked who will create the test to confirm that the twice-each bug is fixed, but no one responded. Which of the three of you feel you know how to do this? Is this being tested in your new tests? > > Ketan tells me that in the 0.92+ interim release I made for Beagle it looks like the resume feature is not working. I was aware that such a bug was reported in trunk, but in the original 0.92 Cray version (under /home/wilde/swift/rev) resume *was* working. Does the test suite test the resume feature at the moment? > > Lastly, who will tag and upload the new release, remove or change the red warning in the download page, and announce 0.92.1 on swift-user? > > - Mike > > > ----- Original Message ----- > > Thanks, David. Please cc all discussion of this sort to swift-devel. > > > > I assume SVN is working for you now? (It was working for me, from > > communicadao, around 9AM this morning). > > > > - Mike > > > > > > ----- Original Message ----- > > > It appears that there may be a problem with svn.ci.uchicago.edu . I > > > am > > > unable to connect from an SVN client or through the web interface - > > > both attempts just hang indefinitely. I have sent an email to > > > support > > > (ticket 12539), but just wanted to give you guys a heads up that > > > there > > > may be an issue there. I will try to run the tests again in the > > > morning. > > > > > > David > > > > > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > David, Sarah, > > > > > > How quickly could you re-divide the Swift site test plan between you > > > and confirm back to swift-devel that we are ready to tag and release > > > the branch as 0.92.1? > > > > > > Before we do that, you need to add a test to the test suite that can > > > replicate the twice-each bug and verify that its detected in 0.92 > > > and > > > corrected in 0.92.1 > > > > > > Can you possibly do this by noon tomorrow? > > > > > > Can you post a checklist of tests with names of who's going to run > > > them? > > > > > > Depending on what you can commit to, I will see if I, Ketan, and/or > > > Justin can help take various sites as well. I feel we really need to > > > do this quickly so we have a stable trusted release out there. > > > > > > > > > > > > Thanks, > > > > > > Mike > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Tue Apr 5 14:26:48 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Tue, 5 Apr 2011 14:26:48 -0500 Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: References: <1738774714.66684.1302028602212.JavaMail.root@zimbra.anl.gov> <721045497.66956.1302030609111.JavaMail.root@zimbra.anl.gov> Message-ID: The script I posted might be too complex to use to replicate the twice each bug. However, didn't Mike post a simple loop script that was looping twice when the bug was initially found? On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari wrote: > Sarah, > > I do not have the test you are asking for yet. I am looking at the test > suite and will start on Beagle soon. > > Ketan > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: > > i'm currently working on a swift script to replicate the bug for .92 which > i will then commit to svn in the test suite. if you mike, or ketan already > have this let me know (i'm trying to hack the script jon posted to the list) > and i'll use yours...david said he doesn't have one. > > as i said, my plan was to test on ranger, abe and a couple of (uci) local > workstations. > > ~sk > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde wrote: > >> David, Sarah, Ketan, >> >> Can you all report back to the devel list on your progress on testing the >> release? Ie, what systems are you testing, and which of those tests are >> complete? When will the rest be done, and hence when are we ready to tag >> and release the fix? >> >> I asked who will create the test to confirm that the twice-each bug is >> fixed, but no one responded. Which of the three of you feel you know how to >> do this? Is this being tested in your new tests? >> >> Ketan tells me that in the 0.92+ interim release I made for Beagle it >> looks like the resume feature is not working. I was aware that such a bug >> was reported in trunk, but in the original 0.92 Cray version (under >> /home/wilde/swift/rev) resume *was* working. Does the test suite test the >> resume feature at the moment? >> >> Lastly, who will tag and upload the new release, remove or change the red >> warning in the download page, and announce 0.92.1 on swift-user? >> >> - Mike >> >> >> ----- Original Message ----- >> > Thanks, David. Please cc all discussion of this sort to swift-devel. >> > >> > I assume SVN is working for you now? (It was working for me, from >> > communicadao, around 9AM this morning). >> > >> > - Mike >> > >> > >> > ----- Original Message ----- >> > > It appears that there may be a problem with svn.ci.uchicago.edu . I >> > > am >> > > unable to connect from an SVN client or through the web interface - >> > > both attempts just hang indefinitely. I have sent an email to >> > > support >> > > (ticket 12539), but just wanted to give you guys a heads up that >> > > there >> > > may be an issue there. I will try to run the tests again in the >> > > morning. >> > > >> > > David >> > > >> > > >> > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < wilde at mcs.anl.gov > >> > > wrote: >> > > >> > > >> > > David, Sarah, >> > > >> > > How quickly could you re-divide the Swift site test plan between you >> > > and confirm back to swift-devel that we are ready to tag and release >> > > the branch as 0.92.1? >> > > >> > > Before we do that, you need to add a test to the test suite that can >> > > replicate the twice-each bug and verify that its detected in 0.92 >> > > and >> > > corrected in 0.92.1 >> > > >> > > Can you possibly do this by noon tomorrow? >> > > >> > > Can you post a checklist of tests with names of who's going to run >> > > them? >> > > >> > > Depending on what you can commit to, I will see if I, Ketan, and/or >> > > Justin can help take various sites as well. I feel we really need to >> > > do this quickly so we have a stable trusted release out there. >> > > >> > > >> > > >> > > Thanks, >> > > >> > > Mike >> > > >> > > -- >> > > Michael Wilde >> > > Computation Institute, University of Chicago >> > > Mathematics and Computer Science Division >> > > Argonne National Laboratory >> > >> > -- >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> > >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Apr 5 14:33:42 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 14:33:42 -0500 (CDT) Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: Message-ID: <1046706498.67103.1302032022991.JavaMail.root@zimbra.anl.gov> Yes, I had posted variations of the following to the list: zz3.swift: int arr[]; arr[0]=1; arr[1]=2; foreach a in arr { trace("for", a); } zz6.swift: int arr[]; foreach a,i in [0:9] { arr[i] = i; } trace("arr",arr); foreach a,i in arr { trace("for", a,i); } com$ PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH com$ which swift ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift com$ cd swift/lab com$ swift zz3.swift Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) RunID: 20110404-1344-j98f22id Progress: SwiftScript trace: for, 2 SwiftScript trace: for, 1 Final status: com$ PATH=~/swift/rev/swift-0.92/bin:$PATH com$ swift zz3.swift Swift svn swift-r4157 cog-r3056 RunID: 20110404-1344-ensm4te8 Progress: SwiftScript trace: for, 1 SwiftScript trace: for, 2 SwiftScript trace: for, 2 SwiftScript trace: for, 1 Final status: com$ swift zz6.swift Swift svn swift-r4157 cog-r3056 RunID: 20110404-1344-i7y6q1i1 Progress: SwiftScript trace: arr, arr.$[]/10 SwiftScript trace: for, 3, 3 SwiftScript trace: for, 2, 2 SwiftScript trace: for, 4, 4 SwiftScript trace: for, 5, 5 SwiftScript trace: for, 3, 3 SwiftScript trace: for, 5, 5 SwiftScript trace: for, 9, 9 SwiftScript trace: for, 4, 4 SwiftScript trace: for, 1, 1 SwiftScript trace: for, 7, 7 SwiftScript trace: for, 7, 7 SwiftScript trace: for, 6, 6 SwiftScript trace: for, 9, 9 SwiftScript trace: for, 6, 6 SwiftScript trace: for, 1, 1 SwiftScript trace: for, 2, 2 SwiftScript trace: for, 0, 0 SwiftScript trace: for, 8, 8 SwiftScript trace: for, 0, 0 SwiftScript trace: for, 8, 8 Final status: com$ ----- Original Message ----- > The script I posted might be too complex to use to replicate the twice > each bug. However, didn't Mike post a simple loop script that was > looping twice when the bug was initially found? > > > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari < > ketancmaheshwari at gmail.com > wrote: > > > > Sarah, > > > I do not have the test you are asking for yet. I am looking at the > test suite and will start on Beagle soon. > > > Ketan > > > > > > > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: > > > i'm currently working on a swift script to replicate the bug for .92 > which i will then commit to svn in the test suite. if you mike, or > ketan already have this let me know (i'm trying to hack the script jon > posted to the list) and i'll use yours...david said he doesn't have > one. > > as i said, my plan was to test on ranger, abe and a couple of (uci) > local workstations. > > ~sk > > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > David, Sarah, Ketan, > > Can you all report back to the devel list on your progress on testing > the release? Ie, what systems are you testing, and which of those > tests are complete? When will the rest be done, and hence when are we > ready to tag and release the fix? > > I asked who will create the test to confirm that the twice-each bug is > fixed, but no one responded. Which of the three of you feel you know > how to do this? Is this being tested in your new tests? > > Ketan tells me that in the 0.92+ interim release I made for Beagle it > looks like the resume feature is not working. I was aware that such a > bug was reported in trunk, but in the original 0.92 Cray version > (under /home/wilde/swift/rev) resume *was* working. Does the test > suite test the resume feature at the moment? > > Lastly, who will tag and upload the new release, remove or change the > red warning in the download page, and announce 0.92.1 on swift-user? > > - Mike > > > > > > ----- Original Message ----- > > Thanks, David. Please cc all discussion of this sort to swift-devel. > > > > I assume SVN is working for you now? (It was working for me, from > > communicadao, around 9AM this morning). > > > > - Mike > > > > > > ----- Original Message ----- > > > It appears that there may be a problem with svn.ci.uchicago.edu . > > > I > > > am > > > unable to connect from an SVN client or through the web interface > > > - > > > both attempts just hang indefinitely. I have sent an email to > > > support > > > (ticket 12539), but just wanted to give you guys a heads up that > > > there > > > may be an issue there. I will try to run the tests again in the > > > morning. > > > > > > David > > > > > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < wilde at mcs.anl.gov > > > > > > > wrote: > > > > > > > > > David, Sarah, > > > > > > How quickly could you re-divide the Swift site test plan between > > > you > > > and confirm back to swift-devel that we are ready to tag and > > > release > > > the branch as 0.92.1? > > > > > > Before we do that, you need to add a test to the test suite that > > > can > > > replicate the twice-each bug and verify that its detected in 0.92 > > > and > > > corrected in 0.92.1 > > > > > > Can you possibly do this by noon tomorrow? > > > > > > Can you post a checklist of tests with names of who's going to run > > > them? > > > > > > Depending on what you can commit to, I will see if I, Ketan, > > > and/or > > > Justin can help take various sites as well. I feel we really need > > > to > > > do this quickly so we have a stable trusted release out there. > > > > > > > > > > > > Thanks, > > > > > > Mike > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > > > > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Tue Apr 5 14:37:09 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Tue, 5 Apr 2011 14:37:09 -0500 Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: <1046706498.67103.1302032022991.JavaMail.root@zimbra.anl.gov> References: <1046706498.67103.1302032022991.JavaMail.root@zimbra.anl.gov> Message-ID: Yes. That is the one I remember seeing. That is much easier than what my Montage scripts are doing. On Tue, Apr 5, 2011 at 2:33 PM, Michael Wilde wrote: > Yes, I had posted variations of the following to the list: > > zz3.swift: > > int arr[]; > > arr[0]=1; > arr[1]=2; > > foreach a in arr { > trace("for", a); > } > > zz6.swift: > > > int arr[]; > > foreach a,i in [0:9] { > arr[i] = i; > } > > trace("arr",arr); > > foreach a,i in arr { > trace("for", a,i); > } > > > com$ > PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH > com$ which swift > ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift > com$ cd swift/lab > com$ swift zz3.swift > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > locally) > > RunID: 20110404-1344-j98f22id > Progress: > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > Final status: > com$ PATH=~/swift/rev/swift-0.92/bin:$PATH > com$ swift zz3.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110404-1344-ensm4te8 > Progress: > SwiftScript trace: for, 1 > SwiftScript trace: for, 2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > Final status: > com$ swift zz6.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110404-1344-i7y6q1i1 > Progress: > SwiftScript trace: arr, arr.$[]/10 > SwiftScript trace: for, 3, 3 > SwiftScript trace: for, 2, 2 > SwiftScript trace: for, 4, 4 > SwiftScript trace: for, 5, 5 > SwiftScript trace: for, 3, 3 > SwiftScript trace: for, 5, 5 > SwiftScript trace: for, 9, 9 > SwiftScript trace: for, 4, 4 > SwiftScript trace: for, 1, 1 > SwiftScript trace: for, 7, 7 > SwiftScript trace: for, 7, 7 > SwiftScript trace: for, 6, 6 > SwiftScript trace: for, 9, 9 > SwiftScript trace: for, 6, 6 > SwiftScript trace: for, 1, 1 > SwiftScript trace: for, 2, 2 > SwiftScript trace: for, 0, 0 > SwiftScript trace: for, 8, 8 > SwiftScript trace: for, 0, 0 > SwiftScript trace: for, 8, 8 > Final status: > com$ > > > > > > > ----- Original Message ----- > > The script I posted might be too complex to use to replicate the twice > > each bug. However, didn't Mike post a simple loop script that was > > looping twice when the bug was initially found? > > > > > > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari < > > ketancmaheshwari at gmail.com > wrote: > > > > > > > > Sarah, > > > > > > I do not have the test you are asking for yet. I am looking at the > > test suite and will start on Beagle soon. > > > > > > Ketan > > > > > > > > > > > > > > > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: > > > > > > i'm currently working on a swift script to replicate the bug for .92 > > which i will then commit to svn in the test suite. if you mike, or > > ketan already have this let me know (i'm trying to hack the script jon > > posted to the list) and i'll use yours...david said he doesn't have > > one. > > > > as i said, my plan was to test on ranger, abe and a couple of (uci) > > local workstations. > > > > ~sk > > > > > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > David, Sarah, Ketan, > > > > Can you all report back to the devel list on your progress on testing > > the release? Ie, what systems are you testing, and which of those > > tests are complete? When will the rest be done, and hence when are we > > ready to tag and release the fix? > > > > I asked who will create the test to confirm that the twice-each bug is > > fixed, but no one responded. Which of the three of you feel you know > > how to do this? Is this being tested in your new tests? > > > > Ketan tells me that in the 0.92+ interim release I made for Beagle it > > looks like the resume feature is not working. I was aware that such a > > bug was reported in trunk, but in the original 0.92 Cray version > > (under /home/wilde/swift/rev) resume *was* working. Does the test > > suite test the resume feature at the moment? > > > > Lastly, who will tag and upload the new release, remove or change the > > red warning in the download page, and announce 0.92.1 on swift-user? > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > Thanks, David. Please cc all discussion of this sort to swift-devel. > > > > > > I assume SVN is working for you now? (It was working for me, from > > > communicadao, around 9AM this morning). > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > It appears that there may be a problem with svn.ci.uchicago.edu . > > > > I > > > > am > > > > unable to connect from an SVN client or through the web interface > > > > - > > > > both attempts just hang indefinitely. I have sent an email to > > > > support > > > > (ticket 12539), but just wanted to give you guys a heads up that > > > > there > > > > may be an issue there. I will try to run the tests again in the > > > > morning. > > > > > > > > David > > > > > > > > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < wilde at mcs.anl.gov > > > > > > > > > wrote: > > > > > > > > > > > > David, Sarah, > > > > > > > > How quickly could you re-divide the Swift site test plan between > > > > you > > > > and confirm back to swift-devel that we are ready to tag and > > > > release > > > > the branch as 0.92.1? > > > > > > > > Before we do that, you need to add a test to the test suite that > > > > can > > > > replicate the twice-each bug and verify that its detected in 0.92 > > > > and > > > > corrected in 0.92.1 > > > > > > > > Can you possibly do this by noon tomorrow? > > > > > > > > Can you post a checklist of tests with names of who's going to run > > > > them? > > > > > > > > Depending on what you can commit to, I will see if I, Ketan, > > > > and/or > > > > Justin can help take various sites as well. I feel we really need > > > > to > > > > do this quickly so we have a stable trusted release out there. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Mike > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > > > > > > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Tue Apr 5 14:38:50 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 5 Apr 2011 12:38:50 -0700 Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: References: <1046706498.67103.1302032022991.JavaMail.root@zimbra.anl.gov> Message-ID: got it thanks...to be clear i wasn't going to try to run the whole montage scripit :P but this is easier than extracting the faulty loop :) On Tue, Apr 5, 2011 at 12:37 PM, Jonathan Monette wrote: > Yes. That is the one I remember seeing. That is much easier than what my > Montage scripts are doing. > > > On Tue, Apr 5, 2011 at 2:33 PM, Michael Wilde wrote: > >> Yes, I had posted variations of the following to the list: >> >> zz3.swift: >> >> int arr[]; >> >> arr[0]=1; >> arr[1]=2; >> >> foreach a in arr { >> trace("for", a); >> } >> >> zz6.swift: >> >> >> int arr[]; >> >> foreach a,i in [0:9] { >> arr[i] = i; >> } >> >> trace("arr",arr); >> >> foreach a,i in arr { >> trace("for", a,i); >> } >> >> >> com$ >> PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH >> com$ which swift >> ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift >> com$ cd swift/lab >> com$ swift zz3.swift >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified >> locally) >> >> RunID: 20110404-1344-j98f22id >> Progress: >> SwiftScript trace: for, 2 >> SwiftScript trace: for, 1 >> Final status: >> com$ PATH=~/swift/rev/swift-0.92/bin:$PATH >> com$ swift zz3.swift >> Swift svn swift-r4157 cog-r3056 >> >> RunID: 20110404-1344-ensm4te8 >> Progress: >> SwiftScript trace: for, 1 >> SwiftScript trace: for, 2 >> SwiftScript trace: for, 2 >> SwiftScript trace: for, 1 >> Final status: >> com$ swift zz6.swift >> Swift svn swift-r4157 cog-r3056 >> >> RunID: 20110404-1344-i7y6q1i1 >> Progress: >> SwiftScript trace: arr, arr.$[]/10 >> SwiftScript trace: for, 3, 3 >> SwiftScript trace: for, 2, 2 >> SwiftScript trace: for, 4, 4 >> SwiftScript trace: for, 5, 5 >> SwiftScript trace: for, 3, 3 >> SwiftScript trace: for, 5, 5 >> SwiftScript trace: for, 9, 9 >> SwiftScript trace: for, 4, 4 >> SwiftScript trace: for, 1, 1 >> SwiftScript trace: for, 7, 7 >> SwiftScript trace: for, 7, 7 >> SwiftScript trace: for, 6, 6 >> SwiftScript trace: for, 9, 9 >> SwiftScript trace: for, 6, 6 >> SwiftScript trace: for, 1, 1 >> SwiftScript trace: for, 2, 2 >> SwiftScript trace: for, 0, 0 >> SwiftScript trace: for, 8, 8 >> SwiftScript trace: for, 0, 0 >> SwiftScript trace: for, 8, 8 >> Final status: >> com$ >> >> >> >> >> >> >> ----- Original Message ----- >> > The script I posted might be too complex to use to replicate the twice >> > each bug. However, didn't Mike post a simple loop script that was >> > looping twice when the bug was initially found? >> > >> > >> > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari < >> > ketancmaheshwari at gmail.com > wrote: >> > >> > >> > >> > Sarah, >> > >> > >> > I do not have the test you are asking for yet. I am looking at the >> > test suite and will start on Beagle soon. >> > >> > >> > Ketan >> > >> > >> > >> > >> > >> > >> > >> > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: >> > >> > >> > i'm currently working on a swift script to replicate the bug for .92 >> > which i will then commit to svn in the test suite. if you mike, or >> > ketan already have this let me know (i'm trying to hack the script jon >> > posted to the list) and i'll use yours...david said he doesn't have >> > one. >> > >> > as i said, my plan was to test on ranger, abe and a couple of (uci) >> > local workstations. >> > >> > ~sk >> > >> > >> > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov > >> > wrote: >> > >> > >> > David, Sarah, Ketan, >> > >> > Can you all report back to the devel list on your progress on testing >> > the release? Ie, what systems are you testing, and which of those >> > tests are complete? When will the rest be done, and hence when are we >> > ready to tag and release the fix? >> > >> > I asked who will create the test to confirm that the twice-each bug is >> > fixed, but no one responded. Which of the three of you feel you know >> > how to do this? Is this being tested in your new tests? >> > >> > Ketan tells me that in the 0.92+ interim release I made for Beagle it >> > looks like the resume feature is not working. I was aware that such a >> > bug was reported in trunk, but in the original 0.92 Cray version >> > (under /home/wilde/swift/rev) resume *was* working. Does the test >> > suite test the resume feature at the moment? >> > >> > Lastly, who will tag and upload the new release, remove or change the >> > red warning in the download page, and announce 0.92.1 on swift-user? >> > >> > - Mike >> > >> > >> > >> > >> > >> > ----- Original Message ----- >> > > Thanks, David. Please cc all discussion of this sort to swift-devel. >> > > >> > > I assume SVN is working for you now? (It was working for me, from >> > > communicadao, around 9AM this morning). >> > > >> > > - Mike >> > > >> > > >> > > ----- Original Message ----- >> > > > It appears that there may be a problem with svn.ci.uchicago.edu . >> > > > I >> > > > am >> > > > unable to connect from an SVN client or through the web interface >> > > > - >> > > > both attempts just hang indefinitely. I have sent an email to >> > > > support >> > > > (ticket 12539), but just wanted to give you guys a heads up that >> > > > there >> > > > may be an issue there. I will try to run the tests again in the >> > > > morning. >> > > > >> > > > David >> > > > >> > > > >> > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < wilde at mcs.anl.gov >> > > > > >> > > > wrote: >> > > > >> > > > >> > > > David, Sarah, >> > > > >> > > > How quickly could you re-divide the Swift site test plan between >> > > > you >> > > > and confirm back to swift-devel that we are ready to tag and >> > > > release >> > > > the branch as 0.92.1? >> > > > >> > > > Before we do that, you need to add a test to the test suite that >> > > > can >> > > > replicate the twice-each bug and verify that its detected in 0.92 >> > > > and >> > > > corrected in 0.92.1 >> > > > >> > > > Can you possibly do this by noon tomorrow? >> > > > >> > > > Can you post a checklist of tests with names of who's going to run >> > > > them? >> > > > >> > > > Depending on what you can commit to, I will see if I, Ketan, >> > > > and/or >> > > > Justin can help take various sites as well. I feel we really need >> > > > to >> > > > do this quickly so we have a stable trusted release out there. >> > > > >> > > > >> > > > >> > > > Thanks, >> > > > >> > > > Mike >> > > > >> > > > -- >> > > > Michael Wilde >> > > > Computation Institute, University of Chicago >> > > > Mathematics and Computer Science Division >> > > > Argonne National Laboratory >> > > >> > > -- >> > > Michael Wilde >> > > Computation Institute, University of Chicago >> > > Mathematics and Computer Science Division >> > > Argonne National Laboratory >> > > >> > > _______________________________________________ >> > > Swift-devel mailing list >> > > Swift-devel at ci.uchicago.edu >> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > >> > -- >> > >> > >> > >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> > >> > >> > >> > >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > >> > >> > >> > >> > -- >> > Any intelligent fool can make things bigger and more complex... It >> > takes a touch of genius - and a lot of courage to move in the opposite >> > direction. >> > - Albert Einstein >> > >> > >> > >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > -- > Any intelligent fool can make things bigger and more complex... It takes a > touch of genius - and a lot of courage to move in the opposite direction. > - Albert Einstein > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Apr 5 14:50:08 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 14:50:08 -0500 (CDT) Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: Message-ID: <1305237903.67225.1302033008679.JavaMail.root@zimbra.anl.gov> Just to clarify: we detected this bug by diagnosing the error that Allan was getting in his SCEC workflow, trying to add a file to a local cache that was already there. I never verified if the same bug was causing failures in Montage, but Jon reported Apr 4 12:04 AM that the small Montage was working under the fixed 0.92 branch and that the large Montage run was still to be tested. - Mike ----- Original Message ----- > got it thanks...to be clear i wasn't going to try to run the whole > montage scripit :P but this is easier than extracting the faulty loop > :) > > > On Tue, Apr 5, 2011 at 12:37 PM, Jonathan Monette < > jon.monette at gmail.com > wrote: > > > Yes. That is the one I remember seeing. That is much easier than what > my Montage scripts are doing. > > > > > > On Tue, Apr 5, 2011 at 2:33 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Yes, I had posted variations of the following to the list: > > zz3.swift: > > int arr[]; > > arr[0]=1; > arr[1]=2; > > foreach a in arr { > trace("for", a); > } > > zz6.swift: > > > int arr[]; > > foreach a,i in [0:9] { > arr[i] = i; > } > > trace("arr",arr); > > foreach a,i in arr { > trace("for", a,i); > } > > > com$ > PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH > com$ which swift > ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift > com$ cd swift/lab > com$ swift zz3.swift > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > locally) > > RunID: 20110404-1344-j98f22id > Progress: > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > Final status: > com$ PATH=~/swift/rev/swift-0.92/bin:$PATH > com$ swift zz3.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110404-1344-ensm4te8 > Progress: > SwiftScript trace: for, 1 > SwiftScript trace: for, 2 > SwiftScript trace: for, 2 > SwiftScript trace: for, 1 > Final status: > com$ swift zz6.swift > Swift svn swift-r4157 cog-r3056 > > RunID: 20110404-1344-i7y6q1i1 > Progress: > SwiftScript trace: arr, arr.$[]/10 > SwiftScript trace: for, 3, 3 > SwiftScript trace: for, 2, 2 > SwiftScript trace: for, 4, 4 > SwiftScript trace: for, 5, 5 > SwiftScript trace: for, 3, 3 > SwiftScript trace: for, 5, 5 > SwiftScript trace: for, 9, 9 > SwiftScript trace: for, 4, 4 > SwiftScript trace: for, 1, 1 > SwiftScript trace: for, 7, 7 > SwiftScript trace: for, 7, 7 > SwiftScript trace: for, 6, 6 > SwiftScript trace: for, 9, 9 > SwiftScript trace: for, 6, 6 > SwiftScript trace: for, 1, 1 > SwiftScript trace: for, 2, 2 > SwiftScript trace: for, 0, 0 > SwiftScript trace: for, 8, 8 > SwiftScript trace: for, 0, 0 > SwiftScript trace: for, 8, 8 > Final status: > com$ > > > > > > > > > > ----- Original Message ----- > > The script I posted might be too complex to use to replicate the > > twice > > each bug. However, didn't Mike post a simple loop script that was > > looping twice when the bug was initially found? > > > > > > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari < > > ketancmaheshwari at gmail.com > wrote: > > > > > > > > Sarah, > > > > > > I do not have the test you are asking for yet. I am looking at the > > test suite and will start on Beagle soon. > > > > > > Ketan > > > > > > > > > > > > > > > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: > > > > > > i'm currently working on a swift script to replicate the bug for .92 > > which i will then commit to svn in the test suite. if you mike, or > > ketan already have this let me know (i'm trying to hack the script > > jon > > posted to the list) and i'll use yours...david said he doesn't have > > one. > > > > as i said, my plan was to test on ranger, abe and a couple of (uci) > > local workstations. > > > > ~sk > > > > > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > David, Sarah, Ketan, > > > > Can you all report back to the devel list on your progress on > > testing > > the release? Ie, what systems are you testing, and which of those > > tests are complete? When will the rest be done, and hence when are > > we > > ready to tag and release the fix? > > > > I asked who will create the test to confirm that the twice-each bug > > is > > fixed, but no one responded. Which of the three of you feel you know > > how to do this? Is this being tested in your new tests? > > > > Ketan tells me that in the 0.92+ interim release I made for Beagle > > it > > looks like the resume feature is not working. I was aware that such > > a > > bug was reported in trunk, but in the original 0.92 Cray version > > (under /home/wilde/swift/rev) resume *was* working. Does the test > > suite test the resume feature at the moment? > > > > Lastly, who will tag and upload the new release, remove or change > > the > > red warning in the download page, and announce 0.92.1 on swift-user? > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > Thanks, David. Please cc all discussion of this sort to > > > swift-devel. > > > > > > I assume SVN is working for you now? (It was working for me, from > > > communicadao, around 9AM this morning). > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > It appears that there may be a problem with svn.ci.uchicago.edu > > > > . > > > > I > > > > am > > > > unable to connect from an SVN client or through the web > > > > interface > > > > - > > > > both attempts just hang indefinitely. I have sent an email to > > > > support > > > > (ticket 12539), but just wanted to give you guys a heads up that > > > > there > > > > may be an issue there. I will try to run the tests again in the > > > > morning. > > > > > > > > David > > > > > > > > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < > > > > wilde at mcs.anl.gov > > > > > > > > > wrote: > > > > > > > > > > > > David, Sarah, > > > > > > > > How quickly could you re-divide the Swift site test plan between > > > > you > > > > and confirm back to swift-devel that we are ready to tag and > > > > release > > > > the branch as 0.92.1? > > > > > > > > Before we do that, you need to add a test to the test suite that > > > > can > > > > replicate the twice-each bug and verify that its detected in > > > > 0.92 > > > > and > > > > corrected in 0.92.1 > > > > > > > > Can you possibly do this by noon tomorrow? > > > > > > > > Can you post a checklist of tests with names of who's going to > > > > run > > > > them? > > > > > > > > Depending on what you can commit to, I will see if I, Ketan, > > > > and/or > > > > Justin can help take various sites as well. I feel we really > > > > need > > > > to > > > > do this quickly so we have a stable trusted release out there. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Mike > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > > > > > > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the > > opposite > > direction. > > - Albert Einstein > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > -- > > > > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Tue Apr 5 14:57:39 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Tue, 5 Apr 2011 14:57:39 -0500 Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: <1305237903.67225.1302033008679.JavaMail.root@zimbra.anl.gov> References: <1305237903.67225.1302033008679.JavaMail.root@zimbra.anl.gov> Message-ID: Correct. Based off how I was looping I was receiving the same cache error that Allan was receiving. Also, I never though of this but my Montage scripts were running very slowly in the trunk at some point(I am assuming this was the point that the twice each bug was introduced and everything was being done twice). Under the 0.92 branch by small workflows complete. My large workflows error out with PBS error 254 I believe. Cannot remember the error code but believe it was this one. But this is not due to the twice each bug. On Tue, Apr 5, 2011 at 2:50 PM, Michael Wilde wrote: > Just to clarify: we detected this bug by diagnosing the error that Allan > was getting in his SCEC workflow, trying to add a file to a local cache that > was already there. > > I never verified if the same bug was causing failures in Montage, but Jon > reported Apr 4 12:04 AM that the small Montage was working under the fixed > 0.92 branch and that the large Montage run was still to be tested. > > - Mike > > ----- Original Message ----- > > got it thanks...to be clear i wasn't going to try to run the whole > > montage scripit :P but this is easier than extracting the faulty loop > > :) > > > > > > On Tue, Apr 5, 2011 at 12:37 PM, Jonathan Monette < > > jon.monette at gmail.com > wrote: > > > > > > Yes. That is the one I remember seeing. That is much easier than what > > my Montage scripts are doing. > > > > > > > > > > > > On Tue, Apr 5, 2011 at 2:33 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Yes, I had posted variations of the following to the list: > > > > zz3.swift: > > > > int arr[]; > > > > arr[0]=1; > > arr[1]=2; > > > > foreach a in arr { > > trace("for", a); > > } > > > > zz6.swift: > > > > > > int arr[]; > > > > foreach a,i in [0:9] { > > arr[i] = i; > > } > > > > trace("arr",arr); > > > > foreach a,i in arr { > > trace("for", a,i); > > } > > > > > > com$ > > > PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH > > com$ which swift > > ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift > > com$ cd swift/lab > > com$ swift zz3.swift > > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > > locally) > > > > RunID: 20110404-1344-j98f22id > > Progress: > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > Final status: > > com$ PATH=~/swift/rev/swift-0.92/bin:$PATH > > com$ swift zz3.swift > > Swift svn swift-r4157 cog-r3056 > > > > RunID: 20110404-1344-ensm4te8 > > Progress: > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > Final status: > > com$ swift zz6.swift > > Swift svn swift-r4157 cog-r3056 > > > > RunID: 20110404-1344-i7y6q1i1 > > Progress: > > SwiftScript trace: arr, arr.$[]/10 > > SwiftScript trace: for, 3, 3 > > SwiftScript trace: for, 2, 2 > > SwiftScript trace: for, 4, 4 > > SwiftScript trace: for, 5, 5 > > SwiftScript trace: for, 3, 3 > > SwiftScript trace: for, 5, 5 > > SwiftScript trace: for, 9, 9 > > SwiftScript trace: for, 4, 4 > > SwiftScript trace: for, 1, 1 > > SwiftScript trace: for, 7, 7 > > SwiftScript trace: for, 7, 7 > > SwiftScript trace: for, 6, 6 > > SwiftScript trace: for, 9, 9 > > SwiftScript trace: for, 6, 6 > > SwiftScript trace: for, 1, 1 > > SwiftScript trace: for, 2, 2 > > SwiftScript trace: for, 0, 0 > > SwiftScript trace: for, 8, 8 > > SwiftScript trace: for, 0, 0 > > SwiftScript trace: for, 8, 8 > > Final status: > > com$ > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > The script I posted might be too complex to use to replicate the > > > twice > > > each bug. However, didn't Mike post a simple loop script that was > > > looping twice when the bug was initially found? > > > > > > > > > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari < > > > ketancmaheshwari at gmail.com > wrote: > > > > > > > > > > > > Sarah, > > > > > > > > > I do not have the test you are asking for yet. I am looking at the > > > test suite and will start on Beagle soon. > > > > > > > > > Ketan > > > > > > > > > > > > > > > > > > > > > > > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: > > > > > > > > > i'm currently working on a swift script to replicate the bug for .92 > > > which i will then commit to svn in the test suite. if you mike, or > > > ketan already have this let me know (i'm trying to hack the script > > > jon > > > posted to the list) and i'll use yours...david said he doesn't have > > > one. > > > > > > as i said, my plan was to test on ranger, abe and a couple of (uci) > > > local workstations. > > > > > > ~sk > > > > > > > > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > David, Sarah, Ketan, > > > > > > Can you all report back to the devel list on your progress on > > > testing > > > the release? Ie, what systems are you testing, and which of those > > > tests are complete? When will the rest be done, and hence when are > > > we > > > ready to tag and release the fix? > > > > > > I asked who will create the test to confirm that the twice-each bug > > > is > > > fixed, but no one responded. Which of the three of you feel you know > > > how to do this? Is this being tested in your new tests? > > > > > > Ketan tells me that in the 0.92+ interim release I made for Beagle > > > it > > > looks like the resume feature is not working. I was aware that such > > > a > > > bug was reported in trunk, but in the original 0.92 Cray version > > > (under /home/wilde/swift/rev) resume *was* working. Does the test > > > suite test the resume feature at the moment? > > > > > > Lastly, who will tag and upload the new release, remove or change > > > the > > > red warning in the download page, and announce 0.92.1 on swift-user? > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > Thanks, David. Please cc all discussion of this sort to > > > > swift-devel. > > > > > > > > I assume SVN is working for you now? (It was working for me, from > > > > communicadao, around 9AM this morning). > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > It appears that there may be a problem with svn.ci.uchicago.edu > > > > > . > > > > > I > > > > > am > > > > > unable to connect from an SVN client or through the web > > > > > interface > > > > > - > > > > > both attempts just hang indefinitely. I have sent an email to > > > > > support > > > > > (ticket 12539), but just wanted to give you guys a heads up that > > > > > there > > > > > may be an issue there. I will try to run the tests again in the > > > > > morning. > > > > > > > > > > David > > > > > > > > > > > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < > > > > > wilde at mcs.anl.gov > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > David, Sarah, > > > > > > > > > > How quickly could you re-divide the Swift site test plan between > > > > > you > > > > > and confirm back to swift-devel that we are ready to tag and > > > > > release > > > > > the branch as 0.92.1? > > > > > > > > > > Before we do that, you need to add a test to the test suite that > > > > > can > > > > > replicate the twice-each bug and verify that its detected in > > > > > 0.92 > > > > > and > > > > > corrected in 0.92.1 > > > > > > > > > > Can you possibly do this by noon tomorrow? > > > > > > > > > > Can you post a checklist of tests with names of who's going to > > > > > run > > > > > them? > > > > > > > > > > Depending on what you can commit to, I will see if I, Ketan, > > > > > and/or > > > > > Justin can help take various sites as well. I feel we really > > > > > need > > > > > to > > > > > do this quickly so we have a stable trusted release out there. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Mike > > > > > > > > > > -- > > > > > Michael Wilde > > > > > Computation Institute, University of Chicago > > > > > Mathematics and Computer Science Division > > > > > Argonne National Laboratory > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > > > > > > > > > > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > -- > > > Any intelligent fool can make things bigger and more complex... It > > > takes a touch of genius - and a lot of courage to move in the > > > opposite > > > direction. > > > - Albert Einstein > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > -- > > > > > > > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Apr 5 15:01:19 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 15:01:19 -0500 (CDT) Subject: [Swift-devel] Resume not working in 0.92? Please test. Message-ID: <599769624.67355.1302033679071.JavaMail.root@zimbra.anl.gov> Ketan, to follow up on your mention to me that resume is not working on the latest 0.92 on Beagle, can you do the following: Try a simple foreach script (like /home/wilde/swift/lab/catsnsleep) under the old 0.92 Cray version. Say 10 sleeps throttles 1 at a time on localhost. Hit ^c, then try resume. This *should* work. Then try same in the fixed 0.92 branch to see if possibly resume is broken there (as it was reported to be in trunk: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=273 Then try trunk as well. Can you check to see if the current test suite has a resume test? If so, please try that as well (or instead of the above). If not, can you add it, and report back to swift-devel what you find on the state of resume? - Mike From wilde at mcs.anl.gov Tue Apr 5 15:06:42 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 15:06:42 -0500 (CDT) Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: Message-ID: <417068539.67412.1302034002663.JavaMail.root@zimbra.anl.gov> Jon, PBS Error 254 may be something like app in tc.data is not executable, or app script calls something not found or not executable, or that makes it return non-zero. It falls in that class of error that I just railed about in Bug 321: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321 Its not clear to me that the same root problem manifests in exactly the same error codes and messages under varioud providers and configurations, which is another problem that the fix(es) to Big 321 should deal with. When you fix your 254, could you report back to swift-devel what it was, and either file as a new bug or update Bug 321? - Mike ----- Original Message ----- > Correct. Based off how I was looping I was receiving the same cache > error that Allan was receiving. Also, I never though of this but my > Montage scripts were running very slowly in the trunk at some point(I > am assuming this was the point that the twice each bug was introduced > and everything was being done twice). Under the 0.92 branch by small > workflows complete. My large workflows error out with PBS error 254 I > believe. Cannot remember the error code but believe it was this one. > But this is not due to the twice each bug. > > > On Tue, Apr 5, 2011 at 2:50 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Just to clarify: we detected this bug by diagnosing the error that > Allan was getting in his SCEC workflow, trying to add a file to a > local cache that was already there. > > I never verified if the same bug was causing failures in Montage, but > Jon reported Apr 4 12:04 AM that the small Montage was working under > the fixed 0.92 branch and that the large Montage run was still to be > tested. > > - Mike > > > > > ----- Original Message ----- > > got it thanks...to be clear i wasn't going to try to run the whole > > montage scripit :P but this is easier than extracting the faulty > > loop > > :) > > > > > > On Tue, Apr 5, 2011 at 12:37 PM, Jonathan Monette < > > jon.monette at gmail.com > wrote: > > > > > > Yes. That is the one I remember seeing. That is much easier than > > what > > my Montage scripts are doing. > > > > > > > > > > > > On Tue, Apr 5, 2011 at 2:33 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Yes, I had posted variations of the following to the list: > > > > zz3.swift: > > > > int arr[]; > > > > arr[0]=1; > > arr[1]=2; > > > > foreach a in arr { > > trace("for", a); > > } > > > > zz6.swift: > > > > > > int arr[]; > > > > foreach a,i in [0:9] { > > arr[i] = i; > > } > > > > trace("arr",arr); > > > > foreach a,i in arr { > > trace("for", a,i); > > } > > > > > > com$ > > PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH > > com$ which swift > > ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift > > com$ cd swift/lab > > com$ swift zz3.swift > > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > modified > > locally) > > > > RunID: 20110404-1344-j98f22id > > Progress: > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > Final status: > > com$ PATH=~/swift/rev/swift-0.92/bin:$PATH > > com$ swift zz3.swift > > Swift svn swift-r4157 cog-r3056 > > > > RunID: 20110404-1344-ensm4te8 > > Progress: > > SwiftScript trace: for, 1 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 2 > > SwiftScript trace: for, 1 > > Final status: > > com$ swift zz6.swift > > Swift svn swift-r4157 cog-r3056 > > > > RunID: 20110404-1344-i7y6q1i1 > > Progress: > > SwiftScript trace: arr, arr.$[]/10 > > SwiftScript trace: for, 3, 3 > > SwiftScript trace: for, 2, 2 > > SwiftScript trace: for, 4, 4 > > SwiftScript trace: for, 5, 5 > > SwiftScript trace: for, 3, 3 > > SwiftScript trace: for, 5, 5 > > SwiftScript trace: for, 9, 9 > > SwiftScript trace: for, 4, 4 > > SwiftScript trace: for, 1, 1 > > SwiftScript trace: for, 7, 7 > > SwiftScript trace: for, 7, 7 > > SwiftScript trace: for, 6, 6 > > SwiftScript trace: for, 9, 9 > > SwiftScript trace: for, 6, 6 > > SwiftScript trace: for, 1, 1 > > SwiftScript trace: for, 2, 2 > > SwiftScript trace: for, 0, 0 > > SwiftScript trace: for, 8, 8 > > SwiftScript trace: for, 0, 0 > > SwiftScript trace: for, 8, 8 > > Final status: > > com$ > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > The script I posted might be too complex to use to replicate the > > > twice > > > each bug. However, didn't Mike post a simple loop script that was > > > looping twice when the bug was initially found? > > > > > > > > > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari < > > > ketancmaheshwari at gmail.com > wrote: > > > > > > > > > > > > Sarah, > > > > > > > > > I do not have the test you are asking for yet. I am looking at the > > > test suite and will start on Beagle soon. > > > > > > > > > Ketan > > > > > > > > > > > > > > > > > > > > > > > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: > > > > > > > > > i'm currently working on a swift script to replicate the bug for > > > .92 > > > which i will then commit to svn in the test suite. if you mike, or > > > ketan already have this let me know (i'm trying to hack the script > > > jon > > > posted to the list) and i'll use yours...david said he doesn't > > > have > > > one. > > > > > > as i said, my plan was to test on ranger, abe and a couple of > > > (uci) > > > local workstations. > > > > > > ~sk > > > > > > > > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov > > > > > > > wrote: > > > > > > > > > David, Sarah, Ketan, > > > > > > Can you all report back to the devel list on your progress on > > > testing > > > the release? Ie, what systems are you testing, and which of those > > > tests are complete? When will the rest be done, and hence when are > > > we > > > ready to tag and release the fix? > > > > > > I asked who will create the test to confirm that the twice-each > > > bug > > > is > > > fixed, but no one responded. Which of the three of you feel you > > > know > > > how to do this? Is this being tested in your new tests? > > > > > > Ketan tells me that in the 0.92+ interim release I made for Beagle > > > it > > > looks like the resume feature is not working. I was aware that > > > such > > > a > > > bug was reported in trunk, but in the original 0.92 Cray version > > > (under /home/wilde/swift/rev) resume *was* working. Does the test > > > suite test the resume feature at the moment? > > > > > > Lastly, who will tag and upload the new release, remove or change > > > the > > > red warning in the download page, and announce 0.92.1 on > > > swift-user? > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > Thanks, David. Please cc all discussion of this sort to > > > > swift-devel. > > > > > > > > I assume SVN is working for you now? (It was working for me, > > > > from > > > > communicadao, around 9AM this morning). > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > It appears that there may be a problem with > > > > > svn.ci.uchicago.edu > > > > > . > > > > > I > > > > > am > > > > > unable to connect from an SVN client or through the web > > > > > interface > > > > > - > > > > > both attempts just hang indefinitely. I have sent an email to > > > > > support > > > > > (ticket 12539), but just wanted to give you guys a heads up > > > > > that > > > > > there > > > > > may be an issue there. I will try to run the tests again in > > > > > the > > > > > morning. > > > > > > > > > > David > > > > > > > > > > > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < > > > > > wilde at mcs.anl.gov > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > David, Sarah, > > > > > > > > > > How quickly could you re-divide the Swift site test plan > > > > > between > > > > > you > > > > > and confirm back to swift-devel that we are ready to tag and > > > > > release > > > > > the branch as 0.92.1? > > > > > > > > > > Before we do that, you need to add a test to the test suite > > > > > that > > > > > can > > > > > replicate the twice-each bug and verify that its detected in > > > > > 0.92 > > > > > and > > > > > corrected in 0.92.1 > > > > > > > > > > Can you possibly do this by noon tomorrow? > > > > > > > > > > Can you post a checklist of tests with names of who's going to > > > > > run > > > > > them? > > > > > > > > > > Depending on what you can commit to, I will see if I, Ketan, > > > > > and/or > > > > > Justin can help take various sites as well. I feel we really > > > > > need > > > > > to > > > > > do this quickly so we have a stable trusted release out there. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Mike > > > > > > > > > > -- > > > > > Michael Wilde > > > > > Computation Institute, University of Chicago > > > > > Mathematics and Computer Science Division > > > > > Argonne National Laboratory > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > > > > > > > > > > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > -- > > > Any intelligent fool can make things bigger and more complex... It > > > takes a touch of genius - and a lot of courage to move in the > > > opposite > > > direction. > > > - Albert Einstein > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > -- > > > > > > > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the > > opposite > > direction. > > - Albert Einstein > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > > > > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Tue Apr 5 15:14:22 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Tue, 5 Apr 2011 15:14:22 -0500 Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: <417068539.67412.1302034002663.JavaMail.root@zimbra.anl.gov> References: <417068539.67412.1302034002663.JavaMail.root@zimbra.anl.gov> Message-ID: Yes. I will certainly do that. And those are the usual suspects that I have seen for error 254, but the app I believe is failing do not have any of those properties. I am re-running the script hoping with some changes that will hopefully shed more on where it fails. PADS is in maintenance mode. There are several jobs in the queue and looks like none are even running. On Tue, Apr 5, 2011 at 3:06 PM, Michael Wilde wrote: > Jon, > > PBS Error 254 may be something like app in tc.data is not executable, or > app script calls something not found or not executable, or that makes it > return non-zero. It falls in that class of error that I just railed about in > Bug 321: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321 > > Its not clear to me that the same root problem manifests in exactly the > same error codes and messages under varioud providers and configurations, > which is another problem that the fix(es) to Big 321 should deal with. > > When you fix your 254, could you report back to swift-devel what it was, > and either file as a new bug or update Bug 321? > > - Mike > > > ----- Original Message ----- > > Correct. Based off how I was looping I was receiving the same cache > > error that Allan was receiving. Also, I never though of this but my > > Montage scripts were running very slowly in the trunk at some point(I > > am assuming this was the point that the twice each bug was introduced > > and everything was being done twice). Under the 0.92 branch by small > > workflows complete. My large workflows error out with PBS error 254 I > > believe. Cannot remember the error code but believe it was this one. > > But this is not due to the twice each bug. > > > > > > On Tue, Apr 5, 2011 at 2:50 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Just to clarify: we detected this bug by diagnosing the error that > > Allan was getting in his SCEC workflow, trying to add a file to a > > local cache that was already there. > > > > I never verified if the same bug was causing failures in Montage, but > > Jon reported Apr 4 12:04 AM that the small Montage was working under > > the fixed 0.92 branch and that the large Montage run was still to be > > tested. > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > got it thanks...to be clear i wasn't going to try to run the whole > > > montage scripit :P but this is easier than extracting the faulty > > > loop > > > :) > > > > > > > > > On Tue, Apr 5, 2011 at 12:37 PM, Jonathan Monette < > > > jon.monette at gmail.com > wrote: > > > > > > > > > Yes. That is the one I remember seeing. That is much easier than > > > what > > > my Montage scripts are doing. > > > > > > > > > > > > > > > > > > On Tue, Apr 5, 2011 at 2:33 PM, Michael Wilde < wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > Yes, I had posted variations of the following to the list: > > > > > > zz3.swift: > > > > > > int arr[]; > > > > > > arr[0]=1; > > > arr[1]=2; > > > > > > foreach a in arr { > > > trace("for", a); > > > } > > > > > > zz6.swift: > > > > > > > > > int arr[]; > > > > > > foreach a,i in [0:9] { > > > arr[i] = i; > > > } > > > > > > trace("arr",arr); > > > > > > foreach a,i in arr { > > > trace("for", a,i); > > > } > > > > > > > > > com$ > > > > PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH > > > com$ which swift > > > ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift > > > com$ cd swift/lab > > > com$ swift zz3.swift > > > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > > modified > > > locally) > > > > > > RunID: 20110404-1344-j98f22id > > > Progress: > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > Final status: > > > com$ PATH=~/swift/rev/swift-0.92/bin:$PATH > > > com$ swift zz3.swift > > > Swift svn swift-r4157 cog-r3056 > > > > > > RunID: 20110404-1344-ensm4te8 > > > Progress: > > > SwiftScript trace: for, 1 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 2 > > > SwiftScript trace: for, 1 > > > Final status: > > > com$ swift zz6.swift > > > Swift svn swift-r4157 cog-r3056 > > > > > > RunID: 20110404-1344-i7y6q1i1 > > > Progress: > > > SwiftScript trace: arr, arr.$[]/10 > > > SwiftScript trace: for, 3, 3 > > > SwiftScript trace: for, 2, 2 > > > SwiftScript trace: for, 4, 4 > > > SwiftScript trace: for, 5, 5 > > > SwiftScript trace: for, 3, 3 > > > SwiftScript trace: for, 5, 5 > > > SwiftScript trace: for, 9, 9 > > > SwiftScript trace: for, 4, 4 > > > SwiftScript trace: for, 1, 1 > > > SwiftScript trace: for, 7, 7 > > > SwiftScript trace: for, 7, 7 > > > SwiftScript trace: for, 6, 6 > > > SwiftScript trace: for, 9, 9 > > > SwiftScript trace: for, 6, 6 > > > SwiftScript trace: for, 1, 1 > > > SwiftScript trace: for, 2, 2 > > > SwiftScript trace: for, 0, 0 > > > SwiftScript trace: for, 8, 8 > > > SwiftScript trace: for, 0, 0 > > > SwiftScript trace: for, 8, 8 > > > Final status: > > > com$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > The script I posted might be too complex to use to replicate the > > > > twice > > > > each bug. However, didn't Mike post a simple loop script that was > > > > looping twice when the bug was initially found? > > > > > > > > > > > > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari < > > > > ketancmaheshwari at gmail.com > wrote: > > > > > > > > > > > > > > > > Sarah, > > > > > > > > > > > > I do not have the test you are asking for yet. I am looking at the > > > > test suite and will start on Beagle soon. > > > > > > > > > > > > Ketan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: > > > > > > > > > > > > i'm currently working on a swift script to replicate the bug for > > > > .92 > > > > which i will then commit to svn in the test suite. if you mike, or > > > > ketan already have this let me know (i'm trying to hack the script > > > > jon > > > > posted to the list) and i'll use yours...david said he doesn't > > > > have > > > > one. > > > > > > > > as i said, my plan was to test on ranger, abe and a couple of > > > > (uci) > > > > local workstations. > > > > > > > > ~sk > > > > > > > > > > > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov > > > > > > > > > wrote: > > > > > > > > > > > > David, Sarah, Ketan, > > > > > > > > Can you all report back to the devel list on your progress on > > > > testing > > > > the release? Ie, what systems are you testing, and which of those > > > > tests are complete? When will the rest be done, and hence when are > > > > we > > > > ready to tag and release the fix? > > > > > > > > I asked who will create the test to confirm that the twice-each > > > > bug > > > > is > > > > fixed, but no one responded. Which of the three of you feel you > > > > know > > > > how to do this? Is this being tested in your new tests? > > > > > > > > Ketan tells me that in the 0.92+ interim release I made for Beagle > > > > it > > > > looks like the resume feature is not working. I was aware that > > > > such > > > > a > > > > bug was reported in trunk, but in the original 0.92 Cray version > > > > (under /home/wilde/swift/rev) resume *was* working. Does the test > > > > suite test the resume feature at the moment? > > > > > > > > Lastly, who will tag and upload the new release, remove or change > > > > the > > > > red warning in the download page, and announce 0.92.1 on > > > > swift-user? > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > Thanks, David. Please cc all discussion of this sort to > > > > > swift-devel. > > > > > > > > > > I assume SVN is working for you now? (It was working for me, > > > > > from > > > > > communicadao, around 9AM this morning). > > > > > > > > > > - Mike > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > It appears that there may be a problem with > > > > > > svn.ci.uchicago.edu > > > > > > . > > > > > > I > > > > > > am > > > > > > unable to connect from an SVN client or through the web > > > > > > interface > > > > > > - > > > > > > both attempts just hang indefinitely. I have sent an email to > > > > > > support > > > > > > (ticket 12539), but just wanted to give you guys a heads up > > > > > > that > > > > > > there > > > > > > may be an issue there. I will try to run the tests again in > > > > > > the > > > > > > morning. > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < > > > > > > wilde at mcs.anl.gov > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > David, Sarah, > > > > > > > > > > > > How quickly could you re-divide the Swift site test plan > > > > > > between > > > > > > you > > > > > > and confirm back to swift-devel that we are ready to tag and > > > > > > release > > > > > > the branch as 0.92.1? > > > > > > > > > > > > Before we do that, you need to add a test to the test suite > > > > > > that > > > > > > can > > > > > > replicate the twice-each bug and verify that its detected in > > > > > > 0.92 > > > > > > and > > > > > > corrected in 0.92.1 > > > > > > > > > > > > Can you possibly do this by noon tomorrow? > > > > > > > > > > > > Can you post a checklist of tests with names of who's going to > > > > > > run > > > > > > them? > > > > > > > > > > > > Depending on what you can commit to, I will see if I, Ketan, > > > > > > and/or > > > > > > Justin can help take various sites as well. I feel we really > > > > > > need > > > > > > to > > > > > > do this quickly so we have a stable trusted release out there. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Mike > > > > > > > > > > > > -- > > > > > > Michael Wilde > > > > > > Computation Institute, University of Chicago > > > > > > Mathematics and Computer Science Division > > > > > > Argonne National Laboratory > > > > > > > > > > -- > > > > > Michael Wilde > > > > > Computation Institute, University of Chicago > > > > > Mathematics and Computer Science Division > > > > > Argonne National Laboratory > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > > > > > > > > > > > > > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > -- > > > > Any intelligent fool can make things bigger and more complex... It > > > > takes a touch of genius - and a lot of courage to move in the > > > > opposite > > > > direction. > > > > - Albert Einstein > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Any intelligent fool can make things bigger and more complex... It > > > takes a touch of genius - and a lot of courage to move in the > > > opposite > > > direction. > > > - Albert Einstein > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > > > > > > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Tue Apr 5 16:47:22 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 5 Apr 2011 16:47:22 -0500 Subject: [Swift-devel] Re: Resume not working in 0.92? Please test. In-Reply-To: <599769624.67355.1302033679071.JavaMail.root@zimbra.anl.gov> References: <599769624.67355.1302033679071.JavaMail.root@zimbra.anl.gov> Message-ID: <500F071C-66B5-404E-8DBE-CC284B360861@gmail.com> Mike, I tested resume and it is working indeed: I tested it on local as well as Beagle and it is working well for the latest 0.92 Cray version. Unfortunately, I did not preserve the old 0.92 branch on Beagle. If you see resume is worth testing for that branch, I will check it out that too. The resume test does not exist on current test suite. I will add a test and let you know how it goes. Ketan On Apr 5, 2011, at 3:01 PM, Michael Wilde wrote: > Ketan, to follow up on your mention to me that resume is not working on the latest 0.92 on Beagle, can you do the following: > > Try a simple foreach script (like /home/wilde/swift/lab/catsnsleep) under the old 0.92 Cray version. Say 10 sleeps throttles 1 at a time on localhost. > > Hit ^c, then try resume. This *should* work. > > Then try same in the fixed 0.92 branch to see if possibly resume is broken there (as it was reported to be in trunk: > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=273 > > Then try trunk as well. > > Can you check to see if the current test suite has a resume test? If so, please try that as well (or instead of the above). If not, can you add it, and report back to swift-devel what you find on the state of resume? > > - Mike > From wilde at mcs.anl.gov Tue Apr 5 17:16:25 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 5 Apr 2011 17:16:25 -0500 (CDT) Subject: [Swift-devel] Re: Resume not working in 0.92? Please test. In-Reply-To: <500F071C-66B5-404E-8DBE-CC284B360861@gmail.com> Message-ID: <1278351924.68275.1302041785570.JavaMail.root@zimbra.anl.gov> OK, very good, Ketan. Resume worked for me in the old Cray 0.92 version, so if it works in the current one, then that is great. So I'm assuming that your original suspicion that resume does not work in the latest 0.92+ branch was erroneous. Please test trunk as well, which is the code version that Allan filed bug 273 against. I see in your modftdock writeup that when you document resume you did not give the same arguments to swift as in the initial invocation. Those are needed for correct resume behavior. The resume feature assumes that the resumed run is running in the exact same environment as the run represented in the rlog. If you need it for reference, the original Cray 0.92 version is in /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast (but I see little need to test that rev other than to verify that it exhibits the twice-each bug). - Mike ----- Original Message ----- > Mike, > > I tested resume and it is working indeed: > > I tested it on local as well as Beagle and it is working well for the > latest 0.92 Cray version. > > Unfortunately, I did not preserve the old 0.92 branch on Beagle. If > you see resume is worth testing for that branch, I will check it out > that too. > > The resume test does not exist on current test suite. I will add a > test and let you know how it goes. > > > Ketan > > On Apr 5, 2011, at 3:01 PM, Michael Wilde wrote: > > > Ketan, to follow up on your mention to me that resume is not working > > on the latest 0.92 on Beagle, can you do the following: > > > > Try a simple foreach script (like /home/wilde/swift/lab/catsnsleep) > > under the old 0.92 Cray version. Say 10 sleeps throttles 1 at a time > > on localhost. > > > > Hit ^c, then try resume. This *should* work. > > > > Then try same in the fixed 0.92 branch to see if possibly resume is > > broken there (as it was reported to be in trunk: > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=273 > > > > Then try trunk as well. > > > > Can you check to see if the current test suite has a resume test? If > > so, please try that as well (or instead of the above). If not, can > > you add it, and report back to swift-devel what you find on the > > state of resume? > > > > - Mike > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Tue Apr 5 18:17:57 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 5 Apr 2011 18:17:57 -0500 Subject: [Swift-devel] Re: Resume not working in 0.92? Please test. In-Reply-To: <599769624.67355.1302033679071.JavaMail.root@zimbra.anl.gov> References: <599769624.67355.1302033679071.JavaMail.root@zimbra.anl.gov> Message-ID: Mike, I tested and it seems, resume is broken in the trunk. I will go ahead and put a note on Allan's bug report. Ketan On Apr 5, 2011, at 3:01 PM, Michael Wilde wrote: > Ketan, to follow up on your mention to me that resume is not working on the latest 0.92 on Beagle, can you do the following: > > Try a simple foreach script (like /home/wilde/swift/lab/catsnsleep) under the old 0.92 Cray version. Say 10 sleeps throttles 1 at a time on localhost. > > Hit ^c, then try resume. This *should* work. > > Then try same in the fixed 0.92 branch to see if possibly resume is broken there (as it was reported to be in trunk: > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=273 > > Then try trunk as well. > > Can you check to see if the current test suite has a resume test? If so, please try that as well (or instead of the above). If not, can you add it, and report back to swift-devel what you find on the state of resume? > > - Mike > From bugzilla-daemon at mcs.anl.gov Tue Apr 5 18:21:01 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 5 Apr 2011 18:21:01 -0500 (CDT) Subject: [Swift-devel] [Bug 273] resume is currently broken In-Reply-To: References: Message-ID: <20110405232101.7FFED2B91A@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=273 ketan changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ketan at mcs.anl.gov --- Comment #2 from ketan 2011-04-05 18:21:01 --- I tested for this bug again today (2011-04-05) and it is still broken. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From hategan at mcs.anl.gov Tue Apr 5 18:24:16 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 05 Apr 2011 16:24:16 -0700 Subject: [Swift-devel] Re: Resume not working in 0.92? Please test. In-Reply-To: References: <599769624.67355.1302033679071.JavaMail.root@zimbra.anl.gov> Message-ID: <1302045856.29787.0.camel@blabla2.none> On Tue, 2011-04-05 at 18:17 -0500, Ketan Maheshwari wrote: > Mike, > > I tested and it seems, resume is broken in the trunk. Can't say I'm surprised there. I'll take a look. From mickelso at mcs.anl.gov Wed Apr 6 09:24:22 2011 From: mickelso at mcs.anl.gov (Sheri Mickelson) Date: Wed, 06 Apr 2011 09:24:22 -0500 Subject: [Swift-devel] [Bug 291] New: Add a exists() function to test for file existence In-Reply-To: References: <729744535.55640.1301680289613.JavaMail.root@zimbra.anl.gov> <2A5A416A-CAB6-4D82-B55F-0CFBC8F3B770@ucar.edu> Message-ID: <4D9C7796.6000608@mcs.anl.gov> Hi John, I'm currently working with an older version of swift, not with the trunk. But I should be able to try it out in a week or so when I start adding ncl to the swift script. -Sheri John Dennis wrote: > Justin, > > Thanks for adding this feature. I not currently setup on a system > with the swift trunk. I suspect Sheri is currently work > with the trunk. Sheri, could you take a look at this feature during > your work with swift? > > Thanks, > John > > On Apr 3, 2011, at 3:56 PM, Justin M Wozniak wrote: > >> >> Ok, John, you can give this a try in trunk if you like. The syntax >> for the built-in is @exists, it takes a string and returns a boolean. >> Justin >> >> On Sat, 2 Apr 2011, Justin M Wozniak wrote: >> >>> >>> I have a prototype of this, I'll get it checked in later today. >>> Justin >>> >>> On Fri, 1 Apr 2011, John Dennis wrote: >>> >>>> Michael, >>>> >>>> This type of function would be great to have. >>>> John >>>> On Apr 1, 2011, at 11:51 AM, Michael Wilde wrote: >>>>> Basically as far as I understand: the presence or absence of a >>>>> particular data file within the inout dataset is to be used to >>>>> determine whether the code to process that dataset subsection gets >>>>> invoked or not: >>>>> if (exists("extra.data")) { >>>>> DataFile extraInput<"extra.data">; >>>>> extraResult = analyze(extraInput); >>>>> } >>>>> The above is my assumption based on a phone call. We can and >>>>> should verify the assumption with a simple example. >>>>> I also thought we can try this today by seeing if extraInput can be >>>>> an array, mapped to zero items if nothing to do and 1 item if >>>>> something to do. That would at least let us test the use case. >>>>> John, can you verify if the example Swift lines above are what you >>>>> are looking for here? >>>>> - Mike >>>>> ----- Original Message ----- >>>>>> On Fri, 2011-04-01 at 10:51 -0500, Michael Wilde wrote: >>>>>>> - we should first verify that exists() will solve the NCAR need in a >>>>>>> sufficiently clean way >>>>>> I think this is important. Can we get a description of the problem >>>>>> instead of a (otherwise) random proposal for a solution? >>>>> -- >>>>> Michael Wilde >>>>> Computation Institute, University of Chicago >>>>> Mathematics and Computer Science Division >>>>> Argonne National Laboratory >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> >> -- >> Justin M Wozniak > From bugzilla-daemon at mcs.anl.gov Wed Apr 6 11:52:37 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 6 Apr 2011 11:52:37 -0500 (CDT) Subject: [Swift-devel] [Bug 329] New: Improve tutorial content and user flow for accessing remote sites. Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=329 Summary: Improve tutorial content and user flow for accessing remote sites. Product: Swift Version: 0.93 Platform: PC OS/Version: Mac OS Status: ASSIGNED Severity: normal Priority: P1 Component: Documentation AssignedTo: dk0966 at cs.ship.edu ReportedBy: wilde at mcs.anl.gov By "user flow" I mean how we connect the user from "I want to do X" to the tutorial material that shows the user how to do it. This is from an OSG Fermilab user: On Mon, Feb 28, 2011 at 11:30:57AM -0600, Dave Dykstra wrote: > Here's another status report on this. I was only able to get a bit > further on the tutorial, and since then all my ExTENCI time has been > redirected to the Wide Area Lustre subproject. I need to give a talk > on that next week at the OSG All Hands Meeting. The week after that > I should be able to get back to swift. Mike, I was still tied up with Lustre until this week. I am continuing on the swift tutorial this week and got past the bugs in it and got through section 4.2, "Running on a remote site". That section is so vague as to be barely of much use, but I did manage to get swift to run a job on an OSG site (fnal) and just succeeded also on a Teragrid site (queenbee). I was able to do that with my DOE certificate after finding these helpful web pages https://www.teragrid.org/web/user-support/sso_nontgca http://info.teragrid.org/restdemo/html/tg/services/gram5 I didn't even realize before that Teragrid sites support basically the same globus interface as OSG sites. That shows how little I knew about Teragrid. So I'm not very far from the end of the tutorial now. - Dave -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From skenny at uchicago.edu Wed Apr 6 12:12:54 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 6 Apr 2011 10:12:54 -0700 Subject: [Swift-devel] svn commit email Message-ID: hey all, my svn commits at the moment are generating an email saying my message (which should just be a post-commit email) is awaiting moderator approval...anyone else getting this? ~sk -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Wed Apr 6 12:15:45 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 6 Apr 2011 12:15:45 -0500 Subject: [Swift-devel] svn commit email In-Reply-To: References: Message-ID: Yea. I have gotten them. But it seems for me only if I try to commit to trunk. I can commit to SwiftApps with out email. On Apr 6, 2011 12:13 PM, "Sarah Kenny" wrote: > hey all, my svn commits at the moment are generating an email saying my > message (which should just be a post-commit email) is awaiting moderator > approval...anyone else getting this? > > ~sk -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Wed Apr 6 12:31:31 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 6 Apr 2011 12:31:31 -0500 Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: References: <417068539.67412.1302034002663.JavaMail.root@zimbra.anl.gov> Message-ID: Ok. I found the app. It is a wrapper script I have that just makes sure the the app I call returns exit code 0 and not some other exit code. Some of the apps run and complete but not all of them. I can only assume it is still returning an error code so I have to track this down. One thing that should be changed is when the error 254 occurs that it specifies the name of the app that failed(or job or something). This will at least help track down why and where. On Tue, Apr 5, 2011 at 3:14 PM, Jonathan Monette wrote: > Yes. I will certainly do that. And those are the usual suspects that I > have seen for error 254, but the app I believe is failing do not have any of > those properties. I am re-running the script hoping with some changes that > will hopefully shed more on where it fails. PADS is in maintenance mode. > There are several jobs in the queue and looks like none are even running. > > > On Tue, Apr 5, 2011 at 3:06 PM, Michael Wilde wrote: > >> Jon, >> >> PBS Error 254 may be something like app in tc.data is not executable, or >> app script calls something not found or not executable, or that makes it >> return non-zero. It falls in that class of error that I just railed about in >> Bug 321: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321 >> >> Its not clear to me that the same root problem manifests in exactly the >> same error codes and messages under varioud providers and configurations, >> which is another problem that the fix(es) to Big 321 should deal with. >> >> When you fix your 254, could you report back to swift-devel what it was, >> and either file as a new bug or update Bug 321? >> >> - Mike >> >> >> ----- Original Message ----- >> > Correct. Based off how I was looping I was receiving the same cache >> > error that Allan was receiving. Also, I never though of this but my >> > Montage scripts were running very slowly in the trunk at some point(I >> > am assuming this was the point that the twice each bug was introduced >> > and everything was being done twice). Under the 0.92 branch by small >> > workflows complete. My large workflows error out with PBS error 254 I >> > believe. Cannot remember the error code but believe it was this one. >> > But this is not due to the twice each bug. >> > >> > >> > On Tue, Apr 5, 2011 at 2:50 PM, Michael Wilde < wilde at mcs.anl.gov > >> > wrote: >> > >> > >> > Just to clarify: we detected this bug by diagnosing the error that >> > Allan was getting in his SCEC workflow, trying to add a file to a >> > local cache that was already there. >> > >> > I never verified if the same bug was causing failures in Montage, but >> > Jon reported Apr 4 12:04 AM that the small Montage was working under >> > the fixed 0.92 branch and that the large Montage run was still to be >> > tested. >> > >> > - Mike >> > >> > >> > >> > >> > ----- Original Message ----- >> > > got it thanks...to be clear i wasn't going to try to run the whole >> > > montage scripit :P but this is easier than extracting the faulty >> > > loop >> > > :) >> > > >> > > >> > > On Tue, Apr 5, 2011 at 12:37 PM, Jonathan Monette < >> > > jon.monette at gmail.com > wrote: >> > > >> > > >> > > Yes. That is the one I remember seeing. That is much easier than >> > > what >> > > my Montage scripts are doing. >> > > >> > > >> > > >> > > >> > > >> > > On Tue, Apr 5, 2011 at 2:33 PM, Michael Wilde < wilde at mcs.anl.gov > >> > > wrote: >> > > >> > > >> > > Yes, I had posted variations of the following to the list: >> > > >> > > zz3.swift: >> > > >> > > int arr[]; >> > > >> > > arr[0]=1; >> > > arr[1]=2; >> > > >> > > foreach a in arr { >> > > trace("for", a); >> > > } >> > > >> > > zz6.swift: >> > > >> > > >> > > int arr[]; >> > > >> > > foreach a,i in [0:9] { >> > > arr[i] = i; >> > > } >> > > >> > > trace("arr",arr); >> > > >> > > foreach a,i in arr { >> > > trace("for", a,i); >> > > } >> > > >> > > >> > > com$ >> > > >> PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH >> > > com$ which swift >> > > ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift >> > > com$ cd swift/lab >> > > com$ swift zz3.swift >> > > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >> > > modified >> > > locally) >> > > >> > > RunID: 20110404-1344-j98f22id >> > > Progress: >> > > SwiftScript trace: for, 2 >> > > SwiftScript trace: for, 1 >> > > Final status: >> > > com$ PATH=~/swift/rev/swift-0.92/bin:$PATH >> > > com$ swift zz3.swift >> > > Swift svn swift-r4157 cog-r3056 >> > > >> > > RunID: 20110404-1344-ensm4te8 >> > > Progress: >> > > SwiftScript trace: for, 1 >> > > SwiftScript trace: for, 2 >> > > SwiftScript trace: for, 2 >> > > SwiftScript trace: for, 1 >> > > Final status: >> > > com$ swift zz6.swift >> > > Swift svn swift-r4157 cog-r3056 >> > > >> > > RunID: 20110404-1344-i7y6q1i1 >> > > Progress: >> > > SwiftScript trace: arr, arr.$[]/10 >> > > SwiftScript trace: for, 3, 3 >> > > SwiftScript trace: for, 2, 2 >> > > SwiftScript trace: for, 4, 4 >> > > SwiftScript trace: for, 5, 5 >> > > SwiftScript trace: for, 3, 3 >> > > SwiftScript trace: for, 5, 5 >> > > SwiftScript trace: for, 9, 9 >> > > SwiftScript trace: for, 4, 4 >> > > SwiftScript trace: for, 1, 1 >> > > SwiftScript trace: for, 7, 7 >> > > SwiftScript trace: for, 7, 7 >> > > SwiftScript trace: for, 6, 6 >> > > SwiftScript trace: for, 9, 9 >> > > SwiftScript trace: for, 6, 6 >> > > SwiftScript trace: for, 1, 1 >> > > SwiftScript trace: for, 2, 2 >> > > SwiftScript trace: for, 0, 0 >> > > SwiftScript trace: for, 8, 8 >> > > SwiftScript trace: for, 0, 0 >> > > SwiftScript trace: for, 8, 8 >> > > Final status: >> > > com$ >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > ----- Original Message ----- >> > > > The script I posted might be too complex to use to replicate the >> > > > twice >> > > > each bug. However, didn't Mike post a simple loop script that was >> > > > looping twice when the bug was initially found? >> > > > >> > > > >> > > > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari < >> > > > ketancmaheshwari at gmail.com > wrote: >> > > > >> > > > >> > > > >> > > > Sarah, >> > > > >> > > > >> > > > I do not have the test you are asking for yet. I am looking at the >> > > > test suite and will start on Beagle soon. >> > > > >> > > > >> > > > Ketan >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: >> > > > >> > > > >> > > > i'm currently working on a swift script to replicate the bug for >> > > > .92 >> > > > which i will then commit to svn in the test suite. if you mike, or >> > > > ketan already have this let me know (i'm trying to hack the script >> > > > jon >> > > > posted to the list) and i'll use yours...david said he doesn't >> > > > have >> > > > one. >> > > > >> > > > as i said, my plan was to test on ranger, abe and a couple of >> > > > (uci) >> > > > local workstations. >> > > > >> > > > ~sk >> > > > >> > > > >> > > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov >> > > > > >> > > > wrote: >> > > > >> > > > >> > > > David, Sarah, Ketan, >> > > > >> > > > Can you all report back to the devel list on your progress on >> > > > testing >> > > > the release? Ie, what systems are you testing, and which of those >> > > > tests are complete? When will the rest be done, and hence when are >> > > > we >> > > > ready to tag and release the fix? >> > > > >> > > > I asked who will create the test to confirm that the twice-each >> > > > bug >> > > > is >> > > > fixed, but no one responded. Which of the three of you feel you >> > > > know >> > > > how to do this? Is this being tested in your new tests? >> > > > >> > > > Ketan tells me that in the 0.92+ interim release I made for Beagle >> > > > it >> > > > looks like the resume feature is not working. I was aware that >> > > > such >> > > > a >> > > > bug was reported in trunk, but in the original 0.92 Cray version >> > > > (under /home/wilde/swift/rev) resume *was* working. Does the test >> > > > suite test the resume feature at the moment? >> > > > >> > > > Lastly, who will tag and upload the new release, remove or change >> > > > the >> > > > red warning in the download page, and announce 0.92.1 on >> > > > swift-user? >> > > > >> > > > - Mike >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > ----- Original Message ----- >> > > > > Thanks, David. Please cc all discussion of this sort to >> > > > > swift-devel. >> > > > > >> > > > > I assume SVN is working for you now? (It was working for me, >> > > > > from >> > > > > communicadao, around 9AM this morning). >> > > > > >> > > > > - Mike >> > > > > >> > > > > >> > > > > ----- Original Message ----- >> > > > > > It appears that there may be a problem with >> > > > > > svn.ci.uchicago.edu >> > > > > > . >> > > > > > I >> > > > > > am >> > > > > > unable to connect from an SVN client or through the web >> > > > > > interface >> > > > > > - >> > > > > > both attempts just hang indefinitely. I have sent an email to >> > > > > > support >> > > > > > (ticket 12539), but just wanted to give you guys a heads up >> > > > > > that >> > > > > > there >> > > > > > may be an issue there. I will try to run the tests again in >> > > > > > the >> > > > > > morning. >> > > > > > >> > > > > > David >> > > > > > >> > > > > > >> > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < >> > > > > > wilde at mcs.anl.gov >> > > > > > > >> > > > > > wrote: >> > > > > > >> > > > > > >> > > > > > David, Sarah, >> > > > > > >> > > > > > How quickly could you re-divide the Swift site test plan >> > > > > > between >> > > > > > you >> > > > > > and confirm back to swift-devel that we are ready to tag and >> > > > > > release >> > > > > > the branch as 0.92.1? >> > > > > > >> > > > > > Before we do that, you need to add a test to the test suite >> > > > > > that >> > > > > > can >> > > > > > replicate the twice-each bug and verify that its detected in >> > > > > > 0.92 >> > > > > > and >> > > > > > corrected in 0.92.1 >> > > > > > >> > > > > > Can you possibly do this by noon tomorrow? >> > > > > > >> > > > > > Can you post a checklist of tests with names of who's going to >> > > > > > run >> > > > > > them? >> > > > > > >> > > > > > Depending on what you can commit to, I will see if I, Ketan, >> > > > > > and/or >> > > > > > Justin can help take various sites as well. I feel we really >> > > > > > need >> > > > > > to >> > > > > > do this quickly so we have a stable trusted release out there. >> > > > > > >> > > > > > >> > > > > > >> > > > > > Thanks, >> > > > > > >> > > > > > Mike >> > > > > > >> > > > > > -- >> > > > > > Michael Wilde >> > > > > > Computation Institute, University of Chicago >> > > > > > Mathematics and Computer Science Division >> > > > > > Argonne National Laboratory >> > > > > >> > > > > -- >> > > > > Michael Wilde >> > > > > Computation Institute, University of Chicago >> > > > > Mathematics and Computer Science Division >> > > > > Argonne National Laboratory >> > > > > >> > > > > _______________________________________________ >> > > > > Swift-devel mailing list >> > > > > Swift-devel at ci.uchicago.edu >> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > >> > > > -- >> > > > >> > > > >> > > > >> > > > Michael Wilde >> > > > Computation Institute, University of Chicago >> > > > Mathematics and Computer Science Division >> > > > Argonne National Laboratory >> > > > >> > > > >> > > > >> > > > >> > > > _______________________________________________ >> > > > Swift-devel mailing list >> > > > Swift-devel at ci.uchicago.edu >> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Any intelligent fool can make things bigger and more complex... It >> > > > takes a touch of genius - and a lot of courage to move in the >> > > > opposite >> > > > direction. >> > > > - Albert Einstein >> > > > >> > > > >> > > > >> > > > _______________________________________________ >> > > > Swift-devel mailing list >> > > > Swift-devel at ci.uchicago.edu >> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > >> > > -- >> > > Michael Wilde >> > > Computation Institute, University of Chicago >> > > Mathematics and Computer Science Division >> > > Argonne National Laboratory >> > > >> > > >> > > >> > > >> > > -- >> > > >> > > >> > > >> > > Any intelligent fool can make things bigger and more complex... It >> > > takes a touch of genius - and a lot of courage to move in the >> > > opposite >> > > direction. >> > > - Albert Einstein >> > > >> > > >> > > >> > > _______________________________________________ >> > > Swift-devel mailing list >> > > Swift-devel at ci.uchicago.edu >> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > >> > -- >> > >> > >> > >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> > >> > >> > >> > >> > -- >> > Any intelligent fool can make things bigger and more complex... It >> > takes a touch of genius - and a lot of courage to move in the opposite >> > direction. >> > - Albert Einstein >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > -- > Any intelligent fool can make things bigger and more complex... It takes a > touch of genius - and a lot of courage to move in the opposite direction. > - Albert Einstein > > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Wed Apr 6 12:51:22 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 6 Apr 2011 12:51:22 -0500 Subject: [Swift-devel] svn commit email In-Reply-To: References: Message-ID: <161BBF70-9AD5-45B9-A6FE-FB7E4789A2C5@gmail.com> Me too (on vdl2 repo). --ketan On Apr 6, 2011, at 12:12 PM, Sarah Kenny wrote: > hey all, my svn commits at the moment are generating an email saying my message (which should just be a post-commit email) is awaiting moderator approval...anyone else getting this? > > ~sk > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From ketancmaheshwari at gmail.com Wed Apr 6 13:05:37 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 6 Apr 2011 13:05:37 -0500 Subject: [Swift-devel] Re: Attachment too large: Fwd: Swift-devel post from ketan@mcs.anl.gov requires approval In-Reply-To: <2083793928.71160.1302112893032.JavaMail.root@zimbra.anl.gov> References: <2083793928.71160.1302112893032.JavaMail.root@zimbra.anl.gov> Message-ID: Ohh, sorry about this, did not realize the size of log. here is the link: http://www.mcs.anl.gov/~ketan/files/ftdock-20110401-1627-xd2zm525.log Ketan On Apr 6, 2011, at 1:01 PM, Michael Wilde wrote: > Ketan, you posted an 86MB attachment to swift-devel and its pending approval. > > Can you resend, with a link to a URL instead? (eg from your public_html folder) > > - Mike > > ----- Forwarded Message ----- > From: swift-devel-owner at ci.uchicago.edu > To: swift-devel-owner at ci.uchicago.edu > Sent: Wednesday, April 6, 2011 12:00:03 PM > Subject: Swift-devel post from ketan at mcs.anl.gov requires approval > > As list administrator, your authorization is requested for the > following mailing list posting: > > List: Swift-devel at ci.uchicago.edu > From: ketan at mcs.anl.gov > Subject: Plotting From Swift Log > Reason: Message body is too big: 89541588 bytes with a limit of 20000 KB > > At your convenience, visit: > > http://mail.ci.uchicago.edu/mailman/admindb/swift-devel > > to approve or deny the request. > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From wilde at mcs.anl.gov Wed Apr 6 13:16:31 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 6 Apr 2011 13:16:31 -0500 (CDT) Subject: Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: Message-ID: <779218967.71257.1302113791047.JavaMail.root@zimbra.anl.gov> Hi CI Support, Commit emails for the vdl2 repository started getting flagged for approvals a few days ago. They had been working fine till then. I looked at the swift-commit list settings at that time and saw almost no one subscribed, which I thought was strange. Then a few days later, seems like most of the group was subscribed. Im therefore not sure if I misread what I saw, or it changed somehow (manually or via some restore???) or if I looked at a different list. Did anything change in either the swift-commit list settings or in the linkage from svn to the commit email, in the past week, that would explain problems? Is there anything special in how the commit email list should be set up? Thanks, Mike ----- Original Message ----- hey all, my svn commits at the moment are generating an email saying my message (which should just be a post-commit email) is awaiting moderator approval...anyone else getting this? ~sk _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Apr 6 13:19:59 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 6 Apr 2011 13:19:59 -0500 (CDT) Subject: [Swift-devel] Fwd: [CI Ticketing System #12700] AutoReply: Please replicate communicado OSG+Condor-G install to bridled In-Reply-To: Message-ID: <664238149.71281.1302113999812.JavaMail.root@zimbra.anl.gov> fyi... Hi CI Team, The OSG install and in particular the Condor-G setup on communicado is working well. Can you replicate that install on bridled so that we have an alternate submit host when one is down? Thanks, - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Apr 6 13:24:57 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 6 Apr 2011 13:24:57 -0500 (CDT) Subject: [CI Ticketing System #12698]: Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: Message-ID: <1846194413.71308.1302114297557.JavaMail.root@zimbra.anl.gov> Swift Devels, Here's the CI ticket # for this issue. Im going on the assumption that something systems-wise changed between svn and the swift-commit list to cause the routing to moderator. Maybe its a simple as the list membership changing somehow. Did anyone do anything last week that would have resulted in mass removal of list members? CI Team: Im assuming that everyone that wants commit messages needs to be subscribed, correct? Ie theres no other magic message routing involved here, right? - Mike --- Hi CI Support, Commit emails for the vdl2 repository started getting flagged for approvals a few days ago. They had been working fine till then. I looked at the swift-commit list settings at that time and saw almost no one subscribed, which I thought was strange. Then a few days later, seems like most of the group was subscribed. Im therefore not sure if I misread what I saw, or it changed somehow (manually or via some restore???) or if I looked at a different list. Did anything change in either the swift-commit list settings or in the linkage from svn to the commit email, in the past week, that would explain problems? Is there anything special in how the commit email list should be set up? Thanks, Mike ----- Original Message ----- hey all, my svn commits at the moment are generating an email saying my message (which should just be a post-commit email) is awaiting moderator approval...anyone else getting this? ~sk _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From support at ci.uchicago.edu Wed Apr 6 13:24:59 2011 From: support at ci.uchicago.edu (Mike Wilde) Date: Wed, 6 Apr 2011 13:24:59 -0500 Subject: [CI Ticketing System #12698]: Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: <1846194413.71308.1302114297557.JavaMail.root@zimbra.anl.gov> References: <1846194413.71308.1302114297557.JavaMail.root@zimbra.anl.gov> Message-ID: Swift Devels, Here's the CI ticket # for this issue. Im going on the assumption that something systems-wise changed between svn and the swift-commit list to cause the routing to moderator. Maybe its a simple as the list membership changing somehow. Did anyone do anything last week that would have resulted in mass removal of list members? CI Team: Im assuming that everyone that wants commit messages needs to be subscribed, correct? Ie theres no other magic message routing involved here, right? - Mike --- Hi CI Support, Commit emails for the vdl2 repository started getting flagged for approvals a few days ago. They had been working fine till then. I looked at the swift-commit list settings at that time and saw almost no one subscribed, which I thought was strange. Then a few days later, seems like most of the group was subscribed. Im therefore not sure if I misread what I saw, or it changed somehow (manually or via some restore???) or if I looked at a different list. Did anything change in either the swift-commit list settings or in the linkage from svn to the commit email, in the past week, that would explain problems? Is there anything special in how the commit email list should be set up? Thanks, Mike ----- Original Message ----- hey all, my svn commits at the moment are generating an email saying my message (which should just be a post-commit email) is awaiting moderator approval...anyone else getting this? ~sk _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Apr 6 13:34:12 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 6 Apr 2011 13:34:12 -0500 (CDT) Subject: [Swift-devel] Developer meeting at 4PM CDT today In-Reply-To: <473221833.71388.1302114751296.JavaMail.root@zimbra.anl.gov> Message-ID: <1590231453.71411.1302114852761.JavaMail.root@zimbra.anl.gov> Lets review 0.92.1 and 0.93 plans, bugzilla routing and ticket walkthrough swift-devel monitoring asciidoc Release planner: https://sites.google.com/site/swiftdevel/release-plans Lets keep it to 30 mins. - Mike From jon.monette at gmail.com Wed Apr 6 15:15:13 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 6 Apr 2011 15:15:13 -0500 Subject: [Swift-devel] Error code 254 Message-ID: This question carries over from [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? Could an error code 254 returned if the file that is required for stageout not exist? -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Wed Apr 6 15:56:58 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 6 Apr 2011 15:56:58 -0500 (CDT) Subject: [Swift-devel] Re: Plotting From Swift Log In-Reply-To: <3875827E-46C7-48C5-AA43-DA6A0A476BE1@mcs.anl.gov> References: <3875827E-46C7-48C5-AA43-DA6A0A476BE1@mcs.anl.gov> Message-ID: I placed a first attempt in ~wozniak/load.eps . On Wed, 6 Apr 2011, Ketan Maheshwari wrote: > Justin, > > Per our talk today, please find attached logs for a 10K task Swift run > of modFTdock on Beagle. Let's see if it is possible to get a plottable > info from this log file. > > Ketan -- Justin M Wozniak From bugzilla-daemon at mcs.anl.gov Wed Apr 6 16:08:00 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 6 Apr 2011 16:08:00 -0500 (CDT) Subject: [Swift-devel] [Bug 239] Java 1.5 compatibility issue - @Override In-Reply-To: References: Message-ID: <20110406210800.7A22D2DC68@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=239 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |wozniak at mcs.anl.gov Resolution| |FIXED -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From ketancmaheshwari at gmail.com Wed Apr 6 18:44:49 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 6 Apr 2011 18:44:49 -0500 Subject: [Swift-devel] Added asciidoc cookbook to svn Message-ID: <9384C8A9-3DF6-4488-9917-6222FB30EBA6@gmail.com> Hi, Added asciidoc cookbook to svn at : https://svn.ci.uchicago.edu/svn/vdl2/www/cookbook/ Feel free to add, update. An asciidoc cheatsheet for quick reference can be found here : http://powerman.name/doc/asciidoc Ketan From bugzilla-daemon at mcs.anl.gov Wed Apr 6 19:26:56 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 6 Apr 2011 19:26:56 -0500 (CDT) Subject: [Swift-devel] [Bug 331] New: Add basic regression tests for the twice-each bug Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=331 Summary: Add basic regression tests for the twice-each bug Product: Swift Version: 0.93 Platform: PC OS/Version: Mac OS Status: ASSIGNED Severity: normal Priority: P1 Component: SwiftScript language AssignedTo: dk0966 at cs.ship.edu ReportedBy: wilde at mcs.anl.gov Tests were added for the symptom of this bug that caused the mapper problem - the bugs original symptom. We should also add simple regression tests that test the most basic aspect of this bug directly: foreach iterations being done more than once. Here are a few of the original tests. These could be packaged under language. We should stress foreach() in various ways, with input arrays created in various ways. David: this is a good exercise for adding new tests to the test suite. --- com$ swift ~/swift/lab/zz3.swift Swift svn swift-r4157 cog-r3056 RunID: 20110406-1915-z3zvkba8 Progress: SwiftScript trace: for, 1 SwiftScript trace: for, 2 SwiftScript trace: for, 1 SwiftScript trace: for, 2 Final status: com$ more foreachsite com$ PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH com$ swift ~/swift/lab/zz3.swift Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) RunID: 20110406-1916-u4mstvs2 Progress: SwiftScript trace: for, 2 SwiftScript trace: for, 1 Final status: com$ cat ~/swift/lab/zz3.swift int arr[]; arr[0]=1; arr[1]=2; foreach a in arr { trace("for", a); } com$ cat ~/swift/lab/zz6.swift int arr[]; foreach a,i in [0:9] { arr[i] = i; } trace("arr",arr); foreach a,i in arr { trace("for", a,i); } com$ -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From benc at hawaga.org.uk Thu Apr 7 10:34:39 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 7 Apr 2011 15:34:39 +0000 (GMT) Subject: [Swift-devel] [provenance-challenge] W3C Provenance Working Group - call for participation (fwd) Message-ID: thsi might be of interest to people here who are still actively interested in provenance work and on the delights of w3c bureaucracy. ---------- Forwarded message ---------- Date: Thu, 07 Apr 2011 14:58:39 +0200 From: Paul Groth Reply-To: provenance-challenge at ipaw.info To: provenance-challenge at ipaw.info Subject: [provenance-challenge] W3C Provenance Working Group - call for participation Hello All: One of the central ideas behind the Provenance Challenge series was to understand the commonalities between provenance system and work towards interoperability between them. Many people in this community contributed to the W3C Provenance Incubator Group[1] and the work of this community influenced that group greatly. Early last year, the W3C Incubator Group concluded its work developing a road-map for provenance on the Web, and recommended the creation of a Working Group to define a standard for the interchange of provenance information. Last week, the W3C management team approved the formation of the Provenance Working Group [2]. The aim of the group is to support the widespread publication and use of provenance on the Web through the creation of a simple, extensible language to exchange provenance information. Given the expertise within this community and more importantly the spirt of cooperation that you show, we would be grateful if you would consider participating in the Working Group. Thus, we encourage you to join the Provenance Working Group and help define this critical language for the Web. More details on joining can be found at [4]. If you intend on contributing, please help us select a teleconference time and first face-to-face meeting date [5]. If you have any questions don't hesitate to ask. regards, Luc Moreau Paul Groth co-chairs W3C Working Group on Provenance team-prov-chairs at w3.org [1] Provenance Incubator Group - http://www.w3.org/2005/Incubator/prov/wiki/W3C_Provenance_Incubator_Group_Wiki [2] Provenance Working Group - http://www.w3.org/2011/prov/wiki/Main_Page [3] Charter - http://www.w3.org/2011/01/prov-wg-charter [4] Joining - http://www.w3.org/2011/prov/wiki/How_to_Join [5] Poll on Telecon & F2F - http://www.w3.org/2002/09/wbs/1/prov-sched-1/ From skenny at uchicago.edu Thu Apr 7 16:59:44 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Thu, 7 Apr 2011 14:59:44 -0700 Subject: [Swift-devel] trying to build trunk Message-ID: BUILD FAILED /autonfs/home/skenny/soft/cog/modules/swift/build.xml:73: The following error occurred while executing this line: java.io.FileNotFoundException: /autonfs/home/skenny/soft/cog/mbuild.xml (No such file or directory) am i missing something obvious here? -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Thu Apr 7 17:05:12 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Thu, 7 Apr 2011 15:05:12 -0700 Subject: [Swift-devel] Re: trying to build trunk In-Reply-To: References: Message-ID: sorry my bad...faulty checkout :P On Thu, Apr 7, 2011 at 2:59 PM, Sarah Kenny wrote: > BUILD FAILED > /autonfs/home/skenny/soft/cog/modules/swift/build.xml:73: The following > error occurred while executing this line: > java.io.FileNotFoundException: /autonfs/home/skenny/soft/cog/mbuild.xml (No > such file or directory) > > am i missing something obvious here? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Apr 7 17:06:55 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 7 Apr 2011 17:06:55 -0500 (CDT) Subject: [Swift-devel] trying to build trunk In-Reply-To: Message-ID: <98515176.80274.1302214015152.JavaMail.root@zimbra.anl.gov> Sarah, Do you have cog correctly checked out, under the right dir relative to your swif/ dir? It should be in /home/skenny/soft/cog. In my trunk I see this: com$ cd swift/src/trunk com$ ls cog/ com$ cd cog com$ ls BUGS.txt LICENSE.txt VERSION build.properties etc/ man/ modules/ qualitycontrol/ webstart/ CHANGES.txt README.txt bin/ build.xml lib/ mbuild.xml pmd/ tools/ webstart.properties com$ Does your cog/ dir not have an mbuild.xml? - Mike ----- Original Message ----- BUILD FAILED /autonfs/home/skenny/soft/cog/modules/swift/build.xml:73: The following error occurred while executing this line: java.io.FileNotFoundException: /autonfs/home/skenny/soft/cog/mbuild.xml (No such file or directory) am i missing something obvious here? _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From skenny at uchicago.edu Thu Apr 7 17:35:00 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Thu, 7 Apr 2011 15:35:00 -0700 Subject: [Swift-devel] where to put sites.xml checker Message-ID: hi all, i have a script to call for checking the validity of the user's sites.xml file...i'm not sure where the best place is to call it from. any ideas on the best spot to drop it in? initially we'd discussed the swift script itself but i'm thinking it should probably be where the commandline is parsed (?) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Apr 7 18:24:47 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 7 Apr 2011 18:24:47 -0500 (CDT) Subject: [Swift-devel] where to put sites.xml checker In-Reply-To: Message-ID: <1124918386.80616.1302218687642.JavaMail.root@zimbra.anl.gov> Sarah, that sounds very reasonable. After command parsing you know where the sites file is. I think anywhere between there and before the Java app is launched. - Mike ----- Original Message ----- > hi all, i have a script to call for checking the validity of the > user's sites.xml file...i'm not sure where the best place is to call > it from. any ideas on the best spot to drop it in? initially we'd > discussed the swift script itself but i'm thinking it should probably > be where the commandline is parsed (?) > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri Apr 8 08:45:12 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 8 Apr 2011 08:45:12 -0500 (CDT) Subject: [Swift-devel] Re: Fwd: [Dsl-seminar] Bloom/Bud released In-Reply-To: <20110408083815.ANC27026@mstore03.uchicago.edu> Message-ID: <875258362.81854.1302270312947.JavaMail.root@zimbra.anl.gov> Another data flow language to look at. Swift Team: also some nice layout inspiration and possibly some good examples of language "try it" mechanisms we can leverage in Swift. - Mike ----- Forwarded Message ----- From: "Tim Armstrong" To: dsl-seminar at mailman.cs.uchicago.edu Sent: Friday, April 8, 2011 8:38:15 AM Subject: [Dsl-seminar] Bloom/Bud released http://www.bloom-lang.net/ _______________________________________________ DSL-seminar mailing list DSL-seminar at mailman.cs.uchicago.edu https://mailman.cs.uchicago.edu/mailman/listinfo/dsl-seminar -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From support at ci.uchicago.edu Fri Apr 8 10:10:58 2011 From: support at ci.uchicago.edu (David Forero) Date: Fri, 8 Apr 2011 10:10:58 -0500 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: <779218967.71257.1302113791047.JavaMail.root@zimbra.anl.gov> References: <779218967.71257.1302113791047.JavaMail.root@zimbra.anl.gov> Message-ID: On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > Commit emails for the vdl2 repository started getting flagged for > approvals a few days ago. They had been working fine till then. > > > I looked at the swift-commit list settings at that time and saw almost > no one subscribed, which I thought was strange. Then a few days > later, seems like most of the group was subscribed. Im therefore > not sure if I misread what I saw, or it changed somehow (manually > or via some restore???) or if I looked at a different list. > > > Did anything change in either the swift-commit list settings or in the > linkage from svn to the commit email, in the past week, that would > explain problems? > > > Is there anything special in how the commit email list should be set > up? Here's who's subscribed to swift commit: foster at mcs.anl.gov hategan at mcs.anl.gov iraicu at cs.uchicago.edu jon.monette at gmail.com ketan at mcs.anl.gov noreply at svn.ci.uchicago.edu skenny at uchicago.edu swift-commit at globus.org tim.g.armstrong at gmail.com wilde at mcs.anl.gov wozniak at mcs.anl.gov yongzh at cs.uchicago.edu As far as I know there have been no changes to this list. Does it say why the postings are being held? -- David Forero System Administrator Computation Institute University of Chicago 773-834-4102 From skenny at uchicago.edu Fri Apr 8 11:34:39 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Fri, 8 Apr 2011 09:34:39 -0700 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: References: <779218967.71257.1302113791047.JavaMail.root@zimbra.anl.gov> Message-ID: i got another yesterday here's the content: Your mail to 'Swift-commit' with the subject r4307 - trunk/bin Is being held until the list moderator can review it for approval. The reason it is being held: Post by non-member to a members-only list Either the message will get posted to the list, or you will receive notification of the moderator's decision. If you would like to cancel this posting, please visit the following URL: http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 On Fri, Apr 8, 2011 at 8:10 AM, David Forero wrote: > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > Commit emails for the vdl2 repository started getting flagged for > > approvals a few days ago. They had been working fine till then. > > > > > > I looked at the swift-commit list settings at that time and saw almost > > no one subscribed, which I thought was strange. Then a few days > > later, seems like most of the group was subscribed. Im therefore > > not sure if I misread what I saw, or it changed somehow (manually > > or via some restore???) or if I looked at a different list. > > > > > > Did anything change in either the swift-commit list settings or in the > > linkage from svn to the commit email, in the past week, that would > > explain problems? > > > > > > Is there anything special in how the commit email list should be set > > up? > > Here's who's subscribed to swift commit: > > foster at mcs.anl.gov > > hategan at mcs.anl.gov > > iraicu at cs.uchicago.edu > > jon.monette at gmail.com > > ketan at mcs.anl.gov > > noreply at svn.ci.uchicago.edu > > skenny at uchicago.edu > > swift-commit at globus.org > > tim.g.armstrong at gmail.com > > wilde at mcs.anl.gov > > wozniak at mcs.anl.gov > > yongzh at cs.uchicago.edu > > As far as I know there have been no changes to this list. Does it say why > the > postings are being held? > > > -- > > David Forero > System Administrator > Computation Institute > University of Chicago > 773-834-4102 > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From support at ci.uchicago.edu Fri Apr 8 11:34:49 2011 From: support at ci.uchicago.edu (skenny) Date: Fri, 8 Apr 2011 11:34:49 -0500 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: References: <779218967.71257.1302113791047.JavaMail.root@zimbra.anl.gov> Message-ID: i got another yesterday here's the content: Your mail to 'Swift-commit' with the subject r4307 - trunk/bin Is being held until the list moderator can review it for approval. The reason it is being held: Post by non-member to a members-only list Either the message will get posted to the list, or you will receive notification of the moderator's decision. If you would like to cancel this posting, please visit the following URL: http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 On Fri, Apr 8, 2011 at 8:10 AM, David Forero wrote: > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > Commit emails for the vdl2 repository started getting flagged for > > approvals a few days ago. They had been working fine till then. > > > > > > I looked at the swift-commit list settings at that time and saw almost > > no one subscribed, which I thought was strange. Then a few days > > later, seems like most of the group was subscribed. Im therefore > > not sure if I misread what I saw, or it changed somehow (manually > > or via some restore???) or if I looked at a different list. > > > > > > Did anything change in either the swift-commit list settings or in the > > linkage from svn to the commit email, in the past week, that would > > explain problems? > > > > > > Is there anything special in how the commit email list should be set > > up? > > Here's who's subscribed to swift commit: > > foster at mcs.anl.gov > > hategan at mcs.anl.gov > > iraicu at cs.uchicago.edu > > jon.monette at gmail.com > > ketan at mcs.anl.gov > > noreply at svn.ci.uchicago.edu > > skenny at uchicago.edu > > swift-commit at globus.org > > tim.g.armstrong at gmail.com > > wilde at mcs.anl.gov > > wozniak at mcs.anl.gov > > yongzh at cs.uchicago.edu > > As far as I know there have been no changes to this list. Does it say why > the > postings are being held? > > > -- > > David Forero > System Administrator > Computation Institute > University of Chicago > 773-834-4102 > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From support at ci.uchicago.edu Fri Apr 8 11:36:51 2011 From: support at ci.uchicago.edu (skenny) Date: Fri, 8 Apr 2011 11:36:51 -0500 Subject: Fwd: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: References: <779218967.71257.1302113791047.JavaMail.root@zimbra.anl.gov> Message-ID: david, this was meant for you...i hit reply-all but it seems to replace the support address with mine :P ---------- Forwarded message ---------- From: skenny Date: Fri, Apr 8, 2011 at 9:34 AM Subject: Re: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email To: wilde at mcs.anl.gov Cc: swift-devel at ci.uchicago.edu i got another yesterday here's the content: Your mail to 'Swift-commit' with the subject r4307 - trunk/bin Is being held until the list moderator can review it for approval. The reason it is being held: Post by non-member to a members-only list Either the message will get posted to the list, or you will receive notification of the moderator's decision. If you would like to cancel this posting, please visit the following URL: http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 On Fri, Apr 8, 2011 at 8:10 AM, David Forero wrote: > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > Commit emails for the vdl2 repository started getting flagged for > > approvals a few days ago. They had been working fine till then. > > > > > > I looked at the swift-commit list settings at that time and saw almost > > no one subscribed, which I thought was strange. Then a few days > > later, seems like most of the group was subscribed. Im therefore > > not sure if I misread what I saw, or it changed somehow (manually > > or via some restore???) or if I looked at a different list. > > > > > > Did anything change in either the swift-commit list settings or in the > > linkage from svn to the commit email, in the past week, that would > > explain problems? > > > > > > Is there anything special in how the commit email list should be set > > up? > > Here's who's subscribed to swift commit: > > foster at mcs.anl.gov > > hategan at mcs.anl.gov > > iraicu at cs.uchicago.edu > > jon.monette at gmail.com > > ketan at mcs.anl.gov > > noreply at svn.ci.uchicago.edu > > skenny at uchicago.edu > > swift-commit at globus.org > > tim.g.armstrong at gmail.com > > wilde at mcs.anl.gov > > wozniak at mcs.anl.gov > > yongzh at cs.uchicago.edu > > As far as I know there have been no changes to this list. Does it say why > the > postings are being held? > > > -- > > David Forero > System Administrator > Computation Institute > University of Chicago > 773-834-4102 > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Fri Apr 8 11:48:35 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 8 Apr 2011 16:48:35 +0000 (GMT) Subject: [Swift-devel] Re: Fwd: [Dsl-seminar] Bloom/Bud released In-Reply-To: <875258362.81854.1302270312947.JavaMail.root@zimbra.anl.gov> References: <875258362.81854.1302270312947.JavaMail.root@zimbra.anl.gov> Message-ID: I looked at this yesterday briefly. The page: http://www.bloom-lang.net/features/ has points 1 - 3 which match up with Swift really very closely, although phrased in very different language. -- http://www.hawaga.org.uk/ben/ On Fri, 8 Apr 2011, Michael Wilde wrote: > Another data flow language to look at. > > Swift Team: also some nice layout inspiration and possibly some good examples of language "try it" mechanisms we can leverage in Swift. > > - Mike > > ----- Forwarded Message ----- > From: "Tim Armstrong" > To: dsl-seminar at mailman.cs.uchicago.edu > Sent: Friday, April 8, 2011 8:38:15 AM > Subject: [Dsl-seminar] Bloom/Bud released > > http://www.bloom-lang.net/ > _______________________________________________ > DSL-seminar mailing list > DSL-seminar at mailman.cs.uchicago.edu > https://mailman.cs.uchicago.edu/mailman/listinfo/dsl-seminar > > From wilde at mcs.anl.gov Fri Apr 8 11:53:31 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 8 Apr 2011 11:53:31 -0500 (CDT) Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: Message-ID: <587015715.83064.1302281611011.JavaMail.root@zimbra.anl.gov> David, can you tell us how these messages work? Meaning, when you do a commit, I would have thought that the message sent to swift-commit is sent by some kind of CI daemon running in or near the SVN server. How do these messages get associated with a user's email address? Are these the email addresses associated with our CI logins? And if so, did something change in this how these messages are formed and routed in the last few weeks? - Mike ----- Original Message ----- > i got another yesterday here's the content: > > Your mail to 'Swift-commit' with the subject > > r4307 - trunk/bin > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Post by non-member to a members-only list > > Either the message will get posted to the list, or you will receive > notification of the moderator's decision. If you would like to cancel > this posting, please visit the following URL: > > > http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 > > On Fri, Apr 8, 2011 at 8:10 AM, David Forero > wrote: > > > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > > Commit emails for the vdl2 repository started getting flagged for > > > approvals a few days ago. They had been working fine till then. > > > > > > > > > I looked at the swift-commit list settings at that time and saw > > > almost > > > no one subscribed, which I thought was strange. Then a few days > > > later, seems like most of the group was subscribed. Im therefore > > > not sure if I misread what I saw, or it changed somehow (manually > > > or via some restore???) or if I looked at a different list. > > > > > > > > > Did anything change in either the swift-commit list settings or in > > > the > > > linkage from svn to the commit email, in the past week, that would > > > explain problems? > > > > > > > > > Is there anything special in how the commit email list should be > > > set > > > up? > > > > Here's who's subscribed to swift commit: > > > > foster at mcs.anl.gov > > > > hategan at mcs.anl.gov > > > > iraicu at cs.uchicago.edu > > > > jon.monette at gmail.com > > > > ketan at mcs.anl.gov > > > > noreply at svn.ci.uchicago.edu > > > > skenny at uchicago.edu > > > > swift-commit at globus.org > > > > tim.g.armstrong at gmail.com > > > > wilde at mcs.anl.gov > > > > wozniak at mcs.anl.gov > > > > yongzh at cs.uchicago.edu > > > > As far as I know there have been no changes to this list. Does it > > say why > > the > > postings are being held? > > > > > > -- > > > > David Forero > > System Administrator > > Computation Institute > > University of Chicago > > 773-834-4102 > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From support at ci.uchicago.edu Fri Apr 8 11:53:33 2011 From: support at ci.uchicago.edu (Mike Wilde) Date: Fri, 8 Apr 2011 11:53:33 -0500 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: <587015715.83064.1302281611011.JavaMail.root@zimbra.anl.gov> References: <587015715.83064.1302281611011.JavaMail.root@zimbra.anl.gov> Message-ID: David, can you tell us how these messages work? Meaning, when you do a commit, I would have thought that the message sent to swift-commit is sent by some kind of CI daemon running in or near the SVN server. How do these messages get associated with a user's email address? Are these the email addresses associated with our CI logins? And if so, did something change in this how these messages are formed and routed in the last few weeks? - Mike ----- Original Message ----- > i got another yesterday here's the content: > > Your mail to 'Swift-commit' with the subject > > r4307 - trunk/bin > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Post by non-member to a members-only list > > Either the message will get posted to the list, or you will receive > notification of the moderator's decision. If you would like to cancel > this posting, please visit the following URL: > > > http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 > > On Fri, Apr 8, 2011 at 8:10 AM, David Forero > wrote: > > > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > > Commit emails for the vdl2 repository started getting flagged for > > > approvals a few days ago. They had been working fine till then. > > > > > > > > > I looked at the swift-commit list settings at that time and saw > > > almost > > > no one subscribed, which I thought was strange. Then a few days > > > later, seems like most of the group was subscribed. Im therefore > > > not sure if I misread what I saw, or it changed somehow (manually > > > or via some restore???) or if I looked at a different list. > > > > > > > > > Did anything change in either the swift-commit list settings or in > > > the > > > linkage from svn to the commit email, in the past week, that would > > > explain problems? > > > > > > > > > Is there anything special in how the commit email list should be > > > set > > > up? > > > > Here's who's subscribed to swift commit: > > > > foster at mcs.anl.gov > > > > hategan at mcs.anl.gov > > > > iraicu at cs.uchicago.edu > > > > jon.monette at gmail.com > > > > ketan at mcs.anl.gov > > > > noreply at svn.ci.uchicago.edu > > > > skenny at uchicago.edu > > > > swift-commit at globus.org > > > > tim.g.armstrong at gmail.com > > > > wilde at mcs.anl.gov > > > > wozniak at mcs.anl.gov > > > > yongzh at cs.uchicago.edu > > > > As far as I know there have been no changes to this list. Does it > > say why > > the > > postings are being held? > > > > > > -- > > > > David Forero > > System Administrator > > Computation Institute > > University of Chicago > > 773-834-4102 > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Apr 8 12:47:25 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 08 Apr 2011 10:47:25 -0700 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: References: <779218967.71257.1302113791047.JavaMail.root@zimbra.anl.gov> Message-ID: <1302284845.31922.1.camel@blabla2.none> Looks like skenny at uchicago.edu is subscribed, but the message was sent as skenny at ci.uchicago.edu. I can approve the messages being held an add them to the accept list so that future messages get posted directly. Mihael On Fri, 2011-04-08 at 09:34 -0700, Sarah Kenny wrote: > i got another yesterday here's the content: > > Your mail to 'Swift-commit' with the subject > > r4307 - trunk/bin > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Post by non-member to a members-only list > > Either the message will get posted to the list, or you will receive > notification of the moderator's decision. If you would like to cancel > this posting, please visit the following URL: > > > http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 > > On Fri, Apr 8, 2011 at 8:10 AM, David Forero > wrote: > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > Commit emails for the vdl2 repository started getting > flagged for > > approvals a few days ago. They had been working fine till > then. > > > > > > I looked at the swift-commit list settings at that time and > saw almost > > no one subscribed, which I thought was strange. Then a few > days > > later, seems like most of the group was subscribed. Im > therefore > > not sure if I misread what I saw, or it changed somehow > (manually > > or via some restore???) or if I looked at a different list. > > > > > > Did anything change in either the swift-commit list settings > or in the > > linkage from svn to the commit email, in the past week, that > would > > explain problems? > > > > > > Is there anything special in how the commit email list > should be set > > up? > > Here's who's subscribed to swift commit: > > foster at mcs.anl.gov > > hategan at mcs.anl.gov > > iraicu at cs.uchicago.edu > > jon.monette at gmail.com > > ketan at mcs.anl.gov > > noreply at svn.ci.uchicago.edu > > skenny at uchicago.edu > > swift-commit at globus.org > > tim.g.armstrong at gmail.com > > wilde at mcs.anl.gov > > wozniak at mcs.anl.gov > > yongzh at cs.uchicago.edu > > As far as I know there have been no changes to this list. Does > it say why the > postings are being held? > > > -- > > David Forero > System Administrator > Computation Institute > University of Chicago > 773-834-4102 > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From support at ci.uchicago.edu Fri Apr 8 12:47:38 2011 From: support at ci.uchicago.edu (Mihael Hategan) Date: Fri, 8 Apr 2011 12:47:38 -0500 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: <1302284845.31922.1.camel@blabla2.none> References: <779218967.71257.1302113791047.JavaMail.root@zimbra.anl.gov> <1302284845.31922.1.camel@blabla2.none> Message-ID: Looks like skenny at uchicago.edu is subscribed, but the message was sent as skenny at ci.uchicago.edu. I can approve the messages being held an add them to the accept list so that future messages get posted directly. Mihael On Fri, 2011-04-08 at 09:34 -0700, Sarah Kenny wrote: > i got another yesterday here's the content: > > Your mail to 'Swift-commit' with the subject > > r4307 - trunk/bin > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Post by non-member to a members-only list > > Either the message will get posted to the list, or you will receive > notification of the moderator's decision. If you would like to cancel > this posting, please visit the following URL: > > > http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 > > On Fri, Apr 8, 2011 at 8:10 AM, David Forero > wrote: > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > Commit emails for the vdl2 repository started getting > flagged for > > approvals a few days ago. They had been working fine till > then. > > > > > > I looked at the swift-commit list settings at that time and > saw almost > > no one subscribed, which I thought was strange. Then a few > days > > later, seems like most of the group was subscribed. Im > therefore > > not sure if I misread what I saw, or it changed somehow > (manually > > or via some restore???) or if I looked at a different list. > > > > > > Did anything change in either the swift-commit list settings > or in the > > linkage from svn to the commit email, in the past week, that > would > > explain problems? > > > > > > Is there anything special in how the commit email list > should be set > > up? > > Here's who's subscribed to swift commit: > > foster at mcs.anl.gov > > hategan at mcs.anl.gov > > iraicu at cs.uchicago.edu > > jon.monette at gmail.com > > ketan at mcs.anl.gov > > noreply at svn.ci.uchicago.edu > > skenny at uchicago.edu > > swift-commit at globus.org > > tim.g.armstrong at gmail.com > > wilde at mcs.anl.gov > > wozniak at mcs.anl.gov > > yongzh at cs.uchicago.edu > > As far as I know there have been no changes to this list. Does > it say why the > postings are being held? > > > -- > > David Forero > System Administrator > Computation Institute > University of Chicago > 773-834-4102 > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From support at ci.uchicago.edu Fri Apr 8 12:56:56 2011 From: support at ci.uchicago.edu (Ti Leggett) Date: Fri, 8 Apr 2011 12:56:56 -0500 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: <8C4DA37A-9E66-448D-967F-7DF8CA0CA116@ci.uchicago.edu> References: <779218967.71257.1302113791047.JavaMail.root@zimbra.anl.gov> <1302284845.31922.1.camel@blabla2.none> <8C4DA37A-9E66-448D-967F-7DF8CA0CA116@ci.uchicago.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We had several requests to have commit messages come from the user making the commit. This also fixed another issue where commit messages weren't being accepted because the address didn't have a proper MX record. This was done several weeks ago. Commit messages now come from the user's CI email address. On Apr 8, 2011, at 12:47 PM, Mihael Hategan wrote: > > Looks like skenny at uchicago.edu is subscribed, but the message was sent > as skenny at ci.uchicago.edu. > > I can approve the messages being held an add them to the accept list so > that future messages get posted directly. > > Mihael > > On Fri, 2011-04-08 at 09:34 -0700, Sarah Kenny wrote: >> i got another yesterday here's the content: >> >> Your mail to 'Swift-commit' with the subject >> >> r4307 - trunk/bin >> >> Is being held until the list moderator can review it for approval. >> >> The reason it is being held: >> >> Post by non-member to a members-only list >> >> Either the message will get posted to the list, or you will receive >> notification of the moderator's decision. If you would like to cancel >> this posting, please visit the following URL: >> >> >> http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 >> >> On Fri, Apr 8, 2011 at 8:10 AM, David Forero >> wrote: >> On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: >>> Commit emails for the vdl2 repository started getting >> flagged for >>> approvals a few days ago. They had been working fine till >> then. >>> >>> >>> I looked at the swift-commit list settings at that time and >> saw almost >>> no one subscribed, which I thought was strange. Then a few >> days >>> later, seems like most of the group was subscribed. Im >> therefore >>> not sure if I misread what I saw, or it changed somehow >> (manually >>> or via some restore???) or if I looked at a different list. >>> >>> >>> Did anything change in either the swift-commit list settings >> or in the >>> linkage from svn to the commit email, in the past week, that >> would >>> explain problems? >>> >>> >>> Is there anything special in how the commit email list >> should be set >>> up? >> >> Here's who's subscribed to swift commit: >> >> foster at mcs.anl.gov >> >> hategan at mcs.anl.gov >> >> iraicu at cs.uchicago.edu >> >> jon.monette at gmail.com >> >> ketan at mcs.anl.gov >> >> noreply at svn.ci.uchicago.edu >> >> skenny at uchicago.edu >> >> swift-commit at globus.org >> >> tim.g.armstrong at gmail.com >> >> wilde at mcs.anl.gov >> >> wozniak at mcs.anl.gov >> >> yongzh at cs.uchicago.edu >> >> As far as I know there have been no changes to this list. Does >> it say why the >> postings are being held? >> >> >> -- >> >> David Forero >> System Administrator >> Computation Institute >> University of Chicago >> 773-834-4102 >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.14 (Darwin) iEYEARECAAYFAk2fTGUACgkQ4RgdOxQVi0BaxwCglaWol1TJotVl2HyXDJWPfI4X xBUAn0pKlRX8F3lbW8E6t2nWFEAw0l0Z =DD3d -----END PGP SIGNATURE----- From wilde at mcs.anl.gov Fri Apr 8 13:30:10 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 8 Apr 2011 13:30:10 -0500 (CDT) Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: Message-ID: <514647778.83550.1302287410002.JavaMail.root@zimbra.anl.gov> I am wondering if something changed whereby the emails form SVN *used* to look like they all came from a single system email address, and then changed to come from the email address associated with the CI login of the person doing the SVN commit??? - Mike - ----- Original Message ----- > Looks like skenny at uchicago.edu is subscribed, but the message was sent > as skenny at ci.uchicago.edu. > > I can approve the messages being held an add them to the accept list > so > that future messages get posted directly. > > Mihael > > On Fri, 2011-04-08 at 09:34 -0700, Sarah Kenny wrote: > > i got another yesterday here's the content: > > > > Your mail to 'Swift-commit' with the subject > > > > r4307 - trunk/bin > > > > Is being held until the list moderator can review it for approval. > > > > The reason it is being held: > > > > Post by non-member to a members-only list > > > > Either the message will get posted to the list, or you will receive > > notification of the moderator's decision. If you would like to > > cancel > > this posting, please visit the following URL: > > > > > > http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 > > > > On Fri, Apr 8, 2011 at 8:10 AM, David Forero > > > > wrote: > > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > > Commit emails for the vdl2 repository started getting > > flagged for > > > approvals a few days ago. They had been working fine till > > then. > > > > > > > > > I looked at the swift-commit list settings at that time > > > and > > saw almost > > > no one subscribed, which I thought was strange. Then a few > > days > > > later, seems like most of the group was subscribed. Im > > therefore > > > not sure if I misread what I saw, or it changed somehow > > (manually > > > or via some restore???) or if I looked at a different > > > list. > > > > > > > > > Did anything change in either the swift-commit list > > > settings > > or in the > > > linkage from svn to the commit email, in the past week, > > > that > > would > > > explain problems? > > > > > > > > > Is there anything special in how the commit email list > > should be set > > > up? > > > > Here's who's subscribed to swift commit: > > > > foster at mcs.anl.gov > > > > hategan at mcs.anl.gov > > > > iraicu at cs.uchicago.edu > > > > jon.monette at gmail.com > > > > ketan at mcs.anl.gov > > > > noreply at svn.ci.uchicago.edu > > > > skenny at uchicago.edu > > > > swift-commit at globus.org > > > > tim.g.armstrong at gmail.com > > > > wilde at mcs.anl.gov > > > > wozniak at mcs.anl.gov > > > > yongzh at cs.uchicago.edu > > > > As far as I know there have been no changes to this list. > > Does > > it say why the > > postings are being held? > > > > > > -- > > > > David Forero > > System Administrator > > Computation Institute > > University of Chicago > > 773-834-4102 > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From support at ci.uchicago.edu Fri Apr 8 13:30:12 2011 From: support at ci.uchicago.edu (Mike Wilde) Date: Fri, 8 Apr 2011 13:30:12 -0500 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: <514647778.83550.1302287410002.JavaMail.root@zimbra.anl.gov> References: <514647778.83550.1302287410002.JavaMail.root@zimbra.anl.gov> Message-ID: I am wondering if something changed whereby the emails form SVN *used* to look like they all came from a single system email address, and then changed to come from the email address associated with the CI login of the person doing the SVN commit??? - Mike - ----- Original Message ----- > Looks like skenny at uchicago.edu is subscribed, but the message was sent > as skenny at ci.uchicago.edu. > > I can approve the messages being held an add them to the accept list > so > that future messages get posted directly. > > Mihael > > On Fri, 2011-04-08 at 09:34 -0700, Sarah Kenny wrote: > > i got another yesterday here's the content: > > > > Your mail to 'Swift-commit' with the subject > > > > r4307 - trunk/bin > > > > Is being held until the list moderator can review it for approval. > > > > The reason it is being held: > > > > Post by non-member to a members-only list > > > > Either the message will get posted to the list, or you will receive > > notification of the moderator's decision. If you would like to > > cancel > > this posting, please visit the following URL: > > > > > > http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 > > > > On Fri, Apr 8, 2011 at 8:10 AM, David Forero > > > > wrote: > > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > > Commit emails for the vdl2 repository started getting > > flagged for > > > approvals a few days ago. They had been working fine till > > then. > > > > > > > > > I looked at the swift-commit list settings at that time > > > and > > saw almost > > > no one subscribed, which I thought was strange. Then a few > > days > > > later, seems like most of the group was subscribed. Im > > therefore > > > not sure if I misread what I saw, or it changed somehow > > (manually > > > or via some restore???) or if I looked at a different > > > list. > > > > > > > > > Did anything change in either the swift-commit list > > > settings > > or in the > > > linkage from svn to the commit email, in the past week, > > > that > > would > > > explain problems? > > > > > > > > > Is there anything special in how the commit email list > > should be set > > > up? > > > > Here's who's subscribed to swift commit: > > > > foster at mcs.anl.gov > > > > hategan at mcs.anl.gov > > > > iraicu at cs.uchicago.edu > > > > jon.monette at gmail.com > > > > ketan at mcs.anl.gov > > > > noreply at svn.ci.uchicago.edu > > > > skenny at uchicago.edu > > > > swift-commit at globus.org > > > > tim.g.armstrong at gmail.com > > > > wilde at mcs.anl.gov > > > > wozniak at mcs.anl.gov > > > > yongzh at cs.uchicago.edu > > > > As far as I know there have been no changes to this list. > > Does > > it say why the > > postings are being held? > > > > > > -- > > > > David Forero > > System Administrator > > Computation Institute > > University of Chicago > > 773-834-4102 > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Apr 8 13:35:00 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 08 Apr 2011 11:35:00 -0700 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: <514647778.83550.1302287410002.JavaMail.root@zimbra.anl.gov> References: <514647778.83550.1302287410002.JavaMail.root@zimbra.anl.gov> Message-ID: <1302287700.32482.1.camel@blabla2.none> [removed support at ci] Yes. We had noreply at svn.ci.uchicago.edu subscribed, which was the address from which the messages came initially. Ti's email mentions that messages are now sent from username at ci.uchicago.edu instead. Mihael On Fri, 2011-04-08 at 13:30 -0500, Michael Wilde wrote: > I am wondering if something changed whereby the emails form SVN *used* > to look like they all came from a single system email address, and > then changed to come from the email address associated with the CI > login of the person doing the SVN commit??? > > - Mike > > - > > ----- Original Message ----- > > Looks like skenny at uchicago.edu is subscribed, but the message was sent > > as skenny at ci.uchicago.edu. > > > > I can approve the messages being held an add them to the accept list > > so > > that future messages get posted directly. > > > > Mihael > > > > On Fri, 2011-04-08 at 09:34 -0700, Sarah Kenny wrote: > > > i got another yesterday here's the content: > > > > > > Your mail to 'Swift-commit' with the subject > > > > > > r4307 - trunk/bin > > > > > > Is being held until the list moderator can review it for approval. > > > > > > The reason it is being held: > > > > > > Post by non-member to a members-only list > > > > > > Either the message will get posted to the list, or you will receive > > > notification of the moderator's decision. If you would like to > > > cancel > > > this posting, please visit the following URL: > > > > > > > > > http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 > > > > > > On Fri, Apr 8, 2011 at 8:10 AM, David Forero > > > > > > wrote: > > > On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > > > > Commit emails for the vdl2 repository started getting > > > flagged for > > > > approvals a few days ago. They had been working fine till > > > then. > > > > > > > > > > > > I looked at the swift-commit list settings at that time > > > > and > > > saw almost > > > > no one subscribed, which I thought was strange. Then a few > > > days > > > > later, seems like most of the group was subscribed. Im > > > therefore > > > > not sure if I misread what I saw, or it changed somehow > > > (manually > > > > or via some restore???) or if I looked at a different > > > > list. > > > > > > > > > > > > Did anything change in either the swift-commit list > > > > settings > > > or in the > > > > linkage from svn to the commit email, in the past week, > > > > that > > > would > > > > explain problems? > > > > > > > > > > > > Is there anything special in how the commit email list > > > should be set > > > > up? > > > > > > Here's who's subscribed to swift commit: > > > > > > foster at mcs.anl.gov > > > > > > hategan at mcs.anl.gov > > > > > > iraicu at cs.uchicago.edu > > > > > > jon.monette at gmail.com > > > > > > ketan at mcs.anl.gov > > > > > > noreply at svn.ci.uchicago.edu > > > > > > skenny at uchicago.edu > > > > > > swift-commit at globus.org > > > > > > tim.g.armstrong at gmail.com > > > > > > wilde at mcs.anl.gov > > > > > > wozniak at mcs.anl.gov > > > > > > yongzh at cs.uchicago.edu > > > > > > As far as I know there have been no changes to this list. > > > Does > > > it say why the > > > postings are being held? > > > > > > > > > -- > > > > > > David Forero > > > System Administrator > > > Computation Institute > > > University of Chicago > > > 773-834-4102 > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Fri Apr 8 13:38:00 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 8 Apr 2011 13:38:00 -0500 (CDT) Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: Message-ID: <650937626.83579.1302287880974.JavaMail.root@zimbra.anl.gov> OK, Ti confirms here that this was indeed the change. So now we need to set the list so that: - everyone's CI email addr can post - postings get set *to* swift-devel I *think* this now confirms and explains what I originally suspected: that the list had only a few subscribers, of which only 2 mattered: one for the SVN sending address, and one to forward to swift-devel. There may have been a few others. So either we add all the senders and mark them as "send no email to these", or we do it with masks of some sort. Can someone please grab this task, tell the list you got it, and make it work nicely again? Thanks, Mike ----- Original Message ----- > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > We had several requests to have commit messages come from the user > making the commit. This also fixed another issue where commit messages > weren't being accepted because the address didn't have a proper MX > record. This was done several weeks ago. Commit messages now come from > the user's CI email address. > > On Apr 8, 2011, at 12:47 PM, Mihael Hategan wrote: > > > > > Looks like skenny at uchicago.edu is subscribed, but the message was > > sent > > as skenny at ci.uchicago.edu. > > > > I can approve the messages being held an add them to the accept list > > so > > that future messages get posted directly. > > > > Mihael > > > > On Fri, 2011-04-08 at 09:34 -0700, Sarah Kenny wrote: > >> i got another yesterday here's the content: > >> > >> Your mail to 'Swift-commit' with the subject > >> > >> r4307 - trunk/bin > >> > >> Is being held until the list moderator can review it for approval. > >> > >> The reason it is being held: > >> > >> Post by non-member to a members-only list > >> > >> Either the message will get posted to the list, or you will receive > >> notification of the moderator's decision. If you would like to > >> cancel > >> this posting, please visit the following URL: > >> > >> > >> http://mail.ci.uchicago.edu/mailman/confirm/swift-commit/f35b402588db8e0fb5b59f561fa3dd3a640c3406 > >> > >> On Fri, Apr 8, 2011 at 8:10 AM, David Forero > >> > >> wrote: > >> On Wed Apr 06 13:16:32 2011, wilde at mcs.anl.gov wrote: > >>> Commit emails for the vdl2 repository started getting > >> flagged for > >>> approvals a few days ago. They had been working fine till > >> then. > >>> > >>> > >>> I looked at the swift-commit list settings at that time and > >> saw almost > >>> no one subscribed, which I thought was strange. Then a few > >> days > >>> later, seems like most of the group was subscribed. Im > >> therefore > >>> not sure if I misread what I saw, or it changed somehow > >> (manually > >>> or via some restore???) or if I looked at a different list. > >>> > >>> > >>> Did anything change in either the swift-commit list settings > >> or in the > >>> linkage from svn to the commit email, in the past week, that > >> would > >>> explain problems? > >>> > >>> > >>> Is there anything special in how the commit email list > >> should be set > >>> up? > >> > >> Here's who's subscribed to swift commit: > >> > >> foster at mcs.anl.gov > >> > >> hategan at mcs.anl.gov > >> > >> iraicu at cs.uchicago.edu > >> > >> jon.monette at gmail.com > >> > >> ketan at mcs.anl.gov > >> > >> noreply at svn.ci.uchicago.edu > >> > >> skenny at uchicago.edu > >> > >> swift-commit at globus.org > >> > >> tim.g.armstrong at gmail.com > >> > >> wilde at mcs.anl.gov > >> > >> wozniak at mcs.anl.gov > >> > >> yongzh at cs.uchicago.edu > >> > >> As far as I know there have been no changes to this list. > >> Does > >> it say why the > >> postings are being held? > >> > >> > >> -- > >> > >> David Forero > >> System Administrator > >> Computation Institute > >> University of Chicago > >> 773-834-4102 > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.14 (Darwin) > > iEYEARECAAYFAk2fTGUACgkQ4RgdOxQVi0BaxwCglaWol1TJotVl2HyXDJWPfI4X > xBUAn0pKlRX8F3lbW8E6t2nWFEAw0l0Z > =DD3d > -----END PGP SIGNATURE----- -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Apr 8 13:45:11 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 08 Apr 2011 11:45:11 -0700 Subject: [CI Ticketing System #12698] Swift commit messages getting sent to moderator - was Re: [Swift-devel] svn commit email In-Reply-To: <650937626.83579.1302287880974.JavaMail.root@zimbra.anl.gov> References: <650937626.83579.1302287880974.JavaMail.root@zimbra.anl.gov> Message-ID: <1302288311.1150.4.camel@blabla2.none> On Fri, 2011-04-08 at 13:38 -0500, Michael Wilde wrote: > OK, Ti confirms here that this was indeed the change. So now we need to set the list so that: > > - everyone's CI email addr can post We can go with the flow there. Whenever there is a post that doesn't go through, we can permanently add that address. > > - postings get set *to* swift-devel I don't think we should clutter swift-devel with commit messages. Folks should have the freedom to subscribe to these if the want to. I can imagine Ian not wanting daily doses of swift code snippets. > > I *think* this now confirms and explains what I originally suspected: > that the list had only a few subscribers, of which only 2 mattered: > one for the SVN sending address, and one to forward to swift-devel. > There may have been a few others. Everybody who wanted to receive commit messages was subscribed (and there we can see that Ian does want daily doses of swift code snippets). Apart from that, there were the sender and swift-commit at globus.org in order to comply with the globus.org rules. > > So either we add all the senders and mark them as "send no email to these", or we do it with masks of some sort. > > Can someone please grab this task, tell the list you got it, and make it work nicely again? Yeah. Don't worry about it. Mihael From hategan at mcs.anl.gov Fri Apr 8 13:50:20 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 08 Apr 2011 11:50:20 -0700 Subject: [Swift-devel] Re: Fwd: Proposal for coaster service options In-Reply-To: <1301259545.17525.3.camel@blabla2.none> References: <1770867299.31773.1301259341372.JavaMail.root@zimbra.anl.gov> <1301259545.17525.3.camel@blabla2.none> Message-ID: <1302288620.1599.0.camel@blabla2.none> On Sun, 2011-03-27 at 13:59 -0700, Mihael Hategan wrote: > Yes. I'll do that once I deal with this new set of bugs/hangs. I think > by the end of this week if not sooner. That wasn't quite by the end of last week, but it's there now in trunk cog r3095. Please test and let me know if it needs anything else. Mihael From wilde at mcs.anl.gov Fri Apr 8 13:56:49 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 8 Apr 2011 13:56:49 -0500 (CDT) Subject: [Swift-devel] Re: Fwd: Proposal for coaster service options In-Reply-To: <1302288620.1599.0.camel@blabla2.none> Message-ID: <884671183.83683.1302289009935.JavaMail.root@zimbra.anl.gov> Great! Can you echo back to the list the (external user visible) specs of what you did, ie the new options and behaviors? Or is that in a -help option? This would be for Justin, Ketan, and anyone else who is scripting coaster configurations. Thanks, Mike ----- Original Message ----- > On Sun, 2011-03-27 at 13:59 -0700, Mihael Hategan wrote: > > Yes. I'll do that once I deal with this new set of bugs/hangs. I > > think > > by the end of this week if not sooner. > > That wasn't quite by the end of last week, but it's there now in trunk > cog r3095. Please test and let me know if it needs anything else. > > Mihael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Apr 8 14:07:49 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 08 Apr 2011 12:07:49 -0700 Subject: [Swift-devel] Re: Fwd: Proposal for coaster service options In-Reply-To: <884671183.83683.1302289009935.JavaMail.root@zimbra.anl.gov> References: <884671183.83683.1302289009935.JavaMail.root@zimbra.anl.gov> Message-ID: <1302289669.1916.1.camel@blabla2.none> On Fri, 2011-04-08 at 13:56 -0500, Michael Wilde wrote: > Great! Can you echo back to the list the (external user visible) specs of what you did, ie the new options and behaviors? Or is that in a -help option? That is in the -help option. I essentially added -portfile (-S) and -localportfile (-W). When these are used, the ports will be dynamic. The files are written before the "Started service" message is printed by the service. Mihael From aespinosa at cs.uchicago.edu Fri Apr 8 15:25:08 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 8 Apr 2011 15:25:08 -0500 Subject: [Swift-devel] trunk 4322 cog 3095 build error Message-ID: delete.dependency.log.1: [echo] [provider-coaster]: DIST [echo] [provider-coaster]: JARCOPY delete.jar: [echo] [provider-coaster]: DELETE.JAR (cog-provider-coaster-0.3.jar) compile: [echo] [provider-coaster]: COMPILE [mkdir] Created dir: /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/build [javac] Compiling 126 source files to /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/build [javac] /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/src/org/globus/cog/abstraction/coaster/service/CoasterPersistentService.java:172: cannot find symbol [javac] symbol : method getLocalService() [javac] location: class org.globus.cog.abstraction.coaster.service.CoasterPersistentService [javac] writePort(s.getLocalService().getPort(), localPortFile); [javac] ^ [javac] Note: /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/src/org/globus/cog/abstraction/coaster/service/job/manager/Block.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 1 error BUILD FAILED /autonfs/home/aespinosa/swift/cogkit/modules/swift/build.xml:73: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:444: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:79: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:52: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/modules/swift/dependencies.xml:13: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:163: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:168: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/build.xml:59: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:465: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:228: Compile failed; see the compiler error output for details. Total time: 38 seconds -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Fri Apr 8 15:28:41 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 08 Apr 2011 13:28:41 -0700 Subject: [Swift-devel] trunk 4322 cog 3095 build error In-Reply-To: References: Message-ID: <1302294521.2406.0.camel@blabla2.none> My bad. Try r3096. On Fri, 2011-04-08 at 15:25 -0500, Allan Espinosa wrote: > delete.dependency.log.1: > [echo] [provider-coaster]: DIST > [echo] [provider-coaster]: JARCOPY > > delete.jar: > [echo] [provider-coaster]: DELETE.JAR (cog-provider-coaster-0.3.jar) > > compile: > [echo] [provider-coaster]: COMPILE > [mkdir] Created dir: > /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/build > [javac] Compiling 126 source files to > /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/build > [javac] /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/src/org/globus/cog/abstraction/coaster/service/CoasterPersistentService.java:172: > cannot find symbol > [javac] symbol : method getLocalService() > [javac] location: class > org.globus.cog.abstraction.coaster.service.CoasterPersistentService > [javac] writePort(s.getLocalService().getPort(), localPortFile); > [javac] ^ > [javac] Note: > /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/src/org/globus/cog/abstraction/coaster/service/job/manager/Block.java > uses or overrides a deprecated API. > [javac] Note: Recompile with -Xlint:deprecation for details. > [javac] Note: Some input files use unchecked or unsafe operations. > [javac] Note: Recompile with -Xlint:unchecked for details. > [javac] 1 error > > BUILD FAILED > /autonfs/home/aespinosa/swift/cogkit/modules/swift/build.xml:73: The > following error occurred while executing this line: > /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:444: The following > error occurred while executing this line: > /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:79: The following > error occurred while executing this line: > /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:52: The following > error occurred while executing this line: > /autonfs/home/aespinosa/swift/cogkit/modules/swift/dependencies.xml:13: > The following error occurred while executing this line: > /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:163: The following > error occurred while executing this line: > /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:168: The following > error occurred while executing this line: > /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/build.xml:59: > The following error occurred while executing this line: > /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:465: The following > error occurred while executing this line: > /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:228: Compile failed; > see the compiler error output for details. > > Total time: 38 seconds > > From aespinosa at cs.uchicago.edu Fri Apr 8 15:39:10 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 8 Apr 2011 15:39:10 -0500 Subject: [Swift-devel] trunk 4322 cog 3095 build error In-Reply-To: <1302294521.2406.0.camel@blabla2.none> References: <1302294521.2406.0.camel@blabla2.none> Message-ID: it now builds. Thanks! -allan 2011/4/8 Mihael Hategan : > My bad. Try r3096. > > On Fri, 2011-04-08 at 15:25 -0500, Allan Espinosa wrote: >> delete.dependency.log.1: >> ? ? ?[echo] [provider-coaster]: DIST >> ? ? ?[echo] [provider-coaster]: JARCOPY >> >> delete.jar: >> ? ? ?[echo] [provider-coaster]: DELETE.JAR (cog-provider-coaster-0.3.jar) >> >> compile: >> ? ? ?[echo] [provider-coaster]: COMPILE >> ? ? [mkdir] Created dir: >> /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/build >> ? ? [javac] Compiling 126 source files to >> /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/build >> ? ? [javac] /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/src/org/globus/cog/abstraction/coaster/service/CoasterPersistentService.java:172: >> cannot find symbol >> ? ? [javac] symbol ?: method getLocalService() >> ? ? [javac] location: class >> org.globus.cog.abstraction.coaster.service.CoasterPersistentService >> ? ? [javac] ? ? ? ? writePort(s.getLocalService().getPort(), localPortFile); >> ? ? [javac] ? ? ? ? ? ? ? ? ? ?^ >> ? ? [javac] Note: >> /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/src/org/globus/cog/abstraction/coaster/service/job/manager/Block.java >> uses or overrides a deprecated API. >> ? ? [javac] Note: Recompile with -Xlint:deprecation for details. >> ? ? [javac] Note: Some input files use unchecked or unsafe operations. >> ? ? [javac] Note: Recompile with -Xlint:unchecked for details. >> ? ? [javac] 1 error >> >> BUILD FAILED >> /autonfs/home/aespinosa/swift/cogkit/modules/swift/build.xml:73: The >> following error occurred while executing this line: >> /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:444: The following >> error occurred while executing this line: >> /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:79: The following >> error occurred while executing this line: >> /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:52: The following >> error occurred while executing this line: >> /autonfs/home/aespinosa/swift/cogkit/modules/swift/dependencies.xml:13: >> The following error occurred while executing this line: >> /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:163: The following >> error occurred while executing this line: >> /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:168: The following >> error occurred while executing this line: >> /autonfs/home/aespinosa/swift/cogkit/modules/provider-coaster/build.xml:59: >> The following error occurred while executing this line: >> /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:465: The following >> error occurred while executing this line: >> /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:228: Compile failed; >> see the compiler error output for details. >> >> Total time: 38 seconds >> >> > > From ketan at mcs.anl.gov Fri Apr 8 22:57:35 2011 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Fri, 8 Apr 2011 22:57:35 -0500 Subject: [Swift-devel] exit code 254, manual passive coasters on bionimbus cloud Message-ID: Hi, I am trying the modftdock application on the Bionimbus cloud with the following manual coasters setup: - A Manual coaster service is running on a cloud head node: coaster-service -nosec Local contacts: [http://10.101.13.200:53513, http://10.101.12.200:53513, http://10.101.9.200:53513, http://10.101.11.200:53513, http://10.101.10.200:53513, http://10.101.8.200:53513, http://10.101.7.200:53513, http://10.101.6.200:53513, http://10.101.5.200:53513, http://172.31.0.36:53513, http://131.193.181.210:53513] Started local service: http://131.193.181.210:53513 Started coaster service: http://131.193.181.210:1984 Started coaster service: http://131.193.181.210:1984 - Swift on the same above mentioned cloud head node Here is my commandline: swift -config cf -tc.file tc -sites.file bionimbus-coaster-with-provider-staging.xml ftdock.swift -n=1 -list=pdb.list.1 -grid=10 worker.pl is running on one VM accessible via ssh Swift Provider staging enabled as staging has to be done from a non-shared filesystem. -- I tested a simple catsn example on this setup and it works so the setup seems fine. However on running the modftdock I get the following 254 message: Swift svn swift-r4157 cog-r3056 RunID: 20110408-2235-en2fi70c Progress: SwiftScript trace: 3bg0-1 Find: http://localhost:1984 Find: keepalive(120), reconnect - http://localhost:1984 Progress: Active:1 Exception in modftdock: Arguments: [32, -modulo, 0:100, -root, 3bg0-1, -static, input/3bg0-1.pdb, -mobile, input/4TRA.pdb, -calculate_grid, 10, -angle_step, 10, -keep, 10, -noelec] Host: localhost Directory: ftdock-20110408-2235-en2fi70c/jobs/i/modftdock-idyrqf8kTODO: outs ---- Caused by: Job failed with an exit code of 254 Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 254 Final status: Failed:1 The following errors have occurred: 1. Job failed with an exit code of 254 I did some find + grep and ** I found the string "TODO: outs" on my SWIFT_HOME/libexec/vdl-int-staging.k In addition, I noticed that workdirectory was not created in my directory. Attached are the cf, tc, bionimbus...xml files Also attached is the log for this run. Thanks for any clues on this. Regards, Ketan -------------- next part -------------- A non-text attachment was scrubbed... Name: ftdock-20110408-2235-en2fi70c.log Type: application/octet-stream Size: 30330 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: bionimbus-coaster-with-provider-staging.xml Type: application/xml Size: 1156 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cf Type: application/octet-stream Size: 297 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tc Type: application/octet-stream Size: 128 bytes Desc: not available URL: -------------- next part -------------- From jon.monette at gmail.com Sat Apr 9 11:28:08 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sat, 9 Apr 2011 11:28:08 -0500 Subject: [Swift-devel] exit code 254, manual passive coasters on bionimbus cloud In-Reply-To: References: Message-ID: Error 254 means that an app could not execute. Reasons for this are normally tc entry for the app is wrong, the script was not set to executable, staging couldn't be done, or simply the app is not compiled to run on the machine. Does this help in anyway? I am not sure why the TODO: outs is there as I have never seen this line appear in logs or anything. On Apr 8, 2011 10:57 PM, "Ketan Maheshwari" wrote: > Hi, > > I am trying the modftdock application on the Bionimbus cloud with the following manual coasters setup: > > - A Manual coaster service is running on a cloud head node: > > coaster-service -nosec > Local contacts: [http://10.101.13.200:53513, http://10.101.12.200:53513, http://10.101.9.200:53513, http://10.101.11.200:53513, http://10.101.10.200:53513, http://10.101.8.200:53513, http://10.101.7.200:53513, http://10.101.6.200:53513, http://10.101.5.200:53513, http://172.31.0.36:53513, http://131.193.181.210:53513] > Started local service: http://131.193.181.210:53513 > Started coaster service: http://131.193.181.210:1984 > Started coaster service: http://131.193.181.210:1984 > > > - Swift on the same above mentioned cloud head node > Here is my commandline: > > swift -config cf -tc.file tc -sites.file bionimbus-coaster-with-provider-staging.xml ftdock.swift -n=1 -list=pdb.list.1 -grid=10 > > worker.pl is running on one VM accessible via ssh > > Swift Provider staging enabled as staging has to be done from a non-shared filesystem. > > -- I tested a simple catsn example on this setup and it works so the setup seems fine. > > However on running the modftdock I get the following 254 message: > > Swift svn swift-r4157 cog-r3056 > > RunID: 20110408-2235-en2fi70c > Progress: > SwiftScript trace: 3bg0-1 > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: Active:1 > Exception in modftdock: > Arguments: [32, -modulo, 0:100, -root, 3bg0-1, -static, input/3bg0-1.pdb, -mobile, input/4TRA.pdb, -calculate_grid, 10, -angle_step, 10, -keep, 10, -noelec] > Host: localhost > Directory: ftdock-20110408-2235-en2fi70c/jobs/i/modftdock-idyrqf8kTODO: outs > ---- > > Caused by: Job failed with an exit code of 254 > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 254 > Final status: Failed:1 > The following errors have occurred: > 1. Job failed with an exit code of 254 > > I did some find + grep and > ** I found the string "TODO: outs" on my SWIFT_HOME/libexec/vdl-int-staging.k > > In addition, I noticed that workdirectory was not created in my directory. > > > Attached are the cf, tc, bionimbus...xml files > > Also attached is the log for this run. > > Thanks for any clues on this. > > Regards, > Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Mon Apr 11 11:57:15 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 11 Apr 2011 11:57:15 -0500 (CDT) Subject: [Swift-devel] [Bug 343] New: Add support for optional input and output files Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=343 Summary: Add support for optional input and output files Product: Swift Version: 1.0 Platform: PC OS/Version: Mac OS Status: NEW Severity: enhancement Priority: P2 Component: SwiftScript language AssignedTo: wozniak at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov A frequent situation is that some application may not produce all of their declared output files, while other applications dont require all of their input files. This ticket is filed to determine whether this application pattern should be explicitly supported in Swift by a new notation to declare that outputs or inputs are optional. The assumption is that current Swift future-based data dependencies would remain unchanged: optional output files would be considered to "exist" when the program that might produce them completes, even if the file was not in fact created. Thus we'd need to change the wrapper and data transfer code to consider non-existance a valid situation in cases where it was declared so. This will need more discussion, but am filing this to get the discussion started. This feature was requested by the ParVis project, but many other users have asked how to handle such cases. The answer to date has always been to create wrappers that create or interpret zero-length files to signify non-existiing files. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From jon.monette at gmail.com Mon Apr 11 12:46:48 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 11 Apr 2011 12:46:48 -0500 Subject: [Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release? In-Reply-To: References: <417068539.67412.1302034002663.JavaMail.root@zimbra.anl.gov> Message-ID: My PBS error 254 was indeed that the outfile to the app was not created by the script so the stageout failed. On Wed, Apr 6, 2011 at 12:31 PM, Jonathan Monette wrote: > Ok. I found the app. It is a wrapper script I have that just makes sure > the the app I call returns exit code 0 and not some other exit code. Some > of the apps run and complete but not all of them. I can only assume it is > still returning an error code so I have to track this down. One thing that > should be changed is when the error 254 occurs that it specifies the name of > the app that failed(or job or something). This will at least help track > down why and where. > > > On Tue, Apr 5, 2011 at 3:14 PM, Jonathan Monette wrote: > >> Yes. I will certainly do that. And those are the usual suspects that I >> have seen for error 254, but the app I believe is failing do not have any of >> those properties. I am re-running the script hoping with some changes that >> will hopefully shed more on where it fails. PADS is in maintenance mode. >> There are several jobs in the queue and looks like none are even running. >> >> >> On Tue, Apr 5, 2011 at 3:06 PM, Michael Wilde wrote: >> >>> Jon, >>> >>> PBS Error 254 may be something like app in tc.data is not executable, or >>> app script calls something not found or not executable, or that makes it >>> return non-zero. It falls in that class of error that I just railed about in >>> Bug 321: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321 >>> >>> Its not clear to me that the same root problem manifests in exactly the >>> same error codes and messages under varioud providers and configurations, >>> which is another problem that the fix(es) to Big 321 should deal with. >>> >>> When you fix your 254, could you report back to swift-devel what it was, >>> and either file as a new bug or update Bug 321? >>> >>> - Mike >>> >>> >>> ----- Original Message ----- >>> > Correct. Based off how I was looping I was receiving the same cache >>> > error that Allan was receiving. Also, I never though of this but my >>> > Montage scripts were running very slowly in the trunk at some point(I >>> > am assuming this was the point that the twice each bug was introduced >>> > and everything was being done twice). Under the 0.92 branch by small >>> > workflows complete. My large workflows error out with PBS error 254 I >>> > believe. Cannot remember the error code but believe it was this one. >>> > But this is not due to the twice each bug. >>> > >>> > >>> > On Tue, Apr 5, 2011 at 2:50 PM, Michael Wilde < wilde at mcs.anl.gov > >>> > wrote: >>> > >>> > >>> > Just to clarify: we detected this bug by diagnosing the error that >>> > Allan was getting in his SCEC workflow, trying to add a file to a >>> > local cache that was already there. >>> > >>> > I never verified if the same bug was causing failures in Montage, but >>> > Jon reported Apr 4 12:04 AM that the small Montage was working under >>> > the fixed 0.92 branch and that the large Montage run was still to be >>> > tested. >>> > >>> > - Mike >>> > >>> > >>> > >>> > >>> > ----- Original Message ----- >>> > > got it thanks...to be clear i wasn't going to try to run the whole >>> > > montage scripit :P but this is easier than extracting the faulty >>> > > loop >>> > > :) >>> > > >>> > > >>> > > On Tue, Apr 5, 2011 at 12:37 PM, Jonathan Monette < >>> > > jon.monette at gmail.com > wrote: >>> > > >>> > > >>> > > Yes. That is the one I remember seeing. That is much easier than >>> > > what >>> > > my Montage scripts are doing. >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > On Tue, Apr 5, 2011 at 2:33 PM, Michael Wilde < wilde at mcs.anl.gov > >>> > > wrote: >>> > > >>> > > >>> > > Yes, I had posted variations of the following to the list: >>> > > >>> > > zz3.swift: >>> > > >>> > > int arr[]; >>> > > >>> > > arr[0]=1; >>> > > arr[1]=2; >>> > > >>> > > foreach a in arr { >>> > > trace("for", a); >>> > > } >>> > > >>> > > zz6.swift: >>> > > >>> > > >>> > > int arr[]; >>> > > >>> > > foreach a,i in [0:9] { >>> > > arr[i] = i; >>> > > } >>> > > >>> > > trace("arr",arr); >>> > > >>> > > foreach a,i in arr { >>> > > trace("for", a,i); >>> > > } >>> > > >>> > > >>> > > com$ >>> > > >>> PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH >>> > > com$ which swift >>> > > ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift >>> > > com$ cd swift/lab >>> > > com$ swift zz3.swift >>> > > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>> > > modified >>> > > locally) >>> > > >>> > > RunID: 20110404-1344-j98f22id >>> > > Progress: >>> > > SwiftScript trace: for, 2 >>> > > SwiftScript trace: for, 1 >>> > > Final status: >>> > > com$ PATH=~/swift/rev/swift-0.92/bin:$PATH >>> > > com$ swift zz3.swift >>> > > Swift svn swift-r4157 cog-r3056 >>> > > >>> > > RunID: 20110404-1344-ensm4te8 >>> > > Progress: >>> > > SwiftScript trace: for, 1 >>> > > SwiftScript trace: for, 2 >>> > > SwiftScript trace: for, 2 >>> > > SwiftScript trace: for, 1 >>> > > Final status: >>> > > com$ swift zz6.swift >>> > > Swift svn swift-r4157 cog-r3056 >>> > > >>> > > RunID: 20110404-1344-i7y6q1i1 >>> > > Progress: >>> > > SwiftScript trace: arr, arr.$[]/10 >>> > > SwiftScript trace: for, 3, 3 >>> > > SwiftScript trace: for, 2, 2 >>> > > SwiftScript trace: for, 4, 4 >>> > > SwiftScript trace: for, 5, 5 >>> > > SwiftScript trace: for, 3, 3 >>> > > SwiftScript trace: for, 5, 5 >>> > > SwiftScript trace: for, 9, 9 >>> > > SwiftScript trace: for, 4, 4 >>> > > SwiftScript trace: for, 1, 1 >>> > > SwiftScript trace: for, 7, 7 >>> > > SwiftScript trace: for, 7, 7 >>> > > SwiftScript trace: for, 6, 6 >>> > > SwiftScript trace: for, 9, 9 >>> > > SwiftScript trace: for, 6, 6 >>> > > SwiftScript trace: for, 1, 1 >>> > > SwiftScript trace: for, 2, 2 >>> > > SwiftScript trace: for, 0, 0 >>> > > SwiftScript trace: for, 8, 8 >>> > > SwiftScript trace: for, 0, 0 >>> > > SwiftScript trace: for, 8, 8 >>> > > Final status: >>> > > com$ >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > ----- Original Message ----- >>> > > > The script I posted might be too complex to use to replicate the >>> > > > twice >>> > > > each bug. However, didn't Mike post a simple loop script that was >>> > > > looping twice when the bug was initially found? >>> > > > >>> > > > >>> > > > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari < >>> > > > ketancmaheshwari at gmail.com > wrote: >>> > > > >>> > > > >>> > > > >>> > > > Sarah, >>> > > > >>> > > > >>> > > > I do not have the test you are asking for yet. I am looking at the >>> > > > test suite and will start on Beagle soon. >>> > > > >>> > > > >>> > > > Ketan >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote: >>> > > > >>> > > > >>> > > > i'm currently working on a swift script to replicate the bug for >>> > > > .92 >>> > > > which i will then commit to svn in the test suite. if you mike, or >>> > > > ketan already have this let me know (i'm trying to hack the script >>> > > > jon >>> > > > posted to the list) and i'll use yours...david said he doesn't >>> > > > have >>> > > > one. >>> > > > >>> > > > as i said, my plan was to test on ranger, abe and a couple of >>> > > > (uci) >>> > > > local workstations. >>> > > > >>> > > > ~sk >>> > > > >>> > > > >>> > > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov >>> > > > > >>> > > > wrote: >>> > > > >>> > > > >>> > > > David, Sarah, Ketan, >>> > > > >>> > > > Can you all report back to the devel list on your progress on >>> > > > testing >>> > > > the release? Ie, what systems are you testing, and which of those >>> > > > tests are complete? When will the rest be done, and hence when are >>> > > > we >>> > > > ready to tag and release the fix? >>> > > > >>> > > > I asked who will create the test to confirm that the twice-each >>> > > > bug >>> > > > is >>> > > > fixed, but no one responded. Which of the three of you feel you >>> > > > know >>> > > > how to do this? Is this being tested in your new tests? >>> > > > >>> > > > Ketan tells me that in the 0.92+ interim release I made for Beagle >>> > > > it >>> > > > looks like the resume feature is not working. I was aware that >>> > > > such >>> > > > a >>> > > > bug was reported in trunk, but in the original 0.92 Cray version >>> > > > (under /home/wilde/swift/rev) resume *was* working. Does the test >>> > > > suite test the resume feature at the moment? >>> > > > >>> > > > Lastly, who will tag and upload the new release, remove or change >>> > > > the >>> > > > red warning in the download page, and announce 0.92.1 on >>> > > > swift-user? >>> > > > >>> > > > - Mike >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > ----- Original Message ----- >>> > > > > Thanks, David. Please cc all discussion of this sort to >>> > > > > swift-devel. >>> > > > > >>> > > > > I assume SVN is working for you now? (It was working for me, >>> > > > > from >>> > > > > communicadao, around 9AM this morning). >>> > > > > >>> > > > > - Mike >>> > > > > >>> > > > > >>> > > > > ----- Original Message ----- >>> > > > > > It appears that there may be a problem with >>> > > > > > svn.ci.uchicago.edu >>> > > > > > . >>> > > > > > I >>> > > > > > am >>> > > > > > unable to connect from an SVN client or through the web >>> > > > > > interface >>> > > > > > - >>> > > > > > both attempts just hang indefinitely. I have sent an email to >>> > > > > > support >>> > > > > > (ticket 12539), but just wanted to give you guys a heads up >>> > > > > > that >>> > > > > > there >>> > > > > > may be an issue there. I will try to run the tests again in >>> > > > > > the >>> > > > > > morning. >>> > > > > > >>> > > > > > David >>> > > > > > >>> > > > > > >>> > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde < >>> > > > > > wilde at mcs.anl.gov >>> > > > > > > >>> > > > > > wrote: >>> > > > > > >>> > > > > > >>> > > > > > David, Sarah, >>> > > > > > >>> > > > > > How quickly could you re-divide the Swift site test plan >>> > > > > > between >>> > > > > > you >>> > > > > > and confirm back to swift-devel that we are ready to tag and >>> > > > > > release >>> > > > > > the branch as 0.92.1? >>> > > > > > >>> > > > > > Before we do that, you need to add a test to the test suite >>> > > > > > that >>> > > > > > can >>> > > > > > replicate the twice-each bug and verify that its detected in >>> > > > > > 0.92 >>> > > > > > and >>> > > > > > corrected in 0.92.1 >>> > > > > > >>> > > > > > Can you possibly do this by noon tomorrow? >>> > > > > > >>> > > > > > Can you post a checklist of tests with names of who's going to >>> > > > > > run >>> > > > > > them? >>> > > > > > >>> > > > > > Depending on what you can commit to, I will see if I, Ketan, >>> > > > > > and/or >>> > > > > > Justin can help take various sites as well. I feel we really >>> > > > > > need >>> > > > > > to >>> > > > > > do this quickly so we have a stable trusted release out there. >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > Thanks, >>> > > > > > >>> > > > > > Mike >>> > > > > > >>> > > > > > -- >>> > > > > > Michael Wilde >>> > > > > > Computation Institute, University of Chicago >>> > > > > > Mathematics and Computer Science Division >>> > > > > > Argonne National Laboratory >>> > > > > >>> > > > > -- >>> > > > > Michael Wilde >>> > > > > Computation Institute, University of Chicago >>> > > > > Mathematics and Computer Science Division >>> > > > > Argonne National Laboratory >>> > > > > >>> > > > > _______________________________________________ >>> > > > > Swift-devel mailing list >>> > > > > Swift-devel at ci.uchicago.edu >>> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> > > > >>> > > > -- >>> > > > >>> > > > >>> > > > >>> > > > Michael Wilde >>> > > > Computation Institute, University of Chicago >>> > > > Mathematics and Computer Science Division >>> > > > Argonne National Laboratory >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > _______________________________________________ >>> > > > Swift-devel mailing list >>> > > > Swift-devel at ci.uchicago.edu >>> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > -- >>> > > > Any intelligent fool can make things bigger and more complex... It >>> > > > takes a touch of genius - and a lot of courage to move in the >>> > > > opposite >>> > > > direction. >>> > > > - Albert Einstein >>> > > > >>> > > > >>> > > > >>> > > > _______________________________________________ >>> > > > Swift-devel mailing list >>> > > > Swift-devel at ci.uchicago.edu >>> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> > > >>> > > -- >>> > > Michael Wilde >>> > > Computation Institute, University of Chicago >>> > > Mathematics and Computer Science Division >>> > > Argonne National Laboratory >>> > > >>> > > >>> > > >>> > > >>> > > -- >>> > > >>> > > >>> > > >>> > > Any intelligent fool can make things bigger and more complex... It >>> > > takes a touch of genius - and a lot of courage to move in the >>> > > opposite >>> > > direction. >>> > > - Albert Einstein >>> > > >>> > > >>> > > >>> > > _______________________________________________ >>> > > Swift-devel mailing list >>> > > Swift-devel at ci.uchicago.edu >>> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> > >>> > -- >>> > >>> > >>> > >>> > Michael Wilde >>> > Computation Institute, University of Chicago >>> > Mathematics and Computer Science Division >>> > Argonne National Laboratory >>> > >>> > >>> > >>> > >>> > -- >>> > Any intelligent fool can make things bigger and more complex... It >>> > takes a touch of genius - and a lot of courage to move in the opposite >>> > direction. >>> > - Albert Einstein >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >>> >> >> >> -- >> Any intelligent fool can make things bigger and more complex... It takes a >> touch of genius - and a lot of courage to move in the opposite direction. >> - Albert Einstein >> >> >> > > > -- > Any intelligent fool can make things bigger and more complex... It takes a > touch of genius - and a lot of courage to move in the opposite direction. > - Albert Einstein > > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Mon Apr 11 12:59:11 2011 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 11 Apr 2011 12:59:11 -0500 Subject: [Swift-devel] exit code 254, manual passive coasters on bionimbus cloud In-Reply-To: References: Message-ID: That does make sense Jon: From my experiments, it seems the provider staging is not able to stage-in the app executable. I came to this conclusion because: 1. The same setup that I am using works for another example where the app is a pre-installed /bin/cat 2. The wrapper log file indeed complains the executable /provided/exe/path does not exist While I look into more detail on this, could someone confirm if you have used provider staging and faced this issue? Ketan On Apr 9, 2011, at 11:28 AM, Jonathan Monette wrote: > Error 254 means that an app could not execute. Reasons for this are normally tc entry for the app is wrong, the script was not set to executable, staging couldn't be done, or simply the app is not compiled to run on the machine. Does this help in anyway? I am not sure why the TODO: outs is there as I have never seen this line appear in logs or anything. > > On Apr 8, 2011 10:57 PM, "Ketan Maheshwari" wrote: > > Hi, > > > > I am trying the modftdock application on the Bionimbus cloud with the following manual coasters setup: > > > > - A Manual coaster service is running on a cloud head node: > > > > coaster-service -nosec > > Local contacts: [http://10.101.13.200:53513, http://10.101.12.200:53513, http://10.101.9.200:53513, http://10.101.11.200:53513, http://10.101.10.200:53513, http://10.101.8.200:53513, http://10.101.7.200:53513, http://10.101.6.200:53513, http://10.101.5.200:53513, http://172.31.0.36:53513, http://131.193.181.210:53513] > > Started local service: http://131.193.181.210:53513 > > Started coaster service: http://131.193.181.210:1984 > > Started coaster service: http://131.193.181.210:1984 > > > > > > - Swift on the same above mentioned cloud head node > > Here is my commandline: > > > > swift -config cf -tc.file tc -sites.file bionimbus-coaster-with-provider-staging.xml ftdock.swift -n=1 -list=pdb.list.1 -grid=10 > > > > worker.pl is running on one VM accessible via ssh > > > > Swift Provider staging enabled as staging has to be done from a non-shared filesystem. > > > > -- I tested a simple catsn example on this setup and it works so the setup seems fine. > > > > However on running the modftdock I get the following 254 message: > > > > Swift svn swift-r4157 cog-r3056 > > > > RunID: 20110408-2235-en2fi70c > > Progress: > > SwiftScript trace: 3bg0-1 > > Find: http://localhost:1984 > > Find: keepalive(120), reconnect - http://localhost:1984 > > Progress: Active:1 > > Exception in modftdock: > > Arguments: [32, -modulo, 0:100, -root, 3bg0-1, -static, input/3bg0-1.pdb, -mobile, input/4TRA.pdb, -calculate_grid, 10, -angle_step, 10, -keep, 10, -noelec] > > Host: localhost > > Directory: ftdock-20110408-2235-en2fi70c/jobs/i/modftdock-idyrqf8kTODO: outs > > ---- > > > > Caused by: Job failed with an exit code of 254 > > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 254 > > Final status: Failed:1 > > The following errors have occurred: > > 1. Job failed with an exit code of 254 > > > > I did some find + grep and > > ** I found the string "TODO: outs" on my SWIFT_HOME/libexec/vdl-int-staging.k > > > > In addition, I noticed that workdirectory was not created in my directory. > > > > > > Attached are the cf, tc, bionimbus...xml files > > > > Also attached is the log for this run. > > > > Thanks for any clues on this. > > > > Regards, > > Ketan > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Apr 11 13:59:44 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 11 Apr 2011 11:59:44 -0700 Subject: [Swift-devel] 0.92.1 Message-ID: <1302548384.10462.1.camel@blabla2.none> We probably need to make this release sooner rather than later. Mihael From wilde at mcs.anl.gov Mon Apr 11 14:05:06 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 11 Apr 2011 14:05:06 -0500 (CDT) Subject: [Swift-devel] 0.92.1 In-Reply-To: <1302548384.10462.1.camel@blabla2.none> Message-ID: <1138462572.89748.1302548706515.JavaMail.root@zimbra.anl.gov> We were targeting this for last Friday but didnt succeed. And Im not sure we were using the best approach. Should we tag a release candidate and test that instead of the raw branch? What we said last Wednesday was something like: [ ] Sarah: document what systems you test and how you test them [ ] Ranger [ ] Abe -> will move this to test other TG Systems [ ] Beagle [ ] David: same for CI/Argonne systems [ ] Argonne computer servers [ ] PADS [ ] Fusion [ ] local tests [ ] Justin: make 0.92.1 tag Sarah and David, where do you stand on the above tests? - Mike ----- Original Message ----- > We probably need to make this release sooner rather than later. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Mon Apr 11 14:06:33 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 11 Apr 2011 12:06:33 -0700 Subject: [Swift-devel] [Bug 343] New: Add support for optional input and output files In-Reply-To: References: Message-ID: <1302548793.10462.8.camel@blabla2.none> The issue that I see is not as much how to implement optional files, but how to deal with them. Consider the following: app (optional file x) myapp() { blabla; } file x = myapp(); file y = f(x); The question is what should that program do. Though that's probably a simple example: one could simply say that anything dependent on x is also optional (so f would not be invoked). How about a reduce: app (optional file x) myapp(int i) { blabla i; } for i in [0:100] { a[i] = myapp(i); } file y = reduce(a); Should reduce be invoked with only the available array elements? Should it not be invoked at all? Should the type system be used to distinguish between the two? Mihael On Mon, 2011-04-11 at 11:57 -0500, bugzilla-daemon at mcs.anl.gov wrote: > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=343 > > Summary: Add support for optional input and output files > Product: Swift > Version: 1.0 > Platform: PC > OS/Version: Mac OS > Status: NEW > Severity: enhancement > Priority: P2 > Component: SwiftScript language > AssignedTo: wozniak at mcs.anl.gov > ReportedBy: wilde at mcs.anl.gov > > > A frequent situation is that some application may not produce all of their > declared output files, while other applications dont require all of their input > files. > > This ticket is filed to determine whether this application pattern should be > explicitly supported in Swift by a new notation to declare that outputs or > inputs are optional. > > The assumption is that current Swift future-based data dependencies would > remain unchanged: optional output files would be considered to "exist" when the > program that might produce them completes, even if the file was not in fact > created. Thus we'd need to change the wrapper and data transfer code to > consider non-existance a valid situation in cases where it was declared so. > > This will need more discussion, but am filing this to get the discussion > started. > > This feature was requested by the ParVis project, but many other users have > asked how to handle such cases. The answer to date has always been to create > wrappers that create or interpret zero-length files to signify non-existiing > files. > From hategan at mcs.anl.gov Mon Apr 11 14:09:05 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 11 Apr 2011 12:09:05 -0700 Subject: [Swift-devel] 0.92.1 In-Reply-To: <1138462572.89748.1302548706515.JavaMail.root@zimbra.anl.gov> References: <1138462572.89748.1302548706515.JavaMail.root@zimbra.anl.gov> Message-ID: <1302548945.10462.11.camel@blabla2.none> On Mon, 2011-04-11 at 14:05 -0500, Michael Wilde wrote: > We were targeting this for last Friday but didnt succeed. And Im not sure we were using the best approach. > > Should we tag a release candidate and test that instead of the raw branch? We can do that. Didn't we have some policy in place? Mihael > > What we said last Wednesday was something like: > > [ ] Sarah: document what systems you test and how you test them > [ ] Ranger > [ ] Abe -> will move this to test other TG Systems > [ ] Beagle > [ ] David: same for CI/Argonne systems > [ ] Argonne computer servers > [ ] PADS > [ ] Fusion > [ ] local tests > [ ] Justin: make 0.92.1 tag > > Sarah and David, where do you stand on the above tests? > > - Mike > > > > > > > > ----- Original Message ----- > > We probably need to make this release sooner rather than later. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From skenny at uchicago.edu Mon Apr 11 14:11:26 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 11 Apr 2011 12:11:26 -0700 Subject: [Swift-devel] 0.92.1 In-Reply-To: <1138462572.89748.1302548706515.JavaMail.root@zimbra.anl.gov> References: <1302548384.10462.1.camel@blabla2.none> <1138462572.89748.1302548706515.JavaMail.root@zimbra.anl.gov> Message-ID: On Mon, Apr 11, 2011 at 12:05 PM, Michael Wilde wrote: > We were targeting this for last Friday but didnt succeed. And Im not sure > we were using the best approach. > > Should we tag a release candidate and test that instead of the raw branch? > > What we said last Wednesday was something like: > > [ ] Sarah: document what systems you test and how you test them > [ ] Ranger > [ ] Abe -> will move this to test other TG Systems > [ ] Beagle > my understanding is that running on beagle requires a fix that justin committed to trunk but it is not yet available in .092.1 (?) justin can you confirm? should i try testing with .92.1 again? > [ ] David: same for CI/Argonne systems > [ ] Argonne computer servers > [ ] PADS > [ ] Fusion > [ ] local tests > [ ] Justin: make 0.92.1 tag > > Sarah and David, where do you stand on the above tests? > > - Mike > > > > > > > > ----- Original Message ----- > > We probably need to make this release sooner rather than later. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Apr 11 14:20:12 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 11 Apr 2011 14:20:12 -0500 (CDT) Subject: [Swift-devel] 0.92.1 In-Reply-To: <1302548945.10462.11.camel@blabla2.none> Message-ID: <1101811107.89877.1302549612897.JavaMail.root@zimbra.anl.gov> Do date we have not been tagging releases as far as I know. Eg when Ben released 0.9, I think he made a tarball from trunk (see below). I dont see any tags that reflect Swift releases, eg 0.8, 0.9, under the svn tags/ dir. So unless Im looking in the wrong place we have have not tagged to date, nor have we branched until recently. I think that we want to branch each release (for fixes) as we discussed, as well as tagging each release candidate and the final release. Discussion? - Mike ----- Original Message ----- > On Mon, 2011-04-11 at 14:05 -0500, Michael Wilde wrote: > > We were targeting this for last Friday but didnt succeed. And Im not > > sure we were using the best approach. > > > > Should we tag a release candidate and test that instead of the raw > > branch? > > We can do that. Didn't we have some policy in place? > > Mihael > > > > > What we said last Wednesday was something like: > > > > [ ] Sarah: document what systems you test and how you test them > > [ ] Ranger > > [ ] Abe -> will move this to test other TG Systems > > [ ] Beagle > > [ ] David: same for CI/Argonne systems > > [ ] Argonne computer servers > > [ ] PADS > > [ ] Fusion > > [ ] local tests > > [ ] Justin: make 0.92.1 tag > > > > Sarah and David, where do you stand on the above tests? > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > We probably need to make this release sooner rather than later. > > > > > > Mihael > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Mon Apr 11 14:20:32 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 11 Apr 2011 14:20:32 -0500 Subject: [Swift-devel] [Bug 343] New: Add support for optional input and output files Message-ID: Well the case I have in my scripts would be to only run the reduce on the available elements in the array. I am not sure why the other case would be valid. Not doin the reduce on the array because an outfile was not mapped is the same as what swift currently does. The only difference is that instead of causing the swift system to fail it just tries to continue on the execution. On Apr 11, 2011 2:06 PM, "Mihael Hategan" wrote: -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Apr 11 14:30:47 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 11 Apr 2011 14:30:47 -0500 (CDT) Subject: [Swift-devel] 0.92.1 In-Reply-To: Message-ID: <2129470799.89932.1302550247599.JavaMail.root@zimbra.anl.gov> > my understanding is that running on beagle requires a fix that justin > committed to trunk but it is not yet available in .092.1 (?) justin > can you confirm? should i try testing with .92.1 again? Yes, thats right. Is the Beagle test waiting for me to commit my changes to the 0.92 branch. Justin, what did we decide on this? (You retrofit the trunk change for Cray support or I commit my 0.92-based change?) Im sorry if I'm holding this part up. Sarah, did you re-test ranger on the latest 0.92? David, what is the status of your tests? Should (and could) Sarah complete these if you have been unable to? - Mike From wozniak at mcs.anl.gov Mon Apr 11 14:34:03 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 11 Apr 2011 14:34:03 -0500 (Central Daylight Time) Subject: [Swift-devel] 0.92.1 In-Reply-To: References: <1302548384.10462.1.camel@blabla2.none> <1138462572.89748.1302548706515.JavaMail.root@zimbra.anl.gov> Message-ID: That's right, the aprun feature is not part of the 0.92 release plan. Let's get 0.92.1 tested as is and move forward to 0.93... Justin On Mon, 11 Apr 2011, Sarah Kenny wrote: > On Mon, Apr 11, 2011 at 12:05 PM, Michael Wilde wrote: > >> We were targeting this for last Friday but didnt succeed. And Im not sure >> we were using the best approach. >> >> Should we tag a release candidate and test that instead of the raw branch? >> >> What we said last Wednesday was something like: >> >> [ ] Sarah: document what systems you test and how you test them >> [ ] Ranger >> [ ] Abe -> will move this to test other TG Systems >> [ ] Beagle >> > > my understanding is that running on beagle requires a fix that justin > committed to trunk but it is not yet available in .092.1 (?) justin can you > confirm? should i try testing with .92.1 again? > > > >> [ ] David: same for CI/Argonne systems >> [ ] Argonne computer servers >> [ ] PADS >> [ ] Fusion >> [ ] local tests >> [ ] Justin: make 0.92.1 tag >> >> Sarah and David, where do you stand on the above tests? >> >> - Mike >> >> >> >> >> >> >> >> ----- Original Message ----- >>> We probably need to make this release sooner rather than later. >>> >>> Mihael >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > -- Justin M Wozniak From wozniak at mcs.anl.gov Mon Apr 11 14:41:01 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 11 Apr 2011 14:41:01 -0500 (Central Daylight Time) Subject: [Swift-devel] 0.92.1 In-Reply-To: <2129470799.89932.1302550247599.JavaMail.root@zimbra.anl.gov> References: <2129470799.89932.1302550247599.JavaMail.root@zimbra.anl.gov> Message-ID: On Mon, 11 Apr 2011, Michael Wilde wrote: >> my understanding is that running on beagle requires a fix that justin >> committed to trunk but it is not yet available in .092.1 (?) justin >> can you confirm? should i try testing with .92.1 again? > > Yes, thats right. Is the Beagle test waiting for me to commit my changes > to the 0.92 branch. Justin, what did we decide on this? (You retrofit > the trunk change for Cray support or I commit my 0.92-based change?) I made have said something about this in passing but I really think we should go by the list on the swift-devel page and move forward to 0.93- release-0.92 was branched for testing in early January. > Im sorry if I'm holding this part up. > Sarah, did you re-test ranger on the latest 0.92? > > David, what is the status of your tests? Should (and could) Sarah > complete these if you have been unable to? > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From skenny at uchicago.edu Mon Apr 11 15:08:21 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 11 Apr 2011 13:08:21 -0700 Subject: [Swift-devel] 0.92.1 In-Reply-To: References: <2129470799.89932.1302550247599.JavaMail.root@zimbra.anl.gov> Message-ID: On Mon, Apr 11, 2011 at 12:41 PM, Justin M Wozniak wrote: > On Mon, 11 Apr 2011, Michael Wilde wrote: > > my understanding is that running on beagle requires a fix that justin >>> committed to trunk but it is not yet available in .092.1 (?) justin >>> can you confirm? should i try testing with .92.1 again? >>> >> >> Yes, thats right. Is the Beagle test waiting for me to commit my changes >> to the 0.92 branch. Justin, what did we decide on this? (You retrofit the >> trunk change for Cray support or I commit my 0.92-based change?) >> > > I made have said something about this in passing but I really think we > should go by the list on the swift-devel page and move forward to 0.93- > release-0.92 was branched for testing in early January. > > Im sorry if I'm holding this part up. >> Sarah, did you re-test ranger on the latest 0.92? >> > ranger has been tested on .92.1 (and a test specific to the bug fix has been added to the test suite for sge provider). > >> David, what is the status of your tests? Should (and could) Sarah complete >> these if you have been unable to? >> >> - Mike >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > -- > Justin M Wozniak > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Mon Apr 11 15:22:20 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Mon, 11 Apr 2011 16:22:20 -0400 Subject: [Swift-devel] 0.92.1 In-Reply-To: <1138462572.89748.1302548706515.JavaMail.root@zimbra.anl.gov> References: <1302548384.10462.1.camel@blabla2.none> <1138462572.89748.1302548706515.JavaMail.root@zimbra.anl.gov> Message-ID: It has been tested on PADS, Argonne systems and locally. I am struggling to get things running on Fusion - I have been getting into a situation where the job will just sit there for days and never execute. I will send an email to their tech support to get some more detail. In the meantime, if anyone wants to take a stab at getting it running on Fusion that would be helpful. There is a test script in tests/providers/local-sge-coasters/002-duplicate-submission-local-sge-coasters.swift. Thanks, David On Mon, Apr 11, 2011 at 3:05 PM, Michael Wilde wrote: > We were targeting this for last Friday but didnt succeed. And Im not sure > we were using the best approach. > > Should we tag a release candidate and test that instead of the raw branch? > > What we said last Wednesday was something like: > > [ ] Sarah: document what systems you test and how you test them > [ ] Ranger > [ ] Abe -> will move this to test other TG Systems > [ ] Beagle > [ ] David: same for CI/Argonne systems > [ ] Argonne computer servers > [ ] PADS > [ ] Fusion > [ ] local tests > [ ] Justin: make 0.92.1 tag > > Sarah and David, where do you stand on the above tests? > > - Mike > > > > > > > > ----- Original Message ----- > > We probably need to make this release sooner rather than later. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Apr 11 15:31:33 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 11 Apr 2011 15:31:33 -0500 (CDT) Subject: [Swift-devel] 0.92.1 In-Reply-To: Message-ID: <362215613.90356.1302553893797.JavaMail.root@zimbra.anl.gov> David, thats great to hear about PADS, Argonne servers, and local. Did you announce that, and I missed it? Thats what Ive been waiting to hear. My apologies if I missed an email. I think we should release without the Fusion tests. Please swift-devel on any support tickets you file so we can help stay on top of them. One possibility is that you're using parameters that will never be met on Fusion. You should post the PBS .submit file from ~/.globus/scripts so we (and Fusion support) can assess why jobs are not running. Thanks, Mike ----- Original Message ----- It has been tested on PADS, Argonne systems and locally. I am struggling to get things running on Fusion - I have been getting into a situation where the job will just sit there for days and never execute. I will send an email to their tech support to get some more detail. In the meantime, if anyone wants to take a stab at getting it running on Fusion that would be helpful. There is a test script in tests/providers/local-sge-coasters/002-duplicate-submission-local-sge-coasters.swift. Thanks, David On Mon, Apr 11, 2011 at 3:05 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: We were targeting this for last Friday but didnt succeed. And Im not sure we were using the best approach. Should we tag a release candidate and test that instead of the raw branch? What we said last Wednesday was something like: [ ] Sarah: document what systems you test and how you test them [ ] Ranger [ ] Abe -> will move this to test other TG Systems [ ] Beagle [ ] David: same for CI/Argonne systems [ ] Argonne computer servers [ ] PADS [ ] Fusion [ ] local tests [ ] Justin: make 0.92.1 tag Sarah and David, where do you stand on the above tests? - Mike ----- Original Message ----- > We probably need to make this release sooner rather than later. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Mon Apr 11 15:51:23 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Mon, 11 Apr 2011 16:51:23 -0400 Subject: [Swift-devel] 0.92.1 In-Reply-To: <362215613.90356.1302553893797.JavaMail.root@zimbra.anl.gov> References: <362215613.90356.1302553893797.JavaMail.root@zimbra.anl.gov> Message-ID: I don't think I announced it - but I should have sent a status update on the other tests while I was hammering out the issues with Fusion. Sorry about that. David On Mon, Apr 11, 2011 at 4:31 PM, Michael Wilde wrote: > David, thats great to hear about PADS, Argonne servers, and local. Did you > announce that, and I missed it? Thats what Ive been waiting to hear. My > apologies if I missed an email. > > I think we should release without the Fusion tests. Please swift-devel on > any support tickets you file so we can help stay on top of them. > > One possibility is that you're using parameters that will never be met on > Fusion. You should post the PBS .submit file from ~/.globus/scripts so we > (and Fusion support) can assess why jobs are not running. > > Thanks, > > Mike > > ------------------------------ > > It has been tested on PADS, Argonne systems and locally. I am struggling to > get things running on Fusion - I have been getting into a situation where > the job will just sit there for days and never execute. I will send an email > to their tech support to get some more detail. In the meantime, if anyone > wants to take a stab at getting it running on Fusion that would be helpful. > There is a test script in > tests/providers/local-sge-coasters/002-duplicate-submission-local-sge-coasters.swift. > > Thanks, > David > > On Mon, Apr 11, 2011 at 3:05 PM, Michael Wilde wrote: > >> We were targeting this for last Friday but didnt succeed. And Im not sure >> we were using the best approach. >> >> Should we tag a release candidate and test that instead of the raw branch? >> >> What we said last Wednesday was something like: >> >> [ ] Sarah: document what systems you test and how you test them >> [ ] Ranger >> [ ] Abe -> will move this to test other TG Systems >> [ ] Beagle >> [ ] David: same for CI/Argonne systems >> [ ] Argonne computer servers >> [ ] PADS >> [ ] Fusion >> [ ] local tests >> [ ] Justin: make 0.92.1 tag >> >> Sarah and David, where do you stand on the above tests? >> >> - Mike >> >> >> >> >> >> >> >> ----- Original Message ----- >> > We probably need to make this release sooner rather than later. >> > >> > Mihael >> > >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Apr 11 16:12:36 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 11 Apr 2011 16:12:36 -0500 (CDT) Subject: [Swift-devel] 0.92.1 In-Reply-To: Message-ID: <1292163741.90695.1302556355991.JavaMail.root@zimbra.anl.gov> Cool. For Fusion, if you add the line " debug=true" to the end of the file etc/provider-pbs.properties, Swift will save the .submit file for all PBS jobs in ~/.globus/scripts. That file gets overwritten when you build from source; in that case you should make the change in your source tree in the source version of that file, under cog/modules/provider-localscheduler/etc. - Mike ----- Original Message ----- I don't think I announced it - but I should have sent a status update on the other tests while I was hammering out the issues with Fusion. Sorry about that. David On Mon, Apr 11, 2011 at 4:31 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: David, thats great to hear about PADS, Argonne servers, and local. Did you announce that, and I missed it? Thats what Ive been waiting to hear. My apologies if I missed an email. I think we should release without the Fusion tests. Please swift-devel on any support tickets you file so we can help stay on top of them. One possibility is that you're using parameters that will never be met on Fusion. You should post the PBS .submit file from ~/.globus/scripts so we (and Fusion support) can assess why jobs are not running. Thanks, Mike It has been tested on PADS, Argonne systems and locally. I am struggling to get things running on Fusion - I have been getting into a situation where the job will just sit there for days and never execute. I will send an email to their tech support to get some more detail. In the meantime, if anyone wants to take a stab at getting it running on Fusion that would be helpful. There is a test script in tests/providers/local-sge-coasters/002-duplicate-submission-local-sge-coasters.swift. Thanks, David On Mon, Apr 11, 2011 at 3:05 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: We were targeting this for last Friday but didnt succeed. And Im not sure we were using the best approach. Should we tag a release candidate and test that instead of the raw branch? What we said last Wednesday was something like: [ ] Sarah: document what systems you test and how you test them [ ] Ranger [ ] Abe -> will move this to test other TG Systems [ ] Beagle [ ] David: same for CI/Argonne systems [ ] Argonne computer servers [ ] PADS [ ] Fusion [ ] local tests [ ] Justin: make 0.92.1 tag Sarah and David, where do you stand on the above tests? - Mike ----- Original Message ----- > We probably need to make this release sooner rather than later. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Mon Apr 11 19:48:29 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 11 Apr 2011 19:48:29 -0500 Subject: [Swift-devel] error in 0.92 source Message-ID: Does anyone know what is this error trying to tell me? Caused by: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/rectified-20110411-1923-pqygyay7"): java.io.IOException: error=7, Argument list too long Caused by: java.io.IOException: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/rectified-20110411-1923-pqygyay7"): java.io.IOException: error=7, Argument list too long -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Apr 12 06:10:19 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 12 Apr 2011 06:10:19 -0500 (CDT) Subject: [Swift-devel] error in 0.92 source In-Reply-To: Message-ID: <1598406123.91774.1302606619723.JavaMail.root@zimbra.anl.gov> Jon, the error is "Argument list too long" (E2BIG) - most likely because you are trying to form an argument list to an app that is longer than the Linux limit (which is host-specific and settable during Linux configuration, *I think*). As I was writing this response, I realized you already dealt with this issue in a thread on swift-user in Oct 2010 (subject "Argument list too long"). Is your current problem some new manifestation of the prior argument passing problem? I also realized that I dont understand the interaction between property "wrapper.parameter.mode" and the need to use writeData() to form argument lists, and that this needs to be explained in a Cookbook -> User Guide section. So before I go deeper into this, can you explain if the current problem is different than the prior one? Thanks, - Mike ---- This page has some useful info on the limit: http://www.in-ulm.de/~mascheck/various/argmax/ On communicado the limit seems to be some percentage of the stack size, which defaults to 10M. The article above says that Linux 2.26 and above use .25 * stack size. Where are you running, and what is the likely command line length of your largest app invocations? (Keep in mind that pathnames may be longer than you expect and that Swift is running _swiftwrap which also has specific arg passing conventions). This is often an issue when you call a summarization app, and try to pass it all the files processed in some large prior foreach loop. Ben added some support for passing much longer argument lists in a file via the function writeData, and there is also this property: # Controls how swift will supply parameters to the remote wrapper script. # 'args' mode will pass parameters on the command line # 'files' mode will pass parameters through an additional input file # # valid values: args, files # Default: files # # wrapper.parameter.mode=args ----- Original Message ----- > Does anyone know what is this error trying to tell me? > > > Caused by: Cannot run program "/bin/bash" (in directory > "/gpfs/pads/swift/jonmon/Swift/work/localhost/rectified-20110411-1923-pqygyay7"): > java.io.IOException: error=7, Argument list too long > Caused by: java.io.IOException: Cannot run program "/bin/bash" (in > directory > "/gpfs/pads/swift/jonmon/Swift/work/localhost/rectified-20110411-1923-pqygyay7"): > java.io.IOException: error=7, Argument list too long > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dsk at ci.uchicago.edu Tue Apr 12 07:13:23 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Tue, 12 Apr 2011 07:13:23 -0500 Subject: [Swift-devel] [Bug 343] New: Add support for optional input and output files In-Reply-To: References: Message-ID: Basically, the functionality we want to replicate is an app that runs on all files in a directory, filling a new directory with output files. Then we run another app on each of those output files, more or less. We don't always know that the first app will generate an output file in each case, though it does >90% of the time. Dan On Apr 11, 2011, at 14:20, Jonathan Monette wrote: > Well the case I have in my scripts would be to only run the reduce on the available elements in the array. I am not sure why the other case would be valid. Not doin the reduce on the array because an outfile was not mapped is the same as what swift currently does. The only difference is that instead of causing the swift system to fail it just tries to continue on the execution. > > On Apr 11, 2011 2:06 PM, "Mihael Hategan" wrote: > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Tue Apr 12 10:03:51 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Tue, 12 Apr 2011 10:03:51 -0500 Subject: [Swift-devel] error in 0.92 source In-Reply-To: <1598406123.91774.1302606619723.JavaMail.root@zimbra.anl.gov> References: <1598406123.91774.1302606619723.JavaMail.root@zimbra.anl.gov> Message-ID: No. It was the same. As I was digging deeper into the problem I started remembering that all this seemed familiar. I checked my properties file and accidentally deleted the wrapper.parameter.mode line. I wanted to confirm this was the problem before I reported back and it just took awhile before the workflow got to that section. That error makes sense once I remembered what was wrong. This might be another place where a better error message(not just the one linux reported back) would have probably helped shed light on what is going on, imho. On Tue, Apr 12, 2011 at 6:10 AM, Michael Wilde wrote: > Jon, the error is "Argument list too long" (E2BIG) - most likely because > you are trying to form an argument list to an app that is longer than the > Linux limit (which is host-specific and settable during Linux configuration, > *I think*). > > As I was writing this response, I realized you already dealt with this > issue in a thread on swift-user in Oct 2010 (subject "Argument list too > long"). Is your current problem some new manifestation of the prior argument > passing problem? > > I also realized that I dont understand the interaction between property > "wrapper.parameter.mode" and the need to use writeData() to form argument > lists, and that this needs to be explained in a Cookbook -> User Guide > section. > > So before I go deeper into this, can you explain if the current problem is > different than the prior one? > > Thanks, > > - Mike > > ---- > > > This page has some useful info on the limit: > http://www.in-ulm.de/~mascheck/various/argmax/ > > On communicado the limit seems to be some percentage of the stack size, > which defaults to 10M. The article above says that Linux 2.26 and above use > .25 * stack size. > > Where are you running, and what is the likely command line length of your > largest app invocations? (Keep in mind that pathnames may be longer than you > expect and that Swift is running _swiftwrap which also has specific arg > passing conventions). > > This is often an issue when you call a summarization app, and try to pass > it all the files processed in some large prior foreach loop. > > Ben added some support for passing much longer argument lists in a file via > the function writeData, and there is also this property: > > # Controls how swift will supply parameters to the remote wrapper script. > # 'args' mode will pass parameters on the command line > # 'files' mode will pass parameters through an additional input file > # > # valid values: args, files > # Default: files > # > # wrapper.parameter.mode=args > > > > > ----- Original Message ----- > > Does anyone know what is this error trying to tell me? > > > > > > Caused by: Cannot run program "/bin/bash" (in directory > > > "/gpfs/pads/swift/jonmon/Swift/work/localhost/rectified-20110411-1923-pqygyay7"): > > java.io.IOException: error=7, Argument list too long > > Caused by: java.io.IOException: Cannot run program "/bin/bash" (in > > directory > > > "/gpfs/pads/swift/jonmon/Swift/work/localhost/rectified-20110411-1923-pqygyay7"): > > java.io.IOException: error=7, Argument list too long > > -- > > Any intelligent fool can make things bigger and more complex... It > > takes a touch of genius - and a lot of courage to move in the opposite > > direction. > > - Albert Einstein > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Tue Apr 12 10:41:00 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 12 Apr 2011 10:41:00 -0500 (CDT) Subject: [Swift-devel] [Bug 345] New: Print a clear error message when the argument list of an app is too long Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=345 Summary: Print a clear error message when the argument list of an app is too long Product: Swift Version: 0.92 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: skenny at uchicago.edu ReportedBy: wilde at mcs.anl.gov This problem is illustrated in the following email thread: No. It was the same. As I was digging deeper into the problem I started remembering that all this seemed familiar. I checked my properties file and accidentally deleted the wrapper.parameter.mode line. I wanted to confirm this was the problem before I reported back and it just took awhile before the workflow got to that section. That error makes sense once I remembered what was wrong. This might be another place where a better error message(not just the one linux reported back) would have probably helped shed light on what is going on, imho. On Tue, Apr 12, 2011 at 6:10 AM, Michael Wilde wrote: Jon, the error is "Argument list too long" (E2BIG) - most likely because you are trying to form an argument list to an app that is longer than the Linux limit (which is host-specific and settable during Linux configuration, *I think*). As I was writing this response, I realized you already dealt with this issue in a thread on swift-user in Oct 2010 (subject "Argument list too long"). Is your current problem some new manifestation of the prior argument passing problem? I also realized that I dont understand the interaction between property "wrapper.parameter.mode" and the need to use writeData() to form argument lists, and that this needs to be explained in a Cookbook -> User Guide section. So before I go deeper into this, can you explain if the current problem is different than the prior one? Thanks, - Mike ---- This page has some useful info on the limit: http://www.in-ulm.de/~mascheck/various/argmax/ On communicado the limit seems to be some percentage of the stack size, which defaults to 10M. The article above says that Linux 2.26 and above use .25 * stack size. Where are you running, and what is the likely command line length of your largest app invocations? (Keep in mind that pathnames may be longer than you expect and that Swift is running _swiftwrap which also has specific arg passing conventions). This is often an issue when you call a summarization app, and try to pass it all the files processed in some large prior foreach loop. Ben added some support for passing much longer argument lists in a file via the function writeData, and there is also this property: # Controls how swift will supply parameters to the remote wrapper script. # 'args' mode will pass parameters on the command line # 'files' mode will pass parameters through an additional input file # # valid values: args, files # Default: files # # wrapper.parameter.mode=args ----- Original Message ----- > Does anyone know what is this error trying to tell me? > > > Caused by: Cannot run program "/bin/bash" (in directory > "/gpfs/pads/swift/jonmon/Swift/work/localhost/rectified-20110411-1923-pqygyay7"): > java.io.IOException: error=7, Argument list too long > Caused by: java.io.IOException: Cannot run program "/bin/bash" (in > directory > "/gpfs/pads/swift/jonmon/Swift/work/localhost/rectified-20110411-1923-pqygyay7"): > java.io.IOException: error=7, Argument list too long > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Tue Apr 12 11:15:55 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 12 Apr 2011 11:15:55 -0500 (CDT) Subject: [Swift-devel] [Bug 347] New: Java exception raised for invalid -cdm.file argument Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=347 Summary: Java exception raised for invalid -cdm.file argument Product: Swift Version: 0.92 Platform: PC OS/Version: Mac OS Status: ASSIGNED Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: skenny at uchicago.edu ReportedBy: wilde at mcs.anl.gov When no filename is specified after the -cdm.file argument on the swift command, a Java exception is raised: com$ rm -rf outdir; swift -cdm.file -config cf -tc.file tc -sites.file local.xml catsnfp.swift -n=1 Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.griphyn.vdl.karajan.Loader.projectName(Loader.java:528) at org.griphyn.vdl.karajan.Loader.main(Loader.java:120) com$ -- A clear error message should in printed, instead, from command line arg parsing. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wozniak at mcs.anl.gov Tue Apr 12 13:49:41 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 12 Apr 2011 13:49:41 -0500 (Central Daylight Time) Subject: [Swift-devel] 0.92.1 In-Reply-To: References: <2129470799.89932.1302550247599.JavaMail.root@zimbra.anl.gov> Message-ID: On Mon, 11 Apr 2011, Sarah Kenny wrote: > On Mon, Apr 11, 2011 at 12:41 PM, Justin M Wozniak wrote: > >> On Mon, 11 Apr 2011, Michael Wilde wrote: >> >> my understanding is that running on beagle requires a fix that justin >>>> committed to trunk but it is not yet available in .092.1 (?) justin >>>> can you confirm? should i try testing with .92.1 again? >>>> >>> >>> Yes, thats right. Is the Beagle test waiting for me to commit my changes >>> to the 0.92 branch. Justin, what did we decide on this? (You retrofit the >>> trunk change for Cray support or I commit my 0.92-based change?) >>> >> >> I made have said something about this in passing but I really think we >> should go by the list on the swift-devel page and move forward to 0.93- >> release-0.92 was branched for testing in early January. >> >> Im sorry if I'm holding this part up. >>> Sarah, did you re-test ranger on the latest 0.92? >>> >> > ranger has been tested on .92.1 (and a test specific to the bug fix has been > added to the test suite for sge provider). Ok, I was just able to successfully test the release-0.92 branch on Beagle with Mike's local modifications. Users on Fusion can use my trunk-based installation as the Parvis group does. I think we can package 0.92.1 at this point. Justin -- Justin M Wozniak From bugzilla-daemon at mcs.anl.gov Tue Apr 12 13:51:10 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 12 Apr 2011 13:51:10 -0500 (CDT) Subject: [Swift-devel] [Bug 347] Java exception raised for invalid -cdm.file argument In-Reply-To: References: Message-ID: <20110412185110.BA9242D4E7@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=347 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov AssignedTo|skenny at uchicago.edu |wozniak at mcs.anl.gov --- Comment #1 from Justin Wozniak 2011-04-12 13:51:10 --- I'll take this one. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Tue Apr 12 13:57:40 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 12 Apr 2011 13:57:40 -0500 (CDT) Subject: [Swift-devel] [Bug 229] Swift log should capture additional environmental information In-Reply-To: References: Message-ID: <20110412185740.7CAEA2D528@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=229 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- Version|unspecified |0.93 -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From hategan at mcs.anl.gov Tue Apr 12 14:09:12 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 12 Apr 2011 12:09:12 -0700 Subject: [Swift-devel] [Bug 343] New: Add support for optional input and output files In-Reply-To: References: Message-ID: <1302635352.24772.27.camel@blabla2.none> I'm assuming we won't statically track optional data. In a static typing scenario, all optional data would need to be declared as such. This would be similar to the Maybe type in Haskell. I'm assuming we don't want to do that. Instead, optional types would be dynamic types. This would allow one to use an app defined with non-optional types with optional data. The typing rules would go something like this: 1. f: X -> Y type(f(Nothing/X)) = Nothing/Y type(f(Just X)) = Just Y 2. For a composite type Y = X1 x X2 x... x Xn, type(Y) = Nothing if any Xi = Nothing, type(Y) = Just Y if all Xi = Just Xi. 3. Corollary of 1 and 2 is that f: X1 x X2 -> Y, f(x1, Nothing) = f(Nothing, x2) = f(Nothing, Nothing) = Nothing. This can be generalized. We should have an additional operator (catMaybes in Haskell) which extracts the present values from an array. In other words (and we need a name for it), ~([maybe x]) = [x}. There might be some contention here. I'm saying that a reduce operating on an array of optional data should by default return nothing if any of the array elements is a nothing. I think this should be done if we are to have consistency. Reduce is the successive application of some function to the elements of a list: reduce(a[], "+") = (...((a[0] + a[1]) + a[2]) + ... ) + a[n]) If, by the first rule (which I think is fundamental) "+"(Just x, Nothing) = Nothing, it can be easily seen that reduce(a[], "+") = Nothing if any a[i] = Nothing. In order to reduce only the Just values, there would need to be a way to extract only those from the array. Thoughts? Questions? Mihael On Mon, 2011-04-11 at 14:20 -0500, Jonathan Monette wrote: > Well the case I have in my scripts would be to only run the reduce on > the available elements in the array. I am not sure why the other case > would be valid. Not doin the reduce on the array because an outfile > was not mapped is the same as what swift currently does. The only > difference is that instead of causing the swift system to fail it just > tries to continue on the execution. > > On Apr 11, 2011 2:06 PM, "Mihael Hategan" wrote: > From skenny at uchicago.edu Tue Apr 12 23:10:27 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 12 Apr 2011 21:10:27 -0700 Subject: [Swift-devel] 0.92.1 In-Reply-To: References: <2129470799.89932.1302550247599.JavaMail.root@zimbra.anl.gov> Message-ID: i will update the release binary the web site unless anyone objects...looks like ketan has write ownership of the module on beagle, ketan are you planning to update there? ~sk On Tue, Apr 12, 2011 at 11:49 AM, Justin M Wozniak wrote: > On Mon, 11 Apr 2011, Sarah Kenny wrote: > > On Mon, Apr 11, 2011 at 12:41 PM, Justin M Wozniak > >wrote: >> >> On Mon, 11 Apr 2011, Michael Wilde wrote: >>> >>> my understanding is that running on beagle requires a fix that justin >>> >>>> committed to trunk but it is not yet available in .092.1 (?) justin >>>>> can you confirm? should i try testing with .92.1 again? >>>>> >>>>> >>>> Yes, thats right. Is the Beagle test waiting for me to commit my changes >>>> to the 0.92 branch. Justin, what did we decide on this? (You retrofit >>>> the >>>> trunk change for Cray support or I commit my 0.92-based change?) >>>> >>>> >>> I made have said something about this in passing but I really think we >>> should go by the list on the swift-devel page and move forward to 0.93- >>> release-0.92 was branched for testing in early January. >>> >>> Im sorry if I'm holding this part up. >>> >>>> Sarah, did you re-test ranger on the latest 0.92? >>>> >>>> >>> ranger has been tested on .92.1 (and a test specific to the bug fix has >> been >> added to the test suite for sge provider). >> > > Ok, I was just able to successfully test the release-0.92 branch on Beagle > with Mike's local modifications. > > Users on Fusion can use my trunk-based installation as the Parvis group > does. > > I think we can package 0.92.1 at this point. > > Justin > > -- > Justin M Wozniak > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Tue Apr 12 23:44:30 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Wed, 13 Apr 2011 00:44:30 -0400 Subject: [Swift-devel] Swift 0.92(.1) on Fusion Message-ID: Hello, Recently when I try to run Swift on Fusion, my job never seems to execute. I have emailed Fusion support about this (ticket #70175) but thought it may also be useful to send to the list. I am trying to run the catsn.swift script for testing. I can see it in qstat. The sites.xml is based on the config listed in the Fusion cookbook, with a few small changes. I added an internalHostname entry and set it to the IP address attached to the Infiniband device. I also lowered the maxtime from 1000 to 10. The Fusion cookbook says "Set MAXTIME as in qsub walltime. This is on a per-allocation basis and should be at least 20% larger than your longest task". I am not sure how maxtime relates to walltime exactly, but the walltime value in the PBS file gets set to 00:00:00. I am not sure if this matters or not. I have also attached a compressed log file and the actual swift script I'm trying to run. Thanks, David $ swift -version Swift svn swift-r4076 cog-r3049 qstat: 541724.fmgt2.l davidk shared Block-0412 -- 1 1 -- 00:00 Q -- sites.xml: 192.168.71.81 10 1 1 1 2 shared 5.99 10000 /home/davidk/swiftwork PBS submission file: #PBS -S /bin/bash #PBS -N Block-0412-211041-000000 #PBS -m n #PBS -l nodes=1 #PBS -l walltime=00:00:00 #PBS -q shared #PBS -o /homes/davidk/.globus/scripts/PBS1298937999826083605.submit.stdout #PBS -e /homes/davidk/.globus/scripts/PBS1298937999826083605.submit.stderr WORKER_LOGGING_LEVEL=NONE #PBS -v WORKER_LOGGING_LEVEL cd / && /usr/bin/perl /homes/davidk/.globus/coasters/cscript1716491648595514240.pl http://192.168.71.81:46584 0412-211041-000000 NOLOGGING /bin/echo $? >/homes/davidk/.globus/scripts/PBS1298937999826083605.submit.exitcode -------------- next part -------------- A non-text attachment was scrubbed... Name: 001-catsn-local-20110412-2245-7nq0adz6.log.gz Type: application/x-gzip Size: 107356 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 001-catsn-local.swift Type: application/octet-stream Size: 339 bytes Desc: not available URL: From ketancmaheshwari at gmail.com Wed Apr 13 00:00:38 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 13 Apr 2011 00:00:38 -0500 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions Message-ID: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> Here is a list of questions asked by the visitors of our swift poster at the CCA-11 poster session [poster attached here] earlier this evening: -- What is the novelty in this stuff? Meta-scheduling has been done in the past; scalability studies have also been done in the past (by Swift too). -- Is this similar to map/reduce? -- Are you planning to compare Swift/coasters with map/reduce? -- What do people do without Swift in this same scenario (multi-clouds)? -- This looks similar to Condor glide-ins!? -- (After explaining the coasters mechanism) What is sites script again? (some more explaining that it is not script it is a descriptor) And again what is that sites file? -- What happens without Coasters? -- What do I need to run Swift on my cluster? -- Would it benefit me if I chuck Python and use Swift instead for my GIS related app? -- The simplicity of script-code seems deceptive!?! -- Is the coasters scheduling algorithm published somewhere? -------------- next part -------------- A non-text attachment was scrubbed... Name: SwiftPoster.CCA-11.pdf Type: application/pdf Size: 652611 bytes Desc: not available URL: -------------- next part -------------- --Ketan From hategan at mcs.anl.gov Wed Apr 13 00:18:33 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 12 Apr 2011 22:18:33 -0700 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> References: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> Message-ID: <1302671913.26813.0.camel@blabla2.none> Some are interesting. Do you want any of them answered? Mihael On Wed, 2011-04-13 at 00:00 -0500, Ketan Maheshwari wrote: > Here is a list of questions asked by the visitors of our swift poster at the CCA-11 poster session [poster attached here] earlier this evening: > > -- What is the novelty in this stuff? Meta-scheduling has been done in the past; scalability studies have also been done in the past (by Swift too). > > -- Is this similar to map/reduce? > > -- Are you planning to compare Swift/coasters with map/reduce? > > -- What do people do without Swift in this same scenario (multi-clouds)? > > -- This looks similar to Condor glide-ins!? > > -- (After explaining the coasters mechanism) What is sites script again? (some more explaining that it is not script it is a descriptor) And again what is that sites file? > > -- What happens without Coasters? > > -- What do I need to run Swift on my cluster? > > -- Would it benefit me if I chuck Python and use Swift instead for my GIS related app? > > -- The simplicity of script-code seems deceptive!?! > > -- Is the coasters scheduling algorithm published somewhere? > > > --Ketan > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From bugzilla-daemon at mcs.anl.gov Wed Apr 13 02:07:03 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 02:07:03 -0500 (CDT) Subject: [Swift-devel] [Bug 277] Swift gives misleading error message when sites file is missing tag In-Reply-To: References: Message-ID: <20110413070703.574762F169@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=277 skenny changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #2 from skenny 2011-04-13 02:07:02 --- python script "chxml" has been added and is called from the main swift script to check that the specified sites file is well-formed and gives appropriate error message if it is not. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From foster at anl.gov Wed Apr 13 05:30:41 2011 From: foster at anl.gov (Ian Foster) Date: Wed, 13 Apr 2011 05:30:41 -0500 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> References: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> Message-ID: Hi Ketan: Nice work transcribing the questions! I'm curious as to what the "deceptive" question means. The simplicity doesn''t seem deceptive to me! The poster title is a bit deceptive: it makes this sound like a research poster, when the content reads more like a Swift advertisement. It would be good to produce a paper that provides a detailed description of the coaster's mechanism, a comparison with other work in this area, and a careful evaluation of the functioning of the coaster implementation. Ian. On Apr 13, 2011, at 12:00 AM, Ketan Maheshwari wrote: > Here is a list of questions asked by the visitors of our swift poster at the CCA-11 poster session [poster attached here] earlier this evening: > > -- What is the novelty in this stuff? Meta-scheduling has been done in the past; scalability studies have also been done in the past (by Swift too). > > -- Is this similar to map/reduce? > > -- Are you planning to compare Swift/coasters with map/reduce? > > -- What do people do without Swift in this same scenario (multi-clouds)? > > -- This looks similar to Condor glide-ins!? > > -- (After explaining the coasters mechanism) What is sites script again? (some more explaining that it is not script it is a descriptor) And again what is that sites file? > > -- What happens without Coasters? > > -- What do I need to run Swift on my cluster? > > -- Would it benefit me if I chuck Python and use Swift instead for my GIS related app? > > -- The simplicity of script-code seems deceptive!?! > > -- Is the coasters scheduling algorithm published somewhere? > > > > --Ketan > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From dsk at ci.uchicago.edu Wed Apr 13 05:41:20 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Wed, 13 Apr 2011 05:41:20 -0500 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: References: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> Message-ID: On Apr 13, 2011, at 5:30 AM, Ian Foster wrote: > Hi Ketan: > > Nice work transcribing the questions! > > I'm curious as to what the "deceptive" question means. The simplicity doesn''t seem deceptive to me! I don't think deceptive was used seriously. > > The poster title is a bit deceptive: it makes this sound like a research poster, when the content reads more like a Swift advertisement. It would be good to produce a paper that provides a detailed description of the coaster's mechanism, a comparison with other work in this area, and a careful evaluation of the functioning of the coaster implementation. A few of us have started this, but haven't gotten very far yet. > > Ian. > > > On Apr 13, 2011, at 12:00 AM, Ketan Maheshwari wrote: > >> Here is a list of questions asked by the visitors of our swift poster at the CCA-11 poster session [poster attached here] earlier this evening: >> >> -- What is the novelty in this stuff? Meta-scheduling has been done in the past; scalability studies have also been done in the past (by Swift too). >> >> -- Is this similar to map/reduce? >> >> -- Are you planning to compare Swift/coasters with map/reduce? >> >> -- What do people do without Swift in this same scenario (multi-clouds)? >> >> -- This looks similar to Condor glide-ins!? >> >> -- (After explaining the coasters mechanism) What is sites script again? (some more explaining that it is not script it is a descriptor) And again what is that sites file? >> >> -- What happens without Coasters? >> >> -- What do I need to run Swift on my cluster? >> >> -- Would it benefit me if I chuck Python and use Swift instead for my GIS related app? >> >> -- The simplicity of script-code seems deceptive!?! >> >> -- Is the coasters scheduling algorithm published somewhere? >> >> >> >> --Ketan >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From foster at anl.gov Wed Apr 13 05:44:03 2011 From: foster at anl.gov (Ian Foster) Date: Wed, 13 Apr 2011 05:44:03 -0500 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: References: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> Message-ID: <0A3AC433-C9BA-4AEB-B207-799DB262693D@anl.gov> I assume that there was some thought behind the question. E.g., they might think that a lot of work and code is required to wrap the executables (which is not the case). Or that a lot of work is required to configure the coasters. Or something else? On Apr 13, 2011, at 5:41 AM, Daniel S. Katz wrote: > > On Apr 13, 2011, at 5:30 AM, Ian Foster wrote: > >> Hi Ketan: >> >> Nice work transcribing the questions! >> >> I'm curious as to what the "deceptive" question means. The simplicity doesn''t seem deceptive to me! > > I don't think deceptive was used seriously. > >> >> The poster title is a bit deceptive: it makes this sound like a research poster, when the content reads more like a Swift advertisement. It would be good to produce a paper that provides a detailed description of the coaster's mechanism, a comparison with other work in this area, and a careful evaluation of the functioning of the coaster implementation. > > A few of us have started this, but haven't gotten very far yet. > >> >> Ian. >> >> >> On Apr 13, 2011, at 12:00 AM, Ketan Maheshwari wrote: >> >>> Here is a list of questions asked by the visitors of our swift poster at the CCA-11 poster session [poster attached here] earlier this evening: >>> >>> -- What is the novelty in this stuff? Meta-scheduling has been done in the past; scalability studies have also been done in the past (by Swift too). >>> >>> -- Is this similar to map/reduce? >>> >>> -- Are you planning to compare Swift/coasters with map/reduce? >>> >>> -- What do people do without Swift in this same scenario (multi-clouds)? >>> >>> -- This looks similar to Condor glide-ins!? >>> >>> -- (After explaining the coasters mechanism) What is sites script again? (some more explaining that it is not script it is a descriptor) And again what is that sites file? >>> >>> -- What happens without Coasters? >>> >>> -- What do I need to run Swift on my cluster? >>> >>> -- Would it benefit me if I chuck Python and use Swift instead for my GIS related app? >>> >>> -- The simplicity of script-code seems deceptive!?! >>> >>> -- Is the coasters scheduling algorithm published somewhere? >>> >>> >>> >>> --Ketan >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Daniel S. Katz > University of Chicago > (773) 834-7186 (voice) > (773) 834-3700 (fax) > d.katz at ieee.org or dsk at ci.uchicago.edu > http://www.ci.uchicago.edu/~dsk/ > > > From wozniak at mcs.anl.gov Wed Apr 13 09:07:04 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 13 Apr 2011 09:07:04 -0500 (CDT) Subject: [Swift-devel] 0.92.1 In-Reply-To: References: <2129470799.89932.1302550247599.JavaMail.root@zimbra.anl.gov> Message-ID: Sounds great- before posting, please just double-check your tree to make sure it contains the key fix. Let me know if you have any problems with update.sh, I was working in there recently. Justin On Tue, 12 Apr 2011, Sarah Kenny wrote: > i will update the release binary the web site unless anyone objects...looks > like ketan has write ownership of the module on beagle, ketan are you > planning to update there? > > ~sk > > On Tue, Apr 12, 2011 at 11:49 AM, Justin M Wozniak wrote: > >> On Mon, 11 Apr 2011, Sarah Kenny wrote: >> >> On Mon, Apr 11, 2011 at 12:41 PM, Justin M Wozniak >>> wrote: >>> >>> On Mon, 11 Apr 2011, Michael Wilde wrote: >>>> >>>> my understanding is that running on beagle requires a fix that justin >>>> >>>>> committed to trunk but it is not yet available in .092.1 (?) justin >>>>>> can you confirm? should i try testing with .92.1 again? >>>>>> >>>>>> >>>>> Yes, thats right. Is the Beagle test waiting for me to commit my changes >>>>> to the 0.92 branch. Justin, what did we decide on this? (You retrofit >>>>> the >>>>> trunk change for Cray support or I commit my 0.92-based change?) >>>>> >>>>> >>>> I made have said something about this in passing but I really think we >>>> should go by the list on the swift-devel page and move forward to 0.93- >>>> release-0.92 was branched for testing in early January. >>>> >>>> Im sorry if I'm holding this part up. >>>> >>>>> Sarah, did you re-test ranger on the latest 0.92? >>>>> >>>>> >>>> ranger has been tested on .92.1 (and a test specific to the bug fix has >>> been >>> added to the test suite for sge provider). >>> >> >> Ok, I was just able to successfully test the release-0.92 branch on Beagle >> with Mike's local modifications. >> >> Users on Fusion can use my trunk-based installation as the Parvis group >> does. >> >> I think we can package 0.92.1 at this point. >> >> Justin >> >> -- >> Justin M Wozniak >> > -- Justin M Wozniak From wozniak at mcs.anl.gov Wed Apr 13 09:11:11 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 13 Apr 2011 09:11:11 -0500 (CDT) Subject: [Swift-devel] Swift 0.92(.1) on Fusion In-Reply-To: References: Message-ID: Thanks for digging into this- can you try this again from trunk? A Parvis developer and I were able to run successfully there. (The queues are very long.) One thing I did when working on Fusion was cut out the generated submit files and qsub them myself to verify that the #PBS settings did actually work for me, you may want to try that too. Justin On Wed, 13 Apr 2011, David Kelly wrote: > Hello, > > Recently when I try to run Swift on Fusion, my job never seems to > execute. I have emailed Fusion support about this (ticket #70175) but > thought it may also be useful to send to the list. I am trying to run > the catsn.swift script for testing. I can see it in qstat. The > sites.xml is based on the config listed in the Fusion cookbook, with a > few small changes. I added an internalHostname entry and set it to the > IP address attached to the Infiniband device. I also lowered the > maxtime from 1000 to 10. The Fusion cookbook says "Set MAXTIME as in > qsub walltime. This is on a per-allocation basis and should be at > least 20% larger than your longest task". I am not sure how maxtime > relates to walltime exactly, but the walltime value in the PBS file > gets set to 00:00:00. I am not sure if this matters or not. > > I have also attached a compressed log file and the actual swift script > I'm trying to run. > > Thanks, > David > > $ swift -version > Swift svn swift-r4076 cog-r3049 > > qstat: > 541724.fmgt2.l davidk shared Block-0412 -- 1 1 -- 00:00 Q -- > > sites.xml: > > > > > 192.168.71.81 > 10 > 1 > 1 > 1 > 2 > shared > 5.99 > 10000 > /home/davidk/swiftwork > > > > PBS submission file: > #PBS -S /bin/bash > #PBS -N Block-0412-211041-000000 > #PBS -m n > #PBS -l nodes=1 > #PBS -l walltime=00:00:00 > #PBS -q shared > #PBS -o /homes/davidk/.globus/scripts/PBS1298937999826083605.submit.stdout > #PBS -e /homes/davidk/.globus/scripts/PBS1298937999826083605.submit.stderr > WORKER_LOGGING_LEVEL=NONE > #PBS -v WORKER_LOGGING_LEVEL > cd / && /usr/bin/perl > /homes/davidk/.globus/coasters/cscript1716491648595514240.pl > http://192.168.71.81:46584 0412-211041-000000 NOLOGGING > /bin/echo $? >/homes/davidk/.globus/scripts/PBS1298937999826083605.submit.exitcode -- Justin M Wozniak From ketancmaheshwari at gmail.com Wed Apr 13 09:37:38 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 13 Apr 2011 09:37:38 -0500 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: <0A3AC433-C9BA-4AEB-B207-799DB262693D@anl.gov> References: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> <0A3AC433-C9BA-4AEB-B207-799DB262693D@anl.gov> Message-ID: Ian, Dan, > .... , they might think that a lot of work and code is required to wrap the executables (which is not the case). Or that a lot of work is required to configure the coasters. Or something else? Right. From what I gathered, the question was on a light tone. However, the message was: Is Swift ( especially the SwiftScript ) all that simple to write (which, it is) as is been shown in the code snippet? Ketan From ketancmaheshwari at gmail.com Wed Apr 13 09:43:11 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 13 Apr 2011 09:43:11 -0500 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: <1302671913.26813.0.camel@blabla2.none> References: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> <1302671913.26813.0.camel@blabla2.none> Message-ID: Mihael, > Some are interesting. Do you want any of them answered? May be, I would like to know more on these: >> -- Is this similar to map/reduce?, Are you planning to compare Swift/coasters with map/reduce? >> -- Would it benefit me if I chuck Python and use Swift instead for my GIS related app? >> -- Is the coasters scheduling algorithm published somewhere? > Ketan From dk0966 at cs.ship.edu Wed Apr 13 11:21:49 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Wed, 13 Apr 2011 12:21:49 -0400 Subject: [Swift-devel] Swift 0.92(.1) on Fusion In-Reply-To: References: Message-ID: The problem is related to the walltime of 00:00:00. The shared queue that I am trying to use requires a walltime of less than one hour. Running 'checkjob' reveals that there is a policy violation related to this. As a test, I edited the PBS submit file. I set the walltime to a reasonable value and had it echo something. When I manually submitted it with qsub, it ran right away. How can I get a walltime value to be created in the PBS files? I have maxtime specified in sites.xml and I have a maxwalltime for all my applications. I tried this with the 0.92 branch as well with trunk with no luck. Switching to the batch queue might be one workaround. Regards, David On Wed, Apr 13, 2011 at 10:11 AM, Justin M Wozniak wrote: > > Thanks for digging into this- can you try this again from trunk? ?A Parvis > developer and I were able to run successfully there. ?(The queues are very > long.) > > One thing I did when working on Fusion was cut out the generated submit > files and qsub them myself to verify that the #PBS settings did actually > work for me, you may want to try that too. > > ? ? ? ?Justin > > On Wed, 13 Apr 2011, David Kelly wrote: > >> Hello, >> >> Recently when I try to run Swift on Fusion, my job never seems to >> execute. I have emailed Fusion support about this (ticket #70175) but >> thought it may also be useful to send to the list. I am trying to run >> the catsn.swift script for testing. I can see it in qstat. The >> sites.xml is based on the config listed in the Fusion cookbook, with a >> few small changes. I added an internalHostname entry and set it to the >> IP address attached to the Infiniband device. I also lowered the >> maxtime from 1000 to 10. The Fusion cookbook says "Set MAXTIME as in >> qsub walltime. This is on a per-allocation basis and should be at >> least 20% larger than your longest task". I am not sure how maxtime >> relates to walltime exactly, but the walltime value in the PBS file >> gets set to 00:00:00. I am not sure if this matters or not. >> >> I have also attached a compressed log file and the actual swift script >> I'm trying to run. >> >> Thanks, >> David >> >> $ swift -version >> Swift svn swift-r4076 cog-r3049 >> >> qstat: >> 541724.fmgt2.l davidk ? shared ? Block-0412 ? ?-- ? ?1 ? ?1 ? ?-- ?00:00 Q >> ? -- >> >> sites.xml: >> >> >> ? >> ? >> ?> key="internalHostname">192.168.71.81 >> ?10 >> ?1 >> ?1 >> ?1 >> ?2 >> ?shared >> ?5.99 >> ?10000 >> ?/home/davidk/swiftwork >> >> >> >> PBS submission file: >> #PBS -S /bin/bash >> #PBS -N Block-0412-211041-000000 >> #PBS -m n >> #PBS -l nodes=1 >> #PBS -l walltime=00:00:00 >> #PBS -q shared >> #PBS -o /homes/davidk/.globus/scripts/PBS1298937999826083605.submit.stdout >> #PBS -e /homes/davidk/.globus/scripts/PBS1298937999826083605.submit.stderr >> WORKER_LOGGING_LEVEL=NONE >> #PBS -v WORKER_LOGGING_LEVEL >> cd / && /usr/bin/perl >> /homes/davidk/.globus/coasters/cscript1716491648595514240.pl >> http://192.168.71.81:46584 0412-211041-000000 NOLOGGING >> /bin/echo $? >> >/homes/davidk/.globus/scripts/PBS1298937999826083605.submit.exitcode > > -- > Justin M Wozniak > -------------- next part -------------- A non-text attachment was scrubbed... Name: tc.data Type: application/octet-stream Size: 562 bytes Desc: not available URL: From wilde at mcs.anl.gov Wed Apr 13 13:23:18 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 13 Apr 2011 13:23:18 -0500 (CDT) Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: <20110413180228.82DF79CCBB@svn.ci.uchicago.edu> Message-ID: <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> Sarah, All, Is this 0.92.1 branch just a last-time special-case for the 0.92.1 release? While the plan, as I understood it, is that in the future we will only create branches for a "point" release (e.g. 0.93) but all sub-point releases based on that branch (e.g. 0.93.1, 0.93.2, etc) will be just tags on the 0.93 branch rather than new branches? - Mike ----- Original Message ----- > Author: skenny > Date: 2011-04-13 13:02:28 -0500 (Wed, 13 Apr 2011) > New Revision: 4363 > > Added: > branches/release-0.92.1/ > Removed: > branches/release-0.92/ > Log: > 0.92 release fixed, moving to 092.1 > > Copied: branches/release-0.92.1 (from rev 4362, branches/release-0.92) > > _______________________________________________ > Swift-commit mailing list > Swift-commit at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-commit -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From skenny at uchicago.edu Wed Apr 13 13:26:17 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 13 Apr 2011 11:26:17 -0700 Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> References: <20110413180228.82DF79CCBB@svn.ci.uchicago.edu> <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> Message-ID: oh, i thought we were branching it...should i move that back? On Wed, Apr 13, 2011 at 11:23 AM, Michael Wilde wrote: > Sarah, All, > > Is this 0.92.1 branch just a last-time special-case for the 0.92.1 release? > > While the plan, as I understood it, is that in the future we will only > create branches for a "point" release (e.g. 0.93) but all sub-point releases > based on that branch (e.g. 0.93.1, 0.93.2, etc) will be just tags on the > 0.93 branch rather than new branches? > > - Mike > > > > > ----- Original Message ----- > > Author: skenny > > Date: 2011-04-13 13:02:28 -0500 (Wed, 13 Apr 2011) > > New Revision: 4363 > > > > Added: > > branches/release-0.92.1/ > > Removed: > > branches/release-0.92/ > > Log: > > 0.92 release fixed, moving to 092.1 > > > > Copied: branches/release-0.92.1 (from rev 4362, branches/release-0.92) > > > > _______________________________________________ > > Swift-commit mailing list > > Swift-commit at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-commit > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Apr 13 13:28:47 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 13 Apr 2011 11:28:47 -0700 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: References: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> Message-ID: <1302719327.28436.0.camel@blabla2.none> On Wed, 2011-04-13 at 05:41 -0500, Daniel S. Katz wrote: > > > > The poster title is a bit deceptive: it makes this sound like a > research poster, when the content reads more like a Swift > advertisement. It would be good to produce a paper that provides a > detailed description of the coaster's mechanism, a comparison with > other work in this area, and a careful evaluation of the functioning > of the coaster implementation. > > A few of us have started this, but haven't gotten very far yet. Can you be more specific? Mihael From hategan at mcs.anl.gov Wed Apr 13 13:30:25 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 13 Apr 2011 11:30:25 -0700 Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: References: <20110413180228.82DF79CCBB@svn.ci.uchicago.edu> <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> Message-ID: <1302719425.28436.1.camel@blabla2.none> On Wed, 2011-04-13 at 11:26 -0700, Sarah Kenny wrote: > oh, i thought we were branching it...should i move that back? We're tagging point releases. I.e. branches/0.92 -> tags/0.92.1 Mihael > > On Wed, Apr 13, 2011 at 11:23 AM, Michael Wilde > wrote: > Sarah, All, > > Is this 0.92.1 branch just a last-time special-case for the > 0.92.1 release? > > While the plan, as I understood it, is that in the future we > will only create branches for a "point" release (e.g. 0.93) > but all sub-point releases based on that branch (e.g. 0.93.1, > 0.93.2, etc) will be just tags on the 0.93 branch rather than > new branches? > > - Mike > > > > > > ----- Original Message ----- > > Author: skenny > > Date: 2011-04-13 13:02:28 -0500 (Wed, 13 Apr 2011) > > New Revision: 4363 > > > > Added: > > branches/release-0.92.1/ > > Removed: > > branches/release-0.92/ > > Log: > > 0.92 release fixed, moving to 092.1 > > > > Copied: branches/release-0.92.1 (from rev 4362, > branches/release-0.92) > > > > _______________________________________________ > > Swift-commit mailing list > > Swift-commit at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-commit > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From skenny at uchicago.edu Wed Apr 13 13:33:29 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 13 Apr 2011 11:33:29 -0700 Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: <1302719425.28436.1.camel@blabla2.none> References: <20110413180228.82DF79CCBB@svn.ci.uchicago.edu> <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> <1302719425.28436.1.camel@blabla2.none> Message-ID: and should the binary then still be called 0.92 rather than 0.92.1 ? being that 0.92 had such a substantial flaw and a warning was posted on the site, it seems to me that we'd want to make it clear (to users) that this is a new version/release by naming it 0.92.1...does tagging adequately solve this? On Wed, Apr 13, 2011 at 11:30 AM, Mihael Hategan wrote: > On Wed, 2011-04-13 at 11:26 -0700, Sarah Kenny wrote: > > oh, i thought we were branching it...should i move that back? > > We're tagging point releases. I.e. branches/0.92 -> tags/0.92.1 > > Mihael > > > > On Wed, Apr 13, 2011 at 11:23 AM, Michael Wilde > > wrote: > > Sarah, All, > > > > Is this 0.92.1 branch just a last-time special-case for the > > 0.92.1 release? > > > > While the plan, as I understood it, is that in the future we > > will only create branches for a "point" release (e.g. 0.93) > > but all sub-point releases based on that branch (e.g. 0.93.1, > > 0.93.2, etc) will be just tags on the 0.93 branch rather than > > new branches? > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > Author: skenny > > > Date: 2011-04-13 13:02:28 -0500 (Wed, 13 Apr 2011) > > > New Revision: 4363 > > > > > > Added: > > > branches/release-0.92.1/ > > > Removed: > > > branches/release-0.92/ > > > Log: > > > 0.92 release fixed, moving to 092.1 > > > > > > Copied: branches/release-0.92.1 (from rev 4362, > > branches/release-0.92) > > > > > > _______________________________________________ > > > Swift-commit mailing list > > > Swift-commit at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-commit > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Apr 13 13:39:38 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 13 Apr 2011 13:39:38 -0500 (CDT) Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: Message-ID: <993834577.98326.1302719978004.JavaMail.root@zimbra.anl.gov> Sarah, I think the binary should be called 0.92.1, and it should be built from the 0.92.1 tag which should be a tag of the latest revision in the 0.92 branches of both Swift and CoG. We should never put out two different releases with the same version number. Ketan or Justin, you should create a Swift_0.92.1 module on Beagle per Justin's email based on my Cray support mods, and all future modules we create should be based on a tag, just like the general releases. (Ideally in the future the Cray modules should just *be* general releases). - Mike ----- Original Message ----- and should the binary then still be called 0.92 rather than 0.92.1 ? being that 0.92 had such a substantial flaw and a warning was posted on the site, it seems to me that we'd want to make it clear (to users) that this is a new version/release by naming it 0.92.1...does tagging adequately solve this? On Wed, Apr 13, 2011 at 11:30 AM, Mihael Hategan < hategan at mcs.anl.gov > wrote: On Wed, 2011-04-13 at 11:26 -0700, Sarah Kenny wrote: > oh, i thought we were branching it...should i move that back? We're tagging point releases. I.e. branches/0.92 -> tags/0.92.1 Mihael > > On Wed, Apr 13, 2011 at 11:23 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > Sarah, All, > > Is this 0.92.1 branch just a last-time special-case for the > 0.92.1 release? > > While the plan, as I understood it, is that in the future we > will only create branches for a "point" release (e.g. 0.93) > but all sub-point releases based on that branch (e.g. 0.93.1, > 0.93.2, etc) will be just tags on the 0.93 branch rather than > new branches? > > - Mike > > > > > > ----- Original Message ----- > > Author: skenny > > Date: 2011-04-13 13:02:28 -0500 (Wed, 13 Apr 2011) > > New Revision: 4363 > > > > Added: > > branches/release-0.92.1/ > > Removed: > > branches/release-0.92/ > > Log: > > 0.92 release fixed, moving to 092.1 > > > > Copied: branches/release-0.92.1 (from rev 4362, > branches/release-0.92) > > > > _______________________________________________ > > Swift-commit mailing list > > Swift-commit at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-commit > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Apr 13 13:53:45 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 13 Apr 2011 11:53:45 -0700 Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: References: <20110413180228.82DF79CCBB@svn.ci.uchicago.edu> <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> <1302719425.28436.1.camel@blabla2.none> Message-ID: <1302720825.28949.1.camel@blabla2.none> On Wed, 2011-04-13 at 11:33 -0700, Sarah Kenny wrote: > and should the binary then still be called 0.92 rather than 0.92.1 ? The binary would be called 0.92.1. > > being that 0.92 had such a substantial flaw and a warning was posted > on the site, it seems to me that we'd want to make it clear (to users) > that this is a new version/release by naming it 0.92.1...does tagging > adequately solve this? Tagging doesn't solve that. Tagging is there so that we can have some label in SVN that matches the packages that we release as opposed to having to track a specific revision in the branch. > From skenny at uchicago.edu Wed Apr 13 14:09:34 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 13 Apr 2011 12:09:34 -0700 Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: <1302720825.28949.1.camel@blabla2.none> References: <20110413180228.82DF79CCBB@svn.ci.uchicago.edu> <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> <1302719425.28436.1.camel@blabla2.none> <1302720825.28949.1.camel@blabla2.none> Message-ID: alrighty, so, does this seem right to everyone? http://www.ci.uchicago.edu/~skenny/swift/downloads/index.php i'm doing a run thru of the site tester now building from tags/release-0.92.1 as a sanity test and assuming all goes well that's what i will use to build the binary. let me know if this sounds ok or if i'm missing anything here. On Wed, Apr 13, 2011 at 11:53 AM, Mihael Hategan wrote: > On Wed, 2011-04-13 at 11:33 -0700, Sarah Kenny wrote: > > and should the binary then still be called 0.92 rather than 0.92.1 ? > > The binary would be called 0.92.1. > > > > being that 0.92 had such a substantial flaw and a warning was posted > > on the site, it seems to me that we'd want to make it clear (to users) > > that this is a new version/release by naming it 0.92.1...does tagging > > adequately solve this? > > Tagging doesn't solve that. Tagging is there so that we can have some > label in SVN that matches the packages that we release as opposed to > having to track a specific revision in the branch. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Apr 13 14:13:02 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 13 Apr 2011 12:13:02 -0700 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: References: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> <1302671913.26813.0.camel@blabla2.none> Message-ID: <1302721982.29045.17.camel@blabla2.none> On Wed, 2011-04-13 at 09:43 -0500, Ketan Maheshwari wrote: > Mihael, > > > > Some are interesting. Do you want any of them answered? > > May be, I would like to know more on these: > > >> -- Is this similar to map/reduce?, Are you planning to compare Swift/coasters with map/reduce? A full comparison is not a short thing. I guess if I was to sum it up in few words then map/reduce is a system to run two functions/combinators (map and reduce) on a distributed system whereas Swift is a Turing-complete language that can run arbitrary compositions of apps on a distributed system, including map/reduce like things. In other words Swift can do MapReduce, but MapReduce cannot do Swift. > >> -- Would it benefit me if I chuck Python and use Swift instead for my GIS related app? Maybe, maybe not. If your GIS related app adds two number then probably not. > > >> -- Is the coasters scheduling algorithm published somewhere? I need to finish that paper, and I need help with the numbers, so not yet. From hategan at mcs.anl.gov Wed Apr 13 14:14:36 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 13 Apr 2011 12:14:36 -0700 Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: References: <20110413180228.82DF79CCBB@svn.ci.uchicago.edu> <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> <1302719425.28436.1.camel@blabla2.none> <1302720825.28949.1.camel@blabla2.none> Message-ID: <1302722076.29045.19.camel@blabla2.none> On Wed, 2011-04-13 at 12:09 -0700, Sarah Kenny wrote: > alrighty, so, does this seem right to everyone? Almost. The instructions for checking out the source code are not quite right. > > http://www.ci.uchicago.edu/~skenny/swift/downloads/index.php > > i'm doing a run thru of the site tester now building from > tags/release-0.92.1 as a sanity test and assuming all goes well that's > what i will use to build the binary. > > let me know if this sounds ok or if i'm missing anything here. > > On Wed, Apr 13, 2011 at 11:53 AM, Mihael Hategan > wrote: > On Wed, 2011-04-13 at 11:33 -0700, Sarah Kenny wrote: > > and should the binary then still be called 0.92 rather than > 0.92.1 ? > > > The binary would be called 0.92.1. > > > > being that 0.92 had such a substantial flaw and a warning > was posted > > on the site, it seems to me that we'd want to make it clear > (to users) > > that this is a new version/release by naming it > 0.92.1...does tagging > > adequately solve this? > > > Tagging doesn't solve that. Tagging is there so that we can > have some > label in SVN that matches the packages that we release as > opposed to > having to track a specific revision in the branch. > > > > > From skenny at uchicago.edu Wed Apr 13 14:35:02 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 13 Apr 2011 12:35:02 -0700 Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: <1302722076.29045.19.camel@blabla2.none> References: <20110413180228.82DF79CCBB@svn.ci.uchicago.edu> <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> <1302719425.28436.1.camel@blabla2.none> <1302720825.28949.1.camel@blabla2.none> <1302722076.29045.19.camel@blabla2.none> Message-ID: they should get it directly from tags instead of svn co https://svn.ci.uchicago.edu/svn/vdl2/branches/release-0.92 swift ? On Wed, Apr 13, 2011 at 12:14 PM, Mihael Hategan wrote: > On Wed, 2011-04-13 at 12:09 -0700, Sarah Kenny wrote: > > alrighty, so, does this seem right to everyone? > > Almost. The instructions for checking out the source code are not quite > right. > > > > http://www.ci.uchicago.edu/~skenny/swift/downloads/index.php > > > > i'm doing a run thru of the site tester now building from > > tags/release-0.92.1 as a sanity test and assuming all goes well that's > > what i will use to build the binary. > > > > let me know if this sounds ok or if i'm missing anything here. > > > > On Wed, Apr 13, 2011 at 11:53 AM, Mihael Hategan > > wrote: > > On Wed, 2011-04-13 at 11:33 -0700, Sarah Kenny wrote: > > > and should the binary then still be called 0.92 rather than > > 0.92.1 ? > > > > > > The binary would be called 0.92.1. > > > > > > being that 0.92 had such a substantial flaw and a warning > > was posted > > > on the site, it seems to me that we'd want to make it clear > > (to users) > > > that this is a new version/release by naming it > > 0.92.1...does tagging > > > adequately solve this? > > > > > > Tagging doesn't solve that. Tagging is there so that we can > > have some > > label in SVN that matches the packages that we release as > > opposed to > > having to track a specific revision in the branch. > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Wed Apr 13 14:47:05 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 13 Apr 2011 14:47:05 -0500 Subject: [Swift-devel] [Bug 343] New: Add support for optional input and output files In-Reply-To: <1302635352.24772.27.camel@blabla2.none> References: <1302635352.24772.27.camel@blabla2.none> Message-ID: So then running a reduce on an array with optional arguments will return nothing. However, if we need to reduce on the elements that were actually returned to the array we would call something like reduce( extract( a[] ), "+" )? Is that what you are saying? That in order to implement the optional output files a function would need to be defined to extract all the elements that are mapped from the optional array and this will all the user to run a reduce on the extracted values? If this is indeed how it will work then it will work fine for my case. I am not sure what parvis is doing so I cannot say it will work for them but for the Montage wrappers it will work and will clean up code on my end. On Tue, Apr 12, 2011 at 2:09 PM, Mihael Hategan wrote: > I'm assuming we won't statically track optional data. In a static typing > scenario, all optional data would need to be declared as such. This > would be similar to the Maybe type in Haskell. I'm assuming we don't > want to do that. Instead, optional types would be dynamic types. This > would allow one to use an app defined with non-optional types with > optional data. > > The typing rules would go something like this: > 1. f: X -> Y > type(f(Nothing/X)) = Nothing/Y > type(f(Just X)) = Just Y > > 2. For a composite type Y = X1 x X2 x... x Xn, type(Y) = Nothing if any > Xi = Nothing, type(Y) = Just Y if all Xi = Just Xi. > > 3. Corollary of 1 and 2 is that f: X1 x X2 -> Y, f(x1, Nothing) = > f(Nothing, x2) = f(Nothing, Nothing) = Nothing. This can be generalized. > > We should have an additional operator (catMaybes in Haskell) which > extracts the present values from an array. In other words (and we need a > name for it), ~([maybe x]) = [x}. > > There might be some contention here. I'm saying that a reduce operating > on an array of optional data should by default return nothing if any of > the array elements is a nothing. I think this should be done if we are > to have consistency. Reduce is the successive application of some > function to the elements of a list: > > reduce(a[], "+") = (...((a[0] + a[1]) + a[2]) + ... ) + a[n]) > If, by the first rule (which I think is fundamental) "+"(Just x, > Nothing) = Nothing, it can be easily seen that reduce(a[], "+") = > Nothing if any a[i] = Nothing. > > In order to reduce only the Just values, there would need to be a way to > extract only those from the array. > > Thoughts? Questions? > > Mihael > > On Mon, 2011-04-11 at 14:20 -0500, Jonathan Monette wrote: > > Well the case I have in my scripts would be to only run the reduce on > > the available elements in the array. I am not sure why the other case > > would be valid. Not doin the reduce on the array because an outfile > > was not mapped is the same as what swift currently does. The only > > difference is that instead of causing the swift system to fail it just > > tries to continue on the execution. > > > > On Apr 11, 2011 2:06 PM, "Mihael Hategan" wrote: > > > > > -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. - Albert Einstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Apr 13 15:05:20 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 13 Apr 2011 15:05:20 -0500 (CDT) Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: Message-ID: <201274858.99144.1302725120495.JavaMail.root@zimbra.anl.gov> Also, will there be a swift_0.92.1 tag in CoG SVN? It would be nice to have identical tag names to check out under both repositories. - Mike ----- Original Message ----- they should get it directly from tags instead of svn co https://svn.ci.uchicago.edu/svn/vdl2/branches/release-0.92 swift ? On Wed, Apr 13, 2011 at 12:14 PM, Mihael Hategan < hategan at mcs.anl.gov > wrote: On Wed, 2011-04-13 at 12:09 -0700, Sarah Kenny wrote: > alrighty, so, does this seem right to everyone? Almost. The instructions for checking out the source code are not quite right. > > http://www.ci.uchicago.edu/~skenny/swift/downloads/index.php > > i'm doing a run thru of the site tester now building from > tags/release-0.92.1 as a sanity test and assuming all goes well that's > what i will use to build the binary. > > let me know if this sounds ok or if i'm missing anything here. > > On Wed, Apr 13, 2011 at 11:53 AM, Mihael Hategan < hategan at mcs.anl.gov > > wrote: > On Wed, 2011-04-13 at 11:33 -0700, Sarah Kenny wrote: > > and should the binary then still be called 0.92 rather than > 0.92.1 ? > > > The binary would be called 0.92.1. > > > > being that 0.92 had such a substantial flaw and a warning > was posted > > on the site, it seems to me that we'd want to make it clear > (to users) > > that this is a new version/release by naming it > 0.92.1...does tagging > > adequately solve this? > > > Tagging doesn't solve that. Tagging is there so that we can > have some > label in SVN that matches the packages that we release as > opposed to > having to track a specific revision in the branch. > > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Apr 13 15:12:58 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 13 Apr 2011 13:12:58 -0700 Subject: [Swift-devel] [Bug 343] New: Add support for optional input and output files In-Reply-To: References: <1302635352.24772.27.camel@blabla2.none> Message-ID: <1302725578.29432.1.camel@blabla2.none> Yes. There might be variations on that, such as: extractOne(value, defaultValue) extract(array, defaultValue) The first is obvious and the second would preserve the array shape. On Wed, 2011-04-13 at 14:47 -0500, Jonathan Monette wrote: > So then running a reduce on an array with optional arguments will > return nothing. However, if we need to reduce on the elements that > were actually returned to the array we would call something like > reduce( extract( a[] ), "+" )? Is that what you are saying? That in > order to implement the optional output files a function would need to > be defined to extract all the elements that are mapped from the > optional array and this will all the user to run a reduce on the > extracted values? > > > If this is indeed how it will work then it will work fine for my > case. I am not sure what parvis is doing so I cannot say it will work > for them but for the Montage wrappers it will work and will clean up > code on my end. > > On Tue, Apr 12, 2011 at 2:09 PM, Mihael Hategan > wrote: > I'm assuming we won't statically track optional data. In a > static typing > scenario, all optional data would need to be declared as such. > This > would be similar to the Maybe type in Haskell. I'm assuming we > don't > want to do that. Instead, optional types would be dynamic > types. This > would allow one to use an app defined with non-optional types > with > optional data. > > The typing rules would go something like this: > 1. f: X -> Y > type(f(Nothing/X)) = Nothing/Y > type(f(Just X)) = Just Y > > 2. For a composite type Y = X1 x X2 x... x Xn, type(Y) = > Nothing if any > Xi = Nothing, type(Y) = Just Y if all Xi = Just Xi. > > 3. Corollary of 1 and 2 is that f: X1 x X2 -> Y, f(x1, > Nothing) = > f(Nothing, x2) = f(Nothing, Nothing) = Nothing. This can be > generalized. > > We should have an additional operator (catMaybes in Haskell) > which > extracts the present values from an array. In other words (and > we need a > name for it), ~([maybe x]) = [x}. > > There might be some contention here. I'm saying that a reduce > operating > on an array of optional data should by default return nothing > if any of > the array elements is a nothing. I think this should be done > if we are > to have consistency. Reduce is the successive application of > some > function to the elements of a list: > > reduce(a[], "+") = (...((a[0] + a[1]) + a[2]) + ... ) + a[n]) > If, by the first rule (which I think is fundamental) "+"(Just > x, > Nothing) = Nothing, it can be easily seen that reduce(a[], > "+") = > Nothing if any a[i] = Nothing. > > In order to reduce only the Just values, there would need to > be a way to > extract only those from the array. > > Thoughts? Questions? > > Mihael > > > On Mon, 2011-04-11 at 14:20 -0500, Jonathan Monette wrote: > > Well the case I have in my scripts would be to only run the > reduce on > > the available elements in the array. I am not sure why the > other case > > would be valid. Not doin the reduce on the array because an > outfile > > was not mapped is the same as what swift currently does. The > only > > difference is that instead of causing the swift system to > fail it just > > tries to continue on the execution. > > > > On Apr 11, 2011 2:06 PM, "Mihael Hategan" > wrote: > > > > > > > > > -- > Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction. > - Albert Einstein > > > From hategan at mcs.anl.gov Wed Apr 13 15:13:45 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 13 Apr 2011 13:13:45 -0700 Subject: [Swift-devel] Re: [Swift-commit] r4363 - branches In-Reply-To: References: <20110413180228.82DF79CCBB@svn.ci.uchicago.edu> <1662148760.98208.1302718998586.JavaMail.root@zimbra.anl.gov> <1302719425.28436.1.camel@blabla2.none> <1302720825.28949.1.camel@blabla2.none> <1302722076.29045.19.camel@blabla2.none> Message-ID: <1302725625.29432.2.camel@blabla2.none> On Wed, 2011-04-13 at 12:35 -0700, Sarah Kenny wrote: > they should get it directly from tags instead of svn co > https://svn.ci.uchicago.edu/svn/vdl2/branches/release-0.92 swift ? Yes. That's because the branch may change. If we discover another big bug we may have 0.92.2. > > On Wed, Apr 13, 2011 at 12:14 PM, Mihael Hategan > wrote: > On Wed, 2011-04-13 at 12:09 -0700, Sarah Kenny wrote: > > alrighty, so, does this seem right to everyone? > > > Almost. The instructions for checking out the source code are > not quite > right. > > > > > http://www.ci.uchicago.edu/~skenny/swift/downloads/index.php > > > > i'm doing a run thru of the site tester now building from > > tags/release-0.92.1 as a sanity test and assuming all goes > well that's > > what i will use to build the binary. > > > > let me know if this sounds ok or if i'm missing anything > here. > > > > On Wed, Apr 13, 2011 at 11:53 AM, Mihael Hategan > > > wrote: > > On Wed, 2011-04-13 at 11:33 -0700, Sarah Kenny > wrote: > > > and should the binary then still be called 0.92 > rather than > > 0.92.1 ? > > > > > > The binary would be called 0.92.1. > > > > > > being that 0.92 had such a substantial flaw and a > warning > > was posted > > > on the site, it seems to me that we'd want to make > it clear > > (to users) > > > that this is a new version/release by naming it > > 0.92.1...does tagging > > > adequately solve this? > > > > > > Tagging doesn't solve that. Tagging is there so that > we can > > have some > > label in SVN that matches the packages that we > release as > > opposed to > > having to track a specific revision in the branch. > > > > > > > > > > > > > From wilde at mcs.anl.gov Wed Apr 13 16:08:34 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 13 Apr 2011 16:08:34 -0500 (CDT) Subject: [Swift-devel] Swift developer meeting today - 4PM CDT In-Reply-To: <1752683330.99235.1302725886878.JavaMail.root@zimbra.anl.gov> Message-ID: <1872170322.99625.1302728914073.JavaMail.root@zimbra.anl.gov> Lets meet on Skype; Ketan and Justin feel free to use my office. Agenda: - completion of 0.92.1 - update on User Guide plan (Justin, Ketan) - next steps on the test suite - release and update procedures - starting on 0.93; - bugzilla review -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:32:24 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:32:24 -0500 (CDT) Subject: [Swift-devel] [Bug 261] update.sh script (for pushing web content live) gives errors In-Reply-To: References: Message-ID: <20110413213224.E44771C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=261 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #3 from Justin Wozniak 2011-04-13 16:32:24 --- Fixed. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:39:18 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:39:18 -0500 (CDT) Subject: [Swift-devel] [Bug 343] Add support for optional input and output files In-Reply-To: References: Message-ID: <20110413213918.4B7A31C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=343 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- Version|1.0 |0.92 OS/Version|Mac OS |All -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:48:55 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:48:55 -0500 (CDT) Subject: [Swift-devel] [Bug 23] exception-like structure In-Reply-To: References: Message-ID: <20110413214855.567C81C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=23 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov Target Milestone|v0.3 |UNDEFINED -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:50:22 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:50:22 -0500 (CDT) Subject: [Swift-devel] [Bug 26] implement 'swiftstat' In-Reply-To: References: Message-ID: <20110413215022.BDE061C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=26 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov Target Milestone|--- |UNDEFINED -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:51:43 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:51:43 -0500 (CDT) Subject: [Swift-devel] [Bug 29] Staging out of temporary files In-Reply-To: References: Message-ID: <20110413215143.E9C841C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=29 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov Target Milestone|v0.3 |UNDEFINED -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:51:52 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:51:52 -0500 (CDT) Subject: [Swift-devel] [Bug 40] source location indication in execution-time error messages In-Reply-To: References: Message-ID: <20110413215152.AA2D71C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=40 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov AssignedTo|benc at hawaga.org.uk |wozniak at mcs.anl.gov -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:52:52 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:52:52 -0500 (CDT) Subject: [Swift-devel] [Bug 48] default value handling in MapperParam doesn't happen in all cases In-Reply-To: References: Message-ID: <20110413215252.A53661C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=48 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov Version|unspecified |0.93 AssignedTo|hategan at mcs.anl.gov |skenny at uchicago.edu -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:53:28 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:53:28 -0500 (CDT) Subject: [Swift-devel] [Bug 50] ability to add mappers at run (start-of-execution) time In-Reply-To: References: Message-ID: <20110413215328.8CD571C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=50 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov Target Milestone|--- |UNDEFINED -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:54:45 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:54:45 -0500 (CDT) Subject: [Swift-devel] [Bug 51] @random function In-Reply-To: References: Message-ID: <20110413215445.A27EC1C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=51 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov AssignedTo|hategan at mcs.anl.gov |wozniak at mcs.anl.gov Target Milestone|--- |v1.0 -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:56:13 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:56:13 -0500 (CDT) Subject: [Swift-devel] [Bug 55] workflow hangs when accessing uninitialised array member In-Reply-To: References: Message-ID: <20110413215613.D39E41C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=55 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov Version|unspecified |0.93 AssignedTo|hategan at mcs.anl.gov |skenny at uchicago.edu -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:57:10 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:57:10 -0500 (CDT) Subject: [Swift-devel] [Bug 61] semantics of [*] and multi-return-values need clarifying In-Reply-To: References: Message-ID: <20110413215710.E4BDD1C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=61 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov Version|unspecified |0.93 AssignedTo|gabri.turcu at gmail.com |wozniak at mcs.anl.gov -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 16:58:55 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 16:58:55 -0500 (CDT) Subject: [Swift-devel] [Bug 79] execute cleanup jobs through different mechanism to 'bulk' jobs In-Reply-To: References: Message-ID: <20110413215855.3C65E1C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=79 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wozniak at mcs.anl.gov Version|unspecified |0.93 AssignedTo|nobody at mcs.anl.gov |wozniak at mcs.anl.gov -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 17:09:19 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 17:09:19 -0500 (CDT) Subject: [Swift-devel] [Bug 40] source location indication in execution-time error messages In-Reply-To: References: Message-ID: <20110413220919.E90841C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=40 Justin Wozniak changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #4 from Justin Wozniak 2011-04-13 17:09:19 --- I think Comment #2 indicates this is fixed. I do not think this bug is related to Swift array indexing; IndexOutOfBoundsException was just used as an example of a Swift bug that could more easily be tracked down by Swift developers if application-level information was available. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed Apr 13 17:13:02 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 13 Apr 2011 17:13:02 -0500 (CDT) Subject: [Swift-devel] [Bug 319] Set logging level via swift.properties to a few pre-defined levels In-Reply-To: References: Message-ID: <20110413221302.65E0D1C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=319 ketan changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ketan at mcs.anl.gov --- Comment #2 from ketan 2011-04-13 17:13:02 --- This is required urgently since the large runs ongoing on Beagle are generating very large log files which are heavyweight for processing. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From benc at hawaga.org.uk Thu Apr 14 04:59:31 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 14 Apr 2011 09:59:31 +0000 (GMT) Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> References: <35773289-DCA9-42A4-BAF2-A7BB226A5741@gmail.com> Message-ID: > -- The simplicity of script-code seems deceptive!?! If that's refering to the script in the lower left corner, I guess it omits a bunch of stuff. Whether that's "deceptive" is rather subjective - for example, its ommitting the mappings (which at one point were a big part of the swift story). Now its probably ok to cough politely and get away with that for str_roots, but the code as given seems to assign data_files[0] many times without it being clear from the fragment that data_files[0] is going to be different each time round (presumably it is?). Mapping often seems to be a user-effort-sink, so discarding them and then saying "easy!" is kinda cheeky... From wilde at mcs.anl.gov Thu Apr 14 05:41:38 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 14 Apr 2011 05:41:38 -0500 (CDT) Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: Message-ID: <331735507.100928.1302777698194.JavaMail.root@zimbra.anl.gov> That seems a valid criticism that we should address. In this case the entire script is fairly compact, so we could put the whole script on the poster and just enlarge the highlights shown here (and the mappings as well if desired). We should move these to a new Case Studies section of the Swift web that we have long proposed. The format can be a more visually interesting version of the annotated script walkthroughs that we put in the Parco paper. (By the way, at least one NCAR researcher was highly complementary of the Parco paper and said that it really clarified Swift for her). - Mike ----- Original Message ----- > > -- The simplicity of script-code seems deceptive!?! > > If that's refering to the script in the lower left corner, I guess it > omits a bunch of stuff. Whether that's "deceptive" is rather > subjective - > for example, its ommitting the mappings (which at one point were a big > part of the swift story). Now its probably ok to cough politely and > get > away with that for str_roots, but the code as given seems to assign > data_files[0] many times without it being clear from the fragment that > data_files[0] is going to be different each time round (presumably it > is?). Mapping often seems to be a user-effort-sink, so discarding them > and > then saying "easy!" is kinda cheeky... > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dsk at ci.uchicago.edu Thu Apr 14 05:51:14 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Apr 2011 06:51:14 -0400 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: <331735507.100928.1302777698194.JavaMail.root@zimbra.anl.gov> References: <331735507.100928.1302777698194.JavaMail.root@zimbra.anl.gov> Message-ID: In the Montage presentation I have, I include the code (from Jon) that runs a simple version of Montage, and one of the wrapper examples, as follows. This seems to me to be sufficiently complete for a talk. I use two slides, one for the code, and one for the wrapper. import {?} MosaicData header <"header.hdr">; Table images_tbl <"images.tbl">; Image mosaic <"final/mosaic.fits">; Image projected_images[]; Image raw_image_files[] ; projected_images = mProjectBatch( raw_image_files, header ); images_tbl = mImgtbl( projected_images ); mosaic = mAdd( projected_images, images_tbl, header ); ( Image proj_imgs[] ) mProjectBatch( Image raw_imgs[], MosaicData hdr ) { foreach img, i in raw_imgs { Image proj_img ; proj_img = mProject( img, hdr ); proj_imgs[ i ] = proj_img; } } app ( Image proj_img ) mProject( Image raw_img, MosaicData hdr ) { mProject "-X" @raw_img @proj_img @hdr; } On Apr 14, 2011, at 6:41 AM, Michael Wilde wrote: > That seems a valid criticism that we should address. In this case the entire script is fairly compact, so we could put the whole script on the poster and just enlarge the highlights shown here (and the mappings as well if desired). > > We should move these to a new Case Studies section of the Swift web that we have long proposed. > > The format can be a more visually interesting version of the annotated script walkthroughs that we put in the Parco paper. (By the way, at least one NCAR researcher was highly complementary of the Parco paper and said that it really clarified Swift for her). > > - Mike > > ----- Original Message ----- >>> -- The simplicity of script-code seems deceptive!?! >> >> If that's refering to the script in the lower left corner, I guess it >> omits a bunch of stuff. Whether that's "deceptive" is rather >> subjective - >> for example, its ommitting the mappings (which at one point were a big >> part of the swift story). Now its probably ok to cough politely and >> get >> away with that for str_roots, but the code as given seems to assign >> data_files[0] many times without it being clear from the fragment that >> data_files[0] is going to be different each time round (presumably it >> is?). Mapping often seems to be a user-effort-sink, so discarding them >> and >> then saying "easy!" is kinda cheeky... >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From wilde at mcs.anl.gov Thu Apr 14 06:04:00 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 14 Apr 2011 06:04:00 -0500 (CDT) Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: Message-ID: <1613010367.100951.1302779040182.JavaMail.root@zimbra.anl.gov> That looks good for a talk, Dan. The advantage there is that the speaker can point out the topics and describe the code. The challenge on a poster is that no one reads a lot of fine print code. So taking code similar to whats on your talk slides, and "magnifying" the main statements to a much larger font, may accomplish the desired effect. It would be good to continue to hone a set of talks, posters, and the online material so that we get the message across in a way that shows the strengths of Swift. Given enough space, we can show the inevitably messy details of programming real cases, while keeping the details in perspective. All that said, I want to thank Ketan and Justin immensely for putting the whole CCA11 effort together - abstract, cloud runs, plots, poster, and talk - with very little help from me and in a very short timeframe along with many competing responsibilities. That was very nice work!!! - Mike ----- Original Message ----- > In the Montage presentation I have, I include the code (from Jon) that > runs a simple version of Montage, and one of the wrapper examples, as > follows. This seems to me to be sufficiently complete for a talk. I > use two slides, one for the code, and one for the wrapper. > > > > > import {?} > > MosaicData header <"header.hdr">; > Table images_tbl <"images.tbl">; > Image mosaic <"final/mosaic.fits">; > Image projected_images[]; > > Image raw_image_files[] = ".fits">; > projected_images = mProjectBatch( raw_image_files, header ); > images_tbl = mImgtbl( projected_images ); > mosaic = mAdd( projected_images, images_tbl, header ); > > > ( Image proj_imgs[] ) mProjectBatch( Image raw_imgs[], MosaicData hdr > ) > { > foreach img, i in raw_imgs > { > Image proj_img transform = "proj_dir/proj_\\1">; > proj_img = mProject( img, hdr ); > proj_imgs[ i ] = proj_img; > } > } > > app ( Image proj_img ) mProject( Image raw_img, MosaicData hdr ) > { > mProject "-X" @raw_img @proj_img @hdr; > } > > > > On Apr 14, 2011, at 6:41 AM, Michael Wilde wrote: > > > That seems a valid criticism that we should address. In this case > > the entire script is fairly compact, so we could put the whole > > script on the poster and just enlarge the highlights shown here (and > > the mappings as well if desired). > > > > We should move these to a new Case Studies section of the Swift web > > that we have long proposed. > > > > The format can be a more visually interesting version of the > > annotated script walkthroughs that we put in the Parco paper. (By > > the way, at least one NCAR researcher was highly complementary of > > the Parco paper and said that it really clarified Swift for her). > > > > - Mike > > > > ----- Original Message ----- > >>> -- The simplicity of script-code seems deceptive!?! > >> > >> If that's refering to the script in the lower left corner, I guess > >> it > >> omits a bunch of stuff. Whether that's "deceptive" is rather > >> subjective - > >> for example, its ommitting the mappings (which at one point were a > >> big > >> part of the swift story). Now its probably ok to cough politely and > >> get > >> away with that for str_roots, but the code as given seems to assign > >> data_files[0] many times without it being clear from the fragment > >> that > >> data_files[0] is going to be different each time round (presumably > >> it > >> is?). Mapping often seems to be a user-effort-sink, so discarding > >> them > >> and > >> then saying "easy!" is kinda cheeky... > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Daniel S. Katz > University of Chicago > (773) 834-7186 (voice) > (773) 834-3700 (fax) > d.katz at ieee.org or dsk at ci.uchicago.edu > http://www.ci.uchicago.edu/~dsk/ -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dsk at ci.uchicago.edu Thu Apr 14 06:13:27 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Apr 2011 07:13:27 -0400 Subject: [Swift-devel] CCA-11 Swift Poster Visitor's Questions In-Reply-To: <1613010367.100951.1302779040182.JavaMail.root@zimbra.anl.gov> References: <1613010367.100951.1302779040182.JavaMail.root@zimbra.anl.gov> Message-ID: <7987AF7A-58A0-437B-A764-49AA06716E97@ci.uchicago.edu> On Apr 14, 2011, at 7:04 AM, Michael Wilde wrote: > That looks good for a talk, Dan. The advantage there is that the speaker can point out the topics and describe the code. > > The challenge on a poster is that no one reads a lot of fine print code. So taking code similar to whats on your talk slides, and "magnifying" the main statements to a much larger font, may accomplish the desired effect. > > It would be good to continue to hone a set of talks, posters, and the online material so that we get the message across in a way that shows the strengths of Swift. Given enough space, we can show the inevitably messy details of programming real cases, while keeping the details in perspective. > > All that said, I want to thank Ketan and Justin immensely for putting the whole CCA11 effort together - abstract, cloud runs, plots, poster, and talk - with very little help from me and in a very short timeframe along with many competing responsibilities. That was very nice work!!! Yes, I agree!! > > - Mike > > > ----- Original Message ----- >> In the Montage presentation I have, I include the code (from Jon) that >> runs a simple version of Montage, and one of the wrapper examples, as >> follows. This seems to me to be sufficiently complete for a talk. I >> use two slides, one for the code, and one for the wrapper. >> >> >> >> >> import {?} >> >> MosaicData header <"header.hdr">; >> Table images_tbl <"images.tbl">; >> Image mosaic <"final/mosaic.fits">; >> Image projected_images[]; >> >> Image raw_image_files[] > = ".fits">; >> projected_images = mProjectBatch( raw_image_files, header ); >> images_tbl = mImgtbl( projected_images ); >> mosaic = mAdd( projected_images, images_tbl, header ); >> >> >> ( Image proj_imgs[] ) mProjectBatch( Image raw_imgs[], MosaicData hdr >> ) >> { >> foreach img, i in raw_imgs >> { >> Image proj_img > transform = "proj_dir/proj_\\1">; >> proj_img = mProject( img, hdr ); >> proj_imgs[ i ] = proj_img; >> } >> } >> >> app ( Image proj_img ) mProject( Image raw_img, MosaicData hdr ) >> { >> mProject "-X" @raw_img @proj_img @hdr; >> } >> >> >> >> On Apr 14, 2011, at 6:41 AM, Michael Wilde wrote: >> >>> That seems a valid criticism that we should address. In this case >>> the entire script is fairly compact, so we could put the whole >>> script on the poster and just enlarge the highlights shown here (and >>> the mappings as well if desired). >>> >>> We should move these to a new Case Studies section of the Swift web >>> that we have long proposed. >>> >>> The format can be a more visually interesting version of the >>> annotated script walkthroughs that we put in the Parco paper. (By >>> the way, at least one NCAR researcher was highly complementary of >>> the Parco paper and said that it really clarified Swift for her). >>> >>> - Mike >>> >>> ----- Original Message ----- >>>>> -- The simplicity of script-code seems deceptive!?! >>>> >>>> If that's refering to the script in the lower left corner, I guess >>>> it >>>> omits a bunch of stuff. Whether that's "deceptive" is rather >>>> subjective - >>>> for example, its ommitting the mappings (which at one point were a >>>> big >>>> part of the swift story). Now its probably ok to cough politely and >>>> get >>>> away with that for str_roots, but the code as given seems to assign >>>> data_files[0] many times without it being clear from the fragment >>>> that >>>> data_files[0] is going to be different each time round (presumably >>>> it >>>> is?). Mapping often seems to be a user-effort-sink, so discarding >>>> them >>>> and >>>> then saying "easy!" is kinda cheeky... >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> -- >> Daniel S. Katz >> University of Chicago >> (773) 834-7186 (voice) >> (773) 834-3700 (fax) >> d.katz at ieee.org or dsk at ci.uchicago.edu >> http://www.ci.uchicago.edu/~dsk/ > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From wilde at mcs.anl.gov Thu Apr 14 09:11:34 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 14 Apr 2011 09:11:34 -0500 (CDT) Subject: [Swift-devel] Re: Fwd: Proposal for coaster service options In-Reply-To: <1302289669.1916.1.camel@blabla2.none> Message-ID: <1970471914.101384.1302790294954.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > On Fri, 2011-04-08 at 13:56 -0500, Michael Wilde wrote: > > Great! Can you echo back to the list the (external user visible) > > specs of what you did, ie the new options and behaviors? Or is that > > in a -help option? > > That is in the -help option. > > I essentially added -portfile (-S) and -localportfile (-W). When these > are used, the ports will be dynamic. > > The files are written before the "Started service" message is printed > by > the service. I havent tried this yet, but will the file(s) with the contact string be written in the case where the coaster service is running in the swift command's JVM? - Mike > > Mihael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu Apr 14 13:46:09 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 14 Apr 2011 11:46:09 -0700 Subject: [Swift-devel] Re: Fwd: Proposal for coaster service options In-Reply-To: <1970471914.101384.1302790294954.JavaMail.root@zimbra.anl.gov> References: <1970471914.101384.1302790294954.JavaMail.root@zimbra.anl.gov> Message-ID: <1302806769.333.0.camel@blabla2.none> On Thu, 2011-04-14 at 09:11 -0500, Michael Wilde wrote: > ----- Original Message ----- > > On Fri, 2011-04-08 at 13:56 -0500, Michael Wilde wrote: > > > Great! Can you echo back to the list the (external user visible) > > > specs of what you did, ie the new options and behaviors? Or is that > > > in a -help option? > > > > That is in the -help option. > > > > I essentially added -portfile (-S) and -localportfile (-W). When these > > are used, the ports will be dynamic. > > > > The files are written before the "Started service" message is printed > > by > > the service. > > I havent tried this yet, but will the file(s) with the contact string > be written in the case where the coaster service is running in the > swift command's JVM? No. I can add that. From hategan at mcs.anl.gov Thu Apr 14 14:12:07 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 14 Apr 2011 12:12:07 -0700 Subject: [Swift-devel] Re: Resume not working in 0.92? Please test. In-Reply-To: <1302045856.29787.0.camel@blabla2.none> References: <599769624.67355.1302033679071.JavaMail.root@zimbra.anl.gov> <1302045856.29787.0.camel@blabla2.none> Message-ID: <1302808327.651.0.camel@blabla2.none> On Tue, 2011-04-05 at 16:24 -0700, Mihael Hategan wrote: > On Tue, 2011-04-05 at 18:17 -0500, Ketan Maheshwari wrote: > > Mike, > > > > I tested and it seems, resume is broken in the trunk. > > Can't say I'm surprised there. I'll take a look. Has anybody checked 0.92? From hategan at mcs.anl.gov Thu Apr 14 14:28:17 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 14 Apr 2011 12:28:17 -0700 Subject: [Swift-devel] Re: Resume not working in 0.92? Please test. In-Reply-To: <1302808327.651.0.camel@blabla2.none> References: <599769624.67355.1302033679071.JavaMail.root@zimbra.anl.gov> <1302045856.29787.0.camel@blabla2.none> <1302808327.651.0.camel@blabla2.none> Message-ID: <1302809297.4214.0.camel@blabla2.none> Nevermind. Fixed in swift r4374. Mihael On Thu, 2011-04-14 at 12:12 -0700, Mihael Hategan wrote: > On Tue, 2011-04-05 at 16:24 -0700, Mihael Hategan wrote: > > On Tue, 2011-04-05 at 18:17 -0500, Ketan Maheshwari wrote: > > > Mike, > > > > > > I tested and it seems, resume is broken in the trunk. > > > > Can't say I'm surprised there. I'll take a look. > > Has anybody checked 0.92? > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From bugzilla-daemon at mcs.anl.gov Thu Apr 14 15:38:49 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 14 Apr 2011 15:38:49 -0500 (CDT) Subject: [Swift-devel] [Bug 357] New: Script hangs in staging on OSG Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=357 Summary: Script hangs in staging on OSG Product: Swift Version: 0.92 Platform: All OS/Version: Linux Status: ASSIGNED Severity: major Priority: P1 Component: Providers AssignedTo: hategan at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov CC: aespinosa at cs.uchicago.edu Allan's SCEC script is hanging after several hours of successful execution on approx. 10 OSG sites. Staging is via the gridftp provider. Execution is via coasters. It appears that staging for a single job never completes; then a short time later all staging hangs. There is info in recent email threads from Allan to swift-devel with replies from Mihael on the problem. This has now happened to 4 multi-hour runs, always after several hours of execution Two attached images show stage-in and stage-out events. Allan is trying to pin this down to a single transfer that may have triggered the hang. We have one log of the hang (1.8GB): dir: /home/aespinosa/workflows/cybershake/archive-runs/test -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 postproc-20110407-1438-i90jepr3.log -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu Apr 14 15:44:02 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 14 Apr 2011 15:44:02 -0500 (CDT) Subject: [Swift-devel] [Bug 357] Script hangs in staging on OSG In-Reply-To: References: Message-ID: <20110414204402.17C561C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=357 --- Comment #1 from Michael Wilde 2011-04-14 15:44:01 --- Plots of stagein and stageout activity showing the hangs are at: http://www.ci.uchicago.edu/~wilde/dostagein.sorted-start.png http://www.ci.uchicago.edu/~wilde/dostageout.sorted-start.png -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu Apr 14 16:54:31 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 14 Apr 2011 16:54:31 -0500 (CDT) Subject: [Swift-devel] [Bug 357] Script hangs in staging on OSG In-Reply-To: References: Message-ID: <20110414215431.0D8541C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=357 --- Comment #2 from Michael Wilde 2011-04-14 16:54:30 --- Log analysis shows: The first 10 transfers that hung were: bri$ grep START stagein.event | sort -n | head 1302218175.538 73795.3809998035 TEST_218_241_subfx.sgt-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START 1302218175.538 73795.3809998035 TEST_218_241_subfy.sgt-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START 1302218217.45 73753.4689998627 TEST_218_239_subfy.sgt-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START 1302218489.313 73481.6059999466 TEST_218_258_subfy.sgt-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START 1302218489.317 73481.6019999981 218_258.txt.variation-s0003-h0006-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START 1302218489.321 73481.5979998112 218_258.txt.variation-s0003-h0005-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START 1302218489.325 73481.5939998627 218_258.txt.variation-s0003-h0004-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START 1302218489.329 73481.5899999142 218_258.txt.variation-s0003-h0003-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START 1302218489.333 73481.5859999657 218_258.txt.variation-s0004-h0003-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START 1302218489.341 73481.5779998302 218_258.txt.variation-s0003-h0008-gridftp.pads.ci.uchicago.edu-UMissHEP__umiss001.hep.olemiss.edu START Events involved in the *first* file whose transfer hung (to OLEMiss) were: (note: it *looks* to me like the first transfer of this file to OLEMiss got clobbered by the job getting killed due to the replication timer. After that point, things started hanging. So replication is a suspect in this scenario.) ed *pr3.log 1 2011-04-07 14:39:05,314-0500 DEBUG Loader Max heap: 3817799680 /TEST_218_241_subfx.sgt 2011-04-07 18:12:16,776-0500 DEBUG vdl:execute2 JOB_START jobid=extract-91r7kd8k tr=extract arguments=[stat=TEST, extract_sgt=1, slon=-118.286, slat=34.0192, rupmodfile=gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/218/241/218_241.txt.variation-s0011-h0005, sgt_xfile=gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fx_644.sgt, sgt_yfile=gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fy_644.sgt, extract_sgt_xfile=gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfx.sgt, extract_sgt_yfile=gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfy.sgt] tmpdir=postproc-20110407-1438-i90jepr3/jobs/9/extract-91r7kd8k host=PADS / 2011-04-07 18:12:16,777-0500 INFO Execute Submit: in: postproc-20110407-1438-i90jepr3 command: /bin/bash shared/_swiftwrap extract-91r7kd8k -jobdir 9 -scratch -e /gpfs/pads/swift/aespinosa/science/cybershake/apps/JBSim3d/bin/jbsim3d -out stdout.txt -err stderr.txt -i -d gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST|gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/218/241|gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 -if /gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fy_644.sgt|/gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fx_644.sgt|/gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/218/241/218_241.txt.variation-s0011-h0005 -of gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfy.sgt|gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfx.sgt -k -cdmfile -status provider -a stat=TEST extract_sgt=1 slon=-118.286 slat=34.0192 rupmodfile=gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/218/241/218_241.txt.variation-s0011-h0005 sgt_xfile=gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fx_644.sgt sgt_yfile=gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fy_644.sgt extract_sgt_xfile=gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfx.sgt extract_sgt_yfile=gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfy.sgt / 2011-04-07 18:12:16,777-0500 INFO GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, identity=urn:0-13-155-6-1-1302205287944) is /bin/bash shared/_swiftwrap extract-91r7kd8k -jobdir 9 -scratch -e /gpfs/pads/swift/aespinosa/science/cybershake/apps/JBSim3d/bin/jbsim3d -out stdout.txt -err stderr.txt -i -d gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST|gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/218/241|gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 -if /gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fy_644.sgt|/gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fx_644.sgt|/gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/218/241/218_241.txt.variation-s0011-h0005 -of gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfy.sgt|gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfx.sgt -k -cdmfile -status provider -a stat=TEST extract_sgt=1 slon=-118.286 slat=34.0192 rupmodfile=gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/218/241/218_241.txt.variation-s0011-h0005 sgt_xfile=gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fx_644.sgt sgt_yfile=gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles/TEST/TEST_fy_644.sgt extract_sgt_xfile=gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfx.sgt extract_sgt_yfile=gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfy.sgt / 2011-04-07 18:13:25,080-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_START srcname=TEST_218_241_subfx.sgt srcdir=postproc-20110407-1438-i90jepr3/shared/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 srchost=PADS destdir=/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 desthost=gridftp.pads.ci.uchicago.edu provider=gsiftp / 2011-04-07 18:16:13,723-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_END srcname=TEST_218_241_subfx.sgt srcdir=postproc-20110407-1438-i90jepr3/shared/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 srchost=PADS destdir=/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 desthost=gridftp.pads.ci.uchicago.edu provider=gsiftp / 2011-04-07 18:16:15,538-0500 DEBUG vdl:dostagein CDM: gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfx.sgt : DEFAULT / 2011-04-07 18:16:15,538-0500 DEBUG vdl:dostageinfile FILE_STAGE_IN_START file=TEST_218_241_subfx.sgt srchost=gridftp.pads.ci.uchicago.edu srcdir=/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 srcname=TEST_218_241_subfx.sgt desthost=UMissHEP__umiss001.hep.olemiss.edu destdir=postproc-20110407-1438-i90jepr3/shared/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 provider=gsiftp policy=DEFAULT / 2011-04-07 18:16:15,924-0500 DEBUG vdl:dostagein CDM: gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241/TEST_218_241_subfx.sgt : DEFAULT / 2011-04-07 18:16:15,924-0500 DEBUG vdl:dostageinfile FILE_STAGE_IN_START file=TEST_218_241_subfx.sgt srchost=gridftp.pads.ci.uchicago.edu srcdir=/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 srcname=TEST_218_241_subfx.sgt desthost=Nebraska__red.unl.edu destdir=postproc-20110407-1438-i90jepr3/shared/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 provider=gsiftp policy=DEFAULT / 2011-04-07 18:28:54,689-0500 DEBUG vdl:dostageinfile FILE_STAGE_IN_END file=TEST_218_241_subfx.sgt srchost=gridftp.pads.ci.uchicago.edu srcdir=/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 srcname=TEST_218_241_subfx.sgt desthost=Nebraska__red.unl.edu destdir=postproc-20110407-1438-i90jepr3/shared/gpfs/pads/swift/aespinosa/science/cybershake/Results/TEST/218/241 provider=gsiftp -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu Apr 14 16:55:39 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 14 Apr 2011 16:55:39 -0500 (CDT) Subject: [Swift-devel] [Bug 357] Script hangs in staging on OSG In-Reply-To: References: Message-ID: <20110414215539.E4BD81C072@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=357 --- Comment #3 from Allan Espinosa 2011-04-14 16:55:39 --- Note that prior to those events, there are transfers to UMIssHEP that succeeded. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu Apr 14 17:32:44 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 14 Apr 2011 17:32:44 -0500 (CDT) Subject: [Swift-devel] [Bug 357] Script hangs in staging on OSG In-Reply-To: References: Message-ID: <20110414223244.BDC681C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=357 --- Comment #4 from Michael Wilde 2011-04-14 17:32:44 --- Please ignore my prior comment about the job getting killed by replication. Allan clarified that the inout file in question is sent to to unique jobs. So the only anomaly thats clear is that the transfer of that file, TEST_218_241_subfx.sgt, to UMiss, never completes. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu Apr 14 17:37:39 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 14 Apr 2011 17:37:39 -0500 (CDT) Subject: [Swift-devel] [Bug 357] Script hangs in staging on OSG In-Reply-To: References: Message-ID: <20110414223739.E43C31C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=357 --- Comment #5 from Michael Wilde 2011-04-14 17:37:39 --- Typos in comment 4 made it un-intelligible. That comment should have read: Please ignore my prior comment about the job getting killed by replication. Allan clarified that the *input* file in question is sent to *two* unique jobs. So the only anomaly thats clear is that the transfer of that file, TEST_218_241_subfx.sgt, to UMiss, never completes. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From wilde at mcs.anl.gov Thu Apr 14 19:45:45 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 14 Apr 2011 19:45:45 -0500 (CDT) Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: Message-ID: <1404037159.105632.1302828345315.JavaMail.root@zimbra.anl.gov> So you have 4 transfer threads and all 4 are waiting here: at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) (from throttle.transfers=4) Is this workflow hung, and if so, how are you determining that? Do you have another log plot of stagein and out? - Mike ----- Original Message ----- > Fresh traces (jstack and log) in > /home/aespinosa/workflows/cybershake/archive-runs/transfer-logging . > The swift log is a snapshot of the workflow that is still running. > > -Allan > > 2011/4/14 Mihael Hategan : > > One immediate question that I have is what's up with the deadline > > passed > > messages? > > > > That happens when jobs run for at least twice their advertised > > walltime > > and for some reason the site doesn't seem to cancel them. This may > > be > > indicative of notifications getting lost. > > > > As for the transfers, I don't see all transfers hanging after that. > > I > > mean there are transfers that complete ok. Though things do seem to > > slow > > down quite a bit, so that looks like a problem. > > > > Let's see what in the stack traces. In the mean time, I will see > > what it > > takes to get transfer progress messages. > > > > Mihael > > > > > > On Thu, 2011-04-14 at 17:28 -0500, Michael Wilde wrote: > >> bri$ pwd > >> /home/aespinosa/workflows/cybershake/archive-runs/test > >> bri$ ls -lt > >> total 1844128 > >> -rw-r--r-- 1 aespinosa ci-users 0 Apr 14 14:21 max-duration.tmp > >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:20 start-time.tmp > >> -rw-r--r-- 1 aespinosa ci-users 1433206 Apr 14 14:20 stagein.event > >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > >> sort-preserve2.tmp > >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > >> sort-preserve.tmp > >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:19 t.inf > >> -rw-r--r-- 1 aespinosa ci-users 2263727 Apr 14 12:51 > >> stagein.transition > >> -rw-r--r-- 1 aespinosa ci-users 8998897 Apr 14 12:31 stagein.log > >> -rw-r--r-- 1 aespinosa ci-users 92059 Apr 14 12:05 dostageout.event > >> -rw-r--r-- 1 aespinosa ci-users 97442 Apr 14 11:51 dostagein.event > >> -rw-r--r-- 1 aespinosa ci-users 2998 Apr 13 17:38 > >> dostagein.sorted-start.png > >> -rw-r--r-- 1 aespinosa ci-users 3080 Apr 13 17:38 > >> dostageout.sorted-start.png > >> -rw-r--r-- 1 aespinosa ci-users 3255 Apr 8 16:05 execute2-total.png > >> -rw-r--r-- 1 aespinosa ci-users 1533974 Apr 8 14:46 > >> postproc-20110407-1438-i90jepr3.0.rlog > >> -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 > >> postproc-20110407-1438-i90jepr3.log > >> drwxr-xr-x 2 aespinosa ci-users 32768 Apr 7 14:39 > >> postproc-20110407-1438-i90jepr3.d/ > >> bri$ > >> > >> runs, not run > >> > >> ALso see bridled: /tmp/mw1 > >> > >> ----- Original Message ----- > >> > [hategan at bridled tmp]$ cd > >> > ~aespinosa/workflows/cybershake/archive-run/test/ > >> > -bash: cd: > >> > /home/aespinosa/workflows/cybershake/archive-run/test/: No > >> > such file or directory > >> > > >> > On Thu, 2011-04-14 at 17:21 -0500, Allan Espinosa wrote: > >> > > ~aespinosa/workflows/cybershake/archive-run/test/postproc*.log > >> > > > >> > > 2011/4/14 Mihael Hategan : > >> > > > On Thu, 2011-04-14 at 15:57 -0500, Michael Wilde wrote: > >> > > >> While Allan continues to debug this, can you take a look at > >> > > >> the > >> > > >> (huge) log? > >> > > > > >> > > > Where is this log? > >> > > > > >> > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Thu Apr 14 19:49:05 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 14 Apr 2011 19:49:05 -0500 Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: <1404037159.105632.1302828345315.JavaMail.root@zimbra.anl.gov> References: <1404037159.105632.1302828345315.JavaMail.root@zimbra.anl.gov> Message-ID: Right now the logs only gives out messages about AbstractKarajanStreamChannel. I set the org.globus.ftp package's logging level to DEBUG, so entries should be reflected if there are transfers being made. -Allan 2011/4/14 Michael Wilde : > So you have 4 transfer threads and all 4 are waiting here: > > at java.net.SocketInputStream.socketRead0(Native Method) > ? ? ? ?at java.net.SocketInputStream.read(SocketInputStream.java:129) > > (from throttle.transfers=4) > > Is this workflow hung, and if so, how are you determining that? ?Do you have another log plot of stagein and out? > > - Mike > > > ----- Original Message ----- >> Fresh traces (jstack and log) in >> /home/aespinosa/workflows/cybershake/archive-runs/transfer-logging . >> The swift log is a snapshot of the workflow that is still running. >> >> -Allan >> >> 2011/4/14 Mihael Hategan : >> > One immediate question that I have is what's up with the deadline >> > passed >> > messages? >> > >> > That happens when jobs run for at least twice their advertised >> > walltime >> > and for some reason the site doesn't seem to cancel them. This may >> > be >> > indicative of notifications getting lost. >> > >> > As for the transfers, I don't see all transfers hanging after that. >> > I >> > mean there are transfers that complete ok. Though things do seem to >> > slow >> > down quite a bit, so that looks like a problem. >> > >> > Let's see what in the stack traces. In the mean time, I will see >> > what it >> > takes to get transfer progress messages. >> > >> > Mihael >> > >> > >> > On Thu, 2011-04-14 at 17:28 -0500, Michael Wilde wrote: >> >> bri$ pwd >> >> /home/aespinosa/workflows/cybershake/archive-runs/test >> >> bri$ ls -lt >> >> total 1844128 >> >> -rw-r--r-- 1 aespinosa ci-users 0 Apr 14 14:21 max-duration.tmp >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:20 start-time.tmp >> >> -rw-r--r-- 1 aespinosa ci-users 1433206 Apr 14 14:20 stagein.event >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 >> >> sort-preserve2.tmp >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 >> >> sort-preserve.tmp >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:19 t.inf >> >> -rw-r--r-- 1 aespinosa ci-users 2263727 Apr 14 12:51 >> >> stagein.transition >> >> -rw-r--r-- 1 aespinosa ci-users 8998897 Apr 14 12:31 stagein.log >> >> -rw-r--r-- 1 aespinosa ci-users 92059 Apr 14 12:05 dostageout.event >> >> -rw-r--r-- 1 aespinosa ci-users 97442 Apr 14 11:51 dostagein.event >> >> -rw-r--r-- 1 aespinosa ci-users 2998 Apr 13 17:38 >> >> dostagein.sorted-start.png >> >> -rw-r--r-- 1 aespinosa ci-users 3080 Apr 13 17:38 >> >> dostageout.sorted-start.png >> >> -rw-r--r-- 1 aespinosa ci-users 3255 Apr 8 16:05 execute2-total.png >> >> -rw-r--r-- 1 aespinosa ci-users 1533974 Apr 8 14:46 >> >> postproc-20110407-1438-i90jepr3.0.rlog >> >> -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 >> >> postproc-20110407-1438-i90jepr3.log >> >> drwxr-xr-x 2 aespinosa ci-users 32768 Apr 7 14:39 >> >> postproc-20110407-1438-i90jepr3.d/ >> >> bri$ >> >> >> >> runs, not run >> >> >> >> ALso see bridled: /tmp/mw1 >> >> >> >> ----- Original Message ----- >> >> > [hategan at bridled tmp]$ cd >> >> > ~aespinosa/workflows/cybershake/archive-run/test/ >> >> > -bash: cd: >> >> > /home/aespinosa/workflows/cybershake/archive-run/test/: No >> >> > such file or directory >> >> > >> >> > On Thu, 2011-04-14 at 17:21 -0500, Allan Espinosa wrote: >> >> > > ~aespinosa/workflows/cybershake/archive-run/test/postproc*.log >> >> > > >> >> > > 2011/4/14 Mihael Hategan : >> >> > > > On Thu, 2011-04-14 at 15:57 -0500, Michael Wilde wrote: >> >> > > >> While Allan continues to debug this, can you take a look at >> >> > > >> the >> >> > > >> (huge) log? >> >> > > > >> >> > > > Where is this log? >> >> > > > >> >> > > > > From wilde at mcs.anl.gov Thu Apr 14 20:11:18 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 14 Apr 2011 20:11:18 -0500 (CDT) Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: Message-ID: <1728872869.105665.1302829878754.JavaMail.root@zimbra.anl.gov> ALlan, what I meant was: do you have any evidence that this current run is hung (either in a similar manner to the one we looked at closely this morning, or in a different manner)? In this mornings log, you could tell from plots of stagein and stageout events that many these events were not completing after something triggered an error. Do you have similar plots or evidence of hangs regarding this run and its log? I dont know from browsing the traces if one would *naturally* expect the transfer threads to all be waiting on input sockets most of the time, or if seeing all 4 threads waiting on sockets is indicative of data transfer being totally hung. Mihael, I assume you can tell much more from these traces? - Mike ----- Original Message ----- > Right now the logs only gives out messages about > AbstractKarajanStreamChannel. I set the org.globus.ftp package's > logging level to DEBUG, so entries should be reflected if there are > transfers being made. > > -Allan > > 2011/4/14 Michael Wilde : > > So you have 4 transfer threads and all 4 are waiting here: > > > > at java.net.SocketInputStream.socketRead0(Native Method) > > ? ? ? ?at > > ? ? ? ?java.net.SocketInputStream.read(SocketInputStream.java:129) > > > > (from throttle.transfers=4) > > > > Is this workflow hung, and if so, how are you determining that? Do > > you have another log plot of stagein and out? > > > > - Mike > > > > > > ----- Original Message ----- > >> Fresh traces (jstack and log) in > >> /home/aespinosa/workflows/cybershake/archive-runs/transfer-logging > >> . > >> The swift log is a snapshot of the workflow that is still running. > >> > >> -Allan > >> > >> 2011/4/14 Mihael Hategan : > >> > One immediate question that I have is what's up with the deadline > >> > passed > >> > messages? > >> > > >> > That happens when jobs run for at least twice their advertised > >> > walltime > >> > and for some reason the site doesn't seem to cancel them. This > >> > may > >> > be > >> > indicative of notifications getting lost. > >> > > >> > As for the transfers, I don't see all transfers hanging after > >> > that. > >> > I > >> > mean there are transfers that complete ok. Though things do seem > >> > to > >> > slow > >> > down quite a bit, so that looks like a problem. > >> > > >> > Let's see what in the stack traces. In the mean time, I will see > >> > what it > >> > takes to get transfer progress messages. > >> > > >> > Mihael > >> > > >> > > >> > On Thu, 2011-04-14 at 17:28 -0500, Michael Wilde wrote: > >> >> bri$ pwd > >> >> /home/aespinosa/workflows/cybershake/archive-runs/test > >> >> bri$ ls -lt > >> >> total 1844128 > >> >> -rw-r--r-- 1 aespinosa ci-users 0 Apr 14 14:21 max-duration.tmp > >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:20 start-time.tmp > >> >> -rw-r--r-- 1 aespinosa ci-users 1433206 Apr 14 14:20 > >> >> stagein.event > >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > >> >> sort-preserve2.tmp > >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > >> >> sort-preserve.tmp > >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:19 t.inf > >> >> -rw-r--r-- 1 aespinosa ci-users 2263727 Apr 14 12:51 > >> >> stagein.transition > >> >> -rw-r--r-- 1 aespinosa ci-users 8998897 Apr 14 12:31 stagein.log > >> >> -rw-r--r-- 1 aespinosa ci-users 92059 Apr 14 12:05 > >> >> dostageout.event > >> >> -rw-r--r-- 1 aespinosa ci-users 97442 Apr 14 11:51 > >> >> dostagein.event > >> >> -rw-r--r-- 1 aespinosa ci-users 2998 Apr 13 17:38 > >> >> dostagein.sorted-start.png > >> >> -rw-r--r-- 1 aespinosa ci-users 3080 Apr 13 17:38 > >> >> dostageout.sorted-start.png > >> >> -rw-r--r-- 1 aespinosa ci-users 3255 Apr 8 16:05 > >> >> execute2-total.png > >> >> -rw-r--r-- 1 aespinosa ci-users 1533974 Apr 8 14:46 > >> >> postproc-20110407-1438-i90jepr3.0.rlog > >> >> -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 > >> >> postproc-20110407-1438-i90jepr3.log > >> >> drwxr-xr-x 2 aespinosa ci-users 32768 Apr 7 14:39 > >> >> postproc-20110407-1438-i90jepr3.d/ > >> >> bri$ > >> >> > >> >> runs, not run > >> >> > >> >> ALso see bridled: /tmp/mw1 > >> >> > >> >> ----- Original Message ----- > >> >> > [hategan at bridled tmp]$ cd > >> >> > ~aespinosa/workflows/cybershake/archive-run/test/ > >> >> > -bash: cd: > >> >> > /home/aespinosa/workflows/cybershake/archive-run/test/: No > >> >> > such file or directory > >> >> > > >> >> > On Thu, 2011-04-14 at 17:21 -0500, Allan Espinosa wrote: > >> >> > > ~aespinosa/workflows/cybershake/archive-run/test/postproc*.log > >> >> > > > >> >> > > 2011/4/14 Mihael Hategan : > >> >> > > > On Thu, 2011-04-14 at 15:57 -0500, Michael Wilde wrote: > >> >> > > >> While Allan continues to debug this, can you take a look > >> >> > > >> at > >> >> > > >> the > >> >> > > >> (huge) log? > >> >> > > > > >> >> > > > Where is this log? > >> >> > > > > >> >> > > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Apr 14 20:20:10 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 14 Apr 2011 20:20:10 -0500 (CDT) Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: <1728872869.105665.1302829878754.JavaMail.root@zimbra.anl.gov> Message-ID: <64037310.105677.1302830410568.JavaMail.root@zimbra.anl.gov> In other words, does it show the behavior of the attached plot? - Mike ----- Original Message ----- > ALlan, what I meant was: do you have any evidence that this current > run is hung (either in a similar manner to the one we looked at > closely this morning, or in a different manner)? > > In this mornings log, you could tell from plots of stagein and > stageout events that many these events were not completing after > something triggered an error. > > Do you have similar plots or evidence of hangs regarding this run and > its log? > > I dont know from browsing the traces if one would *naturally* expect > the transfer threads to all be waiting on input sockets most of the > time, or if seeing all 4 threads waiting on sockets is indicative of > data transfer being totally hung. > > Mihael, I assume you can tell much more from these traces? > > - Mike > > > ----- Original Message ----- > > Right now the logs only gives out messages about > > AbstractKarajanStreamChannel. I set the org.globus.ftp package's > > logging level to DEBUG, so entries should be reflected if there are > > transfers being made. > > > > -Allan > > > > 2011/4/14 Michael Wilde : > > > So you have 4 transfer threads and all 4 are waiting here: > > > > > > at java.net.SocketInputStream.socketRead0(Native Method) > > > ? ? ? ?at > > > ? ? ? ?java.net.SocketInputStream.read(SocketInputStream.java:129) > > > > > > (from throttle.transfers=4) > > > > > > Is this workflow hung, and if so, how are you determining that? Do > > > you have another log plot of stagein and out? > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > >> Fresh traces (jstack and log) in > > >> /home/aespinosa/workflows/cybershake/archive-runs/transfer-logging > > >> . > > >> The swift log is a snapshot of the workflow that is still > > >> running. > > >> > > >> -Allan > > >> > > >> 2011/4/14 Mihael Hategan : > > >> > One immediate question that I have is what's up with the > > >> > deadline > > >> > passed > > >> > messages? > > >> > > > >> > That happens when jobs run for at least twice their advertised > > >> > walltime > > >> > and for some reason the site doesn't seem to cancel them. This > > >> > may > > >> > be > > >> > indicative of notifications getting lost. > > >> > > > >> > As for the transfers, I don't see all transfers hanging after > > >> > that. > > >> > I > > >> > mean there are transfers that complete ok. Though things do > > >> > seem > > >> > to > > >> > slow > > >> > down quite a bit, so that looks like a problem. > > >> > > > >> > Let's see what in the stack traces. In the mean time, I will > > >> > see > > >> > what it > > >> > takes to get transfer progress messages. > > >> > > > >> > Mihael > > >> > > > >> > > > >> > On Thu, 2011-04-14 at 17:28 -0500, Michael Wilde wrote: > > >> >> bri$ pwd > > >> >> /home/aespinosa/workflows/cybershake/archive-runs/test > > >> >> bri$ ls -lt > > >> >> total 1844128 > > >> >> -rw-r--r-- 1 aespinosa ci-users 0 Apr 14 14:21 > > >> >> max-duration.tmp > > >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:20 start-time.tmp > > >> >> -rw-r--r-- 1 aespinosa ci-users 1433206 Apr 14 14:20 > > >> >> stagein.event > > >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > > >> >> sort-preserve2.tmp > > >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > > >> >> sort-preserve.tmp > > >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:19 t.inf > > >> >> -rw-r--r-- 1 aespinosa ci-users 2263727 Apr 14 12:51 > > >> >> stagein.transition > > >> >> -rw-r--r-- 1 aespinosa ci-users 8998897 Apr 14 12:31 > > >> >> stagein.log > > >> >> -rw-r--r-- 1 aespinosa ci-users 92059 Apr 14 12:05 > > >> >> dostageout.event > > >> >> -rw-r--r-- 1 aespinosa ci-users 97442 Apr 14 11:51 > > >> >> dostagein.event > > >> >> -rw-r--r-- 1 aespinosa ci-users 2998 Apr 13 17:38 > > >> >> dostagein.sorted-start.png > > >> >> -rw-r--r-- 1 aespinosa ci-users 3080 Apr 13 17:38 > > >> >> dostageout.sorted-start.png > > >> >> -rw-r--r-- 1 aespinosa ci-users 3255 Apr 8 16:05 > > >> >> execute2-total.png > > >> >> -rw-r--r-- 1 aespinosa ci-users 1533974 Apr 8 14:46 > > >> >> postproc-20110407-1438-i90jepr3.0.rlog > > >> >> -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 > > >> >> postproc-20110407-1438-i90jepr3.log > > >> >> drwxr-xr-x 2 aespinosa ci-users 32768 Apr 7 14:39 > > >> >> postproc-20110407-1438-i90jepr3.d/ > > >> >> bri$ > > >> >> > > >> >> runs, not run > > >> >> > > >> >> ALso see bridled: /tmp/mw1 > > >> >> > > >> >> ----- Original Message ----- > > >> >> > [hategan at bridled tmp]$ cd > > >> >> > ~aespinosa/workflows/cybershake/archive-run/test/ > > >> >> > -bash: cd: > > >> >> > /home/aespinosa/workflows/cybershake/archive-run/test/: No > > >> >> > such file or directory > > >> >> > > > >> >> > On Thu, 2011-04-14 at 17:21 -0500, Allan Espinosa wrote: > > >> >> > > ~aespinosa/workflows/cybershake/archive-run/test/postproc*.log > > >> >> > > > > >> >> > > 2011/4/14 Mihael Hategan : > > >> >> > > > On Thu, 2011-04-14 at 15:57 -0500, Michael Wilde wrote: > > >> >> > > >> While Allan continues to debug this, can you take a > > >> >> > > >> look > > >> >> > > >> at > > >> >> > > >> the > > >> >> > > >> (huge) log? > > >> >> > > > > > >> >> > > > Where is this log? > > >> >> > > > > > >> >> > > > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: dostagein.sorted-start.png Type: image/png Size: 2998 bytes Desc: not available URL: From hategan at mcs.anl.gov Thu Apr 14 20:51:16 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 14 Apr 2011 18:51:16 -0700 Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: <1404037159.105632.1302828345315.JavaMail.root@zimbra.anl.gov> References: <1404037159.105632.1302828345315.JavaMail.root@zimbra.anl.gov> Message-ID: <1302832276.13385.2.camel@blabla2.none> Well, that's barely hung unless the gridftp servers are hung, which may be. I would suggest upping the transfer throttle in this case. 4 may be cutting it too close. Maybe to 16. On Thu, 2011-04-14 at 19:45 -0500, Michael Wilde wrote: > So you have 4 transfer threads and all 4 are waiting here: > > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > > (from throttle.transfers=4) > > Is this workflow hung, and if so, how are you determining that? Do you have another log plot of stagein and out? > > - Mike > > > ----- Original Message ----- > > Fresh traces (jstack and log) in > > /home/aespinosa/workflows/cybershake/archive-runs/transfer-logging . > > The swift log is a snapshot of the workflow that is still running. > > > > -Allan > > > > 2011/4/14 Mihael Hategan : > > > One immediate question that I have is what's up with the deadline > > > passed > > > messages? > > > > > > That happens when jobs run for at least twice their advertised > > > walltime > > > and for some reason the site doesn't seem to cancel them. This may > > > be > > > indicative of notifications getting lost. > > > > > > As for the transfers, I don't see all transfers hanging after that. > > > I > > > mean there are transfers that complete ok. Though things do seem to > > > slow > > > down quite a bit, so that looks like a problem. > > > > > > Let's see what in the stack traces. In the mean time, I will see > > > what it > > > takes to get transfer progress messages. > > > > > > Mihael > > > > > > > > > On Thu, 2011-04-14 at 17:28 -0500, Michael Wilde wrote: > > >> bri$ pwd > > >> /home/aespinosa/workflows/cybershake/archive-runs/test > > >> bri$ ls -lt > > >> total 1844128 > > >> -rw-r--r-- 1 aespinosa ci-users 0 Apr 14 14:21 max-duration.tmp > > >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:20 start-time.tmp > > >> -rw-r--r-- 1 aespinosa ci-users 1433206 Apr 14 14:20 stagein.event > > >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > > >> sort-preserve2.tmp > > >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > > >> sort-preserve.tmp > > >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:19 t.inf > > >> -rw-r--r-- 1 aespinosa ci-users 2263727 Apr 14 12:51 > > >> stagein.transition > > >> -rw-r--r-- 1 aespinosa ci-users 8998897 Apr 14 12:31 stagein.log > > >> -rw-r--r-- 1 aespinosa ci-users 92059 Apr 14 12:05 dostageout.event > > >> -rw-r--r-- 1 aespinosa ci-users 97442 Apr 14 11:51 dostagein.event > > >> -rw-r--r-- 1 aespinosa ci-users 2998 Apr 13 17:38 > > >> dostagein.sorted-start.png > > >> -rw-r--r-- 1 aespinosa ci-users 3080 Apr 13 17:38 > > >> dostageout.sorted-start.png > > >> -rw-r--r-- 1 aespinosa ci-users 3255 Apr 8 16:05 execute2-total.png > > >> -rw-r--r-- 1 aespinosa ci-users 1533974 Apr 8 14:46 > > >> postproc-20110407-1438-i90jepr3.0.rlog > > >> -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 > > >> postproc-20110407-1438-i90jepr3.log > > >> drwxr-xr-x 2 aespinosa ci-users 32768 Apr 7 14:39 > > >> postproc-20110407-1438-i90jepr3.d/ > > >> bri$ > > >> > > >> runs, not run > > >> > > >> ALso see bridled: /tmp/mw1 > > >> > > >> ----- Original Message ----- > > >> > [hategan at bridled tmp]$ cd > > >> > ~aespinosa/workflows/cybershake/archive-run/test/ > > >> > -bash: cd: > > >> > /home/aespinosa/workflows/cybershake/archive-run/test/: No > > >> > such file or directory > > >> > > > >> > On Thu, 2011-04-14 at 17:21 -0500, Allan Espinosa wrote: > > >> > > ~aespinosa/workflows/cybershake/archive-run/test/postproc*.log > > >> > > > > >> > > 2011/4/14 Mihael Hategan : > > >> > > > On Thu, 2011-04-14 at 15:57 -0500, Michael Wilde wrote: > > >> > > >> While Allan continues to debug this, can you take a look at > > >> > > >> the > > >> > > >> (huge) log? > > >> > > > > > >> > > > Where is this log? > > >> > > > > > >> > > > > From hategan at mcs.anl.gov Thu Apr 14 20:53:10 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 14 Apr 2011 18:53:10 -0700 Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: <1728872869.105665.1302829878754.JavaMail.root@zimbra.anl.gov> References: <1728872869.105665.1302829878754.JavaMail.root@zimbra.anl.gov> Message-ID: <1302832390.13385.4.camel@blabla2.none> On Thu, 2011-04-14 at 20:11 -0500, Michael Wilde wrote: > ALlan, what I meant was: do you have any evidence that this current run is hung (either in a similar manner to the one we looked at closely this morning, or in a different manner)? > > In this mornings log, you could tell from plots of stagein and stageout events that many these events were not completing after something triggered an error. > > Do you have similar plots or evidence of hangs regarding this run and its log? > > I dont know from browsing the traces if one would *naturally* expect > the transfer threads to all be waiting on input sockets most of the > time, or if seeing all 4 threads waiting on sockets is indicative of > data transfer being totally hung. If nothing else happens in the log, then probably so. But the same could happen for very large files (or very slow servers). From jonmon at utexas.edu Fri Apr 15 07:02:54 2011 From: jonmon at utexas.edu (Jonathan S Monette) Date: Fri, 15 Apr 2011 07:02:54 -0500 Subject: [Swift-devel] link broken Message-ID: Hello, The website has a link broken. If you go to the documentation page and click on the mulit-page html swift user guide link for both release-0.91 and trunk and the click on any of the navigation links you get a page not found error. Also, the image at the top of the page seems to not be able to load. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Fri Apr 15 08:28:36 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 15 Apr 2011 08:28:36 -0500 (CDT) Subject: [Swift-devel] [Bug 357] Script hangs in staging on OSG In-Reply-To: References: Message-ID: <20110415132836.155401C073@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=357 --- Comment #6 from Michael Wilde 2011-04-15 08:28:35 --- This problem may be explained by the following: - each site requires a some large number of files tranferred in (60+) and some small number out (<4?) - some sites may hang on transfers, especially small and/or overloaded sites - we have only 4 transfer threads here - if all the transfer threads are hung on requests (eq socket operations) that hang, then all Swift data transfer after that point hangs. Ideally these operations should be run with a timer that enables the operation to be aborted and the transfer thread returned to use. EVen better, all socket operations should be select-driven and non-blocking. (I thought they were..) - Theory: one or more small overloaded sites - eg UMiss in the example of the first log filed in this ticket - are hanging all the transfer threads ==> Proposed temporary solution: (a) use more transfer threads: 16 or 32?; (b) possibly batch up the small files into a single tarball so that we use less threads per site and thus hung sites hang less threads; (c) avoid sites where we are seeing hangs. (d) create a script to analyze a current run's log and spot any hanging IO requests, identifying the files and sites involved. Use this to spot and remove hanging sites. (e) Mihael to improve Swift's robustness in this area by timeout out hung requests and causing the appropriate higher level of recovery to kick in. ==== Some messages from the related email thread on this bug are pasted below: ----- Forwarded Message ----- From: "Mihael Hategan" To: "Michael Wilde" Cc: "Allan Espinosa" , "Daniel Katz" , "Swift Devel" Sent: Thursday, April 14, 2011 8:51:16 PM Subject: Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG Well, that's barely hung unless the gridftp servers are hung, which may be. I would suggest upping the transfer throttle in this case. 4 may be cutting it too close. Maybe to 16. On Thu, 2011-04-14 at 19:45 -0500, Michael Wilde wrote: > So you have 4 transfer threads and all 4 are waiting here: > > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > > (from throttle.transfers=4) > > Is this workflow hung, and if so, how are you determining that? Do you have another log plot of stagein and out? > > - Mike > > ----- Forwarded Message ----- From: "Mihael Hategan" To: "Michael Wilde" Cc: "Allan Espinosa" , "Daniel Katz" , "Swift Devel" Sent: Thursday, April 14, 2011 8:53:10 PM Subject: Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG On Thu, 2011-04-14 at 20:11 -0500, Michael Wilde wrote: > ALlan, what I meant was: do you have any evidence that this current run is hung (either in a similar manner to the one we looked at closely this morning, or in a different manner)? > > In this mornings log, you could tell from plots of stagein and stageout events that many these events were not completing after something triggered an error. > > Do you have similar plots or evidence of hangs regarding this run and its log? > > I dont know from browsing the traces if one would *naturally* expect > the transfer threads to all be waiting on input sockets most of the > time, or if seeing all 4 threads waiting on sockets is indicative of > data transfer being totally hung. If nothing else happens in the log, then probably so. But the same could happen for very large files (or very slow servers). -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From wilde at mcs.anl.gov Fri Apr 15 08:31:31 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 15 Apr 2011 08:31:31 -0500 (CDT) Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: <1302832390.13385.4.camel@blabla2.none> Message-ID: <1422501356.106327.1302874291455.JavaMail.root@zimbra.anl.gov> I proposed the following in bugzilla (Dan, are you getting these? If so I wont forward any more and will assume that when interested you'll read the bugzilla discussions...) ----- Forwarded Message ----- From: bugzilla-daemon at mcs.anl.gov To: wilde at mcs.anl.gov Sent: Friday, April 15, 2011 8:28:36 AM Subject: [Bug 357] Script hangs in staging on OSG https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=357 --- Comment #6 from Michael Wilde 2011-04-15 08:28:35 --- This problem may be explained by the following: - each site requires a some large number of files tranferred in (60+) and some small number out (<4?) - some sites may hang on transfers, especially small and/or overloaded sites - we have only 4 transfer threads here - if all the transfer threads are hung on requests (eq socket operations) that hang, then all Swift data transfer after that point hangs. Ideally these operations should be run with a timer that enables the operation to be aborted and the transfer thread returned to use. EVen better, all socket operations should be select-driven and non-blocking. (I thought they were..) - Theory: one or more small overloaded sites - eg UMiss in the example of the first log filed in this ticket - are hanging all the transfer threads ==> Proposed temporary solution: (a) use more transfer threads: 16 or 32?; (b) possibly batch up the small files into a single tarball so that we use less threads per site and thus hung sites hang less threads; (c) avoid sites where we are seeing hangs. (d) create a script to analyze a current run's log and spot any hanging IO requests, identifying the files and sites involved. Use this to spot and remove hanging sites. (e) Mihael to improve Swift's robustness in this area by timeout out hung requests and causing the appropriate higher level of recovery to kick in. ==== Some messages from the related email thread on this bug are pasted below: ----- Forwarded Message ----- From: "Mihael Hategan" To: "Michael Wilde" Cc: "Allan Espinosa" , "Daniel Katz" , "Swift Devel" Sent: Thursday, April 14, 2011 8:51:16 PM Subject: Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG Well, that's barely hung unless the gridftp servers are hung, which may be. I would suggest upping the transfer throttle in this case. 4 may be cutting it too close. Maybe to 16. On Thu, 2011-04-14 at 19:45 -0500, Michael Wilde wrote: > So you have 4 transfer threads and all 4 are waiting here: > > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > > (from throttle.transfers=4) > > Is this workflow hung, and if so, how are you determining that? Do you have another log plot of stagein and out? > > - Mike > > ----- Forwarded Message ----- From: "Mihael Hategan" To: "Michael Wilde" Cc: "Allan Espinosa" , "Daniel Katz" , "Swift Devel" Sent: Thursday, April 14, 2011 8:53:10 PM Subject: Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG On Thu, 2011-04-14 at 20:11 -0500, Michael Wilde wrote: > ALlan, what I meant was: do you have any evidence that this current run is hung (either in a similar manner to the one we looked at closely this morning, or in a different manner)? > > In this mornings log, you could tell from plots of stagein and stageout events that many these events were not completing after something triggered an error. > > Do you have similar plots or evidence of hangs regarding this run and its log? > > I dont know from browsing the traces if one would *naturally* expect > the transfer threads to all be waiting on input sockets most of the > time, or if seeing all 4 threads waiting on sockets is indicative of > data transfer being totally hung. If nothing else happens in the log, then probably so. But the same could happen for very large files (or very slow servers). -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory ----- Original Message ----- > On Thu, 2011-04-14 at 20:11 -0500, Michael Wilde wrote: > > ALlan, what I meant was: do you have any evidence that this current > > run is hung (either in a similar manner to the one we looked at > > closely this morning, or in a different manner)? > > > > In this mornings log, you could tell from plots of stagein and > > stageout events that many these events were not completing after > > something triggered an error. > > > > Do you have similar plots or evidence of hangs regarding this run > > and its log? > > > > I dont know from browsing the traces if one would *naturally* expect > > the transfer threads to all be waiting on input sockets most of the > > time, or if seeing all 4 threads waiting on sockets is indicative of > > data transfer being totally hung. > > If nothing else happens in the log, then probably so. But the same > could > happen for very large files (or very slow servers). -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri Apr 15 08:44:51 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 15 Apr 2011 08:44:51 -0500 (CDT) Subject: [Swift-devel] Eliminate duplicate messages from bugzilla? Message-ID: <1619840190.106375.1302875091712.JavaMail.root@zimbra.anl.gov> Many of us are getting duplicate messages from bugzilla. Its sending to us directly, and again via swift-devel. I think we should use same approach as with swift-commit: subscribe to it of you're interested. Note that much technical discussion will now move to bugzilla threads from swift-devel. I will try to remove the swift-devel post from bugzilla. Please let me know if there is any reason not to do this. - Mike From wilde at mcs.anl.gov Fri Apr 15 08:51:01 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 15 Apr 2011 08:51:01 -0500 (CDT) Subject: [Swift-devel] Eliminate duplicate messages from bugzilla? In-Reply-To: <1619840190.106375.1302875091712.JavaMail.root@zimbra.anl.gov> Message-ID: <916535522.106391.1302875461152.JavaMail.root@zimbra.anl.gov> Ive left swift-devel as a bugzilla user but disabled email to it: The following changes have been made to the user account swift-devel at ci.uchicago.edu: Bugmail has been disabled. The disable text has been modified. I noticed in the process that there seem to be many junk accounts. I propose to clean them out if no one objects, and just leave real people on the list. - Mike ----- Original Message ----- > Many of us are getting duplicate messages from bugzilla. Its sending > to us directly, and again via swift-devel. > > I think we should use same approach as with swift-commit: subscribe to > it of you're interested. Note that much technical discussion will now > move to bugzilla threads from swift-devel. > > I will try to remove the swift-devel post from bugzilla. Please let me > know if there is any reason not to do this. > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri Apr 15 12:03:55 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 15 Apr 2011 12:03:55 -0500 (CDT) Subject: [Swift-devel] Updating web site with latest release number In-Reply-To: Message-ID: <1908269913.107552.1302887035392.JavaMail.root@zimbra.anl.gov> I updated the right-sidebar today to say 0.92.1 instead of 0.91. Still to do: update the doc page to say that the 0.91-UserGuide version is latest and will be updated in the future. - Mike ----- Forwarded Message ----- From: "Sarah Kenny" To: "Swift User" Sent: Wednesday, April 13, 2011 5:14:39 PM Subject: [Swift-user] Swift Release 0.92.1 ready for use Hi all, We are happy to announce that the latest version of Swift (containing a fix for the recently-discovered bug in version 0.92) is now available for download here: http://www.ci.uchicago.edu/swift/downloads/index.php Enjoy :) -The Swift Development Team _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Fri Apr 15 12:14:29 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 15 Apr 2011 12:14:29 -0500 Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: <1728872869.105665.1302829878754.JavaMail.root@zimbra.anl.gov> References: <1728872869.105665.1302829878754.JavaMail.root@zimbra.anl.gov> Message-ID: Hi Mike, here's the plot of the stagein events from yesterday's showing "almost" the same behavior. I attached the raw data and an svg version of the plot as well. (plot were produced from R instead of the regular swift-log-processing tools) Also looking at the transfers that did not finish (the ones with "START" in the last column on the raw data), I tested out two gridftp servers today: Ranger to top.ucr.edu $ globus-url-copy -vb gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projeca3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/218_260.txt.variation-s0000-h0000 gsiftp://top.ucr.edu/data/down/osg_data/engage/scec/testfile Source: gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/ Dest: gsiftp://top.ucr.edu/data/down/osg_data/engage/scec/ 218_260.txt.variation-s0000-h0000 -> testfile So this site looks like it's working alright. Here's the last activity about top.ucr.edu in the log: 2011-04-14 17:50:20,975-0500 DEBUG Reply read 1st line 2011-04-14 17:50:20,976-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_START srcname=PeakVals_TEST_219_258_32.bsa srcdir=po stproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 srchost=UCR-HEP__ top.ucr.edu destdir=/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 desthost=gridftp.ranger.tacc.tera grid.org provider=gsiftp 2011-04-14 17:50:20,978-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_START srcname=Seismogram_TEST_219_258_35.grm srcdir= postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 srchost=UCR-HEP __top.ucr.edu destdir=/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 desthost=gridftp.ranger.tacc.te ragrid.org provider=gsiftp ----comment----- I am not sure if these gsiftp ABORT calls are with the top.ucr.edu site ---comment------ 2011-04-14 17:50:21,006-0500 DEBUG Reply 1st line: 350 OK. Send RNTO with destination name. 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel received: 350 OK. Send RNTO with destination nam e. 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel sending: ABOR 2011-04-14 17:50:21,006-0500 DEBUG Reply read 1st line 2011-04-14 17:50:21,006-0500 DEBUG Reply 1st line: 350 OK. Send RNTO with destination name. 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel received: 350 OK. Send RNTO with destination nam e. 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel sending: ABOR 2011-04-14 17:50:21,006-0500 DEBUG Reply read 1st line 2011-04-14 17:50:21,068-0500 DEBUG Reply 1st line: 226 Abort successful [aespinosa at communicado shared]$ MB/sec avg 2.39 MB/sec inst Ranger to Clemson: $ globus-url-copy -vb gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/218_260.txt.variation-s0000-h0000 gsiftp://osg-gw.clemson.edu/common1/osg/data/engage/scec/testfile Source: gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/ Dest: gsiftp://osg-gw.clemson.edu/common1/osg/data/engage/scec/ 218_260.txt.variation-s0000-h0000 -> testfile error: globus_xio: Unable to connect to osg-gw.clemson.edu:2811 globus_xio: System error in connect: Connection refused globus_xio: A system call failed: Connection refused Looks like the Swift log gave an exception for the clemson resource: 2011-04-14 17:17:16,198-0500 DEBUG Reply end reached 2011-04-14 17:17:16,198-0500 DEBUG FTPControlChannel Control channel received: 500-Command failed. : globus_gridftp_s erver_file.c:globus_l_gfs_file_send:2190: 500-globus_l_gfs_file_open failed. 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: 500-globus_xio_register_open failed. 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: 500-Unable to open file /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01 035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: 500-System error in open: No such file or directory 500-globus_xio: A system call failed: No such file or directory 500 End. 2011-04-14 17:17:16,198-0500 DEBUG Local first reply bad: 500-Command failed. : globus_gridftp_server_file.c:globus_l _gfs_file_send:2190: 500-globus_l_gfs_file_open failed. 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: 500-globus_xio_register_open failed. 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: 500-Unable to open file /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01 035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: 500-System error in open: No such file or directory 500-globus_xio: A system call failed: No such file or directory 500 End. 2011-04-14 17:17:16,198-0500 DEBUG Local category: 5 2011-04-14 17:17:16,199-0500 DEBUG FTPControlChannel slept 200 2011-04-14 17:17:16,202-0500 DEBUG TransferState intercepted exception org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed. : globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: 500-globus_l_gfs_file_open failed. 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: 500-globus_xio_register_open failed. 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: 500-Unable to open file /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: 500-System error in open: No such file or directory 500-globus_xio: A system call failed: No such file or directory 500 End.]. Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Unexpected reply: 500-Command failed. : globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: 500-globus_l_gfs_file_open failed. 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: 500-globus_xio_register_open failed. 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: 500-Unable to open file /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: 500-System error in open: No such file or directory 500-globus_xio: A system call failed: No such file or directory 500 End. at org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) at org.globus.ftp.vanilla.TransferMonitor.start(TransferMonitor.java:109) at org.globus.ftp.FTPClient.transferRunSingleThread(FTPClient.java:1479) at org.globus.ftp.FTPClient.transfer(FTPClient.java:1378) at org.globus.io.urlcopy.UrlCopy.thirdPartyTransfer(UrlCopy.java:739) at org.globus.io.urlcopy.UrlCopy.copy(UrlCopy.java:493) at org.globus.io.urlcopy.UrlCopy.run(UrlCopy.java:445) at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doThirdPartyTransfer(DelegatedFileTransferHandler.java:571) at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTransferHandler.java:465) I expected the Swift terminal session to report a "Failed" job, but as it runs now it looks like this: Progress: Stage in:79 Stage out:232 Finished successfully:130 Progress: Stage in:79 Stage out:232 Finished successfully:130 Progress: Stage in:79 Stage out:232 Finished successfully:130 Progress: Stage in:79 Stage out:232 Finished successfully:130 Progress: Stage in:79 Stage out:232 Finished successfully:130 Btw, i have lazy.errors=true 2011/4/14 Michael Wilde : > ALlan, what I meant was: do you have any evidence that this current run is hung (either in a similar manner to the one we looked at closely this morning, or in a different manner)? > > In this mornings log, you could tell from plots of stagein and stageout events that many these events were not completing after something triggered an error. > > Do you have similar plots or evidence of hangs regarding this run and its log? > > I dont know from browsing the traces if one would *naturally* expect the transfer threads to all be waiting on input sockets most of the time, or if seeing all 4 threads waiting on sockets is indicative of data transfer being totally hung. > > Mihael, I assume you can tell much more from these traces? > > - Mike > > > ----- Original Message ----- >> Right now the logs only gives out messages about >> AbstractKarajanStreamChannel. I set the org.globus.ftp package's >> logging level to DEBUG, so entries should be reflected if there are >> transfers being made. >> >> -Allan >> >> 2011/4/14 Michael Wilde : >> > So you have 4 transfer threads and all 4 are waiting here: >> > >> > at java.net.SocketInputStream.socketRead0(Native Method) >> > ? ? ? ?at >> > ? ? ? ?java.net.SocketInputStream.read(SocketInputStream.java:129) >> > >> > (from throttle.transfers=4) >> > >> > Is this workflow hung, and if so, how are you determining that? Do >> > you have another log plot of stagein and out? >> > >> > - Mike >> > >> > >> > ----- Original Message ----- >> >> Fresh traces (jstack and log) in >> >> /home/aespinosa/workflows/cybershake/archive-runs/transfer-logging >> >> . >> >> The swift log is a snapshot of the workflow that is still running. >> >> >> >> -Allan >> >> >> >> 2011/4/14 Mihael Hategan : >> >> > One immediate question that I have is what's up with the deadline >> >> > passed >> >> > messages? >> >> > >> >> > That happens when jobs run for at least twice their advertised >> >> > walltime >> >> > and for some reason the site doesn't seem to cancel them. This >> >> > may >> >> > be >> >> > indicative of notifications getting lost. >> >> > >> >> > As for the transfers, I don't see all transfers hanging after >> >> > that. >> >> > I >> >> > mean there are transfers that complete ok. Though things do seem >> >> > to >> >> > slow >> >> > down quite a bit, so that looks like a problem. >> >> > >> >> > Let's see what in the stack traces. In the mean time, I will see >> >> > what it >> >> > takes to get transfer progress messages. >> >> > >> >> > Mihael >> >> > >> >> > >> >> > On Thu, 2011-04-14 at 17:28 -0500, Michael Wilde wrote: >> >> >> bri$ pwd >> >> >> /home/aespinosa/workflows/cybershake/archive-runs/test >> >> >> bri$ ls -lt >> >> >> total 1844128 >> >> >> -rw-r--r-- 1 aespinosa ci-users 0 Apr 14 14:21 max-duration.tmp >> >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:20 start-time.tmp >> >> >> -rw-r--r-- 1 aespinosa ci-users 1433206 Apr 14 14:20 >> >> >> stagein.event >> >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 >> >> >> sort-preserve2.tmp >> >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 >> >> >> sort-preserve.tmp >> >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:19 t.inf >> >> >> -rw-r--r-- 1 aespinosa ci-users 2263727 Apr 14 12:51 >> >> >> stagein.transition >> >> >> -rw-r--r-- 1 aespinosa ci-users 8998897 Apr 14 12:31 stagein.log >> >> >> -rw-r--r-- 1 aespinosa ci-users 92059 Apr 14 12:05 >> >> >> dostageout.event >> >> >> -rw-r--r-- 1 aespinosa ci-users 97442 Apr 14 11:51 >> >> >> dostagein.event >> >> >> -rw-r--r-- 1 aespinosa ci-users 2998 Apr 13 17:38 >> >> >> dostagein.sorted-start.png >> >> >> -rw-r--r-- 1 aespinosa ci-users 3080 Apr 13 17:38 >> >> >> dostageout.sorted-start.png >> >> >> -rw-r--r-- 1 aespinosa ci-users 3255 Apr 8 16:05 >> >> >> execute2-total.png >> >> >> -rw-r--r-- 1 aespinosa ci-users 1533974 Apr 8 14:46 >> >> >> postproc-20110407-1438-i90jepr3.0.rlog >> >> >> -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 >> >> >> postproc-20110407-1438-i90jepr3.log >> >> >> drwxr-xr-x 2 aespinosa ci-users 32768 Apr 7 14:39 >> >> >> postproc-20110407-1438-i90jepr3.d/ >> >> >> bri$ >> >> >> >> >> >> runs, not run >> >> >> >> >> >> ALso see bridled: /tmp/mw1 >> >> >> >> >> >> ----- Original Message ----- >> >> >> > [hategan at bridled tmp]$ cd >> >> >> > ~aespinosa/workflows/cybershake/archive-run/test/ >> >> >> > -bash: cd: >> >> >> > /home/aespinosa/workflows/cybershake/archive-run/test/: No >> >> >> > such file or directory >> >> >> > >> >> >> > On Thu, 2011-04-14 at 17:21 -0500, Allan Espinosa wrote: >> >> >> > > ~aespinosa/workflows/cybershake/archive-run/test/postproc*.log >> >> >> > > >> >> >> > > 2011/4/14 Mihael Hategan : >> >> >> > > > On Thu, 2011-04-14 at 15:57 -0500, Michael Wilde wrote: >> >> >> > > >> While Allan continues to debug this, can you take a look >> >> >> > > >> at >> >> >> > > >> the >> >> >> > > >> (huge) log? >> >> >> > > > >> >> >> > > > Where is this log? >> >> >> > > > >> >> >> > > > >> > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago -------------- next part -------------- A non-text attachment was scrubbed... Name: stageinfile.png Type: image/png Size: 3788 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: stageinfile.svg.bz2 Type: application/x-bzip2 Size: 15786 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: stageinfile.event.bz2 Type: application/x-bzip2 Size: 26018 bytes Desc: not available URL: From hategan at mcs.anl.gov Fri Apr 15 14:22:00 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 15 Apr 2011 12:22:00 -0700 Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: <1422501356.106327.1302874291455.JavaMail.root@zimbra.anl.gov> References: <1422501356.106327.1302874291455.JavaMail.root@zimbra.anl.gov> Message-ID: <1302895320.24852.2.camel@blabla2.none> Sadly, the gridftp client does not use NIO. But besides that, I don't know what the correct solution should be. Perhaps there should be limits on transfer threads to a single site and then the global transfer throttle should be larger. On Fri, 2011-04-15 at 08:31 -0500, Michael Wilde wrote: > I proposed the following in bugzilla (Dan, are you getting these? If so I wont forward any more and will assume that when interested you'll read the bugzilla discussions...) > > ----- Forwarded Message ----- > From: bugzilla-daemon at mcs.anl.gov > To: wilde at mcs.anl.gov > Sent: Friday, April 15, 2011 8:28:36 AM > Subject: [Bug 357] Script hangs in staging on OSG > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=357 > > --- Comment #6 from Michael Wilde 2011-04-15 08:28:35 --- > This problem may be explained by the following: > > - each site requires a some large number of files tranferred in (60+) and some > small number out (<4?) > > - some sites may hang on transfers, especially small and/or overloaded sites > > - we have only 4 transfer threads here > > - if all the transfer threads are hung on requests (eq socket operations) that > hang, then all Swift data transfer after that point hangs. Ideally these > operations should be run with a timer that enables the operation to be aborted > and the transfer thread returned to use. EVen better, all socket operations > should be select-driven and non-blocking. (I thought they were..) > > - Theory: one or more small overloaded sites - eg UMiss in the example of the > first log filed in this ticket - are hanging all the transfer threads > > ==> Proposed temporary solution: (a) use more transfer threads: 16 or 32?; (b) > possibly batch up the small files into a single tarball so that we use less > threads per site and thus hung sites hang less threads; (c) avoid sites where > we are seeing hangs. (d) create a script to analyze a current run's log and > spot any hanging IO requests, identifying the files and sites involved. Use > this to spot and remove hanging sites. (e) Mihael to improve Swift's > robustness in this area by timeout out hung requests and causing the > appropriate higher level of recovery to kick in. > > > ==== > > Some messages from the related email thread on this bug are pasted below: > > ----- Forwarded Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Allan Espinosa" , "Daniel Katz" > , "Swift Devel" > Sent: Thursday, April 14, 2011 8:51:16 PM > Subject: Re: Please review and advise on: Bug 357 - Script hangs in staging on > OSG > > Well, that's barely hung unless the gridftp servers are hung, which may > be. > > I would suggest upping the transfer throttle in this case. 4 may be > cutting it too close. Maybe to 16. > > On Thu, 2011-04-14 at 19:45 -0500, Michael Wilde wrote: > > So you have 4 transfer threads and all 4 are waiting here: > > > > at java.net.SocketInputStream.socketRead0(Native Method) > > at java.net.SocketInputStream.read(SocketInputStream.java:129) > > > > (from throttle.transfers=4) > > > > Is this workflow hung, and if so, how are you determining that? Do you have another log plot of stagein and out? > > > > - Mike > > > > > > ----- Forwarded Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Allan Espinosa" , "Daniel Katz" > , "Swift Devel" > Sent: Thursday, April 14, 2011 8:53:10 PM > Subject: Re: Please review and advise on: Bug 357 - Script hangs in staging on > OSG > > On Thu, 2011-04-14 at 20:11 -0500, Michael Wilde wrote: > > ALlan, what I meant was: do you have any evidence that this current run is hung (either in a similar manner to the one we looked at closely this morning, or in a different manner)? > > > > In this mornings log, you could tell from plots of stagein and stageout events that many these events were not completing after something triggered an error. > > > > Do you have similar plots or evidence of hangs regarding this run and its log? > > > > I dont know from browsing the traces if one would *naturally* expect > > the transfer threads to all be waiting on input sockets most of the > > time, or if seeing all 4 threads waiting on sockets is indicative of > > data transfer being totally hung. > > If nothing else happens in the log, then probably so. But the same could > happen for very large files (or very slow servers). > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- > Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug. > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > ----- Original Message ----- > > On Thu, 2011-04-14 at 20:11 -0500, Michael Wilde wrote: > > > ALlan, what I meant was: do you have any evidence that this current > > > run is hung (either in a similar manner to the one we looked at > > > closely this morning, or in a different manner)? > > > > > > In this mornings log, you could tell from plots of stagein and > > > stageout events that many these events were not completing after > > > something triggered an error. > > > > > > Do you have similar plots or evidence of hangs regarding this run > > > and its log? > > > > > > I dont know from browsing the traces if one would *naturally* expect > > > the transfer threads to all be waiting on input sockets most of the > > > time, or if seeing all 4 threads waiting on sockets is indicative of > > > data transfer being totally hung. > > > > If nothing else happens in the log, then probably so. But the same > > could > > happen for very large files (or very slow servers). > From hategan at mcs.anl.gov Fri Apr 15 14:22:58 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 15 Apr 2011 12:22:58 -0700 Subject: [Swift-devel] Eliminate duplicate messages from bugzilla? In-Reply-To: <1619840190.106375.1302875091712.JavaMail.root@zimbra.anl.gov> References: <1619840190.106375.1302875091712.JavaMail.root@zimbra.anl.gov> Message-ID: <1302895378.24852.3.camel@blabla2.none> On Fri, 2011-04-15 at 08:44 -0500, Michael Wilde wrote: > Many of us are getting duplicate messages from bugzilla. Its sending to us directly, and again via swift-devel. > > I think we should use same approach as with swift-commit: subscribe to it of you're interested. Note that much technical discussion will now move to bugzilla threads from swift-devel. > > I will try to remove the swift-devel post from bugzilla. Please let me know if there is any reason not to do this. The only reason not to do that is that we'd go back to not doing this, which is the point we started from when we decided we should do this. From hategan at mcs.anl.gov Fri Apr 15 14:24:38 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 15 Apr 2011 12:24:38 -0700 Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: References: <1728872869.105665.1302829878754.JavaMail.root@zimbra.anl.gov> Message-ID: <1302895478.24852.4.camel@blabla2.none> Individual transfers are retried up to three times. Only if all 3 fail the job fails. On Fri, 2011-04-15 at 12:14 -0500, Allan Espinosa wrote: > Hi Mike, > > here's the plot of the stagein events from yesterday's showing > "almost" the same behavior. I attached the raw data and an svg > version of the plot as well. (plot were produced from R instead of > the regular swift-log-processing tools) > > Also looking at the transfers that did not finish (the ones with > "START" in the last column on the raw data), I tested out two gridftp > servers today: > > Ranger to top.ucr.edu > $ globus-url-copy -vb > gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projeca3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/218_260.txt.variation-s0000-h0000 > gsiftp://top.ucr.edu/data/down/osg_data/engage/scec/testfile > Source: gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/ > Dest: gsiftp://top.ucr.edu/data/down/osg_data/engage/scec/ > 218_260.txt.variation-s0000-h0000 -> testfile > > So this site looks like it's working alright. Here's the last > activity about top.ucr.edu in the log: > 2011-04-14 17:50:20,975-0500 DEBUG Reply read 1st line > 2011-04-14 17:50:20,976-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_START > srcname=PeakVals_TEST_219_258_32.bsa srcdir=po > stproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 > srchost=UCR-HEP__ > top.ucr.edu destdir=/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 > desthost=gridftp.ranger.tacc.tera > grid.org provider=gsiftp > 2011-04-14 17:50:20,978-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_START > srcname=Seismogram_TEST_219_258_35.grm srcdir= > postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 > srchost=UCR-HEP > __top.ucr.edu destdir=/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 > desthost=gridftp.ranger.tacc.te > ragrid.org provider=gsiftp > > ----comment----- > I am not sure if these gsiftp ABORT calls are with the top.ucr.edu site > ---comment------ > > 2011-04-14 17:50:21,006-0500 DEBUG Reply 1st line: 350 OK. Send RNTO > with destination name. > 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel > received: 350 OK. Send RNTO with destination nam > e. > 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel > sending: ABOR > > 2011-04-14 17:50:21,006-0500 DEBUG Reply read 1st line > 2011-04-14 17:50:21,006-0500 DEBUG Reply 1st line: 350 OK. Send RNTO > with destination name. > 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel > received: 350 OK. Send RNTO with destination nam > e. > 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel > sending: ABOR > > 2011-04-14 17:50:21,006-0500 DEBUG Reply read 1st line > 2011-04-14 17:50:21,068-0500 DEBUG Reply 1st line: 226 Abort successful > > > > [aespinosa at communicado shared]$ MB/sec avg 2.39 MB/sec inst > > Ranger to Clemson: > $ globus-url-copy -vb > gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/218_260.txt.variation-s0000-h0000 > gsiftp://osg-gw.clemson.edu/common1/osg/data/engage/scec/testfile > Source: gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/ > Dest: gsiftp://osg-gw.clemson.edu/common1/osg/data/engage/scec/ > 218_260.txt.variation-s0000-h0000 -> testfile > > > error: globus_xio: Unable to connect to osg-gw.clemson.edu:2811 > globus_xio: System error in connect: Connection refused > globus_xio: A system call failed: Connection refused > > > Looks like the Swift log gave an exception for the clemson resource: > 2011-04-14 17:17:16,198-0500 DEBUG Reply end reached > 2011-04-14 17:17:16,198-0500 DEBUG FTPControlChannel Control channel > received: 500-Command failed. : globus_gridftp_s > erver_file.c:globus_l_gfs_file_send:2190: > 500-globus_l_gfs_file_open failed. > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > 500-globus_xio_register_open failed. > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > 500-Unable to open file > /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01 > 035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > 500-System error in open: No such file or directory > 500-globus_xio: A system call failed: No such file or directory > 500 End. > 2011-04-14 17:17:16,198-0500 DEBUG Local first reply bad: 500-Command > failed. : globus_gridftp_server_file.c:globus_l > _gfs_file_send:2190: > 500-globus_l_gfs_file_open failed. > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > 500-globus_xio_register_open failed. > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > 500-Unable to open file > /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01 > 035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > 500-System error in open: No such file or directory > 500-globus_xio: A system call failed: No such file or directory > 500 End. > 2011-04-14 17:17:16,198-0500 DEBUG Local category: 5 > 2011-04-14 17:17:16,199-0500 DEBUG FTPControlChannel slept 200 > 2011-04-14 17:17:16,202-0500 DEBUG TransferState intercepted exception > org.globus.ftp.exception.ServerException: Server refused performing > the request. Custom message: (error code 1) [Nested exception > message: Custom message: Unexpected reply: 500-Command failed. : > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > 500-globus_l_gfs_file_open failed. > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > 500-globus_xio_register_open failed. > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > 500-Unable to open file > /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > 500-System error in open: No such file or directory > 500-globus_xio: A system call failed: No such file or directory > 500 End.]. Nested exception is > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > message: Unexpected reply: 500-Command failed. : > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > 500-globus_l_gfs_file_open failed. > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > 500-globus_xio_register_open failed. > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > 500-Unable to open file > /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > 500-System error in open: No such file or directory > 500-globus_xio: A system call failed: No such file or directory > 500 End. > at org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > at org.globus.ftp.vanilla.TransferMonitor.start(TransferMonitor.java:109) > at org.globus.ftp.FTPClient.transferRunSingleThread(FTPClient.java:1479) > at org.globus.ftp.FTPClient.transfer(FTPClient.java:1378) > at org.globus.io.urlcopy.UrlCopy.thirdPartyTransfer(UrlCopy.java:739) > at org.globus.io.urlcopy.UrlCopy.copy(UrlCopy.java:493) > at org.globus.io.urlcopy.UrlCopy.run(UrlCopy.java:445) > at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doThirdPartyTransfer(DelegatedFileTransferHandler.java:571) > at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTransferHandler.java:465) > > > I expected the Swift terminal session to report a "Failed" job, but as > it runs now it looks like this: > > Progress: Stage in:79 Stage out:232 Finished successfully:130 > Progress: Stage in:79 Stage out:232 Finished successfully:130 > Progress: Stage in:79 Stage out:232 Finished successfully:130 > Progress: Stage in:79 Stage out:232 Finished successfully:130 > Progress: Stage in:79 Stage out:232 Finished successfully:130 > > Btw, i have lazy.errors=true > > 2011/4/14 Michael Wilde : > > ALlan, what I meant was: do you have any evidence that this current run is hung (either in a similar manner to the one we looked at closely this morning, or in a different manner)? > > > > In this mornings log, you could tell from plots of stagein and stageout events that many these events were not completing after something triggered an error. > > > > Do you have similar plots or evidence of hangs regarding this run and its log? > > > > I dont know from browsing the traces if one would *naturally* expect the transfer threads to all be waiting on input sockets most of the time, or if seeing all 4 threads waiting on sockets is indicative of data transfer being totally hung. > > > > Mihael, I assume you can tell much more from these traces? > > > > - Mike > > > > > > ----- Original Message ----- > >> Right now the logs only gives out messages about > >> AbstractKarajanStreamChannel. I set the org.globus.ftp package's > >> logging level to DEBUG, so entries should be reflected if there are > >> transfers being made. > >> > >> -Allan > >> > >> 2011/4/14 Michael Wilde : > >> > So you have 4 transfer threads and all 4 are waiting here: > >> > > >> > at java.net.SocketInputStream.socketRead0(Native Method) > >> > at > >> > java.net.SocketInputStream.read(SocketInputStream.java:129) > >> > > >> > (from throttle.transfers=4) > >> > > >> > Is this workflow hung, and if so, how are you determining that? Do > >> > you have another log plot of stagein and out? > >> > > >> > - Mike > >> > > >> > > >> > ----- Original Message ----- > >> >> Fresh traces (jstack and log) in > >> >> /home/aespinosa/workflows/cybershake/archive-runs/transfer-logging > >> >> . > >> >> The swift log is a snapshot of the workflow that is still running. > >> >> > >> >> -Allan > >> >> > >> >> 2011/4/14 Mihael Hategan : > >> >> > One immediate question that I have is what's up with the deadline > >> >> > passed > >> >> > messages? > >> >> > > >> >> > That happens when jobs run for at least twice their advertised > >> >> > walltime > >> >> > and for some reason the site doesn't seem to cancel them. This > >> >> > may > >> >> > be > >> >> > indicative of notifications getting lost. > >> >> > > >> >> > As for the transfers, I don't see all transfers hanging after > >> >> > that. > >> >> > I > >> >> > mean there are transfers that complete ok. Though things do seem > >> >> > to > >> >> > slow > >> >> > down quite a bit, so that looks like a problem. > >> >> > > >> >> > Let's see what in the stack traces. In the mean time, I will see > >> >> > what it > >> >> > takes to get transfer progress messages. > >> >> > > >> >> > Mihael > >> >> > > >> >> > > >> >> > On Thu, 2011-04-14 at 17:28 -0500, Michael Wilde wrote: > >> >> >> bri$ pwd > >> >> >> /home/aespinosa/workflows/cybershake/archive-runs/test > >> >> >> bri$ ls -lt > >> >> >> total 1844128 > >> >> >> -rw-r--r-- 1 aespinosa ci-users 0 Apr 14 14:21 max-duration.tmp > >> >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:20 start-time.tmp > >> >> >> -rw-r--r-- 1 aespinosa ci-users 1433206 Apr 14 14:20 > >> >> >> stagein.event > >> >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > >> >> >> sort-preserve2.tmp > >> >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > >> >> >> sort-preserve.tmp > >> >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:19 t.inf > >> >> >> -rw-r--r-- 1 aespinosa ci-users 2263727 Apr 14 12:51 > >> >> >> stagein.transition > >> >> >> -rw-r--r-- 1 aespinosa ci-users 8998897 Apr 14 12:31 stagein.log > >> >> >> -rw-r--r-- 1 aespinosa ci-users 92059 Apr 14 12:05 > >> >> >> dostageout.event > >> >> >> -rw-r--r-- 1 aespinosa ci-users 97442 Apr 14 11:51 > >> >> >> dostagein.event > >> >> >> -rw-r--r-- 1 aespinosa ci-users 2998 Apr 13 17:38 > >> >> >> dostagein.sorted-start.png > >> >> >> -rw-r--r-- 1 aespinosa ci-users 3080 Apr 13 17:38 > >> >> >> dostageout.sorted-start.png > >> >> >> -rw-r--r-- 1 aespinosa ci-users 3255 Apr 8 16:05 > >> >> >> execute2-total.png > >> >> >> -rw-r--r-- 1 aespinosa ci-users 1533974 Apr 8 14:46 > >> >> >> postproc-20110407-1438-i90jepr3.0.rlog > >> >> >> -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 > >> >> >> postproc-20110407-1438-i90jepr3.log > >> >> >> drwxr-xr-x 2 aespinosa ci-users 32768 Apr 7 14:39 > >> >> >> postproc-20110407-1438-i90jepr3.d/ > >> >> >> bri$ > >> >> >> > >> >> >> runs, not run > >> >> >> > >> >> >> ALso see bridled: /tmp/mw1 > >> >> >> > >> >> >> ----- Original Message ----- > >> >> >> > [hategan at bridled tmp]$ cd > >> >> >> > ~aespinosa/workflows/cybershake/archive-run/test/ > >> >> >> > -bash: cd: > >> >> >> > /home/aespinosa/workflows/cybershake/archive-run/test/: No > >> >> >> > such file or directory > >> >> >> > > >> >> >> > On Thu, 2011-04-14 at 17:21 -0500, Allan Espinosa wrote: > >> >> >> > > ~aespinosa/workflows/cybershake/archive-run/test/postproc*.log > >> >> >> > > > >> >> >> > > 2011/4/14 Mihael Hategan : > >> >> >> > > > On Thu, 2011-04-14 at 15:57 -0500, Michael Wilde wrote: > >> >> >> > > >> While Allan continues to debug this, can you take a look > >> >> >> > > >> at > >> >> >> > > >> the > >> >> >> > > >> (huge) log? > >> >> >> > > > > >> >> >> > > > Where is this log? > >> >> >> > > > > >> >> >> > > > > >> > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > From aespinosa at cs.uchicago.edu Fri Apr 15 14:28:45 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 15 Apr 2011 14:28:45 -0500 Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: <1302895478.24852.4.camel@blabla2.none> References: <1728872869.105665.1302829878754.JavaMail.root@zimbra.anl.gov> <1302895478.24852.4.camel@blabla2.none> Message-ID: But then it should be registered into the "Failed jobs" counter right? 2011/4/15 Mihael Hategan : > Individual transfers are retried up to three times. Only if all 3 fail > the job fails. > > On Fri, 2011-04-15 at 12:14 -0500, Allan Espinosa wrote: >> Hi Mike, >> >> here's the plot of the stagein events from yesterday's showing >> "almost" the same behavior. ?I attached the raw data and an svg >> version of the plot as well. ?(plot were produced from R instead of >> the regular swift-log-processing tools) >> >> Also looking at the transfers that did not finish (the ones with >> "START" in the last column on the raw data), I tested out two gridftp >> servers today: >> >> Ranger to top.ucr.edu >> $ globus-url-copy -vb >> gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projeca3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/218_260.txt.variation-s0000-h0000 >> gsiftp://top.ucr.edu/data/down/osg_data/engage/scec/testfile >> Source: gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/ >> Dest: ? gsiftp://top.ucr.edu/data/down/osg_data/engage/scec/ >> ? 218_260.txt.variation-s0000-h0000 ?-> ?testfile >> >> So this site looks like it's working alright. ?Here's the last >> activity about top.ucr.edu in the log: >> 2011-04-14 17:50:20,975-0500 DEBUG Reply read 1st line >> 2011-04-14 17:50:20,976-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_START >> srcname=PeakVals_TEST_219_258_32.bsa srcdir=po >> stproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 >> srchost=UCR-HEP__ >> top.ucr.edu destdir=/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 >> desthost=gridftp.ranger.tacc.tera >> grid.org provider=gsiftp >> 2011-04-14 17:50:20,978-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_START >> srcname=Seismogram_TEST_219_258_35.grm srcdir= >> postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 >> srchost=UCR-HEP >> __top.ucr.edu destdir=/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 >> desthost=gridftp.ranger.tacc.te >> ragrid.org provider=gsiftp >> >> ----comment----- >> I am not sure if these gsiftp ABORT calls are with the top.ucr.edu site >> ---comment------ >> >> 2011-04-14 17:50:21,006-0500 DEBUG Reply 1st line: 350 OK. Send RNTO >> with destination name. >> 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel >> received: 350 OK. Send RNTO with destination nam >> e. >> 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel >> sending: ABOR >> >> 2011-04-14 17:50:21,006-0500 DEBUG Reply read 1st line >> 2011-04-14 17:50:21,006-0500 DEBUG Reply 1st line: 350 OK. Send RNTO >> with destination name. >> 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel >> received: 350 OK. Send RNTO with destination nam >> e. >> 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel >> sending: ABOR >> >> 2011-04-14 17:50:21,006-0500 DEBUG Reply read 1st line >> 2011-04-14 17:50:21,068-0500 DEBUG Reply 1st line: 226 Abort successful >> >> >> >> [aespinosa at communicado shared]$ ?MB/sec avg ? ? ? ? 2.39 MB/sec inst >> >> Ranger to Clemson: >> $ globus-url-copy -vb >> gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/218_260.txt.variation-s0000-h0000 >> gsiftp://osg-gw.clemson.edu/common1/osg/data/engage/scec/testfile >> Source: gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/ >> Dest: ? gsiftp://osg-gw.clemson.edu/common1/osg/data/engage/scec/ >> ? 218_260.txt.variation-s0000-h0000 ?-> ?testfile >> >> >> error: globus_xio: Unable to connect to osg-gw.clemson.edu:2811 >> globus_xio: System error in connect: Connection refused >> globus_xio: A system call failed: Connection refused >> >> >> Looks like the Swift log gave an exception for the clemson resource: >> 2011-04-14 17:17:16,198-0500 DEBUG Reply end reached >> 2011-04-14 17:17:16,198-0500 DEBUG FTPControlChannel Control channel >> received: 500-Command failed. : globus_gridftp_s >> erver_file.c:globus_l_gfs_file_send:2190: >> 500-globus_l_gfs_file_open failed. >> 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: >> 500-globus_xio_register_open failed. >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: >> 500-Unable to open file >> /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01 >> 035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: >> 500-System error in open: No such file or directory >> 500-globus_xio: A system call failed: No such file or directory >> 500 End. >> 2011-04-14 17:17:16,198-0500 DEBUG Local first reply bad: 500-Command >> failed. : globus_gridftp_server_file.c:globus_l >> _gfs_file_send:2190: >> 500-globus_l_gfs_file_open failed. >> 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: >> 500-globus_xio_register_open failed. >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: >> 500-Unable to open file >> /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01 >> 035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: >> 500-System error in open: No such file or directory >> 500-globus_xio: A system call failed: No such file or directory >> 500 End. >> 2011-04-14 17:17:16,198-0500 DEBUG Local category: 5 >> 2011-04-14 17:17:16,199-0500 DEBUG FTPControlChannel slept 200 >> 2011-04-14 17:17:16,202-0500 DEBUG TransferState intercepted exception >> org.globus.ftp.exception.ServerException: Server refused performing >> the request. Custom message: ?(error code 1) [Nested exception >> message: ?Custom message: Unexpected reply: 500-Command failed. : >> globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: >> 500-globus_l_gfs_file_open failed. >> 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: >> 500-globus_xio_register_open failed. >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: >> 500-Unable to open file >> /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: >> 500-System error in open: No such file or directory >> 500-globus_xio: A system call failed: No such file or directory >> 500 End.]. ?Nested exception is >> org.globus.ftp.exception.UnexpectedReplyCodeException: ?Custom >> message: Unexpected reply: 500-Command failed. : >> globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: >> 500-globus_l_gfs_file_open failed. >> 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: >> 500-globus_xio_register_open failed. >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: >> 500-Unable to open file >> /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: >> 500-System error in open: No such file or directory >> 500-globus_xio: A system call failed: No such file or directory >> 500 End. >> ? ? ? ? at org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) >> ? ? ? ? at org.globus.ftp.vanilla.TransferMonitor.start(TransferMonitor.java:109) >> ? ? ? ? at org.globus.ftp.FTPClient.transferRunSingleThread(FTPClient.java:1479) >> ? ? ? ? at org.globus.ftp.FTPClient.transfer(FTPClient.java:1378) >> ? ? ? ? at org.globus.io.urlcopy.UrlCopy.thirdPartyTransfer(UrlCopy.java:739) >> ? ? ? ? at org.globus.io.urlcopy.UrlCopy.copy(UrlCopy.java:493) >> ? ? ? ? at org.globus.io.urlcopy.UrlCopy.run(UrlCopy.java:445) >> ? ? ? ? at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doThirdPartyTransfer(DelegatedFileTransferHandler.java:571) >> ? ? ? ? at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTransferHandler.java:465) >> >> >> I expected the Swift terminal session to report a "Failed" job, but as >> it runs now it looks like this: >> >> Progress: ?Stage in:79 ?Stage out:232 ?Finished successfully:130 >> Progress: ?Stage in:79 ?Stage out:232 ?Finished successfully:130 >> Progress: ?Stage in:79 ?Stage out:232 ?Finished successfully:130 >> Progress: ?Stage in:79 ?Stage out:232 ?Finished successfully:130 >> Progress: ?Stage in:79 ?Stage out:232 ?Finished successfully:130 >> >> Btw, i have lazy.errors=true >> >> 2011/4/14 Michael Wilde : >> > ALlan, what I meant was: do you have any evidence that this current run is hung (either in a similar manner to the one we looked at closely this morning, or in a different manner)? >> > >> > In this mornings log, you could tell from plots of stagein and stageout events that many these events were not completing after something triggered an error. >> > >> > Do you have similar plots or evidence of hangs regarding this run and its log? >> > >> > I dont know from browsing the traces if one would *naturally* expect the transfer threads to all be waiting on input sockets most of the time, or if seeing all 4 threads waiting on sockets is indicative of data transfer being totally hung. >> > >> > Mihael, I assume you can tell much more from these traces? >> > >> > - Mike >> > >> > >> > ----- Original Message ----- >> >> Right now the logs only gives out messages about >> >> AbstractKarajanStreamChannel. I set the org.globus.ftp package's >> >> logging level to DEBUG, so entries should be reflected if there are >> >> transfers being made. >> >> >> >> -Allan >> >> >> >> 2011/4/14 Michael Wilde : >> >> > So you have 4 transfer threads and all 4 are waiting here: >> >> > >> >> > at java.net.SocketInputStream.socketRead0(Native Method) >> >> > ? ? ? ?at >> >> > ? ? ? ?java.net.SocketInputStream.read(SocketInputStream.java:129) >> >> > >> >> > (from throttle.transfers=4) >> >> > >> >> > Is this workflow hung, and if so, how are you determining that? Do >> >> > you have another log plot of stagein and out? >> >> > >> >> > - Mike >> >> > >> >> > >> >> > ----- Original Message ----- >> >> >> Fresh traces (jstack and log) in >> >> >> /home/aespinosa/workflows/cybershake/archive-runs/transfer-logging >> >> >> . >> >> >> The swift log is a snapshot of the workflow that is still running. >> >> >> >> >> >> -Allan >> >> >> >> >> >> 2011/4/14 Mihael Hategan : >> >> >> > One immediate question that I have is what's up with the deadline >> >> >> > passed >> >> >> > messages? >> >> >> > >> >> >> > That happens when jobs run for at least twice their advertised >> >> >> > walltime >> >> >> > and for some reason the site doesn't seem to cancel them. This >> >> >> > may >> >> >> > be >> >> >> > indicative of notifications getting lost. >> >> >> > >> >> >> > As for the transfers, I don't see all transfers hanging after >> >> >> > that. >> >> >> > I >> >> >> > mean there are transfers that complete ok. Though things do seem >> >> >> > to >> >> >> > slow >> >> >> > down quite a bit, so that looks like a problem. >> >> >> > >> >> >> > Let's see what in the stack traces. In the mean time, I will see >> >> >> > what it >> >> >> > takes to get transfer progress messages. >> >> >> > >> >> >> > Mihael >> >> >> > >> >> >> > >> >> >> > On Thu, 2011-04-14 at 17:28 -0500, Michael Wilde wrote: >> >> >> >> bri$ pwd >> >> >> >> /home/aespinosa/workflows/cybershake/archive-runs/test >> >> >> >> bri$ ls -lt >> >> >> >> total 1844128 >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 0 Apr 14 14:21 max-duration.tmp >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:20 start-time.tmp >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 1433206 Apr 14 14:20 >> >> >> >> stagein.event >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 >> >> >> >> sort-preserve2.tmp >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 >> >> >> >> sort-preserve.tmp >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:19 t.inf >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 2263727 Apr 14 12:51 >> >> >> >> stagein.transition >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 8998897 Apr 14 12:31 stagein.log >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 92059 Apr 14 12:05 >> >> >> >> dostageout.event >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 97442 Apr 14 11:51 >> >> >> >> dostagein.event >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 2998 Apr 13 17:38 >> >> >> >> dostagein.sorted-start.png >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 3080 Apr 13 17:38 >> >> >> >> dostageout.sorted-start.png >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 3255 Apr 8 16:05 >> >> >> >> execute2-total.png >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 1533974 Apr 8 14:46 >> >> >> >> postproc-20110407-1438-i90jepr3.0.rlog >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 >> >> >> >> postproc-20110407-1438-i90jepr3.log >> >> >> >> drwxr-xr-x 2 aespinosa ci-users 32768 Apr 7 14:39 >> >> >> >> postproc-20110407-1438-i90jepr3.d/ >> >> >> >> bri$ >> >> >> >> >> >> >> >> runs, not run >> >> >> >> >> >> >> >> ALso see bridled: /tmp/mw1 >> >> >> >> >> >> >> >> ----- Original Message ----- >> >> >> >> > [hategan at bridled tmp]$ cd >> >> >> >> > ~aespinosa/workflows/cybershake/archive-run/test/ >> >> >> >> > -bash: cd: >> >> >> >> > /home/aespinosa/workflows/cybershake/archive-run/test/: No >> >> >> >> > such file or directory >> >> >> >> > >> >> >> >> > On Thu, 2011-04-14 at 17:21 -0500, Allan Espinosa wrote: >> >> >> >> > > ~aespinosa/workflows/cybershake/archive-run/test/postproc*.log >> >> >> >> > > >> >> >> >> > > 2011/4/14 Mihael Hategan : >> >> >> >> > > > On Thu, 2011-04-14 at 15:57 -0500, Michael Wilde wrote: >> >> >> >> > > >> While Allan continues to debug this, can you take a look >> >> >> >> > > >> at >> >> >> >> > > >> the >> >> >> >> > > >> (huge) log? >> >> >> >> > > > >> >> >> >> > > > Where is this log? >> >> >> >> > > > >> >> >> >> > > > >> >> > From hategan at mcs.anl.gov Fri Apr 15 14:46:46 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 15 Apr 2011 12:46:46 -0700 Subject: [Swift-devel] Re: Please review and advise on: Bug 357 - Script hangs in staging on OSG In-Reply-To: References: <1728872869.105665.1302829878754.JavaMail.root@zimbra.anl.gov> <1302895478.24852.4.camel@blabla2.none> Message-ID: <1302896806.25420.0.camel@blabla2.none> If all three tries of the transfer failed and if a subsequent rerun of the job didn't yet complete. On Fri, 2011-04-15 at 14:28 -0500, Allan Espinosa wrote: > But then it should be registered into the "Failed jobs" counter right? > > 2011/4/15 Mihael Hategan : > > Individual transfers are retried up to three times. Only if all 3 fail > > the job fails. > > > > On Fri, 2011-04-15 at 12:14 -0500, Allan Espinosa wrote: > >> Hi Mike, > >> > >> here's the plot of the stagein events from yesterday's showing > >> "almost" the same behavior. I attached the raw data and an svg > >> version of the plot as well. (plot were produced from R instead of > >> the regular swift-log-processing tools) > >> > >> Also looking at the transfers that did not finish (the ones with > >> "START" in the last column on the raw data), I tested out two gridftp > >> servers today: > >> > >> Ranger to top.ucr.edu > >> $ globus-url-copy -vb > >> gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projeca3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/218_260.txt.variation-s0000-h0000 > >> gsiftp://top.ucr.edu/data/down/osg_data/engage/scec/testfile > >> Source: gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/ > >> Dest: gsiftp://top.ucr.edu/data/down/osg_data/engage/scec/ > >> 218_260.txt.variation-s0000-h0000 -> testfile > >> > >> So this site looks like it's working alright. Here's the last > >> activity about top.ucr.edu in the log: > >> 2011-04-14 17:50:20,975-0500 DEBUG Reply read 1st line > >> 2011-04-14 17:50:20,976-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_START > >> srcname=PeakVals_TEST_219_258_32.bsa srcdir=po > >> stproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 > >> srchost=UCR-HEP__ > >> top.ucr.edu destdir=/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 > >> desthost=gridftp.ranger.tacc.tera > >> grid.org provider=gsiftp > >> 2011-04-14 17:50:20,978-0500 DEBUG vdl:dostageout FILE_STAGE_OUT_START > >> srcname=Seismogram_TEST_219_258_35.grm srcdir= > >> postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 > >> srchost=UCR-HEP > >> __top.ucr.edu destdir=/scratch/01035/tg802895/science/cybershake/Results/TEST/219/258 > >> desthost=gridftp.ranger.tacc.te > >> ragrid.org provider=gsiftp > >> > >> ----comment----- > >> I am not sure if these gsiftp ABORT calls are with the top.ucr.edu site > >> ---comment------ > >> > >> 2011-04-14 17:50:21,006-0500 DEBUG Reply 1st line: 350 OK. Send RNTO > >> with destination name. > >> 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel > >> received: 350 OK. Send RNTO with destination nam > >> e. > >> 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel > >> sending: ABOR > >> > >> 2011-04-14 17:50:21,006-0500 DEBUG Reply read 1st line > >> 2011-04-14 17:50:21,006-0500 DEBUG Reply 1st line: 350 OK. Send RNTO > >> with destination name. > >> 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel > >> received: 350 OK. Send RNTO with destination nam > >> e. > >> 2011-04-14 17:50:21,006-0500 DEBUG FTPControlChannel Control channel > >> sending: ABOR > >> > >> 2011-04-14 17:50:21,006-0500 DEBUG Reply read 1st line > >> 2011-04-14 17:50:21,068-0500 DEBUG Reply 1st line: 226 Abort successful > >> > >> > >> > >> [aespinosa at communicado shared]$ MB/sec avg 2.39 MB/sec inst > >> > >> Ranger to Clemson: > >> $ globus-url-copy -vb > >> gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/218_260.txt.variation-s0000-h0000 > >> gsiftp://osg-gw.clemson.edu/common1/osg/data/engage/scec/testfile > >> Source: gsiftp://gridftp.ranger.tacc.teragrid.org/scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations_35_V2_3/218/260/ > >> Dest: gsiftp://osg-gw.clemson.edu/common1/osg/data/engage/scec/ > >> 218_260.txt.variation-s0000-h0000 -> testfile > >> > >> > >> error: globus_xio: Unable to connect to osg-gw.clemson.edu:2811 > >> globus_xio: System error in connect: Connection refused > >> globus_xio: A system call failed: Connection refused > >> > >> > >> Looks like the Swift log gave an exception for the clemson resource: > >> 2011-04-14 17:17:16,198-0500 DEBUG Reply end reached > >> 2011-04-14 17:17:16,198-0500 DEBUG FTPControlChannel Control channel > >> received: 500-Command failed. : globus_gridftp_s > >> erver_file.c:globus_l_gfs_file_send:2190: > >> 500-globus_l_gfs_file_open failed. > >> 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > >> 500-globus_xio_register_open failed. > >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > >> 500-Unable to open file > >> /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01 > >> 035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa > >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > >> 500-System error in open: No such file or directory > >> 500-globus_xio: A system call failed: No such file or directory > >> 500 End. > >> 2011-04-14 17:17:16,198-0500 DEBUG Local first reply bad: 500-Command > >> failed. : globus_gridftp_server_file.c:globus_l > >> _gfs_file_send:2190: > >> 500-globus_l_gfs_file_open failed. > >> 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > >> 500-globus_xio_register_open failed. > >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > >> 500-Unable to open file > >> /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01 > >> 035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa > >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > >> 500-System error in open: No such file or directory > >> 500-globus_xio: A system call failed: No such file or directory > >> 500 End. > >> 2011-04-14 17:17:16,198-0500 DEBUG Local category: 5 > >> 2011-04-14 17:17:16,199-0500 DEBUG FTPControlChannel slept 200 > >> 2011-04-14 17:17:16,202-0500 DEBUG TransferState intercepted exception > >> org.globus.ftp.exception.ServerException: Server refused performing > >> the request. Custom message: (error code 1) [Nested exception > >> message: Custom message: Unexpected reply: 500-Command failed. : > >> globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > >> 500-globus_l_gfs_file_open failed. > >> 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > >> 500-globus_xio_register_open failed. > >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > >> 500-Unable to open file > >> /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa > >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > >> 500-System error in open: No such file or directory > >> 500-globus_xio: A system call failed: No such file or directory > >> 500 End.]. Nested exception is > >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> message: Unexpected reply: 500-Command failed. : > >> globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > >> 500-globus_l_gfs_file_open failed. > >> 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > >> 500-globus_xio_register_open failed. > >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > >> 500-Unable to open file > >> /common1/osg/data/engage/scec/swift_scratch/postproc-20110414-1604-bzxryead/shared/scratch/01035/tg802895/science/cybershake/Results/TEST/219/245/PeakVals_TEST_219_245_126.bsa > >> 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > >> 500-System error in open: No such file or directory > >> 500-globus_xio: A system call failed: No such file or directory > >> 500 End. > >> at org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > >> at org.globus.ftp.vanilla.TransferMonitor.start(TransferMonitor.java:109) > >> at org.globus.ftp.FTPClient.transferRunSingleThread(FTPClient.java:1479) > >> at org.globus.ftp.FTPClient.transfer(FTPClient.java:1378) > >> at org.globus.io.urlcopy.UrlCopy.thirdPartyTransfer(UrlCopy.java:739) > >> at org.globus.io.urlcopy.UrlCopy.copy(UrlCopy.java:493) > >> at org.globus.io.urlcopy.UrlCopy.run(UrlCopy.java:445) > >> at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doThirdPartyTransfer(DelegatedFileTransferHandler.java:571) > >> at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTransferHandler.java:465) > >> > >> > >> I expected the Swift terminal session to report a "Failed" job, but as > >> it runs now it looks like this: > >> > >> Progress: Stage in:79 Stage out:232 Finished successfully:130 > >> Progress: Stage in:79 Stage out:232 Finished successfully:130 > >> Progress: Stage in:79 Stage out:232 Finished successfully:130 > >> Progress: Stage in:79 Stage out:232 Finished successfully:130 > >> Progress: Stage in:79 Stage out:232 Finished successfully:130 > >> > >> Btw, i have lazy.errors=true > >> > >> 2011/4/14 Michael Wilde : > >> > ALlan, what I meant was: do you have any evidence that this current run is hung (either in a similar manner to the one we looked at closely this morning, or in a different manner)? > >> > > >> > In this mornings log, you could tell from plots of stagein and stageout events that many these events were not completing after something triggered an error. > >> > > >> > Do you have similar plots or evidence of hangs regarding this run and its log? > >> > > >> > I dont know from browsing the traces if one would *naturally* expect the transfer threads to all be waiting on input sockets most of the time, or if seeing all 4 threads waiting on sockets is indicative of data transfer being totally hung. > >> > > >> > Mihael, I assume you can tell much more from these traces? > >> > > >> > - Mike > >> > > >> > > >> > ----- Original Message ----- > >> >> Right now the logs only gives out messages about > >> >> AbstractKarajanStreamChannel. I set the org.globus.ftp package's > >> >> logging level to DEBUG, so entries should be reflected if there are > >> >> transfers being made. > >> >> > >> >> -Allan > >> >> > >> >> 2011/4/14 Michael Wilde : > >> >> > So you have 4 transfer threads and all 4 are waiting here: > >> >> > > >> >> > at java.net.SocketInputStream.socketRead0(Native Method) > >> >> > at > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:129) > >> >> > > >> >> > (from throttle.transfers=4) > >> >> > > >> >> > Is this workflow hung, and if so, how are you determining that? Do > >> >> > you have another log plot of stagein and out? > >> >> > > >> >> > - Mike > >> >> > > >> >> > > >> >> > ----- Original Message ----- > >> >> >> Fresh traces (jstack and log) in > >> >> >> /home/aespinosa/workflows/cybershake/archive-runs/transfer-logging > >> >> >> . > >> >> >> The swift log is a snapshot of the workflow that is still running. > >> >> >> > >> >> >> -Allan > >> >> >> > >> >> >> 2011/4/14 Mihael Hategan : > >> >> >> > One immediate question that I have is what's up with the deadline > >> >> >> > passed > >> >> >> > messages? > >> >> >> > > >> >> >> > That happens when jobs run for at least twice their advertised > >> >> >> > walltime > >> >> >> > and for some reason the site doesn't seem to cancel them. This > >> >> >> > may > >> >> >> > be > >> >> >> > indicative of notifications getting lost. > >> >> >> > > >> >> >> > As for the transfers, I don't see all transfers hanging after > >> >> >> > that. > >> >> >> > I > >> >> >> > mean there are transfers that complete ok. Though things do seem > >> >> >> > to > >> >> >> > slow > >> >> >> > down quite a bit, so that looks like a problem. > >> >> >> > > >> >> >> > Let's see what in the stack traces. In the mean time, I will see > >> >> >> > what it > >> >> >> > takes to get transfer progress messages. > >> >> >> > > >> >> >> > Mihael > >> >> >> > > >> >> >> > > >> >> >> > On Thu, 2011-04-14 at 17:28 -0500, Michael Wilde wrote: > >> >> >> >> bri$ pwd > >> >> >> >> /home/aespinosa/workflows/cybershake/archive-runs/test > >> >> >> >> bri$ ls -lt > >> >> >> >> total 1844128 > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 0 Apr 14 14:21 max-duration.tmp > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:20 start-time.tmp > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 1433206 Apr 14 14:20 > >> >> >> >> stagein.event > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > >> >> >> >> sort-preserve2.tmp > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 2372737 Apr 14 14:19 > >> >> >> >> sort-preserve.tmp > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 15 Apr 14 14:19 t.inf > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 2263727 Apr 14 12:51 > >> >> >> >> stagein.transition > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 8998897 Apr 14 12:31 stagein.log > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 92059 Apr 14 12:05 > >> >> >> >> dostageout.event > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 97442 Apr 14 11:51 > >> >> >> >> dostagein.event > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 2998 Apr 13 17:38 > >> >> >> >> dostagein.sorted-start.png > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 3080 Apr 13 17:38 > >> >> >> >> dostageout.sorted-start.png > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 3255 Apr 8 16:05 > >> >> >> >> execute2-total.png > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 1533974 Apr 8 14:46 > >> >> >> >> postproc-20110407-1438-i90jepr3.0.rlog > >> >> >> >> -rw-r--r-- 1 aespinosa ci-users 1868896768 Apr 8 14:46 > >> >> >> >> postproc-20110407-1438-i90jepr3.log > >> >> >> >> drwxr-xr-x 2 aespinosa ci-users 32768 Apr 7 14:39 > >> >> >> >> postproc-20110407-1438-i90jepr3.d/ > >> >> >> >> bri$ > >> >> >> >> > >> >> >> >> runs, not run > >> >> >> >> > >> >> >> >> ALso see bridled: /tmp/mw1 > >> >> >> >> > >> >> >> >> ----- Original Message ----- > >> >> >> >> > [hategan at bridled tmp]$ cd > >> >> >> >> > ~aespinosa/workflows/cybershake/archive-run/test/ > >> >> >> >> > -bash: cd: > >> >> >> >> > /home/aespinosa/workflows/cybershake/archive-run/test/: No > >> >> >> >> > such file or directory > >> >> >> >> > > >> >> >> >> > On Thu, 2011-04-14 at 17:21 -0500, Allan Espinosa wrote: > >> >> >> >> > > ~aespinosa/workflows/cybershake/archive-run/test/postproc*.log > >> >> >> >> > > > >> >> >> >> > > 2011/4/14 Mihael Hategan : > >> >> >> >> > > > On Thu, 2011-04-14 at 15:57 -0500, Michael Wilde wrote: > >> >> >> >> > > >> While Allan continues to debug this, can you take a look > >> >> >> >> > > >> at > >> >> >> >> > > >> the > >> >> >> >> > > >> (huge) log? > >> >> >> >> > > > > >> >> >> >> > > > Where is this log? > >> >> >> >> > > > > >> >> >> >> > > > > >> >> > From hategan at mcs.anl.gov Fri Apr 15 20:04:13 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 15 Apr 2011 18:04:13 -0700 Subject: [Swift-devel] duplicated job submission in swift-0.92? In-Reply-To: <1301794624.12893.1.camel@blabla2.none> References: <357669173.48524.1301541122267.JavaMail.root@zimbra.anl.gov> <1301541697.23803.9.camel@blabla2.none> <1301596477.1319.0.camel@blabla2.none> <1301597554.1319.5.camel@blabla2.none> <1301794624.12893.1.camel@blabla2.none> Message-ID: <1302915853.7783.1.camel@blabla2.none> Turns out I was lying. This bug was also in trunk (not any more as of swift r4381). I didn't notice it because my local copy didn't have it. So I must have fixed it and have been aware of it in trunk, but never committed the fix. I apologize for the mess. Mihael On Sat, 2011-04-02 at 18:37 -0700, Mihael Hategan wrote: > On Thu, 2011-03-31 at 11:52 -0700, Mihael Hategan wrote: > > We decided the following: > > - I will revert the changes in the 0.92 branch > > done > > > - re-commit bug fixes that were committed after the merge > > done > > > - merge the 0.92 branch to trunk > > done > > > - fix the problems in trunk > > and done except the problem was in the branch. I think it was a manual > merge of mine gone wrong. Trunk should be clean. > > We should make a patch release. > > Mihael > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Fri Apr 15 20:12:38 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 15 Apr 2011 18:12:38 -0700 Subject: [Swift-devel] Re: deadlock on workflow: In-Reply-To: <1297999963.23095.0.camel@blabla2.none> References: <227721707.70260.1297988064430.JavaMail.root@zimbra.anl.gov> <1297999963.23095.0.camel@blabla2.none> Message-ID: <1302916358.7880.0.camel@blabla2.none> This is now fixed (sadly after the release). It was only in the branch, not in trunk. See https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=299 On Thu, 2011-02-17 at 19:32 -0800, Mihael Hategan wrote: > I agree. > > On Thu, 2011-02-17 at 18:14 -0600, Michael Wilde wrote: > > Thanks, Allan - good catch. *That* makes it worth fixing I feel, or at least diagnosing its likelihood. Mihael, do you agree? > > > > - Mike > > > > ----- Original Message ----- > > > ok, the deadlock is in branches/release-0.92 as well (swift-r4110 > > > cog-r3032) > > > > > > -Allan > > > > > > 2011/2/17 Allan Espinosa : > > > > Version > > > > > > > > swift-r3835 cog-r2988 > > > > > > > > see attached: > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Fri Apr 15 20:17:20 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 15 Apr 2011 18:17:20 -0700 Subject: [Swift-devel] provider staging vdl-int.staging.k doesn't honor wrapperlog.transfer.always In-Reply-To: References: Message-ID: <1302916640.7880.4.camel@blabla2.none> I've committed a fix. Justin's patch I think was partially solving the problem. The original semantics of this parameter was to always transfer the info file in the event of an application error, but not necessarily so when the application invocation succeeded. There was no easy fix. It required adding some staging "modes" (i.e. transfer only in case of error and/or only if file exists). Not all providers support this (only local and coaster), but then not all providers support staging to begin with. This is now in. See https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=295. Mihael On Mon, 2011-01-17 at 14:49 -0600, Justin M Wozniak wrote: > Try the attached patch- I will test it a bit more before committing. > > Thanks for the report. > > Justin > > On Mon, 17 Jan 2011, Allan Espinosa wrote: > > > vdl-int.staging.k line 239 > > > > There's no check for the swift config property. hence i'm > > alwaysgetting the wrapper logs in provider staging. > > > > -Allanb > > > > > > > > -- > Justin M Wozniak > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun Apr 17 03:25:20 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 17 Apr 2011 01:25:20 -0700 Subject: [Swift-devel] [Fwd: [pads-users] Mounting homes read-only on computes] Message-ID: <1303028720.11077.1.camel@blabla2.none> I guess we should allow worker logs to go in some other place than ~.globus/ Mihael -------- Forwarded Message -------- From: pads-users at ci.uchicago.edu Reply-to: pads-users at ci.uchicago.edu To: pads-users at ci.uchicago.edu Subject: [pads-users] Mounting homes read-only on computes Date: Sat, 16 Apr 2011 22:00:40 -0500 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 One of the things discussed as part of the post-mortem of today's home file server outage is mounting home directories read-only on the compute nodes. We do this on Beagle and haven't had any complaints for the most part. So I wanted to solicit you to see how disruptive that might be to your current PADS work. Homes would still be mounted read-write on the login machines, but they wouldn't be writable on the computes. The alternative is to push your writes to either GPFS or the local scratch filesystems. If we don't hear any major complaints about this, we're targeting next month's maintenance to make this live. What are your thoughts on this? -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.14 (Darwin) iEYEARECAAYFAk2qV9gACgkQ4RgdOxQVi0CovACfW20YM0R8uexD5PeHoX6QGZc3 LOwAnR3Iv1tokCpM81T56kSGrLNuniKg =c5Y+ -----END PGP SIGNATURE----- _______________________________________________ pads-users mailing list pads-users at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/pads-users From ketan at ci.uchicago.edu Sun Apr 17 18:18:38 2011 From: ketan at ci.uchicago.edu (Ketan Maheshwari) Date: Sun, 17 Apr 2011 18:18:38 -0500 Subject: [Swift-devel] (urgent) Fwd: modFtDock and Freed/Sosnick science on Beagle References: Message-ID: <094AEEEB-767B-4720-AF4C-144BC06AEAFB@ci.uchicago.edu> Hi, I have received a confirmation (see below) in response to our reservation request on Beagle for the modftdock runs. The reservation is due in about 18 hours now. The requirement to get jobs submitted into this reservation is to have string '-I advres=modFTDock.47' as part of job submit. Could someone indicate how do I do this from Swift? Thanks, Ketan Begin forwarded message: > From: "Ti Leggett" > Date: April 17, 2011 4:26:18 PM CDT > To: ketan at ci.uchicago.edu > Subject: Fwd: [beagle-planning] Fwd: modFtDock and Freed/Sosnick science on Beagle [Beagle ticket #12863] > Reply-To: beagle-support at ci.uchicago.edu > > You have a 50 node reservation that starts in 20 hours. You should submit your jobs with the '-l > advres=modFTDock.47' option to ensure you run against the reservation. If you finish early, > please release the reservation: releaseres modFTDock.47. > > > [leggett at sandbox:/soft/local-ops/reservations]$ showres modFTDock.47 > > ReservationID Type S Start End Duration N/P StartTime > > modFTDock.47 User - 20:28:10 2:20:28:10 2:00:00:00 50/1200 Mon Apr 18 12:52:38 > > 1 reservation located > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Sun Apr 17 21:16:16 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sun, 17 Apr 2011 21:16:16 -0500 (Central Daylight Time) Subject: [Swift-devel] (urgent) Fwd: modFtDock and Freed/Sosnick science on Beagle In-Reply-To: <094AEEEB-767B-4720-AF4C-144BC06AEAFB@ci.uchicago.edu> References: <094AEEEB-767B-4720-AF4C-144BC06AEAFB@ci.uchicago.edu> Message-ID: This should be similar to other recent modifications to the PBS provider- I will try to get this in by noon tomorrow. That's when this starts, right? Justin On Sun, 17 Apr 2011, Ketan Maheshwari wrote: > Hi, > > I have received a confirmation (see below) in response to our > reservation request on Beagle for the modftdock runs. The reservation is > due in about 18 hours now. > > The requirement to get jobs submitted into this reservation is to have > string '-I advres=modFTDock.47' as part of job submit. Could someone > indicate how do I do this from Swift? > > Thanks, > Ketan > > > Begin forwarded message: > >> From: "Ti Leggett" >> Date: April 17, 2011 4:26:18 PM CDT >> To: ketan at ci.uchicago.edu >> Subject: Fwd: [beagle-planning] Fwd: modFtDock and Freed/Sosnick science on Beagle [Beagle ticket #12863] >> Reply-To: beagle-support at ci.uchicago.edu >> >> You have a 50 node reservation that starts in 20 hours. You should submit your jobs with the '-l >> advres=modFTDock.47' option to ensure you run against the reservation. If you finish early, >> please release the reservation: releaseres modFTDock.47. >> >> >> [leggett at sandbox:/soft/local-ops/reservations]$ showres modFTDock.47 >> >> ReservationID Type S Start End Duration N/P StartTime >> >> modFTDock.47 User - 20:28:10 2:20:28:10 2:00:00:00 50/1200 Mon Apr 18 12:52:38 >> >> 1 reservation located >> > > -- Justin M Wozniak From skenny at uchicago.edu Mon Apr 18 20:53:55 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 18 Apr 2011 18:53:55 -0700 Subject: [Swift-devel] [Fwd: [pads-users] Mounting homes read-only on computes] In-Reply-To: <1303028720.11077.1.camel@blabla2.none> References: <1303028720.11077.1.camel@blabla2.none> Message-ID: /var/tmp ? On Sun, Apr 17, 2011 at 1:25 AM, Mihael Hategan wrote: > I guess we should allow worker logs to go in some other place than > ~.globus/ > > Mihael > > -------- Forwarded Message -------- > From: pads-users at ci.uchicago.edu > Reply-to: pads-users at ci.uchicago.edu > To: pads-users at ci.uchicago.edu > Subject: [pads-users] Mounting homes read-only on computes > Date: Sat, 16 Apr 2011 22:00:40 -0500 > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > One of the things discussed as part of the post-mortem of today's home file > server outage is mounting home directories read-only on the compute nodes. > We do this on Beagle and haven't had any complaints for the most part. So I > wanted to solicit you to see how disruptive that might be to your current > PADS work. Homes would still be mounted read-write on the login machines, > but they wouldn't be writable on the computes. The alternative is to push > your writes to either GPFS or the local scratch filesystems. If we don't > hear any major complaints about this, we're targeting next month's > maintenance to make this live. What are your thoughts on this? > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.14 (Darwin) > > iEYEARECAAYFAk2qV9gACgkQ4RgdOxQVi0CovACfW20YM0R8uexD5PeHoX6QGZc3 > LOwAnR3Iv1tokCpM81T56kSGrLNuniKg > =c5Y+ > -----END PGP SIGNATURE----- > _______________________________________________ > pads-users mailing list > pads-users at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/pads-users > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Apr 18 21:28:12 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 18 Apr 2011 19:28:12 -0700 Subject: [Swift-devel] [Fwd: [pads-users] Mounting homes read-only on computes] In-Reply-To: References: <1303028720.11077.1.camel@blabla2.none> Message-ID: <1303180092.17707.0.camel@blabla2.none> It should be a configurable parameter. I was thinking more like gpfs, but it should be the user's choice. On Mon, 2011-04-18 at 18:53 -0700, Sarah Kenny wrote: > /var/tmp ? > > On Sun, Apr 17, 2011 at 1:25 AM, Mihael Hategan > wrote: > I guess we should allow worker logs to go in some other place > than > ~.globus/ > > Mihael > > -------- Forwarded Message -------- > From: pads-users at ci.uchicago.edu > Reply-to: pads-users at ci.uchicago.edu > To: pads-users at ci.uchicago.edu > Subject: [pads-users] Mounting homes read-only on computes > Date: Sat, 16 Apr 2011 22:00:40 -0500 > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > One of the things discussed as part of the post-mortem of > today's home file server outage is mounting home directories > read-only on the compute nodes. We do this on Beagle and > haven't had any complaints for the most part. So I wanted to > solicit you to see how disruptive that might be to your > current PADS work. Homes would still be mounted read-write on > the login machines, but they wouldn't be writable on the > computes. The alternative is to push your writes to either > GPFS or the local scratch filesystems. If we don't hear any > major complaints about this, we're targeting next month's > maintenance to make this live. What are your thoughts on this? > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.14 (Darwin) > > iEYEARECAAYFAk2qV9gACgkQ4RgdOxQVi0CovACfW20YM0R8uexD5PeHoX6QGZc3 > LOwAnR3Iv1tokCpM81T56kSGrLNuniKg > =c5Y+ > -----END PGP SIGNATURE----- > _______________________________________________ > pads-users mailing list > pads-users at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/pads-users > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From jonmon at utexas.edu Mon Apr 18 23:49:10 2011 From: jonmon at utexas.edu (Jonathan S Monette) Date: Mon, 18 Apr 2011 23:49:10 -0500 Subject: [Swift-devel] [Fwd: [pads-users] Mounting homes read-only on computes] In-Reply-To: <1303180092.17707.0.camel@blabla2.none> References: <1303028720.11077.1.camel@blabla2.none> <1303180092.17707.0.camel@blabla2.none> Message-ID: What about the work directory? On Apr 18, 2011 9:28 PM, "Mihael Hategan" wrote: > It should be a configurable parameter. I was thinking more like gpfs, > but it should be the user's choice. > > On Mon, 2011-04-18 at 18:53 -0700, Sarah Kenny wrote: >> /var/tmp ? >> >> On Sun, Apr 17, 2011 at 1:25 AM, Mihael Hategan >> wrote: >> I guess we should allow worker logs to go in some other place >> than >> ~.globus/ >> >> Mihael >> >> -------- Forwarded Message -------- >> From: pads-users at ci.uchicago.edu >> Reply-to: pads-users at ci.uchicago.edu >> To: pads-users at ci.uchicago.edu >> Subject: [pads-users] Mounting homes read-only on computes >> Date: Sat, 16 Apr 2011 22:00:40 -0500 >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> One of the things discussed as part of the post-mortem of >> today's home file server outage is mounting home directories >> read-only on the compute nodes. We do this on Beagle and >> haven't had any complaints for the most part. So I wanted to >> solicit you to see how disruptive that might be to your >> current PADS work. Homes would still be mounted read-write on >> the login machines, but they wouldn't be writable on the >> computes. The alternative is to push your writes to either >> GPFS or the local scratch filesystems. If we don't hear any >> major complaints about this, we're targeting next month's >> maintenance to make this live. What are your thoughts on this? >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG/MacGPG2 v2.0.14 (Darwin) >> >> iEYEARECAAYFAk2qV9gACgkQ4RgdOxQVi0CovACfW20YM0R8uexD5PeHoX6QGZc3 >> LOwAnR3Iv1tokCpM81T56kSGrLNuniKg >> =c5Y+ >> -----END PGP SIGNATURE----- >> _______________________________________________ >> pads-users mailing list >> pads-users at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/pads-users >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Mon Apr 18 23:55:29 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 18 Apr 2011 23:55:29 -0500 Subject: [Swift-devel] [Fwd: [pads-users] Mounting homes read-only on computes] In-Reply-To: References: <1303028720.11077.1.camel@blabla2.none> <1303180092.17707.0.camel@blabla2.none> Message-ID: The workdirectory of a site and client host maybe on different machines. My guess is that it should be something in the etc/swift.properties file 2011/4/18 Jonathan S Monette : > What about the work directory? > > On Apr 18, 2011 9:28 PM, "Mihael Hategan" wrote: >> It should be a configurable parameter. I was thinking more like gpfs, >> but it should be the user's choice. >> >> On Mon, 2011-04-18 at 18:53 -0700, Sarah Kenny wrote: >>> /var/tmp ? >>> >>> On Sun, Apr 17, 2011 at 1:25 AM, Mihael Hategan >>> wrote: >>> I guess we should allow worker logs to go in some other place >>> than >>> ~.globus/ >>> >>> Mihael >>> >>> -------- Forwarded Message -------- >>> From: pads-users at ci.uchicago.edu >>> Reply-to: pads-users at ci.uchicago.edu >>> To: pads-users at ci.uchicago.edu >>> Subject: [pads-users] Mounting homes read-only on computes >>> Date: Sat, 16 Apr 2011 22:00:40 -0500 >>> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> One of the things discussed as part of the post-mortem of >>> today's home file server outage is mounting home directories >>> read-only on the compute nodes. We do this on Beagle and >>> haven't had any complaints for the most part. So I wanted to >>> solicit you to see how disruptive that might be to your >>> current PADS work. Homes would still be mounted read-write on >>> the login machines, but they wouldn't be writable on the >>> computes. The alternative is to push your writes to either >>> GPFS or the local scratch filesystems. If we don't hear any >>> major complaints about this, we're targeting next month's >>> maintenance to make this live. What are your thoughts on this? >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG/MacGPG2 v2.0.14 (Darwin) >>> >>> iEYEARECAAYFAk2qV9gACgkQ4RgdOxQVi0CovACfW20YM0R8uexD5PeHoX6QGZc3 >>> LOwAnR3Iv1tokCpM81T56kSGrLNuniKg >>> =c5Y+ >>> -----END PGP SIGNATURE----- -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Tue Apr 19 00:02:06 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 18 Apr 2011 22:02:06 -0700 Subject: [Swift-devel] [Fwd: [pads-users] Mounting homes read-only on computes] In-Reply-To: References: <1303028720.11077.1.camel@blabla2.none> <1303180092.17707.0.camel@blabla2.none> Message-ID: <1303189326.22310.1.camel@blabla2.none> You'd want that to be configurable for each site (each may have its SFS mounted somewhere else). That would suggest it might go in the sites file. On Mon, 2011-04-18 at 23:55 -0500, Allan Espinosa wrote: > The workdirectory of a site and client host maybe on different > machines. My guess is that it should be something in the > etc/swift.properties file > > 2011/4/18 Jonathan S Monette : > > What about the work directory? > > > > On Apr 18, 2011 9:28 PM, "Mihael Hategan" wrote: > >> It should be a configurable parameter. I was thinking more like gpfs, > >> but it should be the user's choice. > >> > >> On Mon, 2011-04-18 at 18:53 -0700, Sarah Kenny wrote: > >>> /var/tmp ? > >>> > >>> On Sun, Apr 17, 2011 at 1:25 AM, Mihael Hategan > >>> wrote: > >>> I guess we should allow worker logs to go in some other place > >>> than > >>> ~.globus/ > >>> > >>> Mihael > >>> > >>> -------- Forwarded Message -------- > >>> From: pads-users at ci.uchicago.edu > >>> Reply-to: pads-users at ci.uchicago.edu > >>> To: pads-users at ci.uchicago.edu > >>> Subject: [pads-users] Mounting homes read-only on computes > >>> Date: Sat, 16 Apr 2011 22:00:40 -0500 > >>> > >>> -----BEGIN PGP SIGNED MESSAGE----- > >>> Hash: SHA1 > >>> > >>> One of the things discussed as part of the post-mortem of > >>> today's home file server outage is mounting home directories > >>> read-only on the compute nodes. We do this on Beagle and > >>> haven't had any complaints for the most part. So I wanted to > >>> solicit you to see how disruptive that might be to your > >>> current PADS work. Homes would still be mounted read-write on > >>> the login machines, but they wouldn't be writable on the > >>> computes. The alternative is to push your writes to either > >>> GPFS or the local scratch filesystems. If we don't hear any > >>> major complaints about this, we're targeting next month's > >>> maintenance to make this live. What are your thoughts on this? > >>> -----BEGIN PGP SIGNATURE----- > >>> Version: GnuPG/MacGPG2 v2.0.14 (Darwin) > >>> > >>> iEYEARECAAYFAk2qV9gACgkQ4RgdOxQVi0CovACfW20YM0R8uexD5PeHoX6QGZc3 > >>> LOwAnR3Iv1tokCpM81T56kSGrLNuniKg > >>> =c5Y+ > >>> -----END PGP SIGNATURE----- > > > > From aespinosa at cs.uchicago.edu Tue Apr 19 00:05:06 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 19 Apr 2011 00:05:06 -0500 Subject: [Swift-devel] [Fwd: [pads-users] Mounting homes read-only on computes] In-Reply-To: <1303189326.22310.1.camel@blabla2.none> References: <1303028720.11077.1.camel@blabla2.none> <1303180092.17707.0.camel@blabla2.none> <1303189326.22310.1.camel@blabla2.none> Message-ID: Oh right worker logs. I was thinking of the submit files (pbs or condor). sorry 2011/4/19 Mihael Hategan : > You'd want that to be configurable for each site (each may have its SFS > mounted somewhere else). That would suggest it might go in the sites > file. > > On Mon, 2011-04-18 at 23:55 -0500, Allan Espinosa wrote: >> The workdirectory of a site and client host maybe on different >> machines. ?My guess is that it should be something in the >> etc/swift.properties file >> >> 2011/4/18 Jonathan S Monette : >> > What about the work directory? >> > >> > On Apr 18, 2011 9:28 PM, "Mihael Hategan" wrote: >> >> It should be a configurable parameter. I was thinking more like gpfs, >> >> but it should be the user's choice. >> >> >> >> On Mon, 2011-04-18 at 18:53 -0700, Sarah Kenny wrote: >> >>> /var/tmp ? >> >>> >> >>> On Sun, Apr 17, 2011 at 1:25 AM, Mihael Hategan >> >>> wrote: >> >>> I guess we should allow worker logs to go in some other place >> >>> than >> >>> ~.globus/ >> >>> >> >>> Mihael >> >>> >> >>> -------- Forwarded Message -------- >> >>> From: pads-users at ci.uchicago.edu >> >>> Reply-to: pads-users at ci.uchicago.edu >> >>> To: pads-users at ci.uchicago.edu >> >>> Subject: [pads-users] Mounting homes read-only on computes >> >>> Date: Sat, 16 Apr 2011 22:00:40 -0500 >> >>> >> >>> -----BEGIN PGP SIGNED MESSAGE----- >> >>> Hash: SHA1 >> >>> >> >>> One of the things discussed as part of the post-mortem of >> >>> today's home file server outage is mounting home directories >> >>> read-only on the compute nodes. We do this on Beagle and >> >>> haven't had any complaints for the most part. So I wanted to >> >>> solicit you to see how disruptive that might be to your >> >>> current PADS work. Homes would still be mounted read-write on >> >>> the login machines, but they wouldn't be writable on the >> >>> computes. The alternative is to push your writes to either >> >>> GPFS or the local scratch filesystems. If we don't hear any >> >>> major complaints about this, we're targeting next month's >> >>> maintenance to make this live. What are your thoughts on this? >> >>> -----BEGIN PGP SIGNATURE----- >> >>> Version: GnuPG/MacGPG2 v2.0.14 (Darwin) >> >>> >> >>> iEYEARECAAYFAk2qV9gACgkQ4RgdOxQVi0CovACfW20YM0R8uexD5PeHoX6QGZc3 >> >>> LOwAnR3Iv1tokCpM81T56kSGrLNuniKg >> >>> =c5Y+ >> >>> -----END PGP SIGNATURE----- >> >> > From hategan at mcs.anl.gov Tue Apr 19 00:15:55 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 18 Apr 2011 22:15:55 -0700 Subject: [Swift-devel] [Fwd: [pads-users] Mounting homes read-only on computes] In-Reply-To: References: <1303028720.11077.1.camel@blabla2.none> <1303180092.17707.0.camel@blabla2.none> <1303189326.22310.1.camel@blabla2.none> Message-ID: <1303190155.22557.9.camel@blabla2.none> On Tue, 2011-04-19 at 00:05 -0500, Allan Espinosa wrote: > Oh right worker logs. I was thinking of the submit files (pbs or > condor). sorry Crap. In general, the pbs scripts are only needed on node that you submit from. So they should probably go in $TMP if we want to submit from the CNs, but that can probably be "hardcoded". The coaster worker script should still be readable on the compute nodes. But we may want to also make that a configurable option (though I'd wait for that sufficiently messed up cluster that requires it). > > 2011/4/19 Mihael Hategan : > > You'd want that to be configurable for each site (each may have its SFS > > mounted somewhere else). That would suggest it might go in the sites > > file. > > > > On Mon, 2011-04-18 at 23:55 -0500, Allan Espinosa wrote: > >> The workdirectory of a site and client host maybe on different > >> machines. My guess is that it should be something in the > >> etc/swift.properties file > >> > >> 2011/4/18 Jonathan S Monette : > >> > What about the work directory? > >> > > >> > On Apr 18, 2011 9:28 PM, "Mihael Hategan" wrote: > >> >> It should be a configurable parameter. I was thinking more like gpfs, > >> >> but it should be the user's choice. > >> >> > >> >> On Mon, 2011-04-18 at 18:53 -0700, Sarah Kenny wrote: > >> >>> /var/tmp ? > >> >>> > >> >>> On Sun, Apr 17, 2011 at 1:25 AM, Mihael Hategan > >> >>> wrote: > >> >>> I guess we should allow worker logs to go in some other place > >> >>> than > >> >>> ~.globus/ > >> >>> > >> >>> Mihael > >> >>> > >> >>> -------- Forwarded Message -------- > >> >>> From: pads-users at ci.uchicago.edu > >> >>> Reply-to: pads-users at ci.uchicago.edu > >> >>> To: pads-users at ci.uchicago.edu > >> >>> Subject: [pads-users] Mounting homes read-only on computes > >> >>> Date: Sat, 16 Apr 2011 22:00:40 -0500 > >> >>> > >> >>> -----BEGIN PGP SIGNED MESSAGE----- > >> >>> Hash: SHA1 > >> >>> > >> >>> One of the things discussed as part of the post-mortem of > >> >>> today's home file server outage is mounting home directories > >> >>> read-only on the compute nodes. We do this on Beagle and > >> >>> haven't had any complaints for the most part. So I wanted to > >> >>> solicit you to see how disruptive that might be to your > >> >>> current PADS work. Homes would still be mounted read-write on > >> >>> the login machines, but they wouldn't be writable on the > >> >>> computes. The alternative is to push your writes to either > >> >>> GPFS or the local scratch filesystems. If we don't hear any > >> >>> major complaints about this, we're targeting next month's > >> >>> maintenance to make this live. What are your thoughts on this? > >> >>> -----BEGIN PGP SIGNATURE----- > >> >>> Version: GnuPG/MacGPG2 v2.0.14 (Darwin) > >> >>> > >> >>> iEYEARECAAYFAk2qV9gACgkQ4RgdOxQVi0CovACfW20YM0R8uexD5PeHoX6QGZc3 > >> >>> LOwAnR3Iv1tokCpM81T56kSGrLNuniKg > >> >>> =c5Y+ > >> >>> -----END PGP SIGNATURE----- > >> > >> > > From skenny at uchicago.edu Wed Apr 20 12:16:54 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 20 Apr 2011 10:16:54 -0700 Subject: [Swift-devel] changelog and build instructions Message-ID: hi all, fyi, i put the latest changelog and instructions for building the binary release on the developer site https://sites.google.com/site/swiftdevel/home...let me know if you think they should live elsewhere. also, it looks like the "NOTE:" tag was not used much in svn logging...just another quick reminder that, if you want to make sure an svn commit comment makes it into the changelog please preface it with "NOTE:" :) ~sk -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Sat Apr 23 03:00:01 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Sat, 23 Apr 2011 04:00:01 -0400 Subject: [Swift-devel] Website documentation Message-ID: Hello, What is the current process for updating website documentation? I believe the MaintainingSwiftWebContent wiki may be out of date. I am trying to submit some corrections to the Swift tutorial. Previously I could modify it through www/ in SVN. Now I believe it is generated from swift/docs/tutorial.xml for each release. Is this correct? Is there still a cron job that runs daily and updates the site? I am not seeing the changes I made yesterday reflected on the website. Do I need to manually run /ci/www/projects/swift/update.sh? I do not have permissions to read or execute it at the moment. Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sat Apr 23 07:33:27 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 23 Apr 2011 07:33:27 -0500 (CDT) Subject: [Swift-devel] Website documentation In-Reply-To: Message-ID: <435023679.135358.1303562007576.JavaMail.root@zimbra.anl.gov> David, you're not in the vdl2-svn UNIX group, which you need in order to run this. I'll request that you be added, and do an svn-co of the tutorial and run update.sh for you in the meantime. - Mike ----- Original Message ----- Hello, What is the current process for updating website documentation? I believe the MaintainingSwiftWebContent wiki may be out of date. I am trying to submit some corrections to the Swift tutorial. Previously I could modify it through www/ in SVN. Now I believe it is generated from swift/docs/tutorial.xml for each release. Is this correct? Is there still a cron job that runs daily and updates the site? I am not seeing the changes I made yesterday reflected on the website. Do I need to manually run /ci/www/projects/swift/update.sh? I do not have permissions to read or execute it at the moment. Thanks, David _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sat Apr 23 09:14:49 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 23 Apr 2011 09:14:49 -0500 (CDT) Subject: [Swift-devel] Website documentation In-Reply-To: <435023679.135358.1303562007576.JavaMail.root@zimbra.anl.gov> Message-ID: <1458023567.135441.1303568089035.JavaMail.root@zimbra.anl.gov> Looking deeper into this, I see that the permissions in /ci /www/projects/swift do not permit anyone in vdl2-svn to do svn-update and to run update.sh completely for the entire doc set. Justin, I think this occurred when you last worked on this directory. Can you see if update.sh works for you at the moment? That should push David's committed tutorial corrections to the live content. I suggest that rather than correct this, we push ahead updating the entire per-release doc set to asciidoc and then adjust update.sh and the /ci/www/projects/swift directory to match it. David, can you do this under bug 375, and propose the new structure with a README that outlines how all the directories and content-push scripts will be organized? Thanks, Mike ----- Original Message ----- David, you're not in the vdl2-svn UNIX group, which you need in order to run this. I'll request that you be added, and do an svn-co of the tutorial and run update.sh for you in the meantime. - Mike ----- Original Message ----- Hello, What is the current process for updating website documentation? I believe the MaintainingSwiftWebContent wiki may be out of date. I am trying to submit some corrections to the Swift tutorial. Previously I could modify it through www/ in SVN. Now I believe it is generated from swift/docs/tutorial.xml for each release. Is this correct? Is there still a cron job that runs daily and updates the site? I am not seeing the changes I made yesterday reflected on the website. Do I need to manually run /ci/www/projects/swift/update.sh? I do not have permissions to read or execute it at the moment. Thanks, David _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Sat Apr 23 12:00:07 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sat, 23 Apr 2011 12:00:07 -0500 (Central Daylight Time) Subject: [Swift-devel] Website documentation In-Reply-To: <1458023567.135441.1303568089035.JavaMail.root@zimbra.anl.gov> References: <1458023567.135441.1303568089035.JavaMail.root@zimbra.anl.gov> Message-ID: Yeah- I am trying to be more careful with umask. Please send me your error message and I will modify the chmods in the script accordingly. For the new structure, I recommend that we stop using svn to move the content and just push the simple documents. I would be fine with a cp-based method that only works from a CI system. On Sat, 23 Apr 2011, Michael Wilde wrote: > Looking deeper into this, I see that the permissions in /ci > /www/projects/swift do not permit anyone in vdl2-svn to do svn-update > and to run update.sh completely for the entire doc set. > > > Justin, I think this occurred when you last worked on this directory. > Can you see if update.sh works for you at the moment? That should push > David's committed tutorial corrections to the live content. > > > I suggest that rather than correct this, we push ahead updating the > entire per-release doc set to asciidoc and then adjust update.sh and the > /ci/www/projects/swift directory to match it. David, can you do this > under bug 375, and propose the new structure with a README that outlines > how all the directories and content-push scripts will be organized? > > > Thanks, > > > Mike > > > ----- Original Message ----- > > > > David, you're not in the vdl2-svn UNIX group, which you need in order to run this. I'll request that you be added, and do an svn-co of the tutorial and run update.sh for you in the meantime. > > > - Mike > > ----- Original Message ----- > > > Hello, > > > What is the current process for updating website documentation? I believe the MaintainingSwiftWebContent wiki may be out of date. > > > I am trying to submit some corrections to the Swift tutorial. Previously I could modify it through www/ in SVN. Now I believe it is generated from swift/docs/tutorial.xml for each release. Is this correct? Is there still a cron job that runs daily and updates the site? I am not seeing the changes I made yesterday reflected on the website. Do I need to manually run /ci/www/projects/swift/update.sh? I do not have permissions to read or execute it at the moment. > > > Thanks, > David > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- Justin M Wozniak From benc at hawaga.org.uk Sun Apr 24 07:12:48 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 24 Apr 2011 12:12:48 +0000 (GMT) Subject: [Swift-devel] co-ordination languages (CFP) Message-ID: Now there are these things called 'co-ordination languages', which as far as I can tell overlaps a lot with 'workflow languages' ... -- http://www.hawaga.org.uk/ben/ ---------- Forwarded message ---------- Date: Sun, 24 Apr 2011 13:53:36 +0200 From: "Mousavi, M." To: "types-announce at lists.seas.upenn.edu" , "concurrency at tue.nl" , "hol-info at lists.sourceforge.net" , "agda at lists.chalmers.se" , "isabelle-users at cl.cam.ac.uk" Subject: [Agda] CFP: Foundations of Coordination Languages and Software Architectures (Deadline: June 3) *************************************************** The 10th International Workshop on the Foundations of Coordination Languages and Software Architectures (FOCLASA 2011) A Satellite Workshop of CONCUR 2011 Aachen (Germany), September 10, 2011 http://foclasa.lcc.uma.es/ Submission Deadline: June 3rd, 2011 (Abstract: May 27th, 2011) *************************************************** Abstract ====== Computation nowadays is becoming inherently concurrent, either because of characteristics of the hardware (with multicore processors becoming omnipresent) or due to the ubiquitous presence of distributed systems (incarnated in the Internet). Computational systems are therefore typically distributed, concurrent, mobile, and often involve composition of heterogeneous components. To specify and reason about such systems and go beyond the functional correctness proofs, e.g., by supporting reusability and improving maintainability, approaches such as coordination languages and software architecture are recognised as fundamental. The goal of the FOCLASA workshop is to put together researchers and practitioners of the aforementioned fields, to share and identify common problems, and to devise general solutions in the context of coordination languages and software architectures. Topics of interest ============ Topics of interest include (but are not limited to): * Theoretical models (of coordination, of component composition, of open, concurrent, and distributed systems) * Specification, refinement, and analysis of software systems (architectures, patterns and styles, verification of functional and non-functional properties via logics or types) * Languages for interaction, coordination, architectures, and interface definition (syntax and semantics, implementation, usability, domain-specific languages) * Dynamic software architectures (mobile agents, self-organizing/adaptive/reconfigurable systems) * Tools and environments for the development of applications. In particular, practice, experience and methodologies from the following areas are solicited as well: * Service-Oriented computing * Multi-agent systems * Peer-to-peer systems * Grid computing * Component-based systems Invited Talk ======== Joe Armstrong, Ericsson, Sweden. Submissions ======== FOCLASA 2011 is a satellite workshop of the 22nd International Conference on Concurrency Theory (CONCUR 2011). It provides a venue where researchers and practitioners on the topics given below can meet, exchange ideas and problems, identify some of the key and fundamental issues related to coordination languages and software architecture, and explore together and disseminate solutions. Submissions must describe authors' original research work and their results. Description of work-in-progress with concrete results is also encouraged. The contributions should not exceed 15 pages formatted according to the style of the Electronic Proceedings in Theoretical Computer Science (EPTCS), and should be submitted as Portable Document Format (PDF) files using the EasyChair submission site: click here. Important Dates =========== Abstract submission: May 27th, 2011 Paper submission: June 3rd, 2011 Notification: July 4th, 2011 Final version due: July 18th, 2011 Workshop: September 10th, 2011 Submitting an abstract does not put any obligation on the authors to submit a full paper. Abstracts without an accompanying full paper by the paper submission deadline are automatically considered withdrawn; the authors are, however, encouraged to explicitly withdraw their abstract, if they decide not to submit a full paper. All submissions will be reviewed by an international program committee who will make a selection among the submissions based on the novelty, soundness and applicability of the presented ideas and results. Concurrent submission to other venues (conferences, workshops or journal) and submission of papers under consideration elsewhere are not allowed. A printed version of the proceedings will be distributed among participants during the workshop. The proceedings of the workshop will be published as a volume in the Electronic Proceedings in Theoretical Computer Science (EPTCS) series. Participants will give a presentation of their papers in twenty minutes, followed by a ten-minute round of questions and discussion on participants' work. Following the tradition of the past edition, a special issue of an international scientific journal will be devoted to FOCLASA 2011. Selected participants will be invited to submit an extended version of their papers after the workshop. These extended versions will be reviewed by an international program committee, which will decide on their final publication on the special issue. In the last few editions of FOCLASA, a special issue of Science of Computer Programming has been dedicated to this workshop and we plan to devote a special issue of the same journal to FOCLASA 2011. Program Committee Chairs ================== MohammadReza Mousavi Eindhoven University of Technology, The Netherlands Ant?nio Ravara New University of Lisbon, Portugal Program Committee ============= Jonathan Aldrich, Carnegie Mellon University, USA Luis Barbosa, University of Minho, Portugal Bernhard Beckert, Karlsruhe Institute of Technology, Germany Antonio Brogi, University of Pisa, Italy Carlos Canal, University of M?laga, Spain Vittorio Cortellessa, University of L'Aquila, Italy Gregor Goessler, INRIA Grenoble - Rh?ne-Alpes, France Ludovic Henrio, INRIA Sophia Antipolis, France Paola Inverardi, Universit? dell'Aquila, Italy MohammadReza Mousavi, Eindhoven University of Technology, The Netherlands Jaco van de Pol, University of Twente, The Netherlands Ant?nio Ravara, Technical University of Lisbon, Portugal Gwen Sala?n, Grenoble INP - INRIA - LIG, France Carolyn Talcott, SRI International, USA Emilio Tuosto, University of Leicester, UK Mirko Viroli, University of Bologna, Italy_______________________________________________ Agda mailing list Agda at lists.chalmers.se https://lists.chalmers.se/mailman/listinfo/agda From dsk at ci.uchicago.edu Sun Apr 24 09:03:17 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Sun, 24 Apr 2011 09:03:17 -0500 Subject: [Swift-devel] co-ordination languages (CFP) In-Reply-To: References: Message-ID: <6BB69F00-5E34-43FC-80A4-03277526B5C6@ci.uchicago.edu> It's interesting that the word "workflow" itself doesn't appear in the call. Dan On Apr 24, 2011, at 7:12 AM, Ben Clifford wrote: > > Now there are these things called 'co-ordination languages', which as far > as I can tell overlaps a lot with 'workflow languages' ... > > -- > http://www.hawaga.org.uk/ben/ > > ---------- Forwarded message ---------- > Date: Sun, 24 Apr 2011 13:53:36 +0200 > From: "Mousavi, M." > To: "types-announce at lists.seas.upenn.edu" , > "concurrency at tue.nl" , > "hol-info at lists.sourceforge.net" , > "agda at lists.chalmers.se" , > "isabelle-users at cl.cam.ac.uk" > Subject: [Agda] CFP: Foundations of Coordination Languages and Software > Architectures (Deadline: June 3) > > *************************************************** > The 10th International Workshop on the > > Foundations of Coordination Languages and Software Architectures (FOCLASA 2011) > > A Satellite Workshop of CONCUR 2011 > > Aachen (Germany), September 10, 2011 > > http://foclasa.lcc.uma.es/ > > Submission Deadline: June 3rd, 2011 > (Abstract: May 27th, 2011) > *************************************************** > > > Abstract > ====== > > Computation nowadays is becoming inherently concurrent, either because of characteristics of the hardware (with multicore processors becoming omnipresent) or due to the ubiquitous presence of distributed systems (incarnated in the Internet). Computational systems are therefore typically distributed, concurrent, mobile, and often involve composition of heterogeneous components. > > To specify and reason about such systems and go beyond the functional correctness proofs, e.g., by supporting reusability and improving maintainability, approaches such as coordination languages and software architecture are recognised as fundamental. > > The goal of the FOCLASA workshop is to put together researchers and practitioners of the aforementioned fields, to share and identify common problems, and to devise general solutions in the context of coordination languages and software architectures. > > > Topics of interest > ============ > > Topics of interest include (but are not limited to): > > * Theoretical models (of coordination, of component composition, of open, concurrent, and distributed systems) > * Specification, refinement, and analysis of software systems (architectures, patterns and styles, verification of functional and non-functional properties via logics or types) > * Languages for interaction, coordination, architectures, and interface definition (syntax and semantics, implementation, usability, domain-specific languages) > * Dynamic software architectures (mobile agents, self-organizing/adaptive/reconfigurable systems) > * Tools and environments for the development of applications. > > In particular, practice, experience and methodologies from the following areas are solicited as well: > * Service-Oriented computing > * Multi-agent systems > * Peer-to-peer systems > * Grid computing > * Component-based systems > > Invited Talk > ======== > > Joe Armstrong, Ericsson, Sweden. > > Submissions > ======== > > FOCLASA 2011 is a satellite workshop of the 22nd International Conference on Concurrency Theory (CONCUR 2011). It provides a venue where researchers and practitioners on the topics given below can meet, exchange ideas and problems, identify some of the key and fundamental issues related to coordination languages and software architecture, and explore together and disseminate solutions. > > Submissions must describe authors' original research work and their results. Description of work-in-progress with concrete results is also encouraged. The contributions should not exceed 15 pages formatted according to the style of the Electronic Proceedings in Theoretical Computer Science (EPTCS), and should be submitted as Portable Document Format (PDF) files using the EasyChair submission site: click here. > > Important Dates > =========== > > Abstract submission: May 27th, 2011 > > Paper submission: June 3rd, 2011 > > Notification: July 4th, 2011 > > Final version due: July 18th, 2011 > > Workshop: September 10th, 2011 > > Submitting an abstract does not put any obligation on the authors to submit a full paper. Abstracts without an accompanying full paper by the paper submission deadline are automatically considered withdrawn; the authors are, however, encouraged to explicitly withdraw their abstract, if they decide not to submit a full paper. > > All submissions will be reviewed by an international program committee who will make a selection among the submissions based on the novelty, soundness and applicability of the presented ideas and results. Concurrent submission to other venues (conferences, workshops or journal) and submission of papers under consideration elsewhere are not allowed. A printed version of the proceedings will be distributed among participants during the workshop. The proceedings of the workshop will be published as a volume in the Electronic Proceedings in Theoretical Computer Science (EPTCS) series. > > Participants will give a presentation of their papers in twenty minutes, followed by a ten-minute round of questions and discussion on participants' work. > > Following the tradition of the past edition, a special issue of an international scientific journal will be devoted to FOCLASA 2011. Selected participants will be invited to submit an extended version of their papers after the workshop. These extended versions will be reviewed by an international program committee, which will decide on their final publication on the special issue. In the last few editions of FOCLASA, a special issue of Science of Computer Programming has been dedicated to this workshop and we plan to devote a special issue of the same journal to FOCLASA 2011. > > Program Committee Chairs > ================== > > MohammadReza Mousavi > Eindhoven University of Technology, The Netherlands > > > Ant?nio Ravara > New University of Lisbon, Portugal > > Program Committee > ============= > > Jonathan Aldrich, Carnegie Mellon University, USA > Luis Barbosa, University of Minho, Portugal > Bernhard Beckert, Karlsruhe Institute of Technology, Germany > Antonio Brogi, University of Pisa, Italy > Carlos Canal, University of M?laga, Spain > Vittorio Cortellessa, University of L'Aquila, Italy > Gregor Goessler, INRIA Grenoble - Rh?ne-Alpes, France > Ludovic Henrio, INRIA Sophia Antipolis, France > Paola Inverardi, Universit? dell'Aquila, Italy > MohammadReza Mousavi, Eindhoven University of Technology, The Netherlands > Jaco van de Pol, University of Twente, The Netherlands > Ant?nio Ravara, Technical University of Lisbon, Portugal > Gwen Sala?n, Grenoble INP - INRIA - LIG, France > Carolyn Talcott, SRI International, USA > Emilio Tuosto, University of Leicester, UK > Mirko Viroli, University of Bologna, Italy_______________________________________________ > Agda mailing list > Agda at lists.chalmers.se > https://lists.chalmers.se/mailman/listinfo/agda > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From foster at anl.gov Sun Apr 24 16:59:33 2011 From: foster at anl.gov (Ian Foster) Date: Sun, 24 Apr 2011 16:59:33 -0500 Subject: [Swift-devel] co-ordination languages (CFP) In-Reply-To: <6BB69F00-5E34-43FC-80A4-03277526B5C6@ci.uchicago.edu> References: <6BB69F00-5E34-43FC-80A4-03277526B5C6@ci.uchicago.edu> Message-ID: This term was big in the 1980s/early 1990s, around the time of Linda. It seems to have dropped in popularity of late. I've always thought of Swift as a coordination language rather than a workflow language. But I like the term "parallel scripting" best of all. On Apr 24, 2011, at 9:03 AM, Daniel S. Katz wrote: > It's interesting that the word "workflow" itself doesn't appear in the call. > > Dan > > > On Apr 24, 2011, at 7:12 AM, Ben Clifford wrote: > >> >> Now there are these things called 'co-ordination languages', which as far >> as I can tell overlaps a lot with 'workflow languages' ... >> >> -- >> http://www.hawaga.org.uk/ben/ >> >> ---------- Forwarded message ---------- >> Date: Sun, 24 Apr 2011 13:53:36 +0200 >> From: "Mousavi, M." >> To: "types-announce at lists.seas.upenn.edu" , >> "concurrency at tue.nl" , >> "hol-info at lists.sourceforge.net" , >> "agda at lists.chalmers.se" , >> "isabelle-users at cl.cam.ac.uk" >> Subject: [Agda] CFP: Foundations of Coordination Languages and Software >> Architectures (Deadline: June 3) >> >> *************************************************** >> The 10th International Workshop on the >> >> Foundations of Coordination Languages and Software Architectures (FOCLASA 2011) >> >> A Satellite Workshop of CONCUR 2011 >> >> Aachen (Germany), September 10, 2011 >> >> http://foclasa.lcc.uma.es/ >> >> Submission Deadline: June 3rd, 2011 >> (Abstract: May 27th, 2011) >> *************************************************** >> >> >> Abstract >> ====== >> >> Computation nowadays is becoming inherently concurrent, either because of characteristics of the hardware (with multicore processors becoming omnipresent) or due to the ubiquitous presence of distributed systems (incarnated in the Internet). Computational systems are therefore typically distributed, concurrent, mobile, and often involve composition of heterogeneous components. >> >> To specify and reason about such systems and go beyond the functional correctness proofs, e.g., by supporting reusability and improving maintainability, approaches such as coordination languages and software architecture are recognised as fundamental. >> >> The goal of the FOCLASA workshop is to put together researchers and practitioners of the aforementioned fields, to share and identify common problems, and to devise general solutions in the context of coordination languages and software architectures. >> >> >> Topics of interest >> ============ >> >> Topics of interest include (but are not limited to): >> >> * Theoretical models (of coordination, of component composition, of open, concurrent, and distributed systems) >> * Specification, refinement, and analysis of software systems (architectures, patterns and styles, verification of functional and non-functional properties via logics or types) >> * Languages for interaction, coordination, architectures, and interface definition (syntax and semantics, implementation, usability, domain-specific languages) >> * Dynamic software architectures (mobile agents, self-organizing/adaptive/reconfigurable systems) >> * Tools and environments for the development of applications. >> >> In particular, practice, experience and methodologies from the following areas are solicited as well: >> * Service-Oriented computing >> * Multi-agent systems >> * Peer-to-peer systems >> * Grid computing >> * Component-based systems >> >> Invited Talk >> ======== >> >> Joe Armstrong, Ericsson, Sweden. >> >> Submissions >> ======== >> >> FOCLASA 2011 is a satellite workshop of the 22nd International Conference on Concurrency Theory (CONCUR 2011). It provides a venue where researchers and practitioners on the topics given below can meet, exchange ideas and problems, identify some of the key and fundamental issues related to coordination languages and software architecture, and explore together and disseminate solutions. >> >> Submissions must describe authors' original research work and their results. Description of work-in-progress with concrete results is also encouraged. The contributions should not exceed 15 pages formatted according to the style of the Electronic Proceedings in Theoretical Computer Science (EPTCS), and should be submitted as Portable Document Format (PDF) files using the EasyChair submission site: click here. >> >> Important Dates >> =========== >> >> Abstract submission: May 27th, 2011 >> >> Paper submission: June 3rd, 2011 >> >> Notification: July 4th, 2011 >> >> Final version due: July 18th, 2011 >> >> Workshop: September 10th, 2011 >> >> Submitting an abstract does not put any obligation on the authors to submit a full paper. Abstracts without an accompanying full paper by the paper submission deadline are automatically considered withdrawn; the authors are, however, encouraged to explicitly withdraw their abstract, if they decide not to submit a full paper. >> >> All submissions will be reviewed by an international program committee who will make a selection among the submissions based on the novelty, soundness and applicability of the presented ideas and results. Concurrent submission to other venues (conferences, workshops or journal) and submission of papers under consideration elsewhere are not allowed. A printed version of the proceedings will be distributed among participants during the workshop. The proceedings of the workshop will be published as a volume in the Electronic Proceedings in Theoretical Computer Science (EPTCS) series. >> >> Participants will give a presentation of their papers in twenty minutes, followed by a ten-minute round of questions and discussion on participants' work. >> >> Following the tradition of the past edition, a special issue of an international scientific journal will be devoted to FOCLASA 2011. Selected participants will be invited to submit an extended version of their papers after the workshop. These extended versions will be reviewed by an international program committee, which will decide on their final publication on the special issue. In the last few editions of FOCLASA, a special issue of Science of Computer Programming has been dedicated to this workshop and we plan to devote a special issue of the same journal to FOCLASA 2011. >> >> Program Committee Chairs >> ================== >> >> MohammadReza Mousavi >> Eindhoven University of Technology, The Netherlands >> >> >> Ant?nio Ravara >> New University of Lisbon, Portugal >> >> Program Committee >> ============= >> >> Jonathan Aldrich, Carnegie Mellon University, USA >> Luis Barbosa, University of Minho, Portugal >> Bernhard Beckert, Karlsruhe Institute of Technology, Germany >> Antonio Brogi, University of Pisa, Italy >> Carlos Canal, University of M?laga, Spain >> Vittorio Cortellessa, University of L'Aquila, Italy >> Gregor Goessler, INRIA Grenoble - Rh?ne-Alpes, France >> Ludovic Henrio, INRIA Sophia Antipolis, France >> Paola Inverardi, Universit? dell'Aquila, Italy >> MohammadReza Mousavi, Eindhoven University of Technology, The Netherlands >> Jaco van de Pol, University of Twente, The Netherlands >> Ant?nio Ravara, Technical University of Lisbon, Portugal >> Gwen Sala?n, Grenoble INP - INRIA - LIG, France >> Carolyn Talcott, SRI International, USA >> Emilio Tuosto, University of Leicester, UK >> Mirko Viroli, University of Bologna, Italy_______________________________________________ >> Agda mailing list >> Agda at lists.chalmers.se >> https://lists.chalmers.se/mailman/listinfo/agda >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Daniel S. Katz > University of Chicago > (773) 834-7186 (voice) > (773) 834-3700 (fax) > d.katz at ieee.org or dsk at ci.uchicago.edu > http://www.ci.uchicago.edu/~dsk/ > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From iraicu at cs.iit.edu Mon Apr 25 15:26:29 2011 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Mon, 25 Apr 2011 15:26:29 -0500 Subject: [Swift-devel] Call for Participation: ACM High-Performance Parallel and Distributed Computing (HPDC) in San Jose, California, June 8-11, 2011 Message-ID: <4DB5D8F5.7060105@cs.iit.edu> Call for Participation The 20th International ACM Symposium on High-Performance Parallel and Distributed Computing 2011 San Jose, California, June 8-11, 2011 http://www.hpdc.org/2011/index.php HPDC is the premier computer science conference for presenting new results relating to large scale high performance parallel and distributed systems used in both science and industry. For twenty years, HPDC has been at the center of new discoveries in clusters, grids, clouds, and parallel and multicore computers. HPDC is sponsored by the Association for Computing Machinery and the conference proceedings are published by the ACM Digital Library HPDC 2011 will take place June 8-11 at the San Jose Convention Center in San Jose, California, in conjunction with the ACM Federated Computer Conference (FCRC). The conference overview (http://www.hpdc.org/2011/overview.php), program (http://www.hpdc.org/2011/program.php), and registration (http://www.hpdc.org/2011/venue.php) are now available. The early bird registration period end May 16th, 2011. HPDC workshops will take place on Wednesday, June 8th; for more information, see http://www.hpdc.org/2011/workshops.php:* * * 3DAPAS: Workshop on Dynamic Distributed Data-Intensive Applications, Programming Abstractions, and Systems * DIDC: The Fourth International Workshop on Data-Intensive Distributed Computing * ECMLS: The Second International Emerging Computational Methods for the Life Sciences Workshop * LSAP: Workshop on Large-Scale System and Application Performance * MapReduce: The Second International Workshop on MapReduce and its Applications * ScienceCloud: 2nd Workshop on Scientific Cloud Computing * VTDC: Virtual Technologies in Distributed Computing A limited number of travel assistance grants will be available for students. Applications are due by May 1, 2011. For more information, please see http://www.hpdc.org/2011/studenttravel.php. To apply, please complete this form (https://spreadsheets.google.com/viewform?formkey=dHM1ZEhjeDE0R3A5ai1LSERjN2pQcnc6MQ) before May 1st. HPDC and SIGMETRICS are running a student research poster session which are still accepting submissions until April 27th. For more information on the student research poster session, please see http://www.hpdc.org/2011/cfp-student.php. -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Mon Apr 25 15:50:41 2011 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Mon, 25 Apr 2011 15:50:41 -0500 Subject: [Swift-devel] CFP: ACM HPDC/SIGMETRICS Student Research Posters Session, due April 27th 2011 Message-ID: <4DB5DEA1.90506@cs.iit.edu> Call for Extended Abstracts HPDC/SIGMETRICS 2011 Student Research Posters Session San Jose, California, June 8, 2011 http://www.hpdc.org/2011/cfp-student.php http://www.sigmetrics.org/sigmetrics2011/student_posters.shtml HPDC and SIGMETRICS encourage to submit poster proposals aimed at showcasing the "work-in-progress" of students attending the two conferences. The goal of the poster session is to present students' research, provide an opportunity for informal discussion and facilitate interaction between members of the two communities. Posters will be presented on the evening of Wednesday, June 8th. The primary author(s) of the poster must be a student. Posters will be reviewed by members of the HPDC and SIGMETRICS organization committees. At the event, posters must be presented by a student. Authors of accepted papers must not submit a poster of the work they present in the conference or affiliated workshops. Topics of interest include, but are not limited to: # Applications of parallel, distributed, and cloud computing # Resource management, scheduling and load-balancing # Performance modeling, simulation, measurement and prediction # Network architectures, topology and routing, wireless networks # Systems, networks, and architectures for high end computing # Energy efficient computing systems # Data intensive computing # I/O, file systems, and data management # Parallel and multicore issues and opportunities # Virtualization of machines, networks, and storage # Sensor networks, mobile devices, real-time systems # Social networks, Internet servers, multimedia systems, web services Both analytical and empirical studies are encouraged to be submitted to the student poster session. Submission Guidelines: Student authors are asked to submit 2 pages abstracts in standard ACM format (for templates, see http://www.acm.org/sigs/publications/proceedings-templates) before the deadline below. All submissions should be in PDF format. Please submit your abstract at https://cmt.research.microsoft.com/HPDC_SIG2011 The accepted abstracts will appear in a special issue of ACM Performance Evaluation Review (PER). Important Dates # Submission deadline April 27, 2011 # Acceptance notifications May 9, 2011 Student Poster Session Chairs # Giuliano Casale, Imperial College London # Ioan Raicu, Illinois Institute of Technology & Argonne National Laboratory Student Poster Session Program Committee * Augustin Chaintreau, Columbia University * Abhishek Chandra, University of Minnesota * Alex Iosup - EWI, Delft University * Keith Jackson, Lawrence Berkeley National Laboratory * Manish Parashar, Rutgers University * Y. C. Tay, National University of Singapore -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Mon Apr 25 16:34:54 2011 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Mon, 25 Apr 2011 16:34:54 -0500 Subject: [Swift-devel] CFP: Springer Journal of Grid Computing, Special Issue on Data Intensive Computing in the Clouds Message-ID: <4DB5E8FE.9090402@cs.iit.edu> Special Issue on Data Intensive Computing in the Clouds Springer Journal of Grid Computing http://datasys.cs.iit.edu/events/JGC-DataCloud-2012/index.html Applications and experiments in all areas of science are becoming increasingly complex and more demanding in terms of their computational and data requirements. Some applications generate data volumes reaching hundreds of terabytes and even petabytes. As scientific applications become more data intensive, the management of data resources and dataflow between the storage and compute resources is becoming the main bottleneck. Analyzing, visualizing, and disseminating these large data sets has become a major challenge and data intensive computing is now considered as the "fourth paradigm" in scientific discovery after empirical, theoretical, and computational scientific approaches. The Special Issue on Data Intensive Computing in the Clouds will provide the scientific community a dedicated forum, within the prestigious Springer Journal of Grid Computing, for presenting new research, development, and deployment efforts in running data-intensive computing workloads on Cloud Computing infrastructures. This special issue will focus on the use of cloud-based technologies to meet the new data intensive scientific challenges that are not well served by the current supercomputers, grids or compute-intensive clouds. We believe this venue will be an excellent place to help the community define the current state, determine future goals, and present architectures and services for future clouds supporting data intensive computing. Topics * Data-intensive cloud computing applications, characteristics, challenges * Case studies of data intensive computing in the clouds * Performance evaluation of data clouds, data grids, and data centers * Energy-efficient data cloud design and management * Data placement, scheduling, and interoperability in the clouds * Accountability, QoS, and SLAs * Data privacy and protection in a public cloud environment * Distributed file systems for clouds * Data streaming and parallelization * New programming models for data-intensive cloud computing * Scalability issues in clouds * Social computing and massively social gaming * 3D Internet and implications * Future research challenges in data-intensive cloud computing Important Dates * Papers Due: July 15, 2011 * First Round Decisions: October 1, 2011 * Major Revisions if needed: November 1, 2011 * Second Round Decisions: December 1, 2011 * Minor Revisions if needed: January 13, 2012 * Final Decision: February 1, 2012 * Publication Date: June 2012 Paper Submission Authors are invited to submit original and unpublished technical papers. All submissions will be peer-reviewed and judged on correctness, originality, technical strength, significance, quality of presentation, and relevance to the special issue topics of interest. Submitted papers may not have appeared in or be under consideration for another workshop, conference or a journal, nor may they be under review or submitted to another forum during the review process. Submitted papers may not exceed 20 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (1" margins), including figures, tables, and references; note that accepted papers will likely be between 15 to 20 pages, depending on a variety of factors; for more information for preparing the submitted papers, please see http://www.springer.com/computer/communication+networks/journal/10723, under "Instructions for Authors". The final papers (PDF format) must be submitted online at http://grid.edmgr.com/ before the deadline of July 15th, 2011 at 11:59PM PST. For any questions on the submission process, please email the guest editors at jgc-datacloud-2012 at datasys.cs.iit.edu . Guest Editors Special Issue Guest Editors * Tevfik Kosar (tkosar at buffalo.edu ), University at Buffalo * Ioan Raicu (iraicu at cs.iit.edu ), Illinois Institute of Technology & Argonne National Laboratory Editors-in-Chief * Peter Kacsuk, Hungarian Academy of Sciences * Ian Foster, University of Chicago & Argonne National Laboratory -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Mon Apr 25 21:09:24 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Mon, 25 Apr 2011 22:09:24 -0400 Subject: [Swift-devel] Website documentation In-Reply-To: References: <1458023567.135441.1303568089035.JavaMail.root@zimbra.anl.gov> Message-ID: Hello, I created a first draft of the Swift tutorial in asciidoc. Here are the documents it created: HTML: http://www.ci.uchicago.edu/~davidk/tutorial.html PDF: http://www.ci.uchicago.edu/~davidk/tutorial.pdf ASCIIDOC: http://www.ci.uchicago.edu/~davidk/tutorial.txt There are still a few things I need to tweak, but overall I'm finding asciidoc pretty nice to work with. The ticket that was created for this, 375, mentions that these documents should be created during ant dist. But I think most people do not have asciidoc installed by default, and including it with Swift will not be possible due to GPL licensing issues. Asciidoc itself is GPL, as well as the various packages it depends on - source-highlight for code highlighting, and dblatex for generating PDFs. I think Justin's suggestion of keeping document generation limited to a CI machine is the way to go. If that makes sense to you all, I will start on a script to generate the documents, and continue working on manually converting the rest of the documentation material into asciidoc format. David On Sat, Apr 23, 2011 at 1:00 PM, Justin M Wozniak wrote: > > Yeah- I am trying to be more careful with umask. Please send me your error > message and I will modify the chmods in the script accordingly. > > For the new structure, I recommend that we stop using svn to move the > content and just push the simple documents. I would be fine with a cp-based > method that only works from a CI system. > > > On Sat, 23 Apr 2011, Michael Wilde wrote: > > Looking deeper into this, I see that the permissions in /ci >> /www/projects/swift do not permit anyone in vdl2-svn to do svn-update and to >> run update.sh completely for the entire doc set. >> >> >> Justin, I think this occurred when you last worked on this directory. Can >> you see if update.sh works for you at the moment? That should push David's >> committed tutorial corrections to the live content. >> >> >> I suggest that rather than correct this, we push ahead updating the entire >> per-release doc set to asciidoc and then adjust update.sh and the >> /ci/www/projects/swift directory to match it. David, can you do this under >> bug 375, and propose the new structure with a README that outlines how all >> the directories and content-push scripts will be organized? >> >> >> Thanks, >> >> >> Mike >> >> >> ----- Original Message ----- >> >> >> >> David, you're not in the vdl2-svn UNIX group, which you need in order to >> run this. I'll request that you be added, and do an svn-co of the tutorial >> and run update.sh for you in the meantime. >> >> >> - Mike >> >> ----- Original Message ----- >> >> >> Hello, >> >> >> What is the current process for updating website documentation? I believe >> the MaintainingSwiftWebContent wiki may be out of date. >> >> >> I am trying to submit some corrections to the Swift tutorial. Previously I >> could modify it through www/ in SVN. Now I believe it is generated from >> swift/docs/tutorial.xml for each release. Is this correct? Is there still a >> cron job that runs daily and updates the site? I am not seeing the changes I >> made yesterday reflected on the website. Do I need to manually run >> /ci/www/projects/swift/update.sh? I do not have permissions to read or >> execute it at the moment. >> >> >> Thanks, >> David >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> >> >> >> > -- > Justin M Wozniak > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon Apr 25 21:24:01 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 25 Apr 2011 21:24:01 -0500 Subject: [Swift-devel] Website documentation In-Reply-To: References: <1458023567.135441.1303568089035.JavaMail.root@zimbra.anl.gov> Message-ID: <6375E689-2C2B-4521-A1BB-B10F67B702D4@gmail.com> David, How did you get the java-like source highlighting working in Swift code? I was trying similar syntax but getting this error on asciidoc compile: filter non-zero exit code: source-highlight -f xhtml -s java: returned 127 -- Ketan On Apr 25, 2011, at 9:09 PM, David Kelly wrote: > Hello, > > I created a first draft of the Swift tutorial in asciidoc. Here are the documents it created: > > HTML: http://www.ci.uchicago.edu/~davidk/tutorial.html > PDF: http://www.ci.uchicago.edu/~davidk/tutorial.pdf > ASCIIDOC: http://www.ci.uchicago.edu/~davidk/tutorial.txt > > There are still a few things I need to tweak, but overall I'm finding asciidoc pretty nice to work with. > > The ticket that was created for this, 375, mentions that these documents should be created during ant dist. But I think most people do not have asciidoc installed by default, and including it with Swift will not be possible due to GPL licensing issues. Asciidoc itself is GPL, as well as the various packages it depends on - source-highlight for code highlighting, and dblatex for generating PDFs. > > I think Justin's suggestion of keeping document generation limited to a CI machine is the way to go. If that makes sense to you all, I will start on a script to generate the documents, and continue working on manually converting the rest of the documentation material into asciidoc format. > > David > > On Sat, Apr 23, 2011 at 1:00 PM, Justin M Wozniak wrote: > > Yeah- I am trying to be more careful with umask. Please send me your error message and I will modify the chmods in the script accordingly. > > For the new structure, I recommend that we stop using svn to move the content and just push the simple documents. I would be fine with a cp-based method that only works from a CI system. > > > On Sat, 23 Apr 2011, Michael Wilde wrote: > > Looking deeper into this, I see that the permissions in /ci /www/projects/swift do not permit anyone in vdl2-svn to do svn-update and to run update.sh completely for the entire doc set. > > > Justin, I think this occurred when you last worked on this directory. Can you see if update.sh works for you at the moment? That should push David's committed tutorial corrections to the live content. > > > I suggest that rather than correct this, we push ahead updating the entire per-release doc set to asciidoc and then adjust update.sh and the /ci/www/projects/swift directory to match it. David, can you do this under bug 375, and propose the new structure with a README that outlines how all the directories and content-push scripts will be organized? > > > Thanks, > > > Mike > > > ----- Original Message ----- > > > > David, you're not in the vdl2-svn UNIX group, which you need in order to run this. I'll request that you be added, and do an svn-co of the tutorial and run update.sh for you in the meantime. > > > - Mike > > ----- Original Message ----- > > > Hello, > > > What is the current process for updating website documentation? I believe the MaintainingSwiftWebContent wiki may be out of date. > > > I am trying to submit some corrections to the Swift tutorial. Previously I could modify it through www/ in SVN. Now I believe it is generated from swift/docs/tutorial.xml for each release. Is this correct? Is there still a cron job that runs daily and updates the site? I am not seeing the changes I made yesterday reflected on the website. Do I need to manually run /ci/www/projects/swift/update.sh? I do not have permissions to read or execute it at the moment. > > > Thanks, > David > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > Justin M Wozniak > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Mon Apr 25 21:43:32 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Mon, 25 Apr 2011 22:43:32 -0400 Subject: [Swift-devel] Website documentation In-Reply-To: <6375E689-2C2B-4521-A1BB-B10F67B702D4@gmail.com> References: <1458023567.135441.1303568089035.JavaMail.root@zimbra.anl.gov> <6375E689-2C2B-4521-A1BB-B10F67B702D4@gmail.com> Message-ID: To get that working, I downloaded and installed GNU source-highlight. It was kind of a pain because to get compiled because it didn't work with the default boost libraries. Feel free to copy it from ~davidk/source-highlight. I set the code style to Java.. and for the most part that seems to work reasonably well on swiftscript code. At some point I should create an actual language definition file for Swift. David On Mon, Apr 25, 2011 at 10:24 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > David, > > How did you get the java-like source highlighting working in Swift code? I > was trying similar syntax but getting this error on asciidoc compile: > > filter non-zero exit code: source-highlight -f xhtml -s java: returned 127 > > -- > Ketan > > On Apr 25, 2011, at 9:09 PM, David Kelly wrote: > > Hello, > > I created a first draft of the Swift tutorial in asciidoc. Here are the > documents it created: > > HTML: http://www.ci.uchicago.edu/~davidk/tutorial.html > PDF: http://www.ci.uchicago.edu/~davidk/tutorial.pdf > ASCIIDOC: http://www.ci.uchicago.edu/~davidk/tutorial.txt > > There are still a few things I need to tweak, but overall I'm finding > asciidoc pretty nice to work with. > > The ticket that was created for this, 375, mentions that these documents > should be created during ant dist. But I think most people do not have > asciidoc installed by default, and including it with Swift will not be > possible due to GPL licensing issues. Asciidoc itself is GPL, as well as the > various packages it depends on - source-highlight for code highlighting, and > dblatex for generating PDFs. > > I think Justin's suggestion of keeping document generation limited to a CI > machine is the way to go. If that makes sense to you all, I will start on a > script to generate the documents, and continue working on manually > converting the rest of the documentation material into asciidoc format. > > David > > On Sat, Apr 23, 2011 at 1:00 PM, Justin M Wozniak wrote: > >> >> Yeah- I am trying to be more careful with umask. Please send me your >> error message and I will modify the chmods in the script accordingly. >> >> For the new structure, I recommend that we stop using svn to move the >> content and just push the simple documents. I would be fine with a cp-based >> method that only works from a CI system. >> >> >> On Sat, 23 Apr 2011, Michael Wilde wrote: >> >> Looking deeper into this, I see that the permissions in /ci >>> /www/projects/swift do not permit anyone in vdl2-svn to do svn-update and to >>> run update.sh completely for the entire doc set. >>> >>> >>> Justin, I think this occurred when you last worked on this directory. Can >>> you see if update.sh works for you at the moment? That should push David's >>> committed tutorial corrections to the live content. >>> >>> >>> I suggest that rather than correct this, we push ahead updating the >>> entire per-release doc set to asciidoc and then adjust update.sh and the >>> /ci/www/projects/swift directory to match it. David, can you do this under >>> bug 375, and propose the new structure with a README that outlines how all >>> the directories and content-push scripts will be organized? >>> >>> >>> Thanks, >>> >>> >>> Mike >>> >>> >>> ----- Original Message ----- >>> >>> >>> >>> David, you're not in the vdl2-svn UNIX group, which you need in order to >>> run this. I'll request that you be added, and do an svn-co of the tutorial >>> and run update.sh for you in the meantime. >>> >>> >>> - Mike >>> >>> ----- Original Message ----- >>> >>> >>> Hello, >>> >>> >>> What is the current process for updating website documentation? I believe >>> the MaintainingSwiftWebContent wiki may be out of date. >>> >>> >>> I am trying to submit some corrections to the Swift tutorial. Previously >>> I could modify it through www/ in SVN. Now I believe it is generated from >>> swift/docs/tutorial.xml for each release. Is this correct? Is there still a >>> cron job that runs daily and updates the site? I am not seeing the changes I >>> made yesterday reflected on the website. Do I need to manually run >>> /ci/www/projects/swift/update.sh? I do not have permissions to read or >>> execute it at the moment. >>> >>> >>> Thanks, >>> David >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> >>> >>> >> -- >> Justin M Wozniak >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Tue Apr 26 03:12:01 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Tue, 26 Apr 2011 04:12:01 -0400 Subject: [Swift-devel] Suggestions for improvements to gensites Message-ID: A few people had started discussing this off-line, but I thought it would be useful to continue the conversation on swift-devel. Does anyone have any ideas for improving or simplifying gensites? The way it works right now is described at http://www.ci.uchicago.edu/wiki/bin/view/SWFT/GenSites. I think this is powerful and flexible, but not necessarily easy for new users. My idea to simplify the process is to allow new users to run 'gensites' with no command line options and without having to first create any configuration files. When they run it with no arguments, they are presented with a series of prompts. It will print out a list of all the templates available. They should see the commonly used machines right away. It will then ask for the name of their project, the queue and work directory. Once it has the info, It will print a working sites.xml. The user would copy/paste that and run swift with -sites.file. That step would hopefully be well documented in the new user guide. The prompts would be similar to what swiftconfig had, but a little more simplified. Regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Tue Apr 26 07:17:07 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Tue, 26 Apr 2011 07:17:07 -0500 Subject: [Swift-devel] co-ordination languages (CFP) In-Reply-To: References: <6BB69F00-5E34-43FC-80A4-03277526B5C6@ci.uchicago.edu> Message-ID: I ended up using the term coordination language to describe Swift in my masters paper - its a useful way to think about the problem. If anyone is interested, this paper is quite helpful in motivating the term: http://portal.acm.org/citation.cfm?id=129635 - Tim On Sun, Apr 24, 2011 at 4:59 PM, Ian Foster wrote: > This term was big in the 1980s/early 1990s, around the time of Linda. It > seems to have dropped in popularity of late. > > I've always thought of Swift as a coordination language rather than a > workflow language. But I like the term "parallel scripting" best of all. > > > > > > > On Apr 24, 2011, at 9:03 AM, Daniel S. Katz wrote: > > > It's interesting that the word "workflow" itself doesn't appear in the > call. > > > > Dan > > > > > > On Apr 24, 2011, at 7:12 AM, Ben Clifford wrote: > > > >> > >> Now there are these things called 'co-ordination languages', which as > far > >> as I can tell overlaps a lot with 'workflow languages' ... > >> > >> -- > >> http://www.hawaga.org.uk/ben/ > >> > >> ---------- Forwarded message ---------- > >> Date: Sun, 24 Apr 2011 13:53:36 +0200 > >> From: "Mousavi, M." > >> To: "types-announce at lists.seas.upenn.edu" < > types-announce at lists.seas.upenn.edu>, > >> "concurrency at tue.nl" , > >> "hol-info at lists.sourceforge.net" , > >> "agda at lists.chalmers.se" , > >> "isabelle-users at cl.cam.ac.uk" > >> Subject: [Agda] CFP: Foundations of Coordination Languages and Software > >> Architectures (Deadline: June 3) > >> > >> *************************************************** > >> The 10th International Workshop on the > >> > >> Foundations of Coordination Languages and Software Architectures > (FOCLASA 2011) > >> > >> A Satellite Workshop of CONCUR 2011 > >> > >> Aachen (Germany), September 10, 2011 > >> > >> http://foclasa.lcc.uma.es/ > >> > >> Submission Deadline: June 3rd, 2011 > >> (Abstract: May 27th, 2011) > >> *************************************************** > >> > >> > >> Abstract > >> ====== > >> > >> Computation nowadays is becoming inherently concurrent, either because > of characteristics of the hardware (with multicore processors becoming > omnipresent) or due to the ubiquitous presence of distributed systems > (incarnated in the Internet). Computational systems are therefore typically > distributed, concurrent, mobile, and often involve composition of > heterogeneous components. > >> > >> To specify and reason about such systems and go beyond the functional > correctness proofs, e.g., by supporting reusability and improving > maintainability, approaches such as coordination languages and software > architecture are recognised as fundamental. > >> > >> The goal of the FOCLASA workshop is to put together researchers and > practitioners of the aforementioned fields, to share and identify common > problems, and to devise general solutions in the context of coordination > languages and software architectures. > >> > >> > >> Topics of interest > >> ============ > >> > >> Topics of interest include (but are not limited to): > >> > >> * Theoretical models (of coordination, of component composition, > of open, concurrent, and distributed systems) > >> * Specification, refinement, and analysis of software systems > (architectures, patterns and styles, verification of functional and > non-functional properties via logics or types) > >> * Languages for interaction, coordination, architectures, and > interface definition (syntax and semantics, implementation, usability, > domain-specific languages) > >> * Dynamic software architectures (mobile agents, > self-organizing/adaptive/reconfigurable systems) > >> * Tools and environments for the development of applications. > >> > >> In particular, practice, experience and methodologies from the following > areas are solicited as well: > >> * Service-Oriented computing > >> * Multi-agent systems > >> * Peer-to-peer systems > >> * Grid computing > >> * Component-based systems > >> > >> Invited Talk > >> ======== > >> > >> Joe Armstrong, Ericsson, Sweden. > >> > >> Submissions > >> ======== > >> > >> FOCLASA 2011 is a satellite workshop of the 22nd International > Conference on Concurrency Theory (CONCUR 2011). It provides a venue where > researchers and practitioners on the topics given below can meet, exchange > ideas and problems, identify some of the key and fundamental issues related > to coordination languages and software architecture, and explore together > and disseminate solutions. > >> > >> Submissions must describe authors' original research work and their > results. Description of work-in-progress with concrete results is also > encouraged. The contributions should not exceed 15 pages formatted according > to the style of the Electronic Proceedings in Theoretical Computer Science > (EPTCS), and should be submitted as Portable Document Format (PDF) files > using the EasyChair submission site: click here. > >> > >> Important Dates > >> =========== > >> > >> Abstract submission: May 27th, 2011 > >> > >> Paper submission: June 3rd, 2011 > >> > >> Notification: July 4th, 2011 > >> > >> Final version due: July 18th, 2011 > >> > >> Workshop: September 10th, 2011 > >> > >> Submitting an abstract does not put any obligation on the authors to > submit a full paper. Abstracts without an accompanying full paper by the > paper submission deadline are automatically considered withdrawn; the > authors are, however, encouraged to explicitly withdraw their abstract, if > they decide not to submit a full paper. > >> > >> All submissions will be reviewed by an international program committee > who will make a selection among the submissions based on the novelty, > soundness and applicability of the presented ideas and results. Concurrent > submission to other venues (conferences, workshops or journal) and > submission of papers under consideration elsewhere are not allowed. A > printed version of the proceedings will be distributed among participants > during the workshop. The proceedings of the workshop will be published as a > volume in the Electronic Proceedings in Theoretical Computer Science (EPTCS) > series. > >> > >> Participants will give a presentation of their papers in twenty > minutes, followed by a ten-minute round of questions and discussion on > participants' work. > >> > >> Following the tradition of the past edition, a special issue of an > international scientific journal will be devoted to FOCLASA 2011. Selected > participants will be invited to submit an extended version of their papers > after the workshop. These extended versions will be reviewed by an > international program committee, which will decide on their final > publication on the special issue. In the last few editions of FOCLASA, a > special issue of Science of Computer Programming has been dedicated to this > workshop and we plan to devote a special issue of the same journal to > FOCLASA 2011. > >> > >> Program Committee Chairs > >> ================== > >> > >> MohammadReza Mousavi > >> Eindhoven University of Technology, The Netherlands > >> > >> > >> Ant?nio Ravara > >> New University of Lisbon, Portugal > >> > >> Program Committee > >> ============= > >> > >> Jonathan Aldrich, Carnegie Mellon University, USA > >> Luis Barbosa, University of Minho, Portugal > >> Bernhard Beckert, Karlsruhe Institute of Technology, Germany > >> Antonio Brogi, University of Pisa, Italy > >> Carlos Canal, University of M?laga, Spain > >> Vittorio Cortellessa, University of L'Aquila, Italy > >> Gregor Goessler, INRIA Grenoble - Rh?ne-Alpes, France > >> Ludovic Henrio, INRIA Sophia Antipolis, France > >> Paola Inverardi, Universit? dell'Aquila, Italy > >> MohammadReza Mousavi, Eindhoven University of Technology, The > Netherlands > >> Jaco van de Pol, University of Twente, The Netherlands > >> Ant?nio Ravara, Technical University of Lisbon, Portugal > >> Gwen Sala?n, Grenoble INP - INRIA - LIG, France > >> Carolyn Talcott, SRI International, USA > >> Emilio Tuosto, University of Leicester, UK > >> Mirko Viroli, University of Bologna, > Italy_______________________________________________ > >> Agda mailing list > >> Agda at lists.chalmers.se > >> https://lists.chalmers.se/mailman/listinfo/agda > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Daniel S. Katz > > University of Chicago > > (773) 834-7186 (voice) > > (773) 834-3700 (fax) > > d.katz at ieee.org or dsk at ci.uchicago.edu > > http://www.ci.uchicago.edu/~dsk/ > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Tue Apr 26 09:43:21 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 26 Apr 2011 09:43:21 -0500 Subject: [Swift-devel] Website documentation In-Reply-To: References: <1458023567.135441.1303568089035.JavaMail.root@zimbra.anl.gov> <6375E689-2C2B-4521-A1BB-B10F67B702D4@gmail.com> Message-ID: <0E66005F-E34F-4092-8C60-DD5FFC063C2B@gmail.com> Works for me, thanks. --Ketan On Apr 25, 2011, at 9:43 PM, David Kelly wrote: > To get that working, I downloaded and installed GNU source-highlight. It was kind of a pain because to get compiled because it didn't work with the default boost libraries. Feel free to copy it from ~davidk/source-highlight. I set the code style to Java.. and for the most part that seems to work reasonably well on swiftscript code. At some point I should create an actual language definition file for Swift. > > David > > On Mon, Apr 25, 2011 at 10:24 PM, Ketan Maheshwari wrote: > David, > > How did you get the java-like source highlighting working in Swift code? I was trying similar syntax but getting this error on asciidoc compile: > > filter non-zero exit code: source-highlight -f xhtml -s java: returned 127 > > -- > Ketan > > On Apr 25, 2011, at 9:09 PM, David Kelly wrote: > >> Hello, >> >> I created a first draft of the Swift tutorial in asciidoc. Here are the documents it created: >> >> HTML: http://www.ci.uchicago.edu/~davidk/tutorial.html >> PDF: http://www.ci.uchicago.edu/~davidk/tutorial.pdf >> ASCIIDOC: http://www.ci.uchicago.edu/~davidk/tutorial.txt >> >> There are still a few things I need to tweak, but overall I'm finding asciidoc pretty nice to work with. >> >> The ticket that was created for this, 375, mentions that these documents should be created during ant dist. But I think most people do not have asciidoc installed by default, and including it with Swift will not be possible due to GPL licensing issues. Asciidoc itself is GPL, as well as the various packages it depends on - source-highlight for code highlighting, and dblatex for generating PDFs. >> >> I think Justin's suggestion of keeping document generation limited to a CI machine is the way to go. If that makes sense to you all, I will start on a script to generate the documents, and continue working on manually converting the rest of the documentation material into asciidoc format. >> >> David >> >> On Sat, Apr 23, 2011 at 1:00 PM, Justin M Wozniak wrote: >> >> Yeah- I am trying to be more careful with umask. Please send me your error message and I will modify the chmods in the script accordingly. >> >> For the new structure, I recommend that we stop using svn to move the content and just push the simple documents. I would be fine with a cp-based method that only works from a CI system. >> >> >> On Sat, 23 Apr 2011, Michael Wilde wrote: >> >> Looking deeper into this, I see that the permissions in /ci /www/projects/swift do not permit anyone in vdl2-svn to do svn-update and to run update.sh completely for the entire doc set. >> >> >> Justin, I think this occurred when you last worked on this directory. Can you see if update.sh works for you at the moment? That should push David's committed tutorial corrections to the live content. >> >> >> I suggest that rather than correct this, we push ahead updating the entire per-release doc set to asciidoc and then adjust update.sh and the /ci/www/projects/swift directory to match it. David, can you do this under bug 375, and propose the new structure with a README that outlines how all the directories and content-push scripts will be organized? >> >> >> Thanks, >> >> >> Mike >> >> >> ----- Original Message ----- >> >> >> >> David, you're not in the vdl2-svn UNIX group, which you need in order to run this. I'll request that you be added, and do an svn-co of the tutorial and run update.sh for you in the meantime. >> >> >> - Mike >> >> ----- Original Message ----- >> >> >> Hello, >> >> >> What is the current process for updating website documentation? I believe the MaintainingSwiftWebContent wiki may be out of date. >> >> >> I am trying to submit some corrections to the Swift tutorial. Previously I could modify it through www/ in SVN. Now I believe it is generated from swift/docs/tutorial.xml for each release. Is this correct? Is there still a cron job that runs daily and updates the site? I am not seeing the changes I made yesterday reflected on the website. Do I need to manually run /ci/www/projects/swift/update.sh? I do not have permissions to read or execute it at the moment. >> >> >> Thanks, >> David >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> >> >> >> >> -- >> Justin M Wozniak >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Tue Apr 26 15:24:10 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Tue, 26 Apr 2011 16:24:10 -0400 Subject: [Swift-devel] Weekly ticket report Message-ID: Hello, Mike has asked me to send an email to swift-devel every Tuesday to discuss the top three bugs I have been working on the for the week. Bug 375 - Convert user guide and tutorial to asciidoc ------------------------------------------------- I have compiled asciidoc and all of the dependent programs it uses to convert to text, perform code highlighting, and convert to PDF. I have manually converted the tutorial to asciidoc format and posted the results for others to see. Still need to convert the user guide and write a script to automate the conversion process. Bug 331 - Add Basic Regression Tests for the twice-each bug ----------------------------------------------------------- I have added a series of tests to more thoroughly test foreach. It includes testing foreach multiple times with arrays that have been generated in a variety of ways. Committed to trunk 4418. Bug 309 - Neatify gensites ------------------------- The goal of this is to make gensites easier to use for new users. I have discussed some ideas for how this can be accomplished. I have sent an email to the list hoping to get some additional ideas on how gensites can be improved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Tue Apr 26 15:45:10 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 26 Apr 2011 15:45:10 -0500 Subject: [Swift-devel] Re: transfer-only workload worked!! (was Re: resuming discussion on the hung processes...) In-Reply-To: <1303850202.16676.0.camel@blabla2.none> References: <1303843049.11809.0.camel@blabla2.none> <1236699138.144999.1303847691652.JavaMail.root@zimbra.anl.gov> <1303850202.16676.0.camel@blabla2.none> Message-ID: Hi Mihael, This is on the latest stable branch. Here's the dump: 2011-04-25 11:45:35 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): "Attach Listener" daemon prio=10 tid=0x0000000044cd2800 nid=0x4c5f waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Condor provider queue poller" daemon prio=10 tid=0x00002aabb86eb800 nid=0x3c7a sleeping[0x0000000043c1f000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:76) at java.lang.Thread.run(Thread.java:619) Locked ownable synchronizers: - None "Scheduler" prio=10 tid=0x00002aabb8763800 nid=0x34c0 in Object.wait() [0x0000000041678000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.globus.cog.karajan.scheduler.LateBindingScheduler.sleep(LateBindingScheduler.java:305) at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:258) - locked <0x00002aaab7ca50a0> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler) Locked ownable synchronizers: - None "Progress ticker" daemon prio=10 tid=0x00002aabb86d5000 nid=0x2c3f waiting on condition [0x0000000041577000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.run(RuntimeStats.java:137) Locked ownable synchronizers: - None "Restart Log Sync" daemon prio=10 tid=0x0000000044f15800 nid=0x2c38 in Object.wait() [0x000000004290c000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab7c0c808> (a org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) at java.lang.Object.wait(Object.java:485) at org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread.run(SyncThread.java:45) - locked <0x00002aaab7c0c808> (a org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) Locked ownable synchronizers: - None "Overloaded Host Monitor" daemon prio=10 tid=0x00002aabb857b800 nid=0x2c33 sleeping[0x000000004280b000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:47) Locked ownable synchronizers: - None "Timer-0" daemon prio=10 tid=0x00000000451a0000 nid=0x2c32 in Object.wait() [0x000000004270a000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:509) - locked <0x00002aaab7d01f10> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) Locked ownable synchronizers: - None "pool-1-thread-4" prio=10 tid=0x00000000452d9800 nid=0x2c17 in Object.wait() [0x0000000042508000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab3b8df68> (a edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) at java.lang.Object.wait(Object.java:485) at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) - locked <0x00002aaab3b8df68> (a edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) at java.lang.Thread.run(Thread.java:619) Locked ownable synchronizers: - None "pool-1-thread-3" prio=10 tid=0x0000000044ffc800 nid=0x2c16 in Object.wait() [0x0000000042407000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab3b8df68> (a edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) at java.lang.Object.wait(Object.java:485) at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) - locked <0x00002aaab3b8df68> (a edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) at java.lang.Thread.run(Thread.java:619) Locked ownable synchronizers: - None "pool-1-thread-2" prio=10 tid=0x00002aabc024a800 nid=0x2c15 in Object.wait() [0x0000000042306000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab3b8df68> (a edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) at java.lang.Object.wait(Object.java:485) at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) - locked <0x00002aaab3b8df68> (a edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) at java.lang.Thread.run(Thread.java:619) Locked ownable synchronizers: - None "pool-1-thread-1" prio=10 tid=0x00002aabb85f4000 nid=0x2c14 in Object.wait() [0x0000000042205000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab3b8df68> (a edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) at java.lang.Object.wait(Object.java:485) at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) - locked <0x00002aaab3b8df68> (a edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) at java.lang.Thread.run(Thread.java:619) Locked ownable synchronizers: - None "Low Memory Detector" daemon prio=10 tid=0x0000000044c72000 nid=0x2c12 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "CompilerThread1" daemon prio=10 tid=0x0000000044c70000 nid=0x2c11 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "CompilerThread0" daemon prio=10 tid=0x0000000044c6a800 nid=0x2c10 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Signal Dispatcher" daemon prio=10 tid=0x0000000044c68800 nid=0x2c0f runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Finalizer" daemon prio=10 tid=0x0000000044c44000 nid=0x2c0e in Object.wait() [0x0000000041acb000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Locked ownable synchronizers: - None "Reference Handler" daemon prio=10 tid=0x0000000044c42000 nid=0x2c0d in Object.wait() [0x000000004039b000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock) Locked ownable synchronizers: - None "main" prio=10 tid=0x0000000044be0000 nid=0x2c07 in Object.wait() [0x0000000040977000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab76845f0> (a org.griphyn.vdl.karajan.VDL2ExecutionContext) at java.lang.Object.wait(Object.java:485) at org.globus.cog.karajan.workflow.ExecutionContext.waitFor(ExecutionContext.java:261) - locked <0x00002aaab76845f0> (a org.griphyn.vdl.karajan.VDL2ExecutionContext) at org.griphyn.vdl.karajan.Loader.main(Loader.java:197) Locked ownable synchronizers: - None "VM Thread" prio=10 tid=0x0000000044c3d800 nid=0x2c0c runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000044bf3000 nid=0x2c08 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000044bf5000 nid=0x2c09 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000044bf7000 nid=0x2c0a runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000044bf8800 nid=0x2c0b runnable "VM Periodic Task Thread" prio=10 tid=0x0000000044c7d000 nid=0x2c13 waiting on condition JNI global references: 1093 Here's the last few lines of the resumefile: ... ... 3-199:peak.36!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_36.bsa 13-199:peak.33!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_33.bsa 13-199:peak.34!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_34.bsa 13-199:peak.39!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_39.bsa 13-199:peak.37 2011/4/26 Mihael Hategan : > On Tue, 2011-04-26 at 15:31 -0500, Allan Espinosa wrote: > >> > - does it run repeatedly without any user-visible errors? >> >> There's this problem where Swift is waiting to finish writing to the >> resume file. ?But that's another issue that I would like to defer for >> now. > > Can you send me a stack dump of that situation? From jonmon at utexas.edu Tue Apr 26 15:54:48 2011 From: jonmon at utexas.edu (Jonathan S Monette) Date: Tue, 26 Apr 2011 15:54:48 -0500 Subject: [Swift-devel] Suggestions for improvements to gensites Message-ID: While I like the idea where the user is asked a series of questions and creates the sites.xml file from that, the way I use gensites in my run script will break(or not run how I want it). Currently I run gensites from my run script so I can dynamically change some if the parameters in the sites.xml file without opening the file before each run. So maybe gensites can be kept as the backend to this query but something a new user probably won't run by himself. On Apr 26, 2011 3:12 AM, "David Kelly" wrote: -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Apr 26 15:58:22 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 26 Apr 2011 13:58:22 -0700 Subject: [Swift-devel] Re: transfer-only workload worked!! (was Re: resuming discussion on the hung processes...) In-Reply-To: References: <1303843049.11809.0.camel@blabla2.none> <1236699138.144999.1303847691652.JavaMail.root@zimbra.anl.gov> <1303850202.16676.0.camel@blabla2.none> Message-ID: <1303851502.17414.1.camel@blabla2.none> I think the issue is different. The thread that writes to the restart log is idle. Can I take a look at the swift log? On Tue, 2011-04-26 at 15:45 -0500, Allan Espinosa wrote: > Hi Mihael, > > This is on the latest stable branch. Here's the dump: > > 2011-04-25 11:45:35 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): > > "Attach Listener" daemon prio=10 tid=0x0000000044cd2800 nid=0x4c5f > waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > > Locked ownable synchronizers: > - None > > "Condor provider queue poller" daemon prio=10 tid=0x00002aabb86eb800 > nid=0x3c7a sleeping[0x0000000043c1f000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:76) > at java.lang.Thread.run(Thread.java:619) > > Locked ownable synchronizers: > - None > > "Scheduler" prio=10 tid=0x00002aabb8763800 nid=0x34c0 in Object.wait() > [0x0000000041678000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at org.globus.cog.karajan.scheduler.LateBindingScheduler.sleep(LateBindingScheduler.java:305) > at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:258) > - locked <0x00002aaab7ca50a0> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler) > > Locked ownable synchronizers: > - None > > "Progress ticker" daemon prio=10 tid=0x00002aabb86d5000 nid=0x2c3f > waiting on condition [0x0000000041577000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.run(RuntimeStats.java:137) > > Locked ownable synchronizers: > - None > > "Restart Log Sync" daemon prio=10 tid=0x0000000044f15800 nid=0x2c38 in > Object.wait() [0x000000004290c000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aaab7c0c808> (a > org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) > at java.lang.Object.wait(Object.java:485) > at org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread.run(SyncThread.java:45) > - locked <0x00002aaab7c0c808> (a > org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) > > Locked ownable synchronizers: > - None > > "Overloaded Host Monitor" daemon prio=10 tid=0x00002aabb857b800 > nid=0x2c33 sleeping[0x000000004280b000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:47) > > Locked ownable synchronizers: > - None > > "Timer-0" daemon prio=10 tid=0x00000000451a0000 nid=0x2c32 in > Object.wait() [0x000000004270a000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.util.TimerThread.mainLoop(Timer.java:509) > - locked <0x00002aaab7d01f10> (a java.util.TaskQueue) > at java.util.TimerThread.run(Timer.java:462) > > Locked ownable synchronizers: > - None > > "pool-1-thread-4" prio=10 tid=0x00000000452d9800 nid=0x2c17 in > Object.wait() [0x0000000042508000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aaab3b8df68> (a > edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) > at java.lang.Object.wait(Object.java:485) > at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) > - locked <0x00002aaab3b8df68> (a > edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) > at java.lang.Thread.run(Thread.java:619) > > Locked ownable synchronizers: > - None > > "pool-1-thread-3" prio=10 tid=0x0000000044ffc800 nid=0x2c16 in > Object.wait() [0x0000000042407000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aaab3b8df68> (a > edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) > at java.lang.Object.wait(Object.java:485) > at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) > - locked <0x00002aaab3b8df68> (a > edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) > at java.lang.Thread.run(Thread.java:619) > > Locked ownable synchronizers: > - None > > "pool-1-thread-2" prio=10 tid=0x00002aabc024a800 nid=0x2c15 in > Object.wait() [0x0000000042306000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aaab3b8df68> (a > edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) > at java.lang.Object.wait(Object.java:485) > at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) > - locked <0x00002aaab3b8df68> (a > edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) > at java.lang.Thread.run(Thread.java:619) > > Locked ownable synchronizers: > - None > > "pool-1-thread-1" prio=10 tid=0x00002aabb85f4000 nid=0x2c14 in > Object.wait() [0x0000000042205000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aaab3b8df68> (a > edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) > at java.lang.Object.wait(Object.java:485) > at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) > - locked <0x00002aaab3b8df68> (a > edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) > at java.lang.Thread.run(Thread.java:619) > > Locked ownable synchronizers: > - None > > "Low Memory Detector" daemon prio=10 tid=0x0000000044c72000 nid=0x2c12 > runnable [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > > Locked ownable synchronizers: > - None > > "CompilerThread1" daemon prio=10 tid=0x0000000044c70000 nid=0x2c11 > waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > > Locked ownable synchronizers: > - None > > "CompilerThread0" daemon prio=10 tid=0x0000000044c6a800 nid=0x2c10 > waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > > Locked ownable synchronizers: > - None > > "Signal Dispatcher" daemon prio=10 tid=0x0000000044c68800 nid=0x2c0f > runnable [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > > Locked ownable synchronizers: > - None > > "Finalizer" daemon prio=10 tid=0x0000000044c44000 nid=0x2c0e in > Object.wait() [0x0000000041acb000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) > - locked <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) > > Locked ownable synchronizers: > - None > > "Reference Handler" daemon prio=10 tid=0x0000000044c42000 nid=0x2c0d > in Object.wait() [0x000000004039b000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock) > at java.lang.Object.wait(Object.java:485) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) > - locked <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock) > > Locked ownable synchronizers: > - None > > "main" prio=10 tid=0x0000000044be0000 nid=0x2c07 in Object.wait() > [0x0000000040977000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aaab76845f0> (a > org.griphyn.vdl.karajan.VDL2ExecutionContext) > at java.lang.Object.wait(Object.java:485) > at org.globus.cog.karajan.workflow.ExecutionContext.waitFor(ExecutionContext.java:261) > - locked <0x00002aaab76845f0> (a org.griphyn.vdl.karajan.VDL2ExecutionContext) > at org.griphyn.vdl.karajan.Loader.main(Loader.java:197) > > Locked ownable synchronizers: > - None > > "VM Thread" prio=10 tid=0x0000000044c3d800 nid=0x2c0c runnable > > "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000044bf3000 > nid=0x2c08 runnable > > "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000044bf5000 > nid=0x2c09 runnable > > "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000044bf7000 > nid=0x2c0a runnable > > "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000044bf8800 > nid=0x2c0b runnable > > "VM Periodic Task Thread" prio=10 tid=0x0000000044c7d000 nid=0x2c13 > waiting on condition > > JNI global references: 1093 > > > Here's the last few lines of the resumefile: > ... > ... > 3-199:peak.36!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_36.bsa > 13-199:peak.33!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_33.bsa > 13-199:peak.34!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_34.bsa > 13-199:peak.39!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_39.bsa > 13-199:peak.37 > > 2011/4/26 Mihael Hategan : > > On Tue, 2011-04-26 at 15:31 -0500, Allan Espinosa wrote: > > > >> > - does it run repeatedly without any user-visible errors? > >> > >> There's this problem where Swift is waiting to finish writing to the > >> resume file. But that's another issue that I would like to defer for > >> now. > > > > Can you send me a stack dump of that situation? From jonmon at utexas.edu Tue Apr 26 16:04:33 2011 From: jonmon at utexas.edu (Jonathan S Monette) Date: Tue, 26 Apr 2011 16:04:33 -0500 Subject: [Swift-devel] Suggestions for improvements to gensites In-Reply-To: References: Message-ID: Yes. That option would be nice. On Apr 26, 2011 3:58 PM, "Ketan Maheshwari" wrote: > > I think, a '-i' option for interactive building of sites.xml when required might be useful for both dynamic and interactive usage of gensites. --Ketan > >> While I like the idea where the user is asked a series of questions and creates the sites.xml file from that, the way I use gensites in my run script will break(or not run how I want it). Currently I run gensites from my run script so I can dynamically change some if the parameters in the sites.xml file without opening the file before each run. So maybe gensites can be kept as the backend to this query but something a new user probably won't run by himself. >> >> On Apr 26, 2011 3:12 AM, "David Kelly" wrote: >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Tue Apr 26 16:05:34 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 26 Apr 2011 16:05:34 -0500 Subject: [Swift-devel] hanging resumes (was Re: transfer-only workload worked) Message-ID: The log is in /home/aespinosa/workflows/cybershake/archive-runs/test-completed/postproc-20110422-2320-mcusga23.log The log reports that the stageout for the remaining job finished. 2011/4/26 Mihael Hategan : > I think the issue is different. The thread that writes to the restart > log is idle. > > Can I take a look at the swift log? > > On Tue, 2011-04-26 at 15:45 -0500, Allan Espinosa wrote: >> Hi Mihael, >> >> This is on the latest stable branch. ?Here's the dump: >> >> 2011-04-25 11:45:35 >> Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): >> >> "Attach Listener" daemon prio=10 tid=0x0000000044cd2800 nid=0x4c5f >> waiting on condition [0x0000000000000000] >> ? ?java.lang.Thread.State: RUNNABLE >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Condor provider queue poller" daemon prio=10 tid=0x00002aabb86eb800 >> nid=0x3c7a sleeping[0x0000000043c1f000] >> ? ?java.lang.Thread.State: TIMED_WAITING (sleeping) >> ? ? ? at java.lang.Thread.sleep(Native Method) >> ? ? ? at org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:76) >> ? ? ? at java.lang.Thread.run(Thread.java:619) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Scheduler" prio=10 tid=0x00002aabb8763800 nid=0x34c0 in Object.wait() >> [0x0000000041678000] >> ? ?java.lang.Thread.State: TIMED_WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? at org.globus.cog.karajan.scheduler.LateBindingScheduler.sleep(LateBindingScheduler.java:305) >> ? ? ? at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:258) >> ? ? ? - locked <0x00002aaab7ca50a0> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Progress ticker" daemon prio=10 tid=0x00002aabb86d5000 nid=0x2c3f >> waiting on condition [0x0000000041577000] >> ? ?java.lang.Thread.State: TIMED_WAITING (sleeping) >> ? ? ? at java.lang.Thread.sleep(Native Method) >> ? ? ? at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.run(RuntimeStats.java:137) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Restart Log Sync" daemon prio=10 tid=0x0000000044f15800 nid=0x2c38 in >> Object.wait() [0x000000004290c000] >> ? ?java.lang.Thread.State: WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? - waiting on <0x00002aaab7c0c808> (a >> org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) >> ? ? ? at java.lang.Object.wait(Object.java:485) >> ? ? ? at org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread.run(SyncThread.java:45) >> ? ? ? - locked <0x00002aaab7c0c808> (a >> org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Overloaded Host Monitor" daemon prio=10 tid=0x00002aabb857b800 >> nid=0x2c33 sleeping[0x000000004280b000] >> ? ?java.lang.Thread.State: TIMED_WAITING (sleeping) >> ? ? ? at java.lang.Thread.sleep(Native Method) >> ? ? ? at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:47) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Timer-0" daemon prio=10 tid=0x00000000451a0000 nid=0x2c32 in >> Object.wait() [0x000000004270a000] >> ? ?java.lang.Thread.State: TIMED_WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? at java.util.TimerThread.mainLoop(Timer.java:509) >> ? ? ? - locked <0x00002aaab7d01f10> (a java.util.TaskQueue) >> ? ? ? at java.util.TimerThread.run(Timer.java:462) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "pool-1-thread-4" prio=10 tid=0x00000000452d9800 nid=0x2c17 in >> Object.wait() [0x0000000042508000] >> ? ?java.lang.Thread.State: WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? - waiting on <0x00002aaab3b8df68> (a >> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) >> ? ? ? at java.lang.Object.wait(Object.java:485) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) >> ? ? ? - locked <0x00002aaab3b8df68> (a >> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) >> ? ? ? at java.lang.Thread.run(Thread.java:619) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "pool-1-thread-3" prio=10 tid=0x0000000044ffc800 nid=0x2c16 in >> Object.wait() [0x0000000042407000] >> ? ?java.lang.Thread.State: WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? - waiting on <0x00002aaab3b8df68> (a >> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) >> ? ? ? at java.lang.Object.wait(Object.java:485) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) >> ? ? ? - locked <0x00002aaab3b8df68> (a >> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) >> ? ? ? at java.lang.Thread.run(Thread.java:619) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "pool-1-thread-2" prio=10 tid=0x00002aabc024a800 nid=0x2c15 in >> Object.wait() [0x0000000042306000] >> ? ?java.lang.Thread.State: WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? - waiting on <0x00002aaab3b8df68> (a >> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) >> ? ? ? at java.lang.Object.wait(Object.java:485) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) >> ? ? ? - locked <0x00002aaab3b8df68> (a >> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) >> ? ? ? at java.lang.Thread.run(Thread.java:619) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "pool-1-thread-1" prio=10 tid=0x00002aabb85f4000 nid=0x2c14 in >> Object.wait() [0x0000000042205000] >> ? ?java.lang.Thread.State: WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? - waiting on <0x00002aaab3b8df68> (a >> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) >> ? ? ? at java.lang.Object.wait(Object.java:485) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315) >> ? ? ? - locked <0x00002aaab3b8df68> (a >> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) >> ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667) >> ? ? ? at java.lang.Thread.run(Thread.java:619) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Low Memory Detector" daemon prio=10 tid=0x0000000044c72000 nid=0x2c12 >> runnable [0x0000000000000000] >> ? ?java.lang.Thread.State: RUNNABLE >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "CompilerThread1" daemon prio=10 tid=0x0000000044c70000 nid=0x2c11 >> waiting on condition [0x0000000000000000] >> ? ?java.lang.Thread.State: RUNNABLE >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "CompilerThread0" daemon prio=10 tid=0x0000000044c6a800 nid=0x2c10 >> waiting on condition [0x0000000000000000] >> ? ?java.lang.Thread.State: RUNNABLE >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Signal Dispatcher" daemon prio=10 tid=0x0000000044c68800 nid=0x2c0f >> runnable [0x0000000000000000] >> ? ?java.lang.Thread.State: RUNNABLE >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Finalizer" daemon prio=10 tid=0x0000000044c44000 nid=0x2c0e in >> Object.wait() [0x0000000041acb000] >> ? ?java.lang.Thread.State: WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? - waiting on <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock) >> ? ? ? at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) >> ? ? ? - locked <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock) >> ? ? ? at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) >> ? ? ? at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "Reference Handler" daemon prio=10 tid=0x0000000044c42000 nid=0x2c0d >> in Object.wait() [0x000000004039b000] >> ? ?java.lang.Thread.State: WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? - waiting on <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock) >> ? ? ? at java.lang.Object.wait(Object.java:485) >> ? ? ? at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) >> ? ? ? - locked <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "main" prio=10 tid=0x0000000044be0000 nid=0x2c07 in Object.wait() >> [0x0000000040977000] >> ? ?java.lang.Thread.State: WAITING (on object monitor) >> ? ? ? at java.lang.Object.wait(Native Method) >> ? ? ? - waiting on <0x00002aaab76845f0> (a >> org.griphyn.vdl.karajan.VDL2ExecutionContext) >> ? ? ? at java.lang.Object.wait(Object.java:485) >> ? ? ? at org.globus.cog.karajan.workflow.ExecutionContext.waitFor(ExecutionContext.java:261) >> ? ? ? - locked <0x00002aaab76845f0> (a org.griphyn.vdl.karajan.VDL2ExecutionContext) >> ? ? ? at org.griphyn.vdl.karajan.Loader.main(Loader.java:197) >> >> ? ?Locked ownable synchronizers: >> ? ? ? - None >> >> "VM Thread" prio=10 tid=0x0000000044c3d800 nid=0x2c0c runnable >> >> "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000044bf3000 >> nid=0x2c08 runnable >> >> "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000044bf5000 >> nid=0x2c09 runnable >> >> "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000044bf7000 >> nid=0x2c0a runnable >> >> "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000044bf8800 >> nid=0x2c0b runnable >> >> "VM Periodic Task Thread" prio=10 tid=0x0000000044c7d000 nid=0x2c13 >> waiting on condition >> >> JNI global references: 1093 >> >> >> Here's the last few lines of the resumefile: >> ... >> ... >> 3-199:peak.36!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_36.bsa >> 13-199:peak.33!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_33.bsa >> 13-199:peak.34!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_34.bsa >> 13-199:peak.39!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_39.bsa >> 13-199:peak.37 >> >> 2011/4/26 Mihael Hategan : >> > On Tue, 2011-04-26 at 15:31 -0500, Allan Espinosa wrote: >> > >> >> > - does it run repeatedly without any user-visible errors? >> >> >> >> There's this problem where Swift is waiting to finish writing to the >> >> resume file. ?But that's another issue that I would like to defer for >> >> now. >> > >> > Can you send me a stack dump of that situation? > > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From ketancmaheshwari at gmail.com Tue Apr 26 15:58:31 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 26 Apr 2011 15:58:31 -0500 Subject: [Swift-devel] Suggestions for improvements to gensites In-Reply-To: References: Message-ID: I think, a '-i' option for interactive building of sites.xml when required might be useful for both dynamic and interactive usage of gensites. --Ketan > While I like the idea where the user is asked a series of questions and creates the sites.xml file from that, the way I use gensites in my run script will break(or not run how I want it). Currently I run gensites from my run script so I can dynamically change some if the parameters in the sites.xml file without opening the file before each run. So maybe gensites can be kept as the backend to this query but something a new user probably won't run by himself. > > On Apr 26, 2011 3:12 AM, "David Kelly" wrote: > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Apr 27 13:13:36 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 27 Apr 2011 13:13:36 -0500 (CDT) Subject: [Swift-devel] Meeting today - 4PM? Message-ID: <437066270.148929.1303928016093.JavaMail.root@zimbra.anl.gov> I can't join todays meeting, but hope that you will hold one. - Mike From wozniak at mcs.anl.gov Wed Apr 27 15:11:28 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 27 Apr 2011 15:11:28 -0500 (CDT) Subject: [Swift-devel] Meeting today - 4PM? In-Reply-To: <437066270.148929.1303928016093.JavaMail.root@zimbra.anl.gov> References: <437066270.148929.1303928016093.JavaMail.root@zimbra.anl.gov> Message-ID: I'll be on at that time. On Wed, 27 Apr 2011, Michael Wilde wrote: > I can't join todays meeting, but hope that you will hold one. > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From skenny at uchicago.edu Wed Apr 27 15:30:37 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 27 Apr 2011 13:30:37 -0700 Subject: [Swift-devel] Meeting today - 4PM? In-Reply-To: References: <437066270.148929.1303928016093.JavaMail.root@zimbra.anl.gov> Message-ID: ditto On Wed, Apr 27, 2011 at 1:11 PM, Justin M Wozniak wrote: > > I'll be on at that time. > > > On Wed, 27 Apr 2011, Michael Wilde wrote: > > I can't join todays meeting, but hope that you will hold one. >> >> - Mike >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > -- > Justin M Wozniak > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Wed Apr 27 15:38:30 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 27 Apr 2011 15:38:30 -0500 Subject: [Swift-devel] Meeting today - 4PM? In-Reply-To: References: <437066270.148929.1303928016093.JavaMail.root@zimbra.anl.gov> Message-ID: <5B6EC1CB-6270-478C-81A7-8600E66D715A@gmail.com> I would be there too. --Ketan On Apr 27, 2011, at 3:30 PM, Sarah Kenny wrote: > ditto > > On Wed, Apr 27, 2011 at 1:11 PM, Justin M Wozniak wrote: > > I'll be on at that time. > > > On Wed, 27 Apr 2011, Michael Wilde wrote: > > I can't join todays meeting, but hope that you will hold one. > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > -- > Justin M Wozniak > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Thu Apr 28 10:14:30 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 10:14:30 -0500 Subject: [Swift-devel] ssh:pbs to beagle Message-ID: Hello, Some context: I am trying to submit a big run on Beagle using swift + coasters. However, a previous run is already underway on beagle. So, there are two difficulties running a new run from its login node: 1. Running another swift from the same jvm will result in chaos on the logs (As far as I know, please correct me if this is not the case anymore) 2. Login node is already under load because of my running previous big run. /context So, I am now trying to submit this big run from a remote host (bridled). I know this has been done on PADS using ssh:pbs, provider coaster. I tried the similar approach on a trial swift script but getting error. Following is the error message: [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) RunID: 20110428-1002-c8rvqhe6 Progress: The application "cat" is not available in your tc.data catalog Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException Final status: Failed:1 The following errors have occurred: 1. The application "cat" is not available in your tc.data catalog Attached are my .swift, sites.xml and tc.data files. Could someone indicate if what I am doing is doable and if so how can I correctly configure my sites and tc setup. Thanks. Ketan -------------- next part -------------- A non-text attachment was scrubbed... Name: catsn.swift Type: application/octet-stream Size: 227 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: tc.data Type: application/octet-stream Size: 155 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: beagle-ssh-pbs-coaster.xml Type: application/xml Size: 906 bytes Desc: not available URL: -------------- next part -------------- From wilde at mcs.anl.gov Thu Apr 28 10:20:14 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 10:20:14 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: Message-ID: <610227401.1139.1304004014857.JavaMail.root@zimbra.anl.gov> The pool name in your sites file is pads-remote-pbs-coasters-ssh but you used pbs in your tc.data. - Mike ----- Original Message ----- > Hello, > > Some context: > I am trying to submit a big run on Beagle using swift + coasters. > However, a previous run is already underway on beagle. So, there are > two difficulties running a new run from its login node: > > 1. Running another swift from the same jvm will result in chaos on the > logs (As far as I know, please correct me if this is not the case > anymore) > > 2. Login node is already under load because of my running previous big > run. > > /context > > So, I am now trying to submit this big run from a remote host > (bridled). I know this has been done on PADS using ssh:pbs, provider > coaster. I tried the similar approach on a trial swift script but > getting error. > > Following is the error message: > > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file > beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > locally) > > RunID: 20110428-1002-c8rvqhe6 > Progress: > The application "cat" is not available in your tc.data catalog > Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException > Final status: Failed:1 > The following errors have occurred: > 1. The application "cat" is not available in your tc.data catalog > > > Attached are my .swift, sites.xml and tc.data files. > > Could someone indicate if what I am doing is doable and if so how can > I correctly configure my sites and tc setup. > > Thanks. > Ketan > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Apr 28 10:27:23 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 10:27:23 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <610227401.1139.1304004014857.JavaMail.root@zimbra.anl.gov> Message-ID: <40965548.1193.1304004443666.JavaMail.root@zimbra.anl.gov> Ketan, please file and fix a bugzilla ticket to add to the end of the message: > 1. The application "cat" is not available in your tc.data catalog the text "...at any site/pool in your sites file." Would that have made it clear to you what the problem was? - Mike ----- Original Message ----- > The pool name in your sites file is pads-remote-pbs-coasters-ssh but > you used pbs in your tc.data. > > - Mike > > ----- Original Message ----- > > Hello, > > > > Some context: > > I am trying to submit a big run on Beagle using swift + coasters. > > However, a previous run is already underway on beagle. So, there are > > two difficulties running a new run from its login node: > > > > 1. Running another swift from the same jvm will result in chaos on > > the > > logs (As far as I know, please correct me if this is not the case > > anymore) > > > > 2. Login node is already under load because of my running previous > > big > > run. > > > > /context > > > > So, I am now trying to submit this big run from a remote host > > (bridled). I know this has been done on PADS using ssh:pbs, provider > > coaster. I tried the similar approach on a trial swift script but > > getting error. > > > > Following is the error message: > > > > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > -sites.file > > beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > modified > > locally) > > > > RunID: 20110428-1002-c8rvqhe6 > > Progress: > > The application "cat" is not available in your tc.data catalog > > Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException > > Final status: Failed:1 > > The following errors have occurred: > > 1. The application "cat" is not available in your tc.data catalog > > > > > > Attached are my .swift, sites.xml and tc.data files. > > > > Could someone indicate if what I am doing is doable and if so how > > can > > I correctly configure my sites and tc setup. > > > > Thanks. > > Ketan > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Apr 28 10:30:16 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 10:30:16 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <610227401.1139.1304004014857.JavaMail.root@zimbra.anl.gov> References: <610227401.1139.1304004014857.JavaMail.root@zimbra.anl.gov> Message-ID: <9401E398-4B74-4816-8D2F-4392AA782055@gmail.com> Thanks, I made the change. However, now, I am getting the following on my stderr =========== [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) RunID: 20110428-1022-n9s0k0e0 Progress: [ketan] Progress: Initializing site shared directory:1 [ketan] Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 ======== And from the log it seems some network transmission has failed: 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending SSH_MSG_SERVICE_REQUEST 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon Received SSH_MSG_SERVICE_ACCEPT 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The Transport Protocol thread failed java.io.IOException: The socket is EOF at com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) at com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) at com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) at com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) at com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) at java.lang.Thread.run(Thread.java:662) Any clues? Ketan On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > The pool name in your sites file is pads-remote-pbs-coasters-ssh but you used pbs in your tc.data. > > - Mike > > ----- Original Message ----- >> Hello, >> >> Some context: >> I am trying to submit a big run on Beagle using swift + coasters. >> However, a previous run is already underway on beagle. So, there are >> two difficulties running a new run from its login node: >> >> 1. Running another swift from the same jvm will result in chaos on the >> logs (As far as I know, please correct me if this is not the case >> anymore) >> >> 2. Login node is already under load because of my running previous big >> run. >> >> /context >> >> So, I am now trying to submit this big run from a remote host >> (bridled). I know this has been done on PADS using ssh:pbs, provider >> coaster. I tried the similar approach on a trial swift script but >> getting error. >> >> Following is the error message: >> >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified >> locally) >> >> RunID: 20110428-1002-c8rvqhe6 >> Progress: >> The application "cat" is not available in your tc.data catalog >> Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException >> Final status: Failed:1 >> The following errors have occurred: >> 1. The application "cat" is not available in your tc.data catalog >> >> >> Attached are my .swift, sites.xml and tc.data files. >> >> Could someone indicate if what I am doing is doable and if so how can >> I correctly configure my sites and tc setup. >> >> Thanks. >> Ketan >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From wilde at mcs.anl.gov Thu Apr 28 11:19:15 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 11:19:15 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <9401E398-4B74-4816-8D2F-4392AA782055@gmail.com> Message-ID: <360740419.1626.1304007555965.JavaMail.root@zimbra.anl.gov> Have you already run a simple hellow-world swift test from communicado to bridled to make sure you have ssh configured correctly? I would do that first. Im not sure if an ssh problem explains what you show below, or not. - Mike ----- Original Message ----- > Thanks, I made the change. However, now, I am getting the following on > my stderr > > > =========== > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file > beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > locally) > > RunID: 20110428-1022-n9s0k0e0 > Progress: > [ketan] > Progress: Initializing site shared directory:1 > [ketan] Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > ======== > > And from the log it seems some network transmission has failed: > > 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending > SSH_MSG_SERVICE_REQUEST > 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon Received > SSH_MSG_SERVICE_ACCEPT > 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > Transport Protocol thread failed > java.io.IOException: The socket is EOF > at > com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > at > com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > at > com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > at > com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > at > com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > at java.lang.Thread.run(Thread.java:662) > > > Any clues? > Ketan > > > On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > > > The pool name in your sites file is pads-remote-pbs-coasters-ssh but > > you used pbs in your tc.data. > > > > - Mike > > > > ----- Original Message ----- > >> Hello, > >> > >> Some context: > >> I am trying to submit a big run on Beagle using swift + coasters. > >> However, a previous run is already underway on beagle. So, there > >> are > >> two difficulties running a new run from its login node: > >> > >> 1. Running another swift from the same jvm will result in chaos on > >> the > >> logs (As far as I know, please correct me if this is not the case > >> anymore) > >> > >> 2. Login node is already under load because of my running previous > >> big > >> run. > >> > >> /context > >> > >> So, I am now trying to submit this big run from a remote host > >> (bridled). I know this has been done on PADS using ssh:pbs, > >> provider > >> coaster. I tried the similar approach on a trial swift script but > >> getting error. > >> > >> Following is the error message: > >> > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >> -sites.file > >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >> modified > >> locally) > >> > >> RunID: 20110428-1002-c8rvqhe6 > >> Progress: > >> The application "cat" is not available in your tc.data catalog > >> Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException > >> Final status: Failed:1 > >> The following errors have occurred: > >> 1. The application "cat" is not available in your tc.data catalog > >> > >> > >> Attached are my .swift, sites.xml and tc.data files. > >> > >> Could someone indicate if what I am doing is doable and if so how > >> can > >> I correctly configure my sites and tc setup. > >> > >> Thanks. > >> Ketan > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Apr 28 11:47:52 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 11:47:52 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <360740419.1626.1304007555965.JavaMail.root@zimbra.anl.gov> References: <360740419.1626.1304007555965.JavaMail.root@zimbra.anl.gov> Message-ID: It does look like an ssh problem. I am getting the same stderr and log messages on trying to communicate from Bridled to Communicado. Ketan On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > Have you already run a simple hellow-world swift test from communicado to bridled to make sure you have ssh configured correctly? I would do that first. > > Im not sure if an ssh problem explains what you show below, or not. > > - Mike > > ----- Original Message ----- >> Thanks, I made the change. However, now, I am getting the following on >> my stderr >> >> >> =========== >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified >> locally) >> >> RunID: 20110428-1022-n9s0k0e0 >> Progress: >> [ketan] >> Progress: Initializing site shared directory:1 >> [ketan] Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> Progress: Initializing site shared directory:1 >> ======== >> >> And from the log it seems some network transmission has failed: >> >> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending >> SSH_MSG_SERVICE_REQUEST >> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon Received >> SSH_MSG_SERVICE_ACCEPT >> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The >> Transport Protocol thread failed >> java.io.IOException: The socket is EOF >> at >> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) >> at >> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) >> at >> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) >> at >> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) >> at >> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) >> at java.lang.Thread.run(Thread.java:662) >> >> >> Any clues? >> Ketan >> >> >> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: >> >>> The pool name in your sites file is pads-remote-pbs-coasters-ssh but >>> you used pbs in your tc.data. >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> Hello, >>>> >>>> Some context: >>>> I am trying to submit a big run on Beagle using swift + coasters. >>>> However, a previous run is already underway on beagle. So, there >>>> are >>>> two difficulties running a new run from its login node: >>>> >>>> 1. Running another swift from the same jvm will result in chaos on >>>> the >>>> logs (As far as I know, please correct me if this is not the case >>>> anymore) >>>> >>>> 2. Login node is already under load because of my running previous >>>> big >>>> run. >>>> >>>> /context >>>> >>>> So, I am now trying to submit this big run from a remote host >>>> (bridled). I know this has been done on PADS using ssh:pbs, >>>> provider >>>> coaster. I tried the similar approach on a trial swift script but >>>> getting error. >>>> >>>> Following is the error message: >>>> >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>> -sites.file >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>> modified >>>> locally) >>>> >>>> RunID: 20110428-1002-c8rvqhe6 >>>> Progress: >>>> The application "cat" is not available in your tc.data catalog >>>> Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException >>>> Final status: Failed:1 >>>> The following errors have occurred: >>>> 1. The application "cat" is not available in your tc.data catalog >>>> >>>> >>>> Attached are my .swift, sites.xml and tc.data files. >>>> >>>> Could someone indicate if what I am doing is doable and if so how >>>> can >>>> I correctly configure my sites and tc setup. >>>> >>>> Thanks. >>>> Ketan >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From wilde at mcs.anl.gov Thu Apr 28 12:00:24 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 12:00:24 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: Message-ID: <2099970718.1905.1304010024399.JavaMail.root@zimbra.anl.gov> OK. Was there a cookbook on the ssh settings? Did you set up a $HOME/.ssh/auth.defaults per the user guide? Here is an auth.defaults example. Im not sure its 100% correct, but could serve as a base for you: xlogin1.pads.ci.uchicago.edu.type=password xlogin1.pads.ci.uchicago.edu.username=wilde login.pads.ci.uchicago.edu.type=key login.pads.ci.uchicago.edu.username=wilde login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!! login1.pads.ci.uchicago.edu.type=key login1.pads.ci.uchicago.edu.username=wilde login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!! login.mcs.anl.gov.type=key login.mcs.anl.gov.username=wilde login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE mode=600!!! - Mike ----- Original Message ----- > It does look like an ssh problem. I am getting the same stderr and log > messages on trying to communicate from Bridled to Communicado. > > Ketan > > On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > > > Have you already run a simple hellow-world swift test from > > communicado to bridled to make sure you have ssh configured > > correctly? I would do that first. > > > > Im not sure if an ssh problem explains what you show below, or not. > > > > - Mike > > > > ----- Original Message ----- > >> Thanks, I made the change. However, now, I am getting the following > >> on > >> my stderr > >> > >> > >> =========== > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >> -sites.file > >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >> modified > >> locally) > >> > >> RunID: 20110428-1022-n9s0k0e0 > >> Progress: > >> [ketan] > >> Progress: Initializing site shared directory:1 > >> [ketan] Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> Progress: Initializing site shared directory:1 > >> ======== > >> > >> And from the log it seems some network transmission has failed: > >> > >> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending > >> SSH_MSG_SERVICE_REQUEST > >> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon Received > >> SSH_MSG_SERVICE_ACCEPT > >> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > >> Transport Protocol thread failed > >> java.io.IOException: The socket is EOF > >> at > >> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > >> at > >> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > >> at > >> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > >> at > >> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > >> at > >> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > >> at java.lang.Thread.run(Thread.java:662) > >> > >> > >> Any clues? > >> Ketan > >> > >> > >> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > >> > >>> The pool name in your sites file is pads-remote-pbs-coasters-ssh > >>> but > >>> you used pbs in your tc.data. > >>> > >>> - Mike > >>> > >>> ----- Original Message ----- > >>>> Hello, > >>>> > >>>> Some context: > >>>> I am trying to submit a big run on Beagle using swift + coasters. > >>>> However, a previous run is already underway on beagle. So, there > >>>> are > >>>> two difficulties running a new run from its login node: > >>>> > >>>> 1. Running another swift from the same jvm will result in chaos > >>>> on > >>>> the > >>>> logs (As far as I know, please correct me if this is not the case > >>>> anymore) > >>>> > >>>> 2. Login node is already under load because of my running > >>>> previous > >>>> big > >>>> run. > >>>> > >>>> /context > >>>> > >>>> So, I am now trying to submit this big run from a remote host > >>>> (bridled). I know this has been done on PADS using ssh:pbs, > >>>> provider > >>>> coaster. I tried the similar approach on a trial swift script but > >>>> getting error. > >>>> > >>>> Following is the error message: > >>>> > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>> -sites.file > >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>> modified > >>>> locally) > >>>> > >>>> RunID: 20110428-1002-c8rvqhe6 > >>>> Progress: > >>>> The application "cat" is not available in your tc.data catalog > >>>> Caused by: > >>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > >>>> Final status: Failed:1 > >>>> The following errors have occurred: > >>>> 1. The application "cat" is not available in your tc.data catalog > >>>> > >>>> > >>>> Attached are my .swift, sites.xml and tc.data files. > >>>> > >>>> Could someone indicate if what I am doing is doable and if so how > >>>> can > >>>> I correctly configure my sites and tc setup. > >>>> > >>>> Thanks. > >>>> Ketan > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Swift-devel mailing list > >>>> Swift-devel at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >>> -- > >>> Michael Wilde > >>> Computation Institute, University of Chicago > >>> Mathematics and Computer Science Division > >>> Argonne National Laboratory > >>> > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu Apr 28 12:20:57 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 28 Apr 2011 10:20:57 -0700 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: References: Message-ID: <1304011257.8599.1.camel@blabla2.none> On Thu, 2011-04-28 at 10:14 -0500, Ketan Maheshwari wrote: > Hello, > > Some context: > I am trying to submit a big run on Beagle using swift + coasters. > However, a previous run is already underway on beagle. So, there are > two difficulties running a new run from its login node: > > 1. Running another swift from the same jvm will result in chaos on the > logs (As far as I know, please correct me if this is not the case > anymore) Maybe, but I must ask, how would you get another swift instance in the same JVM as the first one? > > 2. Login node is already under load because of my running previous big run. Shouldn't be under that much load, but that is something you can actually measure. Mihael From hategan at mcs.anl.gov Thu Apr 28 12:22:42 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 28 Apr 2011 10:22:42 -0700 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <2099970718.1905.1304010024399.JavaMail.root@zimbra.anl.gov> References: <2099970718.1905.1304010024399.JavaMail.root@zimbra.anl.gov> Message-ID: <1304011362.8599.2.camel@blabla2.none> You could omit the passphrase and you'd be asked for it. I think. On Thu, 2011-04-28 at 12:00 -0500, Michael Wilde wrote: > OK. Was there a cookbook on the ssh settings? Did you set up a $HOME/.ssh/auth.defaults per the user guide? > > Here is an auth.defaults example. Im not sure its 100% correct, but could serve as a base for you: > > xlogin1.pads.ci.uchicago.edu.type=password > xlogin1.pads.ci.uchicago.edu.username=wilde > > login.pads.ci.uchicago.edu.type=key > login.pads.ci.uchicago.edu.username=wilde > login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!! > > login1.pads.ci.uchicago.edu.type=key > login1.pads.ci.uchicago.edu.username=wilde > login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!! > > login.mcs.anl.gov.type=key > login.mcs.anl.gov.username=wilde > login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE mode=600!!! > > - Mike > > ----- Original Message ----- > > It does look like an ssh problem. I am getting the same stderr and log > > messages on trying to communicate from Bridled to Communicado. > > > > Ketan > > > > On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > > > > > Have you already run a simple hellow-world swift test from > > > communicado to bridled to make sure you have ssh configured > > > correctly? I would do that first. > > > > > > Im not sure if an ssh problem explains what you show below, or not. > > > > > > - Mike > > > > > > ----- Original Message ----- > > >> Thanks, I made the change. However, now, I am getting the following > > >> on > > >> my stderr > > >> > > >> > > >> =========== > > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > >> -sites.file > > >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > >> modified > > >> locally) > > >> > > >> RunID: 20110428-1022-n9s0k0e0 > > >> Progress: > > >> [ketan] > > >> Progress: Initializing site shared directory:1 > > >> [ketan] Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> Progress: Initializing site shared directory:1 > > >> ======== > > >> > > >> And from the log it seems some network transmission has failed: > > >> > > >> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending > > >> SSH_MSG_SERVICE_REQUEST > > >> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon Received > > >> SSH_MSG_SERVICE_ACCEPT > > >> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > > >> Transport Protocol thread failed > > >> java.io.IOException: The socket is EOF > > >> at > > >> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > > >> at > > >> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > > >> at > > >> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > > >> at > > >> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > > >> at > > >> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > > >> at java.lang.Thread.run(Thread.java:662) > > >> > > >> > > >> Any clues? > > >> Ketan > > >> > > >> > > >> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > > >> > > >>> The pool name in your sites file is pads-remote-pbs-coasters-ssh > > >>> but > > >>> you used pbs in your tc.data. > > >>> > > >>> - Mike > > >>> > > >>> ----- Original Message ----- > > >>>> Hello, > > >>>> > > >>>> Some context: > > >>>> I am trying to submit a big run on Beagle using swift + coasters. > > >>>> However, a previous run is already underway on beagle. So, there > > >>>> are > > >>>> two difficulties running a new run from its login node: > > >>>> > > >>>> 1. Running another swift from the same jvm will result in chaos > > >>>> on > > >>>> the > > >>>> logs (As far as I know, please correct me if this is not the case > > >>>> anymore) > > >>>> > > >>>> 2. Login node is already under load because of my running > > >>>> previous > > >>>> big > > >>>> run. > > >>>> > > >>>> /context > > >>>> > > >>>> So, I am now trying to submit this big run from a remote host > > >>>> (bridled). I know this has been done on PADS using ssh:pbs, > > >>>> provider > > >>>> coaster. I tried the similar approach on a trial swift script but > > >>>> getting error. > > >>>> > > >>>> Following is the error message: > > >>>> > > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > >>>> -sites.file > > >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > >>>> modified > > >>>> locally) > > >>>> > > >>>> RunID: 20110428-1002-c8rvqhe6 > > >>>> Progress: > > >>>> The application "cat" is not available in your tc.data catalog > > >>>> Caused by: > > >>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > > >>>> Final status: Failed:1 > > >>>> The following errors have occurred: > > >>>> 1. The application "cat" is not available in your tc.data catalog > > >>>> > > >>>> > > >>>> Attached are my .swift, sites.xml and tc.data files. > > >>>> > > >>>> Could someone indicate if what I am doing is doable and if so how > > >>>> can > > >>>> I correctly configure my sites and tc setup. > > >>>> > > >>>> Thanks. > > >>>> Ketan > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> Swift-devel mailing list > > >>>> Swift-devel at ci.uchicago.edu > > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>> > > >>> -- > > >>> Michael Wilde > > >>> Computation Institute, University of Chicago > > >>> Mathematics and Computer Science Division > > >>> Argonne National Laboratory > > >>> > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > From ketancmaheshwari at gmail.com Thu Apr 28 13:01:18 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 13:01:18 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <2099970718.1905.1304010024399.JavaMail.root@zimbra.anl.gov> References: <2099970718.1905.1304010024399.JavaMail.root@zimbra.anl.gov> Message-ID: Hi, It looks better now. However, I am getting the following: ===== [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) RunID: 20110428-1251-oi9theh8 Progress: Progress: Stage in:1 Could not submit job Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file (/tmp/x509up_u2006) not found. Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file (/tmp/x509up_u2006) not found. Failed to transfer wrapper log from catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh ===== How do I specify "-nosec" on automatic coasters? Ketan On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > OK. Was there a cookbook on the ssh settings? Did you set up a $HOME/.ssh/auth.defaults per the user guide? > > Here is an auth.defaults example. Im not sure its 100% correct, but could serve as a base for you: > > xlogin1.pads.ci.uchicago.edu.type=password > xlogin1.pads.ci.uchicago.edu.username=wilde > > login.pads.ci.uchicago.edu.type=key > login.pads.ci.uchicago.edu.username=wilde > login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!! > > login1.pads.ci.uchicago.edu.type=key > login1.pads.ci.uchicago.edu.username=wilde > login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE mode=600!!! > > login.mcs.anl.gov.type=key > login.mcs.anl.gov.username=wilde > login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE mode=600!!! > > - Mike > > ----- Original Message ----- >> It does look like an ssh problem. I am getting the same stderr and log >> messages on trying to communicate from Bridled to Communicado. >> >> Ketan >> >> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: >> >>> Have you already run a simple hellow-world swift test from >>> communicado to bridled to make sure you have ssh configured >>> correctly? I would do that first. >>> >>> Im not sure if an ssh problem explains what you show below, or not. >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> Thanks, I made the change. However, now, I am getting the following >>>> on >>>> my stderr >>>> >>>> >>>> =========== >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>> -sites.file >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>> modified >>>> locally) >>>> >>>> RunID: 20110428-1022-n9s0k0e0 >>>> Progress: >>>> [ketan] >>>> Progress: Initializing site shared directory:1 >>>> [ketan] Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> Progress: Initializing site shared directory:1 >>>> ======== >>>> >>>> And from the log it seems some network transmission has failed: >>>> >>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending >>>> SSH_MSG_SERVICE_REQUEST >>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon Received >>>> SSH_MSG_SERVICE_ACCEPT >>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The >>>> Transport Protocol thread failed >>>> java.io.IOException: The socket is EOF >>>> at >>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) >>>> at >>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) >>>> at >>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) >>>> at >>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) >>>> at >>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) >>>> at java.lang.Thread.run(Thread.java:662) >>>> >>>> >>>> Any clues? >>>> Ketan >>>> >>>> >>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: >>>> >>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh >>>>> but >>>>> you used pbs in your tc.data. >>>>> >>>>> - Mike >>>>> >>>>> ----- Original Message ----- >>>>>> Hello, >>>>>> >>>>>> Some context: >>>>>> I am trying to submit a big run on Beagle using swift + coasters. >>>>>> However, a previous run is already underway on beagle. So, there >>>>>> are >>>>>> two difficulties running a new run from its login node: >>>>>> >>>>>> 1. Running another swift from the same jvm will result in chaos >>>>>> on >>>>>> the >>>>>> logs (As far as I know, please correct me if this is not the case >>>>>> anymore) >>>>>> >>>>>> 2. Login node is already under load because of my running >>>>>> previous >>>>>> big >>>>>> run. >>>>>> >>>>>> /context >>>>>> >>>>>> So, I am now trying to submit this big run from a remote host >>>>>> (bridled). I know this has been done on PADS using ssh:pbs, >>>>>> provider >>>>>> coaster. I tried the similar approach on a trial swift script but >>>>>> getting error. >>>>>> >>>>>> Following is the error message: >>>>>> >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>> -sites.file >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>> modified >>>>>> locally) >>>>>> >>>>>> RunID: 20110428-1002-c8rvqhe6 >>>>>> Progress: >>>>>> The application "cat" is not available in your tc.data catalog >>>>>> Caused by: >>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException >>>>>> Final status: Failed:1 >>>>>> The following errors have occurred: >>>>>> 1. The application "cat" is not available in your tc.data catalog >>>>>> >>>>>> >>>>>> Attached are my .swift, sites.xml and tc.data files. >>>>>> >>>>>> Could someone indicate if what I am doing is doable and if so how >>>>>> can >>>>>> I correctly configure my sites and tc setup. >>>>>> >>>>>> Thanks. >>>>>> Ketan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>>> -- >>>>> Michael Wilde >>>>> Computation Institute, University of Chicago >>>>> Mathematics and Computer Science Division >>>>> Argonne National Laboratory >>>>> >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From wilde at mcs.anl.gov Thu Apr 28 13:03:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 13:03:46 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: Message-ID: <1403032176.2307.1304013826306.JavaMail.root@zimbra.anl.gov> For now - create a proxy using grid-proxy-init on the swift execution machine. I think there is an option to set "no security" for this config but I cant recall where that is specified. Maybe swift.properties, I cant recall. - Mike ----- Original Message ----- > Hi, > > It looks better now. However, I am getting the following: > > ===== > > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file > beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > locally) > > RunID: 20110428-1251-oi9theh8 > Progress: > Progress: Stage in:1 > Could not submit job > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Could not submit job > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Could not start coaster service > Caused by: > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file > (/tmp/x509up_u2006) not found. > Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy > file (/tmp/x509up_u2006) not found. > Failed to transfer wrapper log from > catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh > > ===== > > How do I specify "-nosec" on automatic coasters? > > Ketan > > On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > > > OK. Was there a cookbook on the ssh settings? Did you set up a > > $HOME/.ssh/auth.defaults per the user guide? > > > > Here is an auth.defaults example. Im not sure its 100% correct, but > > could serve as a base for you: > > > > xlogin1.pads.ci.uchicago.edu.type=password > > xlogin1.pads.ci.uchicago.edu.username=wilde > > > > login.pads.ci.uchicago.edu.type=key > > login.pads.ci.uchicago.edu.username=wilde > > login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > > login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE > > mode=600!!! > > > > login1.pads.ci.uchicago.edu.type=key > > login1.pads.ci.uchicago.edu.username=wilde > > login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > > login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > > SURE mode=600!!! > > > > login.mcs.anl.gov.type=key > > login.mcs.anl.gov.username=wilde > > login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > > login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > > mode=600!!! > > > > - Mike > > > > ----- Original Message ----- > >> It does look like an ssh problem. I am getting the same stderr and > >> log > >> messages on trying to communicate from Bridled to Communicado. > >> > >> Ketan > >> > >> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > >> > >>> Have you already run a simple hellow-world swift test from > >>> communicado to bridled to make sure you have ssh configured > >>> correctly? I would do that first. > >>> > >>> Im not sure if an ssh problem explains what you show below, or > >>> not. > >>> > >>> - Mike > >>> > >>> ----- Original Message ----- > >>>> Thanks, I made the change. However, now, I am getting the > >>>> following > >>>> on > >>>> my stderr > >>>> > >>>> > >>>> =========== > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>> -sites.file > >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>> modified > >>>> locally) > >>>> > >>>> RunID: 20110428-1022-n9s0k0e0 > >>>> Progress: > >>>> [ketan] > >>>> Progress: Initializing site shared directory:1 > >>>> [ketan] Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> Progress: Initializing site shared directory:1 > >>>> ======== > >>>> > >>>> And from the log it seems some network transmission has failed: > >>>> > >>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending > >>>> SSH_MSG_SERVICE_REQUEST > >>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon > >>>> Received > >>>> SSH_MSG_SERVICE_ACCEPT > >>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > >>>> Transport Protocol thread failed > >>>> java.io.IOException: The socket is EOF > >>>> at > >>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > >>>> at > >>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > >>>> at > >>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > >>>> at > >>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > >>>> at > >>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > >>>> at java.lang.Thread.run(Thread.java:662) > >>>> > >>>> > >>>> Any clues? > >>>> Ketan > >>>> > >>>> > >>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > >>>> > >>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh > >>>>> but > >>>>> you used pbs in your tc.data. > >>>>> > >>>>> - Mike > >>>>> > >>>>> ----- Original Message ----- > >>>>>> Hello, > >>>>>> > >>>>>> Some context: > >>>>>> I am trying to submit a big run on Beagle using swift + > >>>>>> coasters. > >>>>>> However, a previous run is already underway on beagle. So, > >>>>>> there > >>>>>> are > >>>>>> two difficulties running a new run from its login node: > >>>>>> > >>>>>> 1. Running another swift from the same jvm will result in chaos > >>>>>> on > >>>>>> the > >>>>>> logs (As far as I know, please correct me if this is not the > >>>>>> case > >>>>>> anymore) > >>>>>> > >>>>>> 2. Login node is already under load because of my running > >>>>>> previous > >>>>>> big > >>>>>> run. > >>>>>> > >>>>>> /context > >>>>>> > >>>>>> So, I am now trying to submit this big run from a remote host > >>>>>> (bridled). I know this has been done on PADS using ssh:pbs, > >>>>>> provider > >>>>>> coaster. I tried the similar approach on a trial swift script > >>>>>> but > >>>>>> getting error. > >>>>>> > >>>>>> Following is the error message: > >>>>>> > >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>> -sites.file > >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>> modified > >>>>>> locally) > >>>>>> > >>>>>> RunID: 20110428-1002-c8rvqhe6 > >>>>>> Progress: > >>>>>> The application "cat" is not available in your tc.data catalog > >>>>>> Caused by: > >>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > >>>>>> Final status: Failed:1 > >>>>>> The following errors have occurred: > >>>>>> 1. The application "cat" is not available in your tc.data > >>>>>> catalog > >>>>>> > >>>>>> > >>>>>> Attached are my .swift, sites.xml and tc.data files. > >>>>>> > >>>>>> Could someone indicate if what I am doing is doable and if so > >>>>>> how > >>>>>> can > >>>>>> I correctly configure my sites and tc setup. > >>>>>> > >>>>>> Thanks. > >>>>>> Ketan > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Swift-devel mailing list > >>>>>> Swift-devel at ci.uchicago.edu > >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>> > >>> -- > >>> Michael Wilde > >>> Computation Institute, University of Chicago > >>> Mathematics and Computer Science Division > >>> Argonne National Laboratory > >>> > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Apr 28 13:36:15 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 13:36:15 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1403032176.2307.1304013826306.JavaMail.root@zimbra.anl.gov> References: <1403032176.2307.1304013826306.JavaMail.root@zimbra.anl.gov> Message-ID: <1952F257-772E-4C97-8AE8-0A73F8E9C7E8@gmail.com> Ok, I got past CredentialException with grid-proxy-init, now I am facing this (note: I have turned on provider staging) : ======== [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) RunID: 20110428-1332-llaa031f Progress: Could not start connection handler java.io.EOFException at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) at org.globus.net.BaseServer.run(BaseServer.java:247) at java.lang.Thread.run(Thread.java:662) Progress: Submitted:1 Could not start connection handler java.io.EOFException at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) at org.globus.net.BaseServer.run(BaseServer.java:247) at java.lang.Thread.run(Thread.java:662) Progress: Submitted:1 Exception in cat: Arguments: [data.txt] Host: beagle-remote-pbs-coasters-ssh Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs ---- Caused by: Could not submit job Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task ended before registration was received. STDOUT: STDERR: Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1 Final status: Failed:1 The following errors have occurred: 1. Job failed with an exit code of 1 ======== From bridled to communicado, I see the following error: ************** [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) RunID: 20110428-1335-k685b2ye Progress: Progress: Submitted:1 Progress: Active:1 Exception in cat: Arguments: [data.txt] Host: communicado-ssh Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs ---- Caused by: Job failed with an exit code of 524 Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 524 Final status: Failed:1 The following errors have occurred: 1. Job failed with an exit code of 524 ************ -- Ketan On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > For now - create a proxy using grid-proxy-init on the swift execution machine. > I think there is an option to set "no security" for this config but I cant recall where that is specified. Maybe swift.properties, I cant recall. > > - Mike > > ----- Original Message ----- >> Hi, >> >> It looks better now. However, I am getting the following: >> >> ===== >> >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified >> locally) >> >> RunID: 20110428-1251-oi9theh8 >> Progress: >> Progress: Stage in:1 >> Could not submit job >> Caused by: >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >> Could not submit job >> Caused by: >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >> Could not start coaster service >> Caused by: >> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: >> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file >> (/tmp/x509up_u2006) not found. >> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy >> file (/tmp/x509up_u2006) not found. >> Failed to transfer wrapper log from >> catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh >> >> ===== >> >> How do I specify "-nosec" on automatic coasters? >> >> Ketan >> >> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: >> >>> OK. Was there a cookbook on the ssh settings? Did you set up a >>> $HOME/.ssh/auth.defaults per the user guide? >>> >>> Here is an auth.defaults example. Im not sure its 100% correct, but >>> could serve as a base for you: >>> >>> xlogin1.pads.ci.uchicago.edu.type=password >>> xlogin1.pads.ci.uchicago.edu.username=wilde >>> >>> login.pads.ci.uchicago.edu.type=key >>> login.pads.ci.uchicago.edu.username=wilde >>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE >>> mode=600!!! >>> >>> login1.pads.ci.uchicago.edu.type=key >>> login1.pads.ci.uchicago.edu.username=wilde >>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE >>> SURE mode=600!!! >>> >>> login.mcs.anl.gov.type=key >>> login.mcs.anl.gov.username=wilde >>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa >>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE >>> mode=600!!! >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> It does look like an ssh problem. I am getting the same stderr and >>>> log >>>> messages on trying to communicate from Bridled to Communicado. >>>> >>>> Ketan >>>> >>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: >>>> >>>>> Have you already run a simple hellow-world swift test from >>>>> communicado to bridled to make sure you have ssh configured >>>>> correctly? I would do that first. >>>>> >>>>> Im not sure if an ssh problem explains what you show below, or >>>>> not. >>>>> >>>>> - Mike >>>>> >>>>> ----- Original Message ----- >>>>>> Thanks, I made the change. However, now, I am getting the >>>>>> following >>>>>> on >>>>>> my stderr >>>>>> >>>>>> >>>>>> =========== >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>> -sites.file >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>> modified >>>>>> locally) >>>>>> >>>>>> RunID: 20110428-1022-n9s0k0e0 >>>>>> Progress: >>>>>> [ketan] >>>>>> Progress: Initializing site shared directory:1 >>>>>> [ketan] Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> Progress: Initializing site shared directory:1 >>>>>> ======== >>>>>> >>>>>> And from the log it seems some network transmission has failed: >>>>>> >>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending >>>>>> SSH_MSG_SERVICE_REQUEST >>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon >>>>>> Received >>>>>> SSH_MSG_SERVICE_ACCEPT >>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The >>>>>> Transport Protocol thread failed >>>>>> java.io.IOException: The socket is EOF >>>>>> at >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) >>>>>> at >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) >>>>>> at >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) >>>>>> at >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) >>>>>> at >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) >>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>> >>>>>> >>>>>> Any clues? >>>>>> Ketan >>>>>> >>>>>> >>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: >>>>>> >>>>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh >>>>>>> but >>>>>>> you used pbs in your tc.data. >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> Hello, >>>>>>>> >>>>>>>> Some context: >>>>>>>> I am trying to submit a big run on Beagle using swift + >>>>>>>> coasters. >>>>>>>> However, a previous run is already underway on beagle. So, >>>>>>>> there >>>>>>>> are >>>>>>>> two difficulties running a new run from its login node: >>>>>>>> >>>>>>>> 1. Running another swift from the same jvm will result in chaos >>>>>>>> on >>>>>>>> the >>>>>>>> logs (As far as I know, please correct me if this is not the >>>>>>>> case >>>>>>>> anymore) >>>>>>>> >>>>>>>> 2. Login node is already under load because of my running >>>>>>>> previous >>>>>>>> big >>>>>>>> run. >>>>>>>> >>>>>>>> /context >>>>>>>> >>>>>>>> So, I am now trying to submit this big run from a remote host >>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, >>>>>>>> provider >>>>>>>> coaster. I tried the similar approach on a trial swift script >>>>>>>> but >>>>>>>> getting error. >>>>>>>> >>>>>>>> Following is the error message: >>>>>>>> >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>> -sites.file >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>> modified >>>>>>>> locally) >>>>>>>> >>>>>>>> RunID: 20110428-1002-c8rvqhe6 >>>>>>>> Progress: >>>>>>>> The application "cat" is not available in your tc.data catalog >>>>>>>> Caused by: >>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException >>>>>>>> Final status: Failed:1 >>>>>>>> The following errors have occurred: >>>>>>>> 1. The application "cat" is not available in your tc.data >>>>>>>> catalog >>>>>>>> >>>>>>>> >>>>>>>> Attached are my .swift, sites.xml and tc.data files. >>>>>>>> >>>>>>>> Could someone indicate if what I am doing is doable and if so >>>>>>>> how >>>>>>>> can >>>>>>>> I correctly configure my sites and tc setup. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> Ketan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Swift-devel mailing list >>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>> >>>>>>> -- >>>>>>> Michael Wilde >>>>>>> Computation Institute, University of Chicago >>>>>>> Mathematics and Computer Science Division >>>>>>> Argonne National Laboratory >>>>>>> >>>>> >>>>> -- >>>>> Michael Wilde >>>>> Computation Institute, University of Chicago >>>>> Mathematics and Computer Science Division >>>>> Argonne National Laboratory >>>>> >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From wilde at mcs.anl.gov Thu Apr 28 13:46:41 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 13:46:41 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1952F257-772E-4C97-8AE8-0A73F8E9C7E8@gmail.com> Message-ID: <1631113895.3244.1304016401370.JavaMail.root@zimbra.anl.gov> Now I think you need to create the same proxy on the Beagle side. For starters, just try copying your proxy file from /tmp on communicado to /tmp on the Beagle login node on which you are running Swift. Later you can do this by creating a proxy on the Beagle size using grid-proxy-init, but you'll need to install CA certs there. Also, have you considered running a passive coaster server on the communicado side, and just having Beagle worker.pl scripts connect back to it? - Mike ----- Original Message ----- > Ok, I got past CredentialException with grid-proxy-init, now I am > facing this (note: I have turned on provider staging) : > > ======== > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file > beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > locally) > > RunID: 20110428-1332-llaa031f > Progress: > Could not start connection handler > java.io.EOFException > at > org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > at > org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > at > org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > at > org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > at > org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > at > org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > at > org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > at > org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > at > org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > at org.globus.net.BaseServer.run(BaseServer.java:247) > at java.lang.Thread.run(Thread.java:662) > Progress: Submitted:1 > Could not start connection handler > java.io.EOFException > at > org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > at > org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > at > org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > at > org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > at > org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > at > org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > at > org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > at > org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > at > org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > at org.globus.net.BaseServer.run(BaseServer.java:247) > at java.lang.Thread.run(Thread.java:662) > Progress: Submitted:1 > Exception in cat: > Arguments: [data.txt] > Host: beagle-remote-pbs-coasters-ssh > Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs > ---- > > Caused by: Could not submit job > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Could not submit job > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Could not start coaster service > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Task ended before registration was received. > STDOUT: > STDERR: > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: Job > failed with an exit code of 1 > Final status: Failed:1 > The following errors have occurred: > 1. Job failed with an exit code of 1 > > ======== > > > From bridled to communicado, I see the following error: > > ************** > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file > coaster-local-ssh-communicado.xml catsn.swift -n=1 > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > locally) > > RunID: 20110428-1335-k685b2ye > Progress: > Progress: Submitted:1 > Progress: Active:1 > Exception in cat: > Arguments: [data.txt] > Host: communicado-ssh > Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs > ---- > > Caused by: Job failed with an exit code of 524 > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: Job > failed with an exit code of 524 > Final status: Failed:1 > The following errors have occurred: > 1. Job failed with an exit code of 524 > > ************ > > > -- > Ketan > > > > > On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > > > For now - create a proxy using grid-proxy-init on the swift > > execution machine. > > I think there is an option to set "no security" for this config but > > I cant recall where that is specified. Maybe swift.properties, I > > cant recall. > > > > - Mike > > > > ----- Original Message ----- > >> Hi, > >> > >> It looks better now. However, I am getting the following: > >> > >> ===== > >> > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >> -sites.file > >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >> modified > >> locally) > >> > >> RunID: 20110428-1251-oi9theh8 > >> Progress: > >> Progress: Stage in:1 > >> Could not submit job > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Could not submit job > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Could not start coaster service > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > >> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file > >> (/tmp/x509up_u2006) not found. > >> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] > >> Proxy > >> file (/tmp/x509up_u2006) not found. > >> Failed to transfer wrapper log from > >> catsn-20110428-1251-oi9theh8/info/e on > >> beagle-remote-pbs-coasters-ssh > >> > >> ===== > >> > >> How do I specify "-nosec" on automatic coasters? > >> > >> Ketan > >> > >> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > >> > >>> OK. Was there a cookbook on the ssh settings? Did you set up a > >>> $HOME/.ssh/auth.defaults per the user guide? > >>> > >>> Here is an auth.defaults example. Im not sure its 100% correct, > >>> but > >>> could serve as a base for you: > >>> > >>> xlogin1.pads.ci.uchicago.edu.type=password > >>> xlogin1.pads.ci.uchicago.edu.username=wilde > >>> > >>> login.pads.ci.uchicago.edu.type=key > >>> login.pads.ci.uchicago.edu.username=wilde > >>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > >>> SURE > >>> mode=600!!! > >>> > >>> login1.pads.ci.uchicago.edu.type=key > >>> login1.pads.ci.uchicago.edu.username=wilde > >>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > >>> SURE mode=600!!! > >>> > >>> login.mcs.anl.gov.type=key > >>> login.mcs.anl.gov.username=wilde > >>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > >>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > >>> mode=600!!! > >>> > >>> - Mike > >>> > >>> ----- Original Message ----- > >>>> It does look like an ssh problem. I am getting the same stderr > >>>> and > >>>> log > >>>> messages on trying to communicate from Bridled to Communicado. > >>>> > >>>> Ketan > >>>> > >>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > >>>> > >>>>> Have you already run a simple hellow-world swift test from > >>>>> communicado to bridled to make sure you have ssh configured > >>>>> correctly? I would do that first. > >>>>> > >>>>> Im not sure if an ssh problem explains what you show below, or > >>>>> not. > >>>>> > >>>>> - Mike > >>>>> > >>>>> ----- Original Message ----- > >>>>>> Thanks, I made the change. However, now, I am getting the > >>>>>> following > >>>>>> on > >>>>>> my stderr > >>>>>> > >>>>>> > >>>>>> =========== > >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>> -sites.file > >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>> modified > >>>>>> locally) > >>>>>> > >>>>>> RunID: 20110428-1022-n9s0k0e0 > >>>>>> Progress: > >>>>>> [ketan] > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> [ketan] Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> ======== > >>>>>> > >>>>>> And from the log it seems some network transmission has failed: > >>>>>> > >>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon > >>>>>> Sending > >>>>>> SSH_MSG_SERVICE_REQUEST > >>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon > >>>>>> Received > >>>>>> SSH_MSG_SERVICE_ACCEPT > >>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > >>>>>> Transport Protocol thread failed > >>>>>> java.io.IOException: The socket is EOF > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > >>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>> > >>>>>> > >>>>>> Any clues? > >>>>>> Ketan > >>>>>> > >>>>>> > >>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > >>>>>> > >>>>>>> The pool name in your sites file is > >>>>>>> pads-remote-pbs-coasters-ssh > >>>>>>> but > >>>>>>> you used pbs in your tc.data. > >>>>>>> > >>>>>>> - Mike > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> Some context: > >>>>>>>> I am trying to submit a big run on Beagle using swift + > >>>>>>>> coasters. > >>>>>>>> However, a previous run is already underway on beagle. So, > >>>>>>>> there > >>>>>>>> are > >>>>>>>> two difficulties running a new run from its login node: > >>>>>>>> > >>>>>>>> 1. Running another swift from the same jvm will result in > >>>>>>>> chaos > >>>>>>>> on > >>>>>>>> the > >>>>>>>> logs (As far as I know, please correct me if this is not the > >>>>>>>> case > >>>>>>>> anymore) > >>>>>>>> > >>>>>>>> 2. Login node is already under load because of my running > >>>>>>>> previous > >>>>>>>> big > >>>>>>>> run. > >>>>>>>> > >>>>>>>> /context > >>>>>>>> > >>>>>>>> So, I am now trying to submit this big run from a remote host > >>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, > >>>>>>>> provider > >>>>>>>> coaster. I tried the similar approach on a trial swift script > >>>>>>>> but > >>>>>>>> getting error. > >>>>>>>> > >>>>>>>> Following is the error message: > >>>>>>>> > >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>> -sites.file > >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>>> modified > >>>>>>>> locally) > >>>>>>>> > >>>>>>>> RunID: 20110428-1002-c8rvqhe6 > >>>>>>>> Progress: > >>>>>>>> The application "cat" is not available in your tc.data > >>>>>>>> catalog > >>>>>>>> Caused by: > >>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > >>>>>>>> Final status: Failed:1 > >>>>>>>> The following errors have occurred: > >>>>>>>> 1. The application "cat" is not available in your tc.data > >>>>>>>> catalog > >>>>>>>> > >>>>>>>> > >>>>>>>> Attached are my .swift, sites.xml and tc.data files. > >>>>>>>> > >>>>>>>> Could someone indicate if what I am doing is doable and if so > >>>>>>>> how > >>>>>>>> can > >>>>>>>> I correctly configure my sites and tc setup. > >>>>>>>> > >>>>>>>> Thanks. > >>>>>>>> Ketan > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Swift-devel mailing list > >>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>> > >>>>>>> -- > >>>>>>> Michael Wilde > >>>>>>> Computation Institute, University of Chicago > >>>>>>> Mathematics and Computer Science Division > >>>>>>> Argonne National Laboratory > >>>>>>> > >>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>> > >>> -- > >>> Michael Wilde > >>> Computation Institute, University of Chicago > >>> Mathematics and Computer Science Division > >>> Argonne National Laboratory > >>> > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Apr 28 14:11:56 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 14:11:56 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1631113895.3244.1304016401370.JavaMail.root@zimbra.anl.gov> Message-ID: <1248059492.3469.1304017916874.JavaMail.root@zimbra.anl.gov> As far as I can tell from the swift-devel archives, the only feature for disabling coaster security is the -nosec option of the coaster-service command. - Mike ----- Original Message ----- > Now I think you need to create the same proxy on the Beagle side. For > starters, just try copying your proxy file from /tmp on communicado to > /tmp on the Beagle login node on which you are running Swift. Later > you can do this by creating a proxy on the Beagle size using > grid-proxy-init, but you'll need to install CA certs there. > > Also, have you considered running a passive coaster server on the > communicado side, and just having Beagle worker.pl scripts connect > back to it? > > - Mike > > ----- Original Message ----- > > Ok, I got past CredentialException with grid-proxy-init, now I am > > facing this (note: I have turned on provider staging) : > > > > ======== > > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > -sites.file > > beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > modified > > locally) > > > > RunID: 20110428-1332-llaa031f > > Progress: > > Could not start connection handler > > java.io.EOFException > > at > > org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > > at > > org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > > at > > org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > > at > > org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > > at > > org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > > at > > org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > > at > > org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > > at > > org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > > at org.globus.net.BaseServer.run(BaseServer.java:247) > > at java.lang.Thread.run(Thread.java:662) > > Progress: Submitted:1 > > Could not start connection handler > > java.io.EOFException > > at > > org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > > at > > org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > > at > > org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > > at > > org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > > at > > org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > > at > > org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > > at > > org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > > at > > org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > > at org.globus.net.BaseServer.run(BaseServer.java:247) > > at java.lang.Thread.run(Thread.java:662) > > Progress: Submitted:1 > > Exception in cat: > > Arguments: [data.txt] > > Host: beagle-remote-pbs-coasters-ssh > > Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: > > outs > > ---- > > > > Caused by: Could not submit job > > Caused by: > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > Could not submit job > > Caused by: > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > Could not start coaster service > > Caused by: > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > Task ended before registration was received. > > STDOUT: > > STDERR: > > Caused by: > > org.globus.cog.abstraction.impl.common.execution.JobException: Job > > failed with an exit code of 1 > > Final status: Failed:1 > > The following errors have occurred: > > 1. Job failed with an exit code of 1 > > > > ======== > > > > > > From bridled to communicado, I see the following error: > > > > ************** > > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > -sites.file > > coaster-local-ssh-communicado.xml catsn.swift -n=1 > > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > modified > > locally) > > > > RunID: 20110428-1335-k685b2ye > > Progress: > > Progress: Submitted:1 > > Progress: Active:1 > > Exception in cat: > > Arguments: [data.txt] > > Host: communicado-ssh > > Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: > > outs > > ---- > > > > Caused by: Job failed with an exit code of 524 > > Caused by: > > org.globus.cog.abstraction.impl.common.execution.JobException: Job > > failed with an exit code of 524 > > Final status: Failed:1 > > The following errors have occurred: > > 1. Job failed with an exit code of 524 > > > > ************ > > > > > > -- > > Ketan > > > > > > > > > > On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > > > > > For now - create a proxy using grid-proxy-init on the swift > > > execution machine. > > > I think there is an option to set "no security" for this config > > > but > > > I cant recall where that is specified. Maybe swift.properties, I > > > cant recall. > > > > > > - Mike > > > > > > ----- Original Message ----- > > >> Hi, > > >> > > >> It looks better now. However, I am getting the following: > > >> > > >> ===== > > >> > > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > >> -sites.file > > >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > >> modified > > >> locally) > > >> > > >> RunID: 20110428-1251-oi9theh8 > > >> Progress: > > >> Progress: Stage in:1 > > >> Could not submit job > > >> Caused by: > > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > >> Could not submit job > > >> Caused by: > > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > >> Could not start coaster service > > >> Caused by: > > >> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > > >> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file > > >> (/tmp/x509up_u2006) not found. > > >> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] > > >> Proxy > > >> file (/tmp/x509up_u2006) not found. > > >> Failed to transfer wrapper log from > > >> catsn-20110428-1251-oi9theh8/info/e on > > >> beagle-remote-pbs-coasters-ssh > > >> > > >> ===== > > >> > > >> How do I specify "-nosec" on automatic coasters? > > >> > > >> Ketan > > >> > > >> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > > >> > > >>> OK. Was there a cookbook on the ssh settings? Did you set up a > > >>> $HOME/.ssh/auth.defaults per the user guide? > > >>> > > >>> Here is an auth.defaults example. Im not sure its 100% correct, > > >>> but > > >>> could serve as a base for you: > > >>> > > >>> xlogin1.pads.ci.uchicago.edu.type=password > > >>> xlogin1.pads.ci.uchicago.edu.username=wilde > > >>> > > >>> login.pads.ci.uchicago.edu.type=key > > >>> login.pads.ci.uchicago.edu.username=wilde > > >>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > > >>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > > >>> SURE > > >>> mode=600!!! > > >>> > > >>> login1.pads.ci.uchicago.edu.type=key > > >>> login1.pads.ci.uchicago.edu.username=wilde > > >>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > > >>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > > >>> SURE mode=600!!! > > >>> > > >>> login.mcs.anl.gov.type=key > > >>> login.mcs.anl.gov.username=wilde > > >>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > > >>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > > >>> mode=600!!! > > >>> > > >>> - Mike > > >>> > > >>> ----- Original Message ----- > > >>>> It does look like an ssh problem. I am getting the same stderr > > >>>> and > > >>>> log > > >>>> messages on trying to communicate from Bridled to Communicado. > > >>>> > > >>>> Ketan > > >>>> > > >>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > > >>>> > > >>>>> Have you already run a simple hellow-world swift test from > > >>>>> communicado to bridled to make sure you have ssh configured > > >>>>> correctly? I would do that first. > > >>>>> > > >>>>> Im not sure if an ssh problem explains what you show below, or > > >>>>> not. > > >>>>> > > >>>>> - Mike > > >>>>> > > >>>>> ----- Original Message ----- > > >>>>>> Thanks, I made the change. However, now, I am getting the > > >>>>>> following > > >>>>>> on > > >>>>>> my stderr > > >>>>>> > > >>>>>> > > >>>>>> =========== > > >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > >>>>>> -sites.file > > >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > >>>>>> modified > > >>>>>> locally) > > >>>>>> > > >>>>>> RunID: 20110428-1022-n9s0k0e0 > > >>>>>> Progress: > > >>>>>> [ketan] > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> [ketan] Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> Progress: Initializing site shared directory:1 > > >>>>>> ======== > > >>>>>> > > >>>>>> And from the log it seems some network transmission has > > >>>>>> failed: > > >>>>>> > > >>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon > > >>>>>> Sending > > >>>>>> SSH_MSG_SERVICE_REQUEST > > >>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon > > >>>>>> Received > > >>>>>> SSH_MSG_SERVICE_ACCEPT > > >>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > > >>>>>> Transport Protocol thread failed > > >>>>>> java.io.IOException: The socket is EOF > > >>>>>> at > > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > > >>>>>> at > > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > > >>>>>> at > > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > > >>>>>> at > > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > > >>>>>> at > > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > > >>>>>> at java.lang.Thread.run(Thread.java:662) > > >>>>>> > > >>>>>> > > >>>>>> Any clues? > > >>>>>> Ketan > > >>>>>> > > >>>>>> > > >>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > > >>>>>> > > >>>>>>> The pool name in your sites file is > > >>>>>>> pads-remote-pbs-coasters-ssh > > >>>>>>> but > > >>>>>>> you used pbs in your tc.data. > > >>>>>>> > > >>>>>>> - Mike > > >>>>>>> > > >>>>>>> ----- Original Message ----- > > >>>>>>>> Hello, > > >>>>>>>> > > >>>>>>>> Some context: > > >>>>>>>> I am trying to submit a big run on Beagle using swift + > > >>>>>>>> coasters. > > >>>>>>>> However, a previous run is already underway on beagle. So, > > >>>>>>>> there > > >>>>>>>> are > > >>>>>>>> two difficulties running a new run from its login node: > > >>>>>>>> > > >>>>>>>> 1. Running another swift from the same jvm will result in > > >>>>>>>> chaos > > >>>>>>>> on > > >>>>>>>> the > > >>>>>>>> logs (As far as I know, please correct me if this is not > > >>>>>>>> the > > >>>>>>>> case > > >>>>>>>> anymore) > > >>>>>>>> > > >>>>>>>> 2. Login node is already under load because of my running > > >>>>>>>> previous > > >>>>>>>> big > > >>>>>>>> run. > > >>>>>>>> > > >>>>>>>> /context > > >>>>>>>> > > >>>>>>>> So, I am now trying to submit this big run from a remote > > >>>>>>>> host > > >>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, > > >>>>>>>> provider > > >>>>>>>> coaster. I tried the similar approach on a trial swift > > >>>>>>>> script > > >>>>>>>> but > > >>>>>>>> getting error. > > >>>>>>>> > > >>>>>>>> Following is the error message: > > >>>>>>>> > > >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > >>>>>>>> -sites.file > > >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 > > >>>>>>>> (cog > > >>>>>>>> modified > > >>>>>>>> locally) > > >>>>>>>> > > >>>>>>>> RunID: 20110428-1002-c8rvqhe6 > > >>>>>>>> Progress: > > >>>>>>>> The application "cat" is not available in your tc.data > > >>>>>>>> catalog > > >>>>>>>> Caused by: > > >>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > > >>>>>>>> Final status: Failed:1 > > >>>>>>>> The following errors have occurred: > > >>>>>>>> 1. The application "cat" is not available in your tc.data > > >>>>>>>> catalog > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Attached are my .swift, sites.xml and tc.data files. > > >>>>>>>> > > >>>>>>>> Could someone indicate if what I am doing is doable and if > > >>>>>>>> so > > >>>>>>>> how > > >>>>>>>> can > > >>>>>>>> I correctly configure my sites and tc setup. > > >>>>>>>> > > >>>>>>>> Thanks. > > >>>>>>>> Ketan > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> _______________________________________________ > > >>>>>>>> Swift-devel mailing list > > >>>>>>>> Swift-devel at ci.uchicago.edu > > >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>>>>>> > > >>>>>>> -- > > >>>>>>> Michael Wilde > > >>>>>>> Computation Institute, University of Chicago > > >>>>>>> Mathematics and Computer Science Division > > >>>>>>> Argonne National Laboratory > > >>>>>>> > > >>>>> > > >>>>> -- > > >>>>> Michael Wilde > > >>>>> Computation Institute, University of Chicago > > >>>>> Mathematics and Computer Science Division > > >>>>> Argonne National Laboratory > > >>>>> > > >>> > > >>> -- > > >>> Michael Wilde > > >>> Computation Institute, University of Chicago > > >>> Mathematics and Computer Science Division > > >>> Argonne National Laboratory > > >>> > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Apr 28 14:14:00 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 14:14:00 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1248059492.3469.1304017916874.JavaMail.root@zimbra.anl.gov> References: <1248059492.3469.1304017916874.JavaMail.root@zimbra.anl.gov> Message-ID: Ok, I am trying a manual coaster setup from bridled (service, swift) to beagle (worker.pl). --Ketan On Apr 28, 2011, at 2:11 PM, Michael Wilde wrote: > As far as I can tell from the swift-devel archives, the only feature for disabling coaster security is the -nosec option of the coaster-service command. > > - Mike > > > ----- Original Message ----- >> Now I think you need to create the same proxy on the Beagle side. For >> starters, just try copying your proxy file from /tmp on communicado to >> /tmp on the Beagle login node on which you are running Swift. Later >> you can do this by creating a proxy on the Beagle size using >> grid-proxy-init, but you'll need to install CA certs there. >> >> Also, have you considered running a passive coaster server on the >> communicado side, and just having Beagle worker.pl scripts connect >> back to it? >> >> - Mike >> >> ----- Original Message ----- >>> Ok, I got past CredentialException with grid-proxy-init, now I am >>> facing this (note: I have turned on provider staging) : >>> >>> ======== >>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>> -sites.file >>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>> modified >>> locally) >>> >>> RunID: 20110428-1332-llaa031f >>> Progress: >>> Could not start connection handler >>> java.io.EOFException >>> at >>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >>> at >>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >>> at >>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >>> at >>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >>> at >>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >>> at >>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >>> at >>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >>> at >>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >>> at >>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >>> at org.globus.net.BaseServer.run(BaseServer.java:247) >>> at java.lang.Thread.run(Thread.java:662) >>> Progress: Submitted:1 >>> Could not start connection handler >>> java.io.EOFException >>> at >>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >>> at >>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >>> at >>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >>> at >>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >>> at >>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >>> at >>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >>> at >>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >>> at >>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >>> at >>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >>> at org.globus.net.BaseServer.run(BaseServer.java:247) >>> at java.lang.Thread.run(Thread.java:662) >>> Progress: Submitted:1 >>> Exception in cat: >>> Arguments: [data.txt] >>> Host: beagle-remote-pbs-coasters-ssh >>> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: >>> outs >>> ---- >>> >>> Caused by: Could not submit job >>> Caused by: >>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>> Could not submit job >>> Caused by: >>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>> Could not start coaster service >>> Caused by: >>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>> Task ended before registration was received. >>> STDOUT: >>> STDERR: >>> Caused by: >>> org.globus.cog.abstraction.impl.common.execution.JobException: Job >>> failed with an exit code of 1 >>> Final status: Failed:1 >>> The following errors have occurred: >>> 1. Job failed with an exit code of 1 >>> >>> ======== >>> >>> >>> From bridled to communicado, I see the following error: >>> >>> ************** >>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>> -sites.file >>> coaster-local-ssh-communicado.xml catsn.swift -n=1 >>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>> modified >>> locally) >>> >>> RunID: 20110428-1335-k685b2ye >>> Progress: >>> Progress: Submitted:1 >>> Progress: Active:1 >>> Exception in cat: >>> Arguments: [data.txt] >>> Host: communicado-ssh >>> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: >>> outs >>> ---- >>> >>> Caused by: Job failed with an exit code of 524 >>> Caused by: >>> org.globus.cog.abstraction.impl.common.execution.JobException: Job >>> failed with an exit code of 524 >>> Final status: Failed:1 >>> The following errors have occurred: >>> 1. Job failed with an exit code of 524 >>> >>> ************ >>> >>> >>> -- >>> Ketan >>> >>> >>> >>> >>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: >>> >>>> For now - create a proxy using grid-proxy-init on the swift >>>> execution machine. >>>> I think there is an option to set "no security" for this config >>>> but >>>> I cant recall where that is specified. Maybe swift.properties, I >>>> cant recall. >>>> >>>> - Mike >>>> >>>> ----- Original Message ----- >>>>> Hi, >>>>> >>>>> It looks better now. However, I am getting the following: >>>>> >>>>> ===== >>>>> >>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>> -sites.file >>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>> modified >>>>> locally) >>>>> >>>>> RunID: 20110428-1251-oi9theh8 >>>>> Progress: >>>>> Progress: Stage in:1 >>>>> Could not submit job >>>>> Caused by: >>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>> Could not submit job >>>>> Caused by: >>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>> Could not start coaster service >>>>> Caused by: >>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: >>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file >>>>> (/tmp/x509up_u2006) not found. >>>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] >>>>> Proxy >>>>> file (/tmp/x509up_u2006) not found. >>>>> Failed to transfer wrapper log from >>>>> catsn-20110428-1251-oi9theh8/info/e on >>>>> beagle-remote-pbs-coasters-ssh >>>>> >>>>> ===== >>>>> >>>>> How do I specify "-nosec" on automatic coasters? >>>>> >>>>> Ketan >>>>> >>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: >>>>> >>>>>> OK. Was there a cookbook on the ssh settings? Did you set up a >>>>>> $HOME/.ssh/auth.defaults per the user guide? >>>>>> >>>>>> Here is an auth.defaults example. Im not sure its 100% correct, >>>>>> but >>>>>> could serve as a base for you: >>>>>> >>>>>> xlogin1.pads.ci.uchicago.edu.type=password >>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde >>>>>> >>>>>> login.pads.ci.uchicago.edu.type=key >>>>>> login.pads.ci.uchicago.edu.username=wilde >>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE >>>>>> SURE >>>>>> mode=600!!! >>>>>> >>>>>> login1.pads.ci.uchicago.edu.type=key >>>>>> login1.pads.ci.uchicago.edu.username=wilde >>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE >>>>>> SURE mode=600!!! >>>>>> >>>>>> login.mcs.anl.gov.type=key >>>>>> login.mcs.anl.gov.username=wilde >>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa >>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE >>>>>> mode=600!!! >>>>>> >>>>>> - Mike >>>>>> >>>>>> ----- Original Message ----- >>>>>>> It does look like an ssh problem. I am getting the same stderr >>>>>>> and >>>>>>> log >>>>>>> messages on trying to communicate from Bridled to Communicado. >>>>>>> >>>>>>> Ketan >>>>>>> >>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: >>>>>>> >>>>>>>> Have you already run a simple hellow-world swift test from >>>>>>>> communicado to bridled to make sure you have ssh configured >>>>>>>> correctly? I would do that first. >>>>>>>> >>>>>>>> Im not sure if an ssh problem explains what you show below, or >>>>>>>> not. >>>>>>>> >>>>>>>> - Mike >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> Thanks, I made the change. However, now, I am getting the >>>>>>>>> following >>>>>>>>> on >>>>>>>>> my stderr >>>>>>>>> >>>>>>>>> >>>>>>>>> =========== >>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>> -sites.file >>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>>> modified >>>>>>>>> locally) >>>>>>>>> >>>>>>>>> RunID: 20110428-1022-n9s0k0e0 >>>>>>>>> Progress: >>>>>>>>> [ketan] >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> [ketan] Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>> ======== >>>>>>>>> >>>>>>>>> And from the log it seems some network transmission has >>>>>>>>> failed: >>>>>>>>> >>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon >>>>>>>>> Sending >>>>>>>>> SSH_MSG_SERVICE_REQUEST >>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon >>>>>>>>> Received >>>>>>>>> SSH_MSG_SERVICE_ACCEPT >>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The >>>>>>>>> Transport Protocol thread failed >>>>>>>>> java.io.IOException: The socket is EOF >>>>>>>>> at >>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) >>>>>>>>> at >>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) >>>>>>>>> at >>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) >>>>>>>>> at >>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) >>>>>>>>> at >>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) >>>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>>> >>>>>>>>> >>>>>>>>> Any clues? >>>>>>>>> Ketan >>>>>>>>> >>>>>>>>> >>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: >>>>>>>>> >>>>>>>>>> The pool name in your sites file is >>>>>>>>>> pads-remote-pbs-coasters-ssh >>>>>>>>>> but >>>>>>>>>> you used pbs in your tc.data. >>>>>>>>>> >>>>>>>>>> - Mike >>>>>>>>>> >>>>>>>>>> ----- Original Message ----- >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> Some context: >>>>>>>>>>> I am trying to submit a big run on Beagle using swift + >>>>>>>>>>> coasters. >>>>>>>>>>> However, a previous run is already underway on beagle. So, >>>>>>>>>>> there >>>>>>>>>>> are >>>>>>>>>>> two difficulties running a new run from its login node: >>>>>>>>>>> >>>>>>>>>>> 1. Running another swift from the same jvm will result in >>>>>>>>>>> chaos >>>>>>>>>>> on >>>>>>>>>>> the >>>>>>>>>>> logs (As far as I know, please correct me if this is not >>>>>>>>>>> the >>>>>>>>>>> case >>>>>>>>>>> anymore) >>>>>>>>>>> >>>>>>>>>>> 2. Login node is already under load because of my running >>>>>>>>>>> previous >>>>>>>>>>> big >>>>>>>>>>> run. >>>>>>>>>>> >>>>>>>>>>> /context >>>>>>>>>>> >>>>>>>>>>> So, I am now trying to submit this big run from a remote >>>>>>>>>>> host >>>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, >>>>>>>>>>> provider >>>>>>>>>>> coaster. I tried the similar approach on a trial swift >>>>>>>>>>> script >>>>>>>>>>> but >>>>>>>>>>> getting error. >>>>>>>>>>> >>>>>>>>>>> Following is the error message: >>>>>>>>>>> >>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>>>> -sites.file >>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 >>>>>>>>>>> (cog >>>>>>>>>>> modified >>>>>>>>>>> locally) >>>>>>>>>>> >>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 >>>>>>>>>>> Progress: >>>>>>>>>>> The application "cat" is not available in your tc.data >>>>>>>>>>> catalog >>>>>>>>>>> Caused by: >>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException >>>>>>>>>>> Final status: Failed:1 >>>>>>>>>>> The following errors have occurred: >>>>>>>>>>> 1. The application "cat" is not available in your tc.data >>>>>>>>>>> catalog >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. >>>>>>>>>>> >>>>>>>>>>> Could someone indicate if what I am doing is doable and if >>>>>>>>>>> so >>>>>>>>>>> how >>>>>>>>>>> can >>>>>>>>>>> I correctly configure my sites and tc setup. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> Ketan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Swift-devel mailing list >>>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Michael Wilde >>>>>>>>>> Computation Institute, University of Chicago >>>>>>>>>> Mathematics and Computer Science Division >>>>>>>>>> Argonne National Laboratory >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Michael Wilde >>>>>>>> Computation Institute, University of Chicago >>>>>>>> Mathematics and Computer Science Division >>>>>>>> Argonne National Laboratory >>>>>>>> >>>>>> >>>>>> -- >>>>>> Michael Wilde >>>>>> Computation Institute, University of Chicago >>>>>> Mathematics and Computer Science Division >>>>>> Argonne National Laboratory >>>>>> >>>> >>>> -- >>>> Michael Wilde >>>> Computation Institute, University of Chicago >>>> Mathematics and Computer Science Division >>>> Argonne National Laboratory >>>> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From hategan at mcs.anl.gov Thu Apr 28 14:17:10 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 28 Apr 2011 12:17:10 -0700 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1952F257-772E-4C97-8AE8-0A73F8E9C7E8@gmail.com> References: <1403032176.2307.1304013826306.JavaMail.root@zimbra.anl.gov> <1952F257-772E-4C97-8AE8-0A73F8E9C7E8@gmail.com> Message-ID: <1304018230.10193.0.camel@blabla2.none> What does your sites file look like? On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: > Ok, I got past CredentialException with grid-proxy-init, now I am facing this (note: I have turned on provider staging) : > > ======== > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) > > RunID: 20110428-1332-llaa031f > Progress: > Could not start connection handler > java.io.EOFException > at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > at org.globus.net.BaseServer.run(BaseServer.java:247) > at java.lang.Thread.run(Thread.java:662) > Progress: Submitted:1 > Could not start connection handler > java.io.EOFException > at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > at org.globus.net.BaseServer.run(BaseServer.java:247) > at java.lang.Thread.run(Thread.java:662) > Progress: Submitted:1 > Exception in cat: > Arguments: [data.txt] > Host: beagle-remote-pbs-coasters-ssh > Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs > ---- > > Caused by: Could not submit job > Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job > Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service > Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task ended before registration was received. > STDOUT: > STDERR: > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1 > Final status: Failed:1 > The following errors have occurred: > 1. Job failed with an exit code of 1 > > ======== > > > From bridled to communicado, I see the following error: > > ************** > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) > > RunID: 20110428-1335-k685b2ye > Progress: > Progress: Submitted:1 > Progress: Active:1 > Exception in cat: > Arguments: [data.txt] > Host: communicado-ssh > Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs > ---- > > Caused by: Job failed with an exit code of 524 > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 524 > Final status: Failed:1 > The following errors have occurred: > 1. Job failed with an exit code of 524 > > ************ > > > -- > Ketan > > > > > On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > > > For now - create a proxy using grid-proxy-init on the swift execution machine. > > I think there is an option to set "no security" for this config but I cant recall where that is specified. Maybe swift.properties, I cant recall. > > > > - Mike > > > > ----- Original Message ----- > >> Hi, > >> > >> It looks better now. However, I am getting the following: > >> > >> ===== > >> > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file > >> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > >> locally) > >> > >> RunID: 20110428-1251-oi9theh8 > >> Progress: > >> Progress: Stage in:1 > >> Could not submit job > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Could not submit job > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Could not start coaster service > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > >> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file > >> (/tmp/x509up_u2006) not found. > >> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy > >> file (/tmp/x509up_u2006) not found. > >> Failed to transfer wrapper log from > >> catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh > >> > >> ===== > >> > >> How do I specify "-nosec" on automatic coasters? > >> > >> Ketan > >> > >> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > >> > >>> OK. Was there a cookbook on the ssh settings? Did you set up a > >>> $HOME/.ssh/auth.defaults per the user guide? > >>> > >>> Here is an auth.defaults example. Im not sure its 100% correct, but > >>> could serve as a base for you: > >>> > >>> xlogin1.pads.ci.uchicago.edu.type=password > >>> xlogin1.pads.ci.uchicago.edu.username=wilde > >>> > >>> login.pads.ci.uchicago.edu.type=key > >>> login.pads.ci.uchicago.edu.username=wilde > >>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE > >>> mode=600!!! > >>> > >>> login1.pads.ci.uchicago.edu.type=key > >>> login1.pads.ci.uchicago.edu.username=wilde > >>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > >>> SURE mode=600!!! > >>> > >>> login.mcs.anl.gov.type=key > >>> login.mcs.anl.gov.username=wilde > >>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > >>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > >>> mode=600!!! > >>> > >>> - Mike > >>> > >>> ----- Original Message ----- > >>>> It does look like an ssh problem. I am getting the same stderr and > >>>> log > >>>> messages on trying to communicate from Bridled to Communicado. > >>>> > >>>> Ketan > >>>> > >>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > >>>> > >>>>> Have you already run a simple hellow-world swift test from > >>>>> communicado to bridled to make sure you have ssh configured > >>>>> correctly? I would do that first. > >>>>> > >>>>> Im not sure if an ssh problem explains what you show below, or > >>>>> not. > >>>>> > >>>>> - Mike > >>>>> > >>>>> ----- Original Message ----- > >>>>>> Thanks, I made the change. However, now, I am getting the > >>>>>> following > >>>>>> on > >>>>>> my stderr > >>>>>> > >>>>>> > >>>>>> =========== > >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>> -sites.file > >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>> modified > >>>>>> locally) > >>>>>> > >>>>>> RunID: 20110428-1022-n9s0k0e0 > >>>>>> Progress: > >>>>>> [ketan] > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> [ketan] Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> Progress: Initializing site shared directory:1 > >>>>>> ======== > >>>>>> > >>>>>> And from the log it seems some network transmission has failed: > >>>>>> > >>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending > >>>>>> SSH_MSG_SERVICE_REQUEST > >>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon > >>>>>> Received > >>>>>> SSH_MSG_SERVICE_ACCEPT > >>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > >>>>>> Transport Protocol thread failed > >>>>>> java.io.IOException: The socket is EOF > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > >>>>>> at > >>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > >>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>> > >>>>>> > >>>>>> Any clues? > >>>>>> Ketan > >>>>>> > >>>>>> > >>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > >>>>>> > >>>>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh > >>>>>>> but > >>>>>>> you used pbs in your tc.data. > >>>>>>> > >>>>>>> - Mike > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> Some context: > >>>>>>>> I am trying to submit a big run on Beagle using swift + > >>>>>>>> coasters. > >>>>>>>> However, a previous run is already underway on beagle. So, > >>>>>>>> there > >>>>>>>> are > >>>>>>>> two difficulties running a new run from its login node: > >>>>>>>> > >>>>>>>> 1. Running another swift from the same jvm will result in chaos > >>>>>>>> on > >>>>>>>> the > >>>>>>>> logs (As far as I know, please correct me if this is not the > >>>>>>>> case > >>>>>>>> anymore) > >>>>>>>> > >>>>>>>> 2. Login node is already under load because of my running > >>>>>>>> previous > >>>>>>>> big > >>>>>>>> run. > >>>>>>>> > >>>>>>>> /context > >>>>>>>> > >>>>>>>> So, I am now trying to submit this big run from a remote host > >>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, > >>>>>>>> provider > >>>>>>>> coaster. I tried the similar approach on a trial swift script > >>>>>>>> but > >>>>>>>> getting error. > >>>>>>>> > >>>>>>>> Following is the error message: > >>>>>>>> > >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>> -sites.file > >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>>> modified > >>>>>>>> locally) > >>>>>>>> > >>>>>>>> RunID: 20110428-1002-c8rvqhe6 > >>>>>>>> Progress: > >>>>>>>> The application "cat" is not available in your tc.data catalog > >>>>>>>> Caused by: > >>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > >>>>>>>> Final status: Failed:1 > >>>>>>>> The following errors have occurred: > >>>>>>>> 1. The application "cat" is not available in your tc.data > >>>>>>>> catalog > >>>>>>>> > >>>>>>>> > >>>>>>>> Attached are my .swift, sites.xml and tc.data files. > >>>>>>>> > >>>>>>>> Could someone indicate if what I am doing is doable and if so > >>>>>>>> how > >>>>>>>> can > >>>>>>>> I correctly configure my sites and tc setup. > >>>>>>>> > >>>>>>>> Thanks. > >>>>>>>> Ketan > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Swift-devel mailing list > >>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>> > >>>>>>> -- > >>>>>>> Michael Wilde > >>>>>>> Computation Institute, University of Chicago > >>>>>>> Mathematics and Computer Science Division > >>>>>>> Argonne National Laboratory > >>>>>>> > >>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>> > >>> -- > >>> Michael Wilde > >>> Computation Institute, University of Chicago > >>> Mathematics and Computer Science Division > >>> Argonne National Laboratory > >>> > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From ketancmaheshwari at gmail.com Thu Apr 28 14:21:00 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 14:21:00 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1304018230.10193.0.camel@blabla2.none> References: <1403032176.2307.1304013826306.JavaMail.root@zimbra.anl.gov> <1952F257-772E-4C97-8AE8-0A73F8E9C7E8@gmail.com> <1304018230.10193.0.camel@blabla2.none> Message-ID: On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote: > What does your sites file look like? ** For beagle ** CI-CCR000013 24:cray:pack 24 1000 1 1 1 .63 10000 $HOME/swift.workdir ** for communicado ** .63 10000 $HOME/swift.workdir > > On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: >> Ok, I got past CredentialException with grid-proxy-init, now I am facing this (note: I have turned on provider staging) : >> >> ======== >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) >> >> RunID: 20110428-1332-llaa031f >> Progress: >> Could not start connection handler >> java.io.EOFException >> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >> at org.globus.net.BaseServer.run(BaseServer.java:247) >> at java.lang.Thread.run(Thread.java:662) >> Progress: Submitted:1 >> Could not start connection handler >> java.io.EOFException >> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >> at org.globus.net.BaseServer.run(BaseServer.java:247) >> at java.lang.Thread.run(Thread.java:662) >> Progress: Submitted:1 >> Exception in cat: >> Arguments: [data.txt] >> Host: beagle-remote-pbs-coasters-ssh >> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs >> ---- >> >> Caused by: Could not submit job >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task ended before registration was received. >> STDOUT: >> STDERR: >> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1 >> Final status: Failed:1 >> The following errors have occurred: >> 1. Job failed with an exit code of 1 >> >> ======== >> >> >> From bridled to communicado, I see the following error: >> >> ************** >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) >> >> RunID: 20110428-1335-k685b2ye >> Progress: >> Progress: Submitted:1 >> Progress: Active:1 >> Exception in cat: >> Arguments: [data.txt] >> Host: communicado-ssh >> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs >> ---- >> >> Caused by: Job failed with an exit code of 524 >> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 524 >> Final status: Failed:1 >> The following errors have occurred: >> 1. Job failed with an exit code of 524 >> >> ************ >> >> >> -- >> Ketan >> >> >> >> >> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: >> >>> For now - create a proxy using grid-proxy-init on the swift execution machine. >>> I think there is an option to set "no security" for this config but I cant recall where that is specified. Maybe swift.properties, I cant recall. >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> Hi, >>>> >>>> It looks better now. However, I am getting the following: >>>> >>>> ===== >>>> >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified >>>> locally) >>>> >>>> RunID: 20110428-1251-oi9theh8 >>>> Progress: >>>> Progress: Stage in:1 >>>> Could not submit job >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>> Could not submit job >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>> Could not start coaster service >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: >>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file >>>> (/tmp/x509up_u2006) not found. >>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy >>>> file (/tmp/x509up_u2006) not found. >>>> Failed to transfer wrapper log from >>>> catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh >>>> >>>> ===== >>>> >>>> How do I specify "-nosec" on automatic coasters? >>>> >>>> Ketan >>>> >>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: >>>> >>>>> OK. Was there a cookbook on the ssh settings? Did you set up a >>>>> $HOME/.ssh/auth.defaults per the user guide? >>>>> >>>>> Here is an auth.defaults example. Im not sure its 100% correct, but >>>>> could serve as a base for you: >>>>> >>>>> xlogin1.pads.ci.uchicago.edu.type=password >>>>> xlogin1.pads.ci.uchicago.edu.username=wilde >>>>> >>>>> login.pads.ci.uchicago.edu.type=key >>>>> login.pads.ci.uchicago.edu.username=wilde >>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE >>>>> mode=600!!! >>>>> >>>>> login1.pads.ci.uchicago.edu.type=key >>>>> login1.pads.ci.uchicago.edu.username=wilde >>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE >>>>> SURE mode=600!!! >>>>> >>>>> login.mcs.anl.gov.type=key >>>>> login.mcs.anl.gov.username=wilde >>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa >>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE >>>>> mode=600!!! >>>>> >>>>> - Mike >>>>> >>>>> ----- Original Message ----- >>>>>> It does look like an ssh problem. I am getting the same stderr and >>>>>> log >>>>>> messages on trying to communicate from Bridled to Communicado. >>>>>> >>>>>> Ketan >>>>>> >>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: >>>>>> >>>>>>> Have you already run a simple hellow-world swift test from >>>>>>> communicado to bridled to make sure you have ssh configured >>>>>>> correctly? I would do that first. >>>>>>> >>>>>>> Im not sure if an ssh problem explains what you show below, or >>>>>>> not. >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> Thanks, I made the change. However, now, I am getting the >>>>>>>> following >>>>>>>> on >>>>>>>> my stderr >>>>>>>> >>>>>>>> >>>>>>>> =========== >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>> -sites.file >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>> modified >>>>>>>> locally) >>>>>>>> >>>>>>>> RunID: 20110428-1022-n9s0k0e0 >>>>>>>> Progress: >>>>>>>> [ketan] >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> [ketan] Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> ======== >>>>>>>> >>>>>>>> And from the log it seems some network transmission has failed: >>>>>>>> >>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending >>>>>>>> SSH_MSG_SERVICE_REQUEST >>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon >>>>>>>> Received >>>>>>>> SSH_MSG_SERVICE_ACCEPT >>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The >>>>>>>> Transport Protocol thread failed >>>>>>>> java.io.IOException: The socket is EOF >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) >>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>> >>>>>>>> >>>>>>>> Any clues? >>>>>>>> Ketan >>>>>>>> >>>>>>>> >>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: >>>>>>>> >>>>>>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh >>>>>>>>> but >>>>>>>>> you used pbs in your tc.data. >>>>>>>>> >>>>>>>>> - Mike >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> Some context: >>>>>>>>>> I am trying to submit a big run on Beagle using swift + >>>>>>>>>> coasters. >>>>>>>>>> However, a previous run is already underway on beagle. So, >>>>>>>>>> there >>>>>>>>>> are >>>>>>>>>> two difficulties running a new run from its login node: >>>>>>>>>> >>>>>>>>>> 1. Running another swift from the same jvm will result in chaos >>>>>>>>>> on >>>>>>>>>> the >>>>>>>>>> logs (As far as I know, please correct me if this is not the >>>>>>>>>> case >>>>>>>>>> anymore) >>>>>>>>>> >>>>>>>>>> 2. Login node is already under load because of my running >>>>>>>>>> previous >>>>>>>>>> big >>>>>>>>>> run. >>>>>>>>>> >>>>>>>>>> /context >>>>>>>>>> >>>>>>>>>> So, I am now trying to submit this big run from a remote host >>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, >>>>>>>>>> provider >>>>>>>>>> coaster. I tried the similar approach on a trial swift script >>>>>>>>>> but >>>>>>>>>> getting error. >>>>>>>>>> >>>>>>>>>> Following is the error message: >>>>>>>>>> >>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>>> -sites.file >>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>>>> modified >>>>>>>>>> locally) >>>>>>>>>> >>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 >>>>>>>>>> Progress: >>>>>>>>>> The application "cat" is not available in your tc.data catalog >>>>>>>>>> Caused by: >>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException >>>>>>>>>> Final status: Failed:1 >>>>>>>>>> The following errors have occurred: >>>>>>>>>> 1. The application "cat" is not available in your tc.data >>>>>>>>>> catalog >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. >>>>>>>>>> >>>>>>>>>> Could someone indicate if what I am doing is doable and if so >>>>>>>>>> how >>>>>>>>>> can >>>>>>>>>> I correctly configure my sites and tc setup. >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> Ketan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Swift-devel mailing list >>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Michael Wilde >>>>>>>>> Computation Institute, University of Chicago >>>>>>>>> Mathematics and Computer Science Division >>>>>>>>> Argonne National Laboratory >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Michael Wilde >>>>>>> Computation Institute, University of Chicago >>>>>>> Mathematics and Computer Science Division >>>>>>> Argonne National Laboratory >>>>>>> >>>>> >>>>> -- >>>>> Michael Wilde >>>>> Computation Institute, University of Chicago >>>>> Mathematics and Computer Science Division >>>>> Argonne National Laboratory >>>>> >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Thu Apr 28 14:26:34 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 28 Apr 2011 12:26:34 -0700 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: References: <1403032176.2307.1304013826306.JavaMail.root@zimbra.anl.gov> <1952F257-772E-4C97-8AE8-0A73F8E9C7E8@gmail.com> <1304018230.10193.0.camel@blabla2.none> Message-ID: <1304018794.10471.1.camel@blabla2.none> That EOFException doesn't make much sense. On beagle you should have something called coaster.log in ~/.globus/coasters. Can post a link to that? Mihael On Thu, 2011-04-28 at 14:21 -0500, Ketan Maheshwari wrote: > On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote: > > > What does your sites file look like? > > ** For beagle ** > > > > > > CI-CCR000013 > > 24:cray:pack > > 24 > 1000 > 1 > 1 > 1 > > .63 > 10000 > > > $HOME/swift.workdir > > > > > > ** for communicado ** > > > > > > > .63 > 10000 > > > $HOME/swift.workdir > > > > > > > > > On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: > >> Ok, I got past CredentialException with grid-proxy-init, now I am facing this (note: I have turned on provider staging) : > >> > >> ======== > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) > >> > >> RunID: 20110428-1332-llaa031f > >> Progress: > >> Could not start connection handler > >> java.io.EOFException > >> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >> at org.globus.net.BaseServer.run(BaseServer.java:247) > >> at java.lang.Thread.run(Thread.java:662) > >> Progress: Submitted:1 > >> Could not start connection handler > >> java.io.EOFException > >> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >> at org.globus.net.BaseServer.run(BaseServer.java:247) > >> at java.lang.Thread.run(Thread.java:662) > >> Progress: Submitted:1 > >> Exception in cat: > >> Arguments: [data.txt] > >> Host: beagle-remote-pbs-coasters-ssh > >> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs > >> ---- > >> > >> Caused by: Could not submit job > >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job > >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service > >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task ended before registration was received. > >> STDOUT: > >> STDERR: > >> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1 > >> Final status: Failed:1 > >> The following errors have occurred: > >> 1. Job failed with an exit code of 1 > >> > >> ======== > >> > >> > >> From bridled to communicado, I see the following error: > >> > >> ************** > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) > >> > >> RunID: 20110428-1335-k685b2ye > >> Progress: > >> Progress: Submitted:1 > >> Progress: Active:1 > >> Exception in cat: > >> Arguments: [data.txt] > >> Host: communicado-ssh > >> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs > >> ---- > >> > >> Caused by: Job failed with an exit code of 524 > >> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 524 > >> Final status: Failed:1 > >> The following errors have occurred: > >> 1. Job failed with an exit code of 524 > >> > >> ************ > >> > >> > >> -- > >> Ketan > >> > >> > >> > >> > >> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > >> > >>> For now - create a proxy using grid-proxy-init on the swift execution machine. > >>> I think there is an option to set "no security" for this config but I cant recall where that is specified. Maybe swift.properties, I cant recall. > >>> > >>> - Mike > >>> > >>> ----- Original Message ----- > >>>> Hi, > >>>> > >>>> It looks better now. However, I am getting the following: > >>>> > >>>> ===== > >>>> > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file > >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > >>>> locally) > >>>> > >>>> RunID: 20110428-1251-oi9theh8 > >>>> Progress: > >>>> Progress: Stage in:1 > >>>> Could not submit job > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>> Could not submit job > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>> Could not start coaster service > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > >>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file > >>>> (/tmp/x509up_u2006) not found. > >>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy > >>>> file (/tmp/x509up_u2006) not found. > >>>> Failed to transfer wrapper log from > >>>> catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh > >>>> > >>>> ===== > >>>> > >>>> How do I specify "-nosec" on automatic coasters? > >>>> > >>>> Ketan > >>>> > >>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > >>>> > >>>>> OK. Was there a cookbook on the ssh settings? Did you set up a > >>>>> $HOME/.ssh/auth.defaults per the user guide? > >>>>> > >>>>> Here is an auth.defaults example. Im not sure its 100% correct, but > >>>>> could serve as a base for you: > >>>>> > >>>>> xlogin1.pads.ci.uchicago.edu.type=password > >>>>> xlogin1.pads.ci.uchicago.edu.username=wilde > >>>>> > >>>>> login.pads.ci.uchicago.edu.type=key > >>>>> login.pads.ci.uchicago.edu.username=wilde > >>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE > >>>>> mode=600!!! > >>>>> > >>>>> login1.pads.ci.uchicago.edu.type=key > >>>>> login1.pads.ci.uchicago.edu.username=wilde > >>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > >>>>> SURE mode=600!!! > >>>>> > >>>>> login.mcs.anl.gov.type=key > >>>>> login.mcs.anl.gov.username=wilde > >>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > >>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > >>>>> mode=600!!! > >>>>> > >>>>> - Mike > >>>>> > >>>>> ----- Original Message ----- > >>>>>> It does look like an ssh problem. I am getting the same stderr and > >>>>>> log > >>>>>> messages on trying to communicate from Bridled to Communicado. > >>>>>> > >>>>>> Ketan > >>>>>> > >>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > >>>>>> > >>>>>>> Have you already run a simple hellow-world swift test from > >>>>>>> communicado to bridled to make sure you have ssh configured > >>>>>>> correctly? I would do that first. > >>>>>>> > >>>>>>> Im not sure if an ssh problem explains what you show below, or > >>>>>>> not. > >>>>>>> > >>>>>>> - Mike > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> Thanks, I made the change. However, now, I am getting the > >>>>>>>> following > >>>>>>>> on > >>>>>>>> my stderr > >>>>>>>> > >>>>>>>> > >>>>>>>> =========== > >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>> -sites.file > >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>>> modified > >>>>>>>> locally) > >>>>>>>> > >>>>>>>> RunID: 20110428-1022-n9s0k0e0 > >>>>>>>> Progress: > >>>>>>>> [ketan] > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> [ketan] Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> ======== > >>>>>>>> > >>>>>>>> And from the log it seems some network transmission has failed: > >>>>>>>> > >>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending > >>>>>>>> SSH_MSG_SERVICE_REQUEST > >>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon > >>>>>>>> Received > >>>>>>>> SSH_MSG_SERVICE_ACCEPT > >>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > >>>>>>>> Transport Protocol thread failed > >>>>>>>> java.io.IOException: The socket is EOF > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > >>>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>>> > >>>>>>>> > >>>>>>>> Any clues? > >>>>>>>> Ketan > >>>>>>>> > >>>>>>>> > >>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > >>>>>>>> > >>>>>>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh > >>>>>>>>> but > >>>>>>>>> you used pbs in your tc.data. > >>>>>>>>> > >>>>>>>>> - Mike > >>>>>>>>> > >>>>>>>>> ----- Original Message ----- > >>>>>>>>>> Hello, > >>>>>>>>>> > >>>>>>>>>> Some context: > >>>>>>>>>> I am trying to submit a big run on Beagle using swift + > >>>>>>>>>> coasters. > >>>>>>>>>> However, a previous run is already underway on beagle. So, > >>>>>>>>>> there > >>>>>>>>>> are > >>>>>>>>>> two difficulties running a new run from its login node: > >>>>>>>>>> > >>>>>>>>>> 1. Running another swift from the same jvm will result in chaos > >>>>>>>>>> on > >>>>>>>>>> the > >>>>>>>>>> logs (As far as I know, please correct me if this is not the > >>>>>>>>>> case > >>>>>>>>>> anymore) > >>>>>>>>>> > >>>>>>>>>> 2. Login node is already under load because of my running > >>>>>>>>>> previous > >>>>>>>>>> big > >>>>>>>>>> run. > >>>>>>>>>> > >>>>>>>>>> /context > >>>>>>>>>> > >>>>>>>>>> So, I am now trying to submit this big run from a remote host > >>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, > >>>>>>>>>> provider > >>>>>>>>>> coaster. I tried the similar approach on a trial swift script > >>>>>>>>>> but > >>>>>>>>>> getting error. > >>>>>>>>>> > >>>>>>>>>> Following is the error message: > >>>>>>>>>> > >>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>>>> -sites.file > >>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>>>>> modified > >>>>>>>>>> locally) > >>>>>>>>>> > >>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 > >>>>>>>>>> Progress: > >>>>>>>>>> The application "cat" is not available in your tc.data catalog > >>>>>>>>>> Caused by: > >>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > >>>>>>>>>> Final status: Failed:1 > >>>>>>>>>> The following errors have occurred: > >>>>>>>>>> 1. The application "cat" is not available in your tc.data > >>>>>>>>>> catalog > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. > >>>>>>>>>> > >>>>>>>>>> Could someone indicate if what I am doing is doable and if so > >>>>>>>>>> how > >>>>>>>>>> can > >>>>>>>>>> I correctly configure my sites and tc setup. > >>>>>>>>>> > >>>>>>>>>> Thanks. > >>>>>>>>>> Ketan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> Swift-devel mailing list > >>>>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Michael Wilde > >>>>>>>>> Computation Institute, University of Chicago > >>>>>>>>> Mathematics and Computer Science Division > >>>>>>>>> Argonne National Laboratory > >>>>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Michael Wilde > >>>>>>> Computation Institute, University of Chicago > >>>>>>> Mathematics and Computer Science Division > >>>>>>> Argonne National Laboratory > >>>>>>> > >>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>> > >>> -- > >>> Michael Wilde > >>> Computation Institute, University of Chicago > >>> Mathematics and Computer Science Division > >>> Argonne National Laboratory > >>> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From ketancmaheshwari at gmail.com Thu Apr 28 14:29:50 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 14:29:50 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1304018794.10471.1.camel@blabla2.none> References: <1403032176.2307.1304013826306.JavaMail.root@zimbra.anl.gov> <1952F257-772E-4C97-8AE8-0A73F8E9C7E8@gmail.com> <1304018230.10193.0.camel@blabla2.none> <1304018794.10471.1.camel@blabla2.none> Message-ID: They are here : /home/ketan/.globus/coasters On Apr 28, 2011, at 2:26 PM, Mihael Hategan wrote: > That EOFException doesn't make much sense. > > On beagle you should have something called coaster.log in > ~/.globus/coasters. > > Can post a link to that? > > Mihael > > On Thu, 2011-04-28 at 14:21 -0500, Ketan Maheshwari wrote: >> On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote: >> >>> What does your sites file look like? >> >> ** For beagle ** >> >> >> >> >> >> CI-CCR000013 >> >> 24:cray:pack >> >> 24 >> 1000 >> 1 >> 1 >> 1 >> >> .63 >> 10000 >> >> >> $HOME/swift.workdir >> >> >> >> >> >> ** for communicado ** >> >> >> >> >> >> >> .63 >> 10000 >> >> >> $HOME/swift.workdir >> >> >> >> >> >>> >>> On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: >>>> Ok, I got past CredentialException with grid-proxy-init, now I am facing this (note: I have turned on provider staging) : >>>> >>>> ======== >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) >>>> >>>> RunID: 20110428-1332-llaa031f >>>> Progress: >>>> Could not start connection handler >>>> java.io.EOFException >>>> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >>>> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >>>> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >>>> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >>>> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >>>> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >>>> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >>>> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >>>> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >>>> at org.globus.net.BaseServer.run(BaseServer.java:247) >>>> at java.lang.Thread.run(Thread.java:662) >>>> Progress: Submitted:1 >>>> Could not start connection handler >>>> java.io.EOFException >>>> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >>>> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >>>> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >>>> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >>>> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >>>> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >>>> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >>>> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >>>> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >>>> at org.globus.net.BaseServer.run(BaseServer.java:247) >>>> at java.lang.Thread.run(Thread.java:662) >>>> Progress: Submitted:1 >>>> Exception in cat: >>>> Arguments: [data.txt] >>>> Host: beagle-remote-pbs-coasters-ssh >>>> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs >>>> ---- >>>> >>>> Caused by: Could not submit job >>>> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job >>>> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service >>>> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task ended before registration was received. >>>> STDOUT: >>>> STDERR: >>>> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1 >>>> Final status: Failed:1 >>>> The following errors have occurred: >>>> 1. Job failed with an exit code of 1 >>>> >>>> ======== >>>> >>>> >>>> From bridled to communicado, I see the following error: >>>> >>>> ************** >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) >>>> >>>> RunID: 20110428-1335-k685b2ye >>>> Progress: >>>> Progress: Submitted:1 >>>> Progress: Active:1 >>>> Exception in cat: >>>> Arguments: [data.txt] >>>> Host: communicado-ssh >>>> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs >>>> ---- >>>> >>>> Caused by: Job failed with an exit code of 524 >>>> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 524 >>>> Final status: Failed:1 >>>> The following errors have occurred: >>>> 1. Job failed with an exit code of 524 >>>> >>>> ************ >>>> >>>> >>>> -- >>>> Ketan >>>> >>>> >>>> >>>> >>>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: >>>> >>>>> For now - create a proxy using grid-proxy-init on the swift execution machine. >>>>> I think there is an option to set "no security" for this config but I cant recall where that is specified. Maybe swift.properties, I cant recall. >>>>> >>>>> - Mike >>>>> >>>>> ----- Original Message ----- >>>>>> Hi, >>>>>> >>>>>> It looks better now. However, I am getting the following: >>>>>> >>>>>> ===== >>>>>> >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified >>>>>> locally) >>>>>> >>>>>> RunID: 20110428-1251-oi9theh8 >>>>>> Progress: >>>>>> Progress: Stage in:1 >>>>>> Could not submit job >>>>>> Caused by: >>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>>> Could not submit job >>>>>> Caused by: >>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>>> Could not start coaster service >>>>>> Caused by: >>>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: >>>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file >>>>>> (/tmp/x509up_u2006) not found. >>>>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy >>>>>> file (/tmp/x509up_u2006) not found. >>>>>> Failed to transfer wrapper log from >>>>>> catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh >>>>>> >>>>>> ===== >>>>>> >>>>>> How do I specify "-nosec" on automatic coasters? >>>>>> >>>>>> Ketan >>>>>> >>>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: >>>>>> >>>>>>> OK. Was there a cookbook on the ssh settings? Did you set up a >>>>>>> $HOME/.ssh/auth.defaults per the user guide? >>>>>>> >>>>>>> Here is an auth.defaults example. Im not sure its 100% correct, but >>>>>>> could serve as a base for you: >>>>>>> >>>>>>> xlogin1.pads.ci.uchicago.edu.type=password >>>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde >>>>>>> >>>>>>> login.pads.ci.uchicago.edu.type=key >>>>>>> login.pads.ci.uchicago.edu.username=wilde >>>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE >>>>>>> mode=600!!! >>>>>>> >>>>>>> login1.pads.ci.uchicago.edu.type=key >>>>>>> login1.pads.ci.uchicago.edu.username=wilde >>>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE >>>>>>> SURE mode=600!!! >>>>>>> >>>>>>> login.mcs.anl.gov.type=key >>>>>>> login.mcs.anl.gov.username=wilde >>>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa >>>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE >>>>>>> mode=600!!! >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> It does look like an ssh problem. I am getting the same stderr and >>>>>>>> log >>>>>>>> messages on trying to communicate from Bridled to Communicado. >>>>>>>> >>>>>>>> Ketan >>>>>>>> >>>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: >>>>>>>> >>>>>>>>> Have you already run a simple hellow-world swift test from >>>>>>>>> communicado to bridled to make sure you have ssh configured >>>>>>>>> correctly? I would do that first. >>>>>>>>> >>>>>>>>> Im not sure if an ssh problem explains what you show below, or >>>>>>>>> not. >>>>>>>>> >>>>>>>>> - Mike >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>>> Thanks, I made the change. However, now, I am getting the >>>>>>>>>> following >>>>>>>>>> on >>>>>>>>>> my stderr >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> =========== >>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>>> -sites.file >>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>>>> modified >>>>>>>>>> locally) >>>>>>>>>> >>>>>>>>>> RunID: 20110428-1022-n9s0k0e0 >>>>>>>>>> Progress: >>>>>>>>>> [ketan] >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> [ketan] Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> ======== >>>>>>>>>> >>>>>>>>>> And from the log it seems some network transmission has failed: >>>>>>>>>> >>>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending >>>>>>>>>> SSH_MSG_SERVICE_REQUEST >>>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon >>>>>>>>>> Received >>>>>>>>>> SSH_MSG_SERVICE_ACCEPT >>>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The >>>>>>>>>> Transport Protocol thread failed >>>>>>>>>> java.io.IOException: The socket is EOF >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) >>>>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any clues? >>>>>>>>>> Ketan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: >>>>>>>>>> >>>>>>>>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh >>>>>>>>>>> but >>>>>>>>>>> you used pbs in your tc.data. >>>>>>>>>>> >>>>>>>>>>> - Mike >>>>>>>>>>> >>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> Some context: >>>>>>>>>>>> I am trying to submit a big run on Beagle using swift + >>>>>>>>>>>> coasters. >>>>>>>>>>>> However, a previous run is already underway on beagle. So, >>>>>>>>>>>> there >>>>>>>>>>>> are >>>>>>>>>>>> two difficulties running a new run from its login node: >>>>>>>>>>>> >>>>>>>>>>>> 1. Running another swift from the same jvm will result in chaos >>>>>>>>>>>> on >>>>>>>>>>>> the >>>>>>>>>>>> logs (As far as I know, please correct me if this is not the >>>>>>>>>>>> case >>>>>>>>>>>> anymore) >>>>>>>>>>>> >>>>>>>>>>>> 2. Login node is already under load because of my running >>>>>>>>>>>> previous >>>>>>>>>>>> big >>>>>>>>>>>> run. >>>>>>>>>>>> >>>>>>>>>>>> /context >>>>>>>>>>>> >>>>>>>>>>>> So, I am now trying to submit this big run from a remote host >>>>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, >>>>>>>>>>>> provider >>>>>>>>>>>> coaster. I tried the similar approach on a trial swift script >>>>>>>>>>>> but >>>>>>>>>>>> getting error. >>>>>>>>>>>> >>>>>>>>>>>> Following is the error message: >>>>>>>>>>>> >>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>>>>> -sites.file >>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>>>>>> modified >>>>>>>>>>>> locally) >>>>>>>>>>>> >>>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 >>>>>>>>>>>> Progress: >>>>>>>>>>>> The application "cat" is not available in your tc.data catalog >>>>>>>>>>>> Caused by: >>>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException >>>>>>>>>>>> Final status: Failed:1 >>>>>>>>>>>> The following errors have occurred: >>>>>>>>>>>> 1. The application "cat" is not available in your tc.data >>>>>>>>>>>> catalog >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. >>>>>>>>>>>> >>>>>>>>>>>> Could someone indicate if what I am doing is doable and if so >>>>>>>>>>>> how >>>>>>>>>>>> can >>>>>>>>>>>> I correctly configure my sites and tc setup. >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Ketan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Swift-devel mailing list >>>>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Michael Wilde >>>>>>>>>>> Computation Institute, University of Chicago >>>>>>>>>>> Mathematics and Computer Science Division >>>>>>>>>>> Argonne National Laboratory >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Michael Wilde >>>>>>>>> Computation Institute, University of Chicago >>>>>>>>> Mathematics and Computer Science Division >>>>>>>>> Argonne National Laboratory >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Michael Wilde >>>>>>> Computation Institute, University of Chicago >>>>>>> Mathematics and Computer Science Division >>>>>>> Argonne National Laboratory >>>>>>> >>>>> >>>>> -- >>>>> Michael Wilde >>>>> Computation Institute, University of Chicago >>>>> Mathematics and Computer Science Division >>>>> Argonne National Laboratory >>>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> > > From wilde at mcs.anl.gov Thu Apr 28 14:32:31 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 14:32:31 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: Message-ID: <2000840687.3799.1304019151105.JavaMail.root@zimbra.anl.gov> What is your communicado pool trying to test? If thats to run eg bridled to communicado, I think jobmanager should be jobmanager="ssh:local" ??? - Mike ----- Original Message ----- > On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote: > > > What does your sites file look like? > > ** For beagle ** > > > > > jobmanager="ssh:pbs"/> > CI-CCR000013 > > 24:cray:pack > > 24 > 1000 > 1 > 1 > 1 > > .63 > 10000 > > > $HOME/swift.workdir > > > > > > ** for communicado ** > > > > > jobmanager="ssh:ssh"/> > > .63 > 10000 > > > $HOME/swift.workdir > > > > > > > > > On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: > >> Ok, I got past CredentialException with grid-proxy-init, now I am > >> facing this (note: I have turned on provider staging) : > >> > >> ======== > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >> -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >> modified locally) > >> > >> RunID: 20110428-1332-llaa031f > >> Progress: > >> Could not start connection handler > >> java.io.EOFException > >> at > >> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >> at > >> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >> at > >> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >> at > >> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >> at > >> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >> at > >> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >> at org.globus.net.BaseServer.run(BaseServer.java:247) > >> at java.lang.Thread.run(Thread.java:662) > >> Progress: Submitted:1 > >> Could not start connection handler > >> java.io.EOFException > >> at > >> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >> at > >> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >> at > >> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >> at > >> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >> at > >> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >> at > >> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >> at org.globus.net.BaseServer.run(BaseServer.java:247) > >> at java.lang.Thread.run(Thread.java:662) > >> Progress: Submitted:1 > >> Exception in cat: > >> Arguments: [data.txt] > >> Host: beagle-remote-pbs-coasters-ssh > >> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: > >> outs > >> ---- > >> > >> Caused by: Could not submit job > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Could not submit job > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Could not start coaster service > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Task ended before registration was received. > >> STDOUT: > >> STDERR: > >> Caused by: > >> org.globus.cog.abstraction.impl.common.execution.JobException: Job > >> failed with an exit code of 1 > >> Final status: Failed:1 > >> The following errors have occurred: > >> 1. Job failed with an exit code of 1 > >> > >> ======== > >> > >> > >> From bridled to communicado, I see the following error: > >> > >> ************** > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >> -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >> modified locally) > >> > >> RunID: 20110428-1335-k685b2ye > >> Progress: > >> Progress: Submitted:1 > >> Progress: Active:1 > >> Exception in cat: > >> Arguments: [data.txt] > >> Host: communicado-ssh > >> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: > >> outs > >> ---- > >> > >> Caused by: Job failed with an exit code of 524 > >> Caused by: > >> org.globus.cog.abstraction.impl.common.execution.JobException: Job > >> failed with an exit code of 524 > >> Final status: Failed:1 > >> The following errors have occurred: > >> 1. Job failed with an exit code of 524 > >> > >> ************ > >> > >> > >> -- > >> Ketan > >> > >> > >> > >> > >> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > >> > >>> For now - create a proxy using grid-proxy-init on the swift > >>> execution machine. > >>> I think there is an option to set "no security" for this config > >>> but I cant recall where that is specified. Maybe swift.properties, > >>> I cant recall. > >>> > >>> - Mike > >>> > >>> ----- Original Message ----- > >>>> Hi, > >>>> > >>>> It looks better now. However, I am getting the following: > >>>> > >>>> ===== > >>>> > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>> -sites.file > >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>> modified > >>>> locally) > >>>> > >>>> RunID: 20110428-1251-oi9theh8 > >>>> Progress: > >>>> Progress: Stage in:1 > >>>> Could not submit job > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>> Could not submit job > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>> Could not start coaster service > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > >>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file > >>>> (/tmp/x509up_u2006) not found. > >>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] > >>>> Proxy > >>>> file (/tmp/x509up_u2006) not found. > >>>> Failed to transfer wrapper log from > >>>> catsn-20110428-1251-oi9theh8/info/e on > >>>> beagle-remote-pbs-coasters-ssh > >>>> > >>>> ===== > >>>> > >>>> How do I specify "-nosec" on automatic coasters? > >>>> > >>>> Ketan > >>>> > >>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > >>>> > >>>>> OK. Was there a cookbook on the ssh settings? Did you set up a > >>>>> $HOME/.ssh/auth.defaults per the user guide? > >>>>> > >>>>> Here is an auth.defaults example. Im not sure its 100% correct, > >>>>> but > >>>>> could serve as a base for you: > >>>>> > >>>>> xlogin1.pads.ci.uchicago.edu.type=password > >>>>> xlogin1.pads.ci.uchicago.edu.username=wilde > >>>>> > >>>>> login.pads.ci.uchicago.edu.type=key > >>>>> login.pads.ci.uchicago.edu.username=wilde > >>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > >>>>> SURE > >>>>> mode=600!!! > >>>>> > >>>>> login1.pads.ci.uchicago.edu.type=key > >>>>> login1.pads.ci.uchicago.edu.username=wilde > >>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > >>>>> SURE mode=600!!! > >>>>> > >>>>> login.mcs.anl.gov.type=key > >>>>> login.mcs.anl.gov.username=wilde > >>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > >>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > >>>>> mode=600!!! > >>>>> > >>>>> - Mike > >>>>> > >>>>> ----- Original Message ----- > >>>>>> It does look like an ssh problem. I am getting the same stderr > >>>>>> and > >>>>>> log > >>>>>> messages on trying to communicate from Bridled to Communicado. > >>>>>> > >>>>>> Ketan > >>>>>> > >>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > >>>>>> > >>>>>>> Have you already run a simple hellow-world swift test from > >>>>>>> communicado to bridled to make sure you have ssh configured > >>>>>>> correctly? I would do that first. > >>>>>>> > >>>>>>> Im not sure if an ssh problem explains what you show below, or > >>>>>>> not. > >>>>>>> > >>>>>>> - Mike > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> Thanks, I made the change. However, now, I am getting the > >>>>>>>> following > >>>>>>>> on > >>>>>>>> my stderr > >>>>>>>> > >>>>>>>> > >>>>>>>> =========== > >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>> -sites.file > >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>>> modified > >>>>>>>> locally) > >>>>>>>> > >>>>>>>> RunID: 20110428-1022-n9s0k0e0 > >>>>>>>> Progress: > >>>>>>>> [ketan] > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> [ketan] Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> ======== > >>>>>>>> > >>>>>>>> And from the log it seems some network transmission has > >>>>>>>> failed: > >>>>>>>> > >>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon > >>>>>>>> Sending > >>>>>>>> SSH_MSG_SERVICE_REQUEST > >>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon > >>>>>>>> Received > >>>>>>>> SSH_MSG_SERVICE_ACCEPT > >>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > >>>>>>>> Transport Protocol thread failed > >>>>>>>> java.io.IOException: The socket is EOF > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > >>>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>>> > >>>>>>>> > >>>>>>>> Any clues? > >>>>>>>> Ketan > >>>>>>>> > >>>>>>>> > >>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > >>>>>>>> > >>>>>>>>> The pool name in your sites file is > >>>>>>>>> pads-remote-pbs-coasters-ssh > >>>>>>>>> but > >>>>>>>>> you used pbs in your tc.data. > >>>>>>>>> > >>>>>>>>> - Mike > >>>>>>>>> > >>>>>>>>> ----- Original Message ----- > >>>>>>>>>> Hello, > >>>>>>>>>> > >>>>>>>>>> Some context: > >>>>>>>>>> I am trying to submit a big run on Beagle using swift + > >>>>>>>>>> coasters. > >>>>>>>>>> However, a previous run is already underway on beagle. So, > >>>>>>>>>> there > >>>>>>>>>> are > >>>>>>>>>> two difficulties running a new run from its login node: > >>>>>>>>>> > >>>>>>>>>> 1. Running another swift from the same jvm will result in > >>>>>>>>>> chaos > >>>>>>>>>> on > >>>>>>>>>> the > >>>>>>>>>> logs (As far as I know, please correct me if this is not > >>>>>>>>>> the > >>>>>>>>>> case > >>>>>>>>>> anymore) > >>>>>>>>>> > >>>>>>>>>> 2. Login node is already under load because of my running > >>>>>>>>>> previous > >>>>>>>>>> big > >>>>>>>>>> run. > >>>>>>>>>> > >>>>>>>>>> /context > >>>>>>>>>> > >>>>>>>>>> So, I am now trying to submit this big run from a remote > >>>>>>>>>> host > >>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, > >>>>>>>>>> provider > >>>>>>>>>> coaster. I tried the similar approach on a trial swift > >>>>>>>>>> script > >>>>>>>>>> but > >>>>>>>>>> getting error. > >>>>>>>>>> > >>>>>>>>>> Following is the error message: > >>>>>>>>>> > >>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>>>> -sites.file > >>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 > >>>>>>>>>> (cog > >>>>>>>>>> modified > >>>>>>>>>> locally) > >>>>>>>>>> > >>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 > >>>>>>>>>> Progress: > >>>>>>>>>> The application "cat" is not available in your tc.data > >>>>>>>>>> catalog > >>>>>>>>>> Caused by: > >>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > >>>>>>>>>> Final status: Failed:1 > >>>>>>>>>> The following errors have occurred: > >>>>>>>>>> 1. The application "cat" is not available in your tc.data > >>>>>>>>>> catalog > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. > >>>>>>>>>> > >>>>>>>>>> Could someone indicate if what I am doing is doable and if > >>>>>>>>>> so > >>>>>>>>>> how > >>>>>>>>>> can > >>>>>>>>>> I correctly configure my sites and tc setup. > >>>>>>>>>> > >>>>>>>>>> Thanks. > >>>>>>>>>> Ketan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> Swift-devel mailing list > >>>>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Michael Wilde > >>>>>>>>> Computation Institute, University of Chicago > >>>>>>>>> Mathematics and Computer Science Division > >>>>>>>>> Argonne National Laboratory > >>>>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Michael Wilde > >>>>>>> Computation Institute, University of Chicago > >>>>>>> Mathematics and Computer Science Division > >>>>>>> Argonne National Laboratory > >>>>>>> > >>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>> > >>> -- > >>> Michael Wilde > >>> Computation Institute, University of Chicago > >>> Mathematics and Computer Science Division > >>> Argonne National Laboratory > >>> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu Apr 28 14:34:13 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 28 Apr 2011 12:34:13 -0700 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: References: <1403032176.2307.1304013826306.JavaMail.root@zimbra.anl.gov> <1952F257-772E-4C97-8AE8-0A73F8E9C7E8@gmail.com> <1304018230.10193.0.camel@blabla2.none> <1304018794.10471.1.camel@blabla2.none> Message-ID: <1304019253.10893.0.camel@blabla2.none> You have a bunch of uknown CA errors in there. You should have the CA public key for your proxy in ~/.globus/certificates (on both client and server machines). Mihael On Thu, 2011-04-28 at 14:29 -0500, Ketan Maheshwari wrote: > They are here : /home/ketan/.globus/coasters > > > On Apr 28, 2011, at 2:26 PM, Mihael Hategan wrote: > > > That EOFException doesn't make much sense. > > > > On beagle you should have something called coaster.log in > > ~/.globus/coasters. > > > > Can post a link to that? > > > > Mihael > > > > On Thu, 2011-04-28 at 14:21 -0500, Ketan Maheshwari wrote: > >> On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote: > >> > >>> What does your sites file look like? > >> > >> ** For beagle ** > >> > >> > >> > >> > >> > >> CI-CCR000013 > >> > >> 24:cray:pack > >> > >> 24 > >> 1000 > >> 1 > >> 1 > >> 1 > >> > >> .63 > >> 10000 > >> > >> > >> $HOME/swift.workdir > >> > >> > >> > >> > >> > >> ** for communicado ** > >> > >> > >> > >> > >> > >> > >> .63 > >> 10000 > >> > >> > >> $HOME/swift.workdir > >> > >> > >> > >> > >> > >>> > >>> On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: > >>>> Ok, I got past CredentialException with grid-proxy-init, now I am facing this (note: I have turned on provider staging) : > >>>> > >>>> ======== > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) > >>>> > >>>> RunID: 20110428-1332-llaa031f > >>>> Progress: > >>>> Could not start connection handler > >>>> java.io.EOFException > >>>> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >>>> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >>>> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >>>> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >>>> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >>>> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >>>> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >>>> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >>>> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >>>> at org.globus.net.BaseServer.run(BaseServer.java:247) > >>>> at java.lang.Thread.run(Thread.java:662) > >>>> Progress: Submitted:1 > >>>> Could not start connection handler > >>>> java.io.EOFException > >>>> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >>>> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >>>> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >>>> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >>>> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >>>> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >>>> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >>>> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >>>> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >>>> at org.globus.net.BaseServer.run(BaseServer.java:247) > >>>> at java.lang.Thread.run(Thread.java:662) > >>>> Progress: Submitted:1 > >>>> Exception in cat: > >>>> Arguments: [data.txt] > >>>> Host: beagle-remote-pbs-coasters-ssh > >>>> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs > >>>> ---- > >>>> > >>>> Caused by: Could not submit job > >>>> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job > >>>> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service > >>>> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task ended before registration was received. > >>>> STDOUT: > >>>> STDERR: > >>>> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1 > >>>> Final status: Failed:1 > >>>> The following errors have occurred: > >>>> 1. Job failed with an exit code of 1 > >>>> > >>>> ======== > >>>> > >>>> > >>>> From bridled to communicado, I see the following error: > >>>> > >>>> ************** > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) > >>>> > >>>> RunID: 20110428-1335-k685b2ye > >>>> Progress: > >>>> Progress: Submitted:1 > >>>> Progress: Active:1 > >>>> Exception in cat: > >>>> Arguments: [data.txt] > >>>> Host: communicado-ssh > >>>> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs > >>>> ---- > >>>> > >>>> Caused by: Job failed with an exit code of 524 > >>>> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 524 > >>>> Final status: Failed:1 > >>>> The following errors have occurred: > >>>> 1. Job failed with an exit code of 524 > >>>> > >>>> ************ > >>>> > >>>> > >>>> -- > >>>> Ketan > >>>> > >>>> > >>>> > >>>> > >>>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > >>>> > >>>>> For now - create a proxy using grid-proxy-init on the swift execution machine. > >>>>> I think there is an option to set "no security" for this config but I cant recall where that is specified. Maybe swift.properties, I cant recall. > >>>>> > >>>>> - Mike > >>>>> > >>>>> ----- Original Message ----- > >>>>>> Hi, > >>>>>> > >>>>>> It looks better now. However, I am getting the following: > >>>>>> > >>>>>> ===== > >>>>>> > >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file > >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > >>>>>> locally) > >>>>>> > >>>>>> RunID: 20110428-1251-oi9theh8 > >>>>>> Progress: > >>>>>> Progress: Stage in:1 > >>>>>> Could not submit job > >>>>>> Caused by: > >>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>>>> Could not submit job > >>>>>> Caused by: > >>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>>>> Could not start coaster service > >>>>>> Caused by: > >>>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > >>>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file > >>>>>> (/tmp/x509up_u2006) not found. > >>>>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy > >>>>>> file (/tmp/x509up_u2006) not found. > >>>>>> Failed to transfer wrapper log from > >>>>>> catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh > >>>>>> > >>>>>> ===== > >>>>>> > >>>>>> How do I specify "-nosec" on automatic coasters? > >>>>>> > >>>>>> Ketan > >>>>>> > >>>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > >>>>>> > >>>>>>> OK. Was there a cookbook on the ssh settings? Did you set up a > >>>>>>> $HOME/.ssh/auth.defaults per the user guide? > >>>>>>> > >>>>>>> Here is an auth.defaults example. Im not sure its 100% correct, but > >>>>>>> could serve as a base for you: > >>>>>>> > >>>>>>> xlogin1.pads.ci.uchicago.edu.type=password > >>>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde > >>>>>>> > >>>>>>> login.pads.ci.uchicago.edu.type=key > >>>>>>> login.pads.ci.uchicago.edu.username=wilde > >>>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE > >>>>>>> mode=600!!! > >>>>>>> > >>>>>>> login1.pads.ci.uchicago.edu.type=key > >>>>>>> login1.pads.ci.uchicago.edu.username=wilde > >>>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > >>>>>>> SURE mode=600!!! > >>>>>>> > >>>>>>> login.mcs.anl.gov.type=key > >>>>>>> login.mcs.anl.gov.username=wilde > >>>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > >>>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > >>>>>>> mode=600!!! > >>>>>>> > >>>>>>> - Mike > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> It does look like an ssh problem. I am getting the same stderr and > >>>>>>>> log > >>>>>>>> messages on trying to communicate from Bridled to Communicado. > >>>>>>>> > >>>>>>>> Ketan > >>>>>>>> > >>>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > >>>>>>>> > >>>>>>>>> Have you already run a simple hellow-world swift test from > >>>>>>>>> communicado to bridled to make sure you have ssh configured > >>>>>>>>> correctly? I would do that first. > >>>>>>>>> > >>>>>>>>> Im not sure if an ssh problem explains what you show below, or > >>>>>>>>> not. > >>>>>>>>> > >>>>>>>>> - Mike > >>>>>>>>> > >>>>>>>>> ----- Original Message ----- > >>>>>>>>>> Thanks, I made the change. However, now, I am getting the > >>>>>>>>>> following > >>>>>>>>>> on > >>>>>>>>>> my stderr > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> =========== > >>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>>>> -sites.file > >>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>>>>> modified > >>>>>>>>>> locally) > >>>>>>>>>> > >>>>>>>>>> RunID: 20110428-1022-n9s0k0e0 > >>>>>>>>>> Progress: > >>>>>>>>>> [ketan] > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> [ketan] Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>> ======== > >>>>>>>>>> > >>>>>>>>>> And from the log it seems some network transmission has failed: > >>>>>>>>>> > >>>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending > >>>>>>>>>> SSH_MSG_SERVICE_REQUEST > >>>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon > >>>>>>>>>> Received > >>>>>>>>>> SSH_MSG_SERVICE_ACCEPT > >>>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > >>>>>>>>>> Transport Protocol thread failed > >>>>>>>>>> java.io.IOException: The socket is EOF > >>>>>>>>>> at > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > >>>>>>>>>> at > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > >>>>>>>>>> at > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > >>>>>>>>>> at > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > >>>>>>>>>> at > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > >>>>>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Any clues? > >>>>>>>>>> Ketan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > >>>>>>>>>> > >>>>>>>>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh > >>>>>>>>>>> but > >>>>>>>>>>> you used pbs in your tc.data. > >>>>>>>>>>> > >>>>>>>>>>> - Mike > >>>>>>>>>>> > >>>>>>>>>>> ----- Original Message ----- > >>>>>>>>>>>> Hello, > >>>>>>>>>>>> > >>>>>>>>>>>> Some context: > >>>>>>>>>>>> I am trying to submit a big run on Beagle using swift + > >>>>>>>>>>>> coasters. > >>>>>>>>>>>> However, a previous run is already underway on beagle. So, > >>>>>>>>>>>> there > >>>>>>>>>>>> are > >>>>>>>>>>>> two difficulties running a new run from its login node: > >>>>>>>>>>>> > >>>>>>>>>>>> 1. Running another swift from the same jvm will result in chaos > >>>>>>>>>>>> on > >>>>>>>>>>>> the > >>>>>>>>>>>> logs (As far as I know, please correct me if this is not the > >>>>>>>>>>>> case > >>>>>>>>>>>> anymore) > >>>>>>>>>>>> > >>>>>>>>>>>> 2. Login node is already under load because of my running > >>>>>>>>>>>> previous > >>>>>>>>>>>> big > >>>>>>>>>>>> run. > >>>>>>>>>>>> > >>>>>>>>>>>> /context > >>>>>>>>>>>> > >>>>>>>>>>>> So, I am now trying to submit this big run from a remote host > >>>>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, > >>>>>>>>>>>> provider > >>>>>>>>>>>> coaster. I tried the similar approach on a trial swift script > >>>>>>>>>>>> but > >>>>>>>>>>>> getting error. > >>>>>>>>>>>> > >>>>>>>>>>>> Following is the error message: > >>>>>>>>>>>> > >>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>>>>>> -sites.file > >>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>>>>>>> modified > >>>>>>>>>>>> locally) > >>>>>>>>>>>> > >>>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 > >>>>>>>>>>>> Progress: > >>>>>>>>>>>> The application "cat" is not available in your tc.data catalog > >>>>>>>>>>>> Caused by: > >>>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > >>>>>>>>>>>> Final status: Failed:1 > >>>>>>>>>>>> The following errors have occurred: > >>>>>>>>>>>> 1. The application "cat" is not available in your tc.data > >>>>>>>>>>>> catalog > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. > >>>>>>>>>>>> > >>>>>>>>>>>> Could someone indicate if what I am doing is doable and if so > >>>>>>>>>>>> how > >>>>>>>>>>>> can > >>>>>>>>>>>> I correctly configure my sites and tc setup. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks. > >>>>>>>>>>>> Ketan > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> Swift-devel mailing list > >>>>>>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Michael Wilde > >>>>>>>>>>> Computation Institute, University of Chicago > >>>>>>>>>>> Mathematics and Computer Science Division > >>>>>>>>>>> Argonne National Laboratory > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Michael Wilde > >>>>>>>>> Computation Institute, University of Chicago > >>>>>>>>> Mathematics and Computer Science Division > >>>>>>>>> Argonne National Laboratory > >>>>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Michael Wilde > >>>>>>> Computation Institute, University of Chicago > >>>>>>> Mathematics and Computer Science Division > >>>>>>> Argonne National Laboratory > >>>>>>> > >>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>>> > >>>> _______________________________________________ > >>>> Swift-devel mailing list > >>>> Swift-devel at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >>> > >> > > > > > From wilde at mcs.anl.gov Thu Apr 28 14:49:52 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 14:49:52 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1304019253.10893.0.camel@blabla2.none> Message-ID: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> Copying these might work for you, Ketan: com$ env | grep 509 X509_CERT_DIR=/home/wilde/TRUSTEDCA X509_CADIR=/home/wilde/TRUSTEDCA com$ ----- Original Message ----- > You have a bunch of uknown CA errors in there. > > You should have the CA public key for your proxy in > ~/.globus/certificates (on both client and server machines). > > Mihael > > On Thu, 2011-04-28 at 14:29 -0500, Ketan Maheshwari wrote: > > They are here : /home/ketan/.globus/coasters > > > > > > On Apr 28, 2011, at 2:26 PM, Mihael Hategan wrote: > > > > > That EOFException doesn't make much sense. > > > > > > On beagle you should have something called coaster.log in > > > ~/.globus/coasters. > > > > > > Can post a link to that? > > > > > > Mihael > > > > > > On Thu, 2011-04-28 at 14:21 -0500, Ketan Maheshwari wrote: > > >> On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote: > > >> > > >>> What does your sites file look like? > > >> > > >> ** For beagle ** > > >> > > >> > > >> > > >> > > >> > >> url="login1.beagle.ci.uchicago.edu" jobmanager="ssh:pbs"/> > > >> > >> key="project">CI-CCR000013 > > >> > > >> 24:cray:pack > > >> > > >> 24 > > >> 1000 > > >> 1 > > >> 1 > > >> 1 > > >> > > >> .63 > > >> > >> key="initialScore">10000 > > >> > > >> > >> /> > > >> $HOME/swift.workdir > > >> > > >> > > >> > > >> > > >> > > >> ** for communicado ** > > >> > > >> > > >> > > >> > > >> > >> url="communicado.ci.uchicago.edu" jobmanager="ssh:ssh"/> > > >> > > >> .63 > > >> > >> key="initialScore">10000 > > >> > > >> > >> /> > > >> $HOME/swift.workdir > > >> > > >> > > >> > > >> > > >> > > >>> > > >>> On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: > > >>>> Ok, I got past CredentialException with grid-proxy-init, now I > > >>>> am facing this (note: I have turned on provider staging) : > > >>>> > > >>>> ======== > > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > >>>> -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > >>>> modified locally) > > >>>> > > >>>> RunID: 20110428-1332-llaa031f > > >>>> Progress: > > >>>> Could not start connection handler > > >>>> java.io.EOFException > > >>>> at > > >>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > > >>>> at > > >>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > > >>>> at > > >>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > > >>>> at > > >>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > > >>>> at > > >>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > > >>>> at > > >>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > > >>>> at > > >>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > > >>>> at > > >>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > > >>>> at > > >>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > > >>>> at org.globus.net.BaseServer.run(BaseServer.java:247) > > >>>> at java.lang.Thread.run(Thread.java:662) > > >>>> Progress: Submitted:1 > > >>>> Could not start connection handler > > >>>> java.io.EOFException > > >>>> at > > >>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > > >>>> at > > >>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > > >>>> at > > >>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > > >>>> at > > >>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > > >>>> at > > >>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > > >>>> at > > >>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > > >>>> at > > >>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > > >>>> at > > >>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > > >>>> at > > >>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > > >>>> at org.globus.net.BaseServer.run(BaseServer.java:247) > > >>>> at java.lang.Thread.run(Thread.java:662) > > >>>> Progress: Submitted:1 > > >>>> Exception in cat: > > >>>> Arguments: [data.txt] > > >>>> Host: beagle-remote-pbs-coasters-ssh > > >>>> Directory: > > >>>> catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs > > >>>> ---- > > >>>> > > >>>> Caused by: Could not submit job > > >>>> Caused by: > > >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > >>>> Could not submit job > > >>>> Caused by: > > >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > >>>> Could not start coaster service > > >>>> Caused by: > > >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > >>>> Task ended before registration was received. > > >>>> STDOUT: > > >>>> STDERR: > > >>>> Caused by: > > >>>> org.globus.cog.abstraction.impl.common.execution.JobException: > > >>>> Job failed with an exit code of 1 > > >>>> Final status: Failed:1 > > >>>> The following errors have occurred: > > >>>> 1. Job failed with an exit code of 1 > > >>>> > > >>>> ======== > > >>>> > > >>>> > > >>>> From bridled to communicado, I see the following error: > > >>>> > > >>>> ************** > > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > >>>> -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 > > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > >>>> modified locally) > > >>>> > > >>>> RunID: 20110428-1335-k685b2ye > > >>>> Progress: > > >>>> Progress: Submitted:1 > > >>>> Progress: Active:1 > > >>>> Exception in cat: > > >>>> Arguments: [data.txt] > > >>>> Host: communicado-ssh > > >>>> Directory: > > >>>> catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs > > >>>> ---- > > >>>> > > >>>> Caused by: Job failed with an exit code of 524 > > >>>> Caused by: > > >>>> org.globus.cog.abstraction.impl.common.execution.JobException: > > >>>> Job failed with an exit code of 524 > > >>>> Final status: Failed:1 > > >>>> The following errors have occurred: > > >>>> 1. Job failed with an exit code of 524 > > >>>> > > >>>> ************ > > >>>> > > >>>> > > >>>> -- > > >>>> Ketan > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > > >>>> > > >>>>> For now - create a proxy using grid-proxy-init on the swift > > >>>>> execution machine. > > >>>>> I think there is an option to set "no security" for this > > >>>>> config but I cant recall where that is specified. Maybe > > >>>>> swift.properties, I cant recall. > > >>>>> > > >>>>> - Mike > > >>>>> > > >>>>> ----- Original Message ----- > > >>>>>> Hi, > > >>>>>> > > >>>>>> It looks better now. However, I am getting the following: > > >>>>>> > > >>>>>> ===== > > >>>>>> > > >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > >>>>>> -sites.file > > >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > > >>>>>> modified > > >>>>>> locally) > > >>>>>> > > >>>>>> RunID: 20110428-1251-oi9theh8 > > >>>>>> Progress: > > >>>>>> Progress: Stage in:1 > > >>>>>> Could not submit job > > >>>>>> Caused by: > > >>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > >>>>>> Could not submit job > > >>>>>> Caused by: > > >>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > >>>>>> Could not start coaster service > > >>>>>> Caused by: > > >>>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > > >>>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy > > >>>>>> file > > >>>>>> (/tmp/x509up_u2006) not found. > > >>>>>> Caused by: org.globus.gsi.GlobusCredentialException: > > >>>>>> [JGLOBUS-5] Proxy > > >>>>>> file (/tmp/x509up_u2006) not found. > > >>>>>> Failed to transfer wrapper log from > > >>>>>> catsn-20110428-1251-oi9theh8/info/e on > > >>>>>> beagle-remote-pbs-coasters-ssh > > >>>>>> > > >>>>>> ===== > > >>>>>> > > >>>>>> How do I specify "-nosec" on automatic coasters? > > >>>>>> > > >>>>>> Ketan > > >>>>>> > > >>>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > > >>>>>> > > >>>>>>> OK. Was there a cookbook on the ssh settings? Did you set up > > >>>>>>> a > > >>>>>>> $HOME/.ssh/auth.defaults per the user guide? > > >>>>>>> > > >>>>>>> Here is an auth.defaults example. Im not sure its 100% > > >>>>>>> correct, but > > >>>>>>> could serve as a base for you: > > >>>>>>> > > >>>>>>> xlogin1.pads.ci.uchicago.edu.type=password > > >>>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde > > >>>>>>> > > >>>>>>> login.pads.ci.uchicago.edu.type=key > > >>>>>>> login.pads.ci.uchicago.edu.username=wilde > > >>>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > > >>>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # > > >>>>>>> MAKE SURE > > >>>>>>> mode=600!!! > > >>>>>>> > > >>>>>>> login1.pads.ci.uchicago.edu.type=key > > >>>>>>> login1.pads.ci.uchicago.edu.username=wilde > > >>>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > > >>>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # > > >>>>>>> MAKE > > >>>>>>> SURE mode=600!!! > > >>>>>>> > > >>>>>>> login.mcs.anl.gov.type=key > > >>>>>>> login.mcs.anl.gov.username=wilde > > >>>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > > >>>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > > >>>>>>> mode=600!!! > > >>>>>>> > > >>>>>>> - Mike > > >>>>>>> > > >>>>>>> ----- Original Message ----- > > >>>>>>>> It does look like an ssh problem. I am getting the same > > >>>>>>>> stderr and > > >>>>>>>> log > > >>>>>>>> messages on trying to communicate from Bridled to > > >>>>>>>> Communicado. > > >>>>>>>> > > >>>>>>>> Ketan > > >>>>>>>> > > >>>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > > >>>>>>>> > > >>>>>>>>> Have you already run a simple hellow-world swift test from > > >>>>>>>>> communicado to bridled to make sure you have ssh > > >>>>>>>>> configured > > >>>>>>>>> correctly? I would do that first. > > >>>>>>>>> > > >>>>>>>>> Im not sure if an ssh problem explains what you show > > >>>>>>>>> below, or > > >>>>>>>>> not. > > >>>>>>>>> > > >>>>>>>>> - Mike > > >>>>>>>>> > > >>>>>>>>> ----- Original Message ----- > > >>>>>>>>>> Thanks, I made the change. However, now, I am getting the > > >>>>>>>>>> following > > >>>>>>>>>> on > > >>>>>>>>>> my stderr > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> =========== > > >>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > > >>>>>>>>>> -sites.file > > >>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > >>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 > > >>>>>>>>>> (cog > > >>>>>>>>>> modified > > >>>>>>>>>> locally) > > >>>>>>>>>> > > >>>>>>>>>> RunID: 20110428-1022-n9s0k0e0 > > >>>>>>>>>> Progress: > > >>>>>>>>>> [ketan] > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> [ketan] Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> Progress: Initializing site shared directory:1 > > >>>>>>>>>> ======== > > >>>>>>>>>> > > >>>>>>>>>> And from the log it seems some network transmission has > > >>>>>>>>>> failed: > > >>>>>>>>>> > > >>>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon > > >>>>>>>>>> Sending > > >>>>>>>>>> SSH_MSG_SERVICE_REQUEST > > >>>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon > > >>>>>>>>>> Received > > >>>>>>>>>> SSH_MSG_SERVICE_ACCEPT > > >>>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon > > >>>>>>>>>> The > > >>>>>>>>>> Transport Protocol thread failed > > >>>>>>>>>> java.io.IOException: The socket is EOF > > >>>>>>>>>> at > > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > > >>>>>>>>>> at > > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > > >>>>>>>>>> at > > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > > >>>>>>>>>> at > > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > > >>>>>>>>>> at > > >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > > >>>>>>>>>> at java.lang.Thread.run(Thread.java:662) > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Any clues? > > >>>>>>>>>> Ketan > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > > >>>>>>>>>> > > >>>>>>>>>>> The pool name in your sites file is > > >>>>>>>>>>> pads-remote-pbs-coasters-ssh > > >>>>>>>>>>> but > > >>>>>>>>>>> you used pbs in your tc.data. > > >>>>>>>>>>> > > >>>>>>>>>>> - Mike > > >>>>>>>>>>> > > >>>>>>>>>>> ----- Original Message ----- > > >>>>>>>>>>>> Hello, > > >>>>>>>>>>>> > > >>>>>>>>>>>> Some context: > > >>>>>>>>>>>> I am trying to submit a big run on Beagle using swift + > > >>>>>>>>>>>> coasters. > > >>>>>>>>>>>> However, a previous run is already underway on beagle. > > >>>>>>>>>>>> So, > > >>>>>>>>>>>> there > > >>>>>>>>>>>> are > > >>>>>>>>>>>> two difficulties running a new run from its login node: > > >>>>>>>>>>>> > > >>>>>>>>>>>> 1. Running another swift from the same jvm will result > > >>>>>>>>>>>> in chaos > > >>>>>>>>>>>> on > > >>>>>>>>>>>> the > > >>>>>>>>>>>> logs (As far as I know, please correct me if this is > > >>>>>>>>>>>> not the > > >>>>>>>>>>>> case > > >>>>>>>>>>>> anymore) > > >>>>>>>>>>>> > > >>>>>>>>>>>> 2. Login node is already under load because of my > > >>>>>>>>>>>> running > > >>>>>>>>>>>> previous > > >>>>>>>>>>>> big > > >>>>>>>>>>>> run. > > >>>>>>>>>>>> > > >>>>>>>>>>>> /context > > >>>>>>>>>>>> > > >>>>>>>>>>>> So, I am now trying to submit this big run from a > > >>>>>>>>>>>> remote host > > >>>>>>>>>>>> (bridled). I know this has been done on PADS using > > >>>>>>>>>>>> ssh:pbs, > > >>>>>>>>>>>> provider > > >>>>>>>>>>>> coaster. I tried the similar approach on a trial swift > > >>>>>>>>>>>> script > > >>>>>>>>>>>> but > > >>>>>>>>>>>> getting error. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Following is the error message: > > >>>>>>>>>>>> > > >>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file > > >>>>>>>>>>>> tc > > >>>>>>>>>>>> -sites.file > > >>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > > >>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) > > >>>>>>>>>>>> cog-r3088 (cog > > >>>>>>>>>>>> modified > > >>>>>>>>>>>> locally) > > >>>>>>>>>>>> > > >>>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 > > >>>>>>>>>>>> Progress: > > >>>>>>>>>>>> The application "cat" is not available in your tc.data > > >>>>>>>>>>>> catalog > > >>>>>>>>>>>> Caused by: > > >>>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > > >>>>>>>>>>>> Final status: Failed:1 > > >>>>>>>>>>>> The following errors have occurred: > > >>>>>>>>>>>> 1. The application "cat" is not available in your > > >>>>>>>>>>>> tc.data > > >>>>>>>>>>>> catalog > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Could someone indicate if what I am doing is doable and > > >>>>>>>>>>>> if so > > >>>>>>>>>>>> how > > >>>>>>>>>>>> can > > >>>>>>>>>>>> I correctly configure my sites and tc setup. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thanks. > > >>>>>>>>>>>> Ketan > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> _______________________________________________ > > >>>>>>>>>>>> Swift-devel mailing list > > >>>>>>>>>>>> Swift-devel at ci.uchicago.edu > > >>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>>>>>>>>>> > > >>>>>>>>>>> -- > > >>>>>>>>>>> Michael Wilde > > >>>>>>>>>>> Computation Institute, University of Chicago > > >>>>>>>>>>> Mathematics and Computer Science Division > > >>>>>>>>>>> Argonne National Laboratory > > >>>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> Michael Wilde > > >>>>>>>>> Computation Institute, University of Chicago > > >>>>>>>>> Mathematics and Computer Science Division > > >>>>>>>>> Argonne National Laboratory > > >>>>>>>>> > > >>>>>>> > > >>>>>>> -- > > >>>>>>> Michael Wilde > > >>>>>>> Computation Institute, University of Chicago > > >>>>>>> Mathematics and Computer Science Division > > >>>>>>> Argonne National Laboratory > > >>>>>>> > > >>>>> > > >>>>> -- > > >>>>> Michael Wilde > > >>>>> Computation Institute, University of Chicago > > >>>>> Mathematics and Computer Science Division > > >>>>> Argonne National Laboratory > > >>>>> > > >>>> > > >>>> _______________________________________________ > > >>>> Swift-devel mailing list > > >>>> Swift-devel at ci.uchicago.edu > > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>> > > >>> > > >> > > > > > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Apr 28 15:00:04 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 15:00:04 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> References: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> Message-ID: <7095F97B-6781-49E5-BD68-4F23C9D81487@gmail.com> The EOFException persists. However, on bridled-communicado I get this one: [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) RunID: 20110428-1457-r7bzx1ha Progress: Progress: Active:1 Exception in cat: Arguments: [data.txt] Host: communicado-ssh Directory: catsn-20110428-1457-r7bzx1ha/jobs/t/cat-tlf05d9kTODO: outs ---- Caused by: Job failed with an exit code of 524 Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 524 Final status: Failed:1 The following errors have occurred: 1. Job failed with an exit code of 524 Any clue what could it be due to? Ketan On Apr 28, 2011, at 2:49 PM, Michael Wilde wrote: > Copying these might work for you, Ketan: > > com$ env | grep 509 > X509_CERT_DIR=/home/wilde/TRUSTEDCA > X509_CADIR=/home/wilde/TRUSTEDCA > com$ > > > ----- Original Message ----- >> You have a bunch of uknown CA errors in there. >> >> You should have the CA public key for your proxy in >> ~/.globus/certificates (on both client and server machines). >> >> Mihael >> >> On Thu, 2011-04-28 at 14:29 -0500, Ketan Maheshwari wrote: >>> They are here : /home/ketan/.globus/coasters >>> >>> >>> On Apr 28, 2011, at 2:26 PM, Mihael Hategan wrote: >>> >>>> That EOFException doesn't make much sense. >>>> >>>> On beagle you should have something called coaster.log in >>>> ~/.globus/coasters. >>>> >>>> Can post a link to that? >>>> >>>> Mihael >>>> >>>> On Thu, 2011-04-28 at 14:21 -0500, Ketan Maheshwari wrote: >>>>> On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote: >>>>> >>>>>> What does your sites file look like? >>>>> >>>>> ** For beagle ** >>>>> >>>>> >>>>> >>>>> >>>>> >>>> url="login1.beagle.ci.uchicago.edu" jobmanager="ssh:pbs"/> >>>>> >>>> key="project">CI-CCR000013 >>>>> >>>>> 24:cray:pack >>>>> >>>>> 24 >>>>> 1000 >>>>> 1 >>>>> 1 >>>>> 1 >>>>> >>>>> .63 >>>>> >>>> key="initialScore">10000 >>>>> >>>>> >>>> /> >>>>> $HOME/swift.workdir >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ** for communicado ** >>>>> >>>>> >>>>> >>>>> >>>>> >>>> url="communicado.ci.uchicago.edu" jobmanager="ssh:ssh"/> >>>>> >>>>> .63 >>>>> >>>> key="initialScore">10000 >>>>> >>>>> >>>> /> >>>>> $HOME/swift.workdir >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: >>>>>>> Ok, I got past CredentialException with grid-proxy-init, now I >>>>>>> am facing this (note: I have turned on provider staging) : >>>>>>> >>>>>>> ======== >>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>> -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>> modified locally) >>>>>>> >>>>>>> RunID: 20110428-1332-llaa031f >>>>>>> Progress: >>>>>>> Could not start connection handler >>>>>>> java.io.EOFException >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >>>>>>> at >>>>>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >>>>>>> at >>>>>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >>>>>>> at >>>>>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >>>>>>> at >>>>>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >>>>>>> at org.globus.net.BaseServer.run(BaseServer.java:247) >>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>> Progress: Submitted:1 >>>>>>> Could not start connection handler >>>>>>> java.io.EOFException >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >>>>>>> at >>>>>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >>>>>>> at >>>>>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >>>>>>> at >>>>>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >>>>>>> at >>>>>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >>>>>>> at >>>>>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >>>>>>> at org.globus.net.BaseServer.run(BaseServer.java:247) >>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>> Progress: Submitted:1 >>>>>>> Exception in cat: >>>>>>> Arguments: [data.txt] >>>>>>> Host: beagle-remote-pbs-coasters-ssh >>>>>>> Directory: >>>>>>> catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs >>>>>>> ---- >>>>>>> >>>>>>> Caused by: Could not submit job >>>>>>> Caused by: >>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>>>> Could not submit job >>>>>>> Caused by: >>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>>>> Could not start coaster service >>>>>>> Caused by: >>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>>>> Task ended before registration was received. >>>>>>> STDOUT: >>>>>>> STDERR: >>>>>>> Caused by: >>>>>>> org.globus.cog.abstraction.impl.common.execution.JobException: >>>>>>> Job failed with an exit code of 1 >>>>>>> Final status: Failed:1 >>>>>>> The following errors have occurred: >>>>>>> 1. Job failed with an exit code of 1 >>>>>>> >>>>>>> ======== >>>>>>> >>>>>>> >>>>>>> From bridled to communicado, I see the following error: >>>>>>> >>>>>>> ************** >>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>> -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 >>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>> modified locally) >>>>>>> >>>>>>> RunID: 20110428-1335-k685b2ye >>>>>>> Progress: >>>>>>> Progress: Submitted:1 >>>>>>> Progress: Active:1 >>>>>>> Exception in cat: >>>>>>> Arguments: [data.txt] >>>>>>> Host: communicado-ssh >>>>>>> Directory: >>>>>>> catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs >>>>>>> ---- >>>>>>> >>>>>>> Caused by: Job failed with an exit code of 524 >>>>>>> Caused by: >>>>>>> org.globus.cog.abstraction.impl.common.execution.JobException: >>>>>>> Job failed with an exit code of 524 >>>>>>> Final status: Failed:1 >>>>>>> The following errors have occurred: >>>>>>> 1. Job failed with an exit code of 524 >>>>>>> >>>>>>> ************ >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ketan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: >>>>>>> >>>>>>>> For now - create a proxy using grid-proxy-init on the swift >>>>>>>> execution machine. >>>>>>>> I think there is an option to set "no security" for this >>>>>>>> config but I cant recall where that is specified. Maybe >>>>>>>> swift.properties, I cant recall. >>>>>>>> >>>>>>>> - Mike >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> It looks better now. However, I am getting the following: >>>>>>>>> >>>>>>>>> ===== >>>>>>>>> >>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>> -sites.file >>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>>> modified >>>>>>>>> locally) >>>>>>>>> >>>>>>>>> RunID: 20110428-1251-oi9theh8 >>>>>>>>> Progress: >>>>>>>>> Progress: Stage in:1 >>>>>>>>> Could not submit job >>>>>>>>> Caused by: >>>>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>>>>>> Could not submit job >>>>>>>>> Caused by: >>>>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>>>>>> Could not start coaster service >>>>>>>>> Caused by: >>>>>>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: >>>>>>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy >>>>>>>>> file >>>>>>>>> (/tmp/x509up_u2006) not found. >>>>>>>>> Caused by: org.globus.gsi.GlobusCredentialException: >>>>>>>>> [JGLOBUS-5] Proxy >>>>>>>>> file (/tmp/x509up_u2006) not found. >>>>>>>>> Failed to transfer wrapper log from >>>>>>>>> catsn-20110428-1251-oi9theh8/info/e on >>>>>>>>> beagle-remote-pbs-coasters-ssh >>>>>>>>> >>>>>>>>> ===== >>>>>>>>> >>>>>>>>> How do I specify "-nosec" on automatic coasters? >>>>>>>>> >>>>>>>>> Ketan >>>>>>>>> >>>>>>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: >>>>>>>>> >>>>>>>>>> OK. Was there a cookbook on the ssh settings? Did you set up >>>>>>>>>> a >>>>>>>>>> $HOME/.ssh/auth.defaults per the user guide? >>>>>>>>>> >>>>>>>>>> Here is an auth.defaults example. Im not sure its 100% >>>>>>>>>> correct, but >>>>>>>>>> could serve as a base for you: >>>>>>>>>> >>>>>>>>>> xlogin1.pads.ci.uchicago.edu.type=password >>>>>>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde >>>>>>>>>> >>>>>>>>>> login.pads.ci.uchicago.edu.type=key >>>>>>>>>> login.pads.ci.uchicago.edu.username=wilde >>>>>>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>>>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # >>>>>>>>>> MAKE SURE >>>>>>>>>> mode=600!!! >>>>>>>>>> >>>>>>>>>> login1.pads.ci.uchicago.edu.type=key >>>>>>>>>> login1.pads.ci.uchicago.edu.username=wilde >>>>>>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>>>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # >>>>>>>>>> MAKE >>>>>>>>>> SURE mode=600!!! >>>>>>>>>> >>>>>>>>>> login.mcs.anl.gov.type=key >>>>>>>>>> login.mcs.anl.gov.username=wilde >>>>>>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa >>>>>>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE >>>>>>>>>> mode=600!!! >>>>>>>>>> >>>>>>>>>> - Mike >>>>>>>>>> >>>>>>>>>> ----- Original Message ----- >>>>>>>>>>> It does look like an ssh problem. I am getting the same >>>>>>>>>>> stderr and >>>>>>>>>>> log >>>>>>>>>>> messages on trying to communicate from Bridled to >>>>>>>>>>> Communicado. >>>>>>>>>>> >>>>>>>>>>> Ketan >>>>>>>>>>> >>>>>>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: >>>>>>>>>>> >>>>>>>>>>>> Have you already run a simple hellow-world swift test from >>>>>>>>>>>> communicado to bridled to make sure you have ssh >>>>>>>>>>>> configured >>>>>>>>>>>> correctly? I would do that first. >>>>>>>>>>>> >>>>>>>>>>>> Im not sure if an ssh problem explains what you show >>>>>>>>>>>> below, or >>>>>>>>>>>> not. >>>>>>>>>>>> >>>>>>>>>>>> - Mike >>>>>>>>>>>> >>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>> Thanks, I made the change. However, now, I am getting the >>>>>>>>>>>>> following >>>>>>>>>>>>> on >>>>>>>>>>>>> my stderr >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> =========== >>>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>>>>>> -sites.file >>>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 >>>>>>>>>>>>> (cog >>>>>>>>>>>>> modified >>>>>>>>>>>>> locally) >>>>>>>>>>>>> >>>>>>>>>>>>> RunID: 20110428-1022-n9s0k0e0 >>>>>>>>>>>>> Progress: >>>>>>>>>>>>> [ketan] >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> [ketan] Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>>>>> ======== >>>>>>>>>>>>> >>>>>>>>>>>>> And from the log it seems some network transmission has >>>>>>>>>>>>> failed: >>>>>>>>>>>>> >>>>>>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon >>>>>>>>>>>>> Sending >>>>>>>>>>>>> SSH_MSG_SERVICE_REQUEST >>>>>>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon >>>>>>>>>>>>> Received >>>>>>>>>>>>> SSH_MSG_SERVICE_ACCEPT >>>>>>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon >>>>>>>>>>>>> The >>>>>>>>>>>>> Transport Protocol thread failed >>>>>>>>>>>>> java.io.IOException: The socket is EOF >>>>>>>>>>>>> at >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) >>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Any clues? >>>>>>>>>>>>> Ketan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The pool name in your sites file is >>>>>>>>>>>>>> pads-remote-pbs-coasters-ssh >>>>>>>>>>>>>> but >>>>>>>>>>>>>> you used pbs in your tc.data. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Mike >>>>>>>>>>>>>> >>>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Some context: >>>>>>>>>>>>>>> I am trying to submit a big run on Beagle using swift + >>>>>>>>>>>>>>> coasters. >>>>>>>>>>>>>>> However, a previous run is already underway on beagle. >>>>>>>>>>>>>>> So, >>>>>>>>>>>>>>> there >>>>>>>>>>>>>>> are >>>>>>>>>>>>>>> two difficulties running a new run from its login node: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. Running another swift from the same jvm will result >>>>>>>>>>>>>>> in chaos >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> logs (As far as I know, please correct me if this is >>>>>>>>>>>>>>> not the >>>>>>>>>>>>>>> case >>>>>>>>>>>>>>> anymore) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. Login node is already under load because of my >>>>>>>>>>>>>>> running >>>>>>>>>>>>>>> previous >>>>>>>>>>>>>>> big >>>>>>>>>>>>>>> run. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /context >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So, I am now trying to submit this big run from a >>>>>>>>>>>>>>> remote host >>>>>>>>>>>>>>> (bridled). I know this has been done on PADS using >>>>>>>>>>>>>>> ssh:pbs, >>>>>>>>>>>>>>> provider >>>>>>>>>>>>>>> coaster. I tried the similar approach on a trial swift >>>>>>>>>>>>>>> script >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>> getting error. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Following is the error message: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file >>>>>>>>>>>>>>> tc >>>>>>>>>>>>>>> -sites.file >>>>>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) >>>>>>>>>>>>>>> cog-r3088 (cog >>>>>>>>>>>>>>> modified >>>>>>>>>>>>>>> locally) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 >>>>>>>>>>>>>>> Progress: >>>>>>>>>>>>>>> The application "cat" is not available in your tc.data >>>>>>>>>>>>>>> catalog >>>>>>>>>>>>>>> Caused by: >>>>>>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException >>>>>>>>>>>>>>> Final status: Failed:1 >>>>>>>>>>>>>>> The following errors have occurred: >>>>>>>>>>>>>>> 1. The application "cat" is not available in your >>>>>>>>>>>>>>> tc.data >>>>>>>>>>>>>>> catalog >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Could someone indicate if what I am doing is doable and >>>>>>>>>>>>>>> if so >>>>>>>>>>>>>>> how >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>> I correctly configure my sites and tc setup. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>> Ketan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Swift-devel mailing list >>>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Michael Wilde >>>>>>>>>>>>>> Computation Institute, University of Chicago >>>>>>>>>>>>>> Mathematics and Computer Science Division >>>>>>>>>>>>>> Argonne National Laboratory >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Michael Wilde >>>>>>>>>>>> Computation Institute, University of Chicago >>>>>>>>>>>> Mathematics and Computer Science Division >>>>>>>>>>>> Argonne National Laboratory >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Michael Wilde >>>>>>>>>> Computation Institute, University of Chicago >>>>>>>>>> Mathematics and Computer Science Division >>>>>>>>>> Argonne National Laboratory >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Michael Wilde >>>>>>>> Computation Institute, University of Chicago >>>>>>>> Mathematics and Computer Science Division >>>>>>>> Argonne National Laboratory >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Swift-devel mailing list >>>>>>> Swift-devel at ci.uchicago.edu >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>>> >>>>> >>>> >>>> >>> > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From wilde at mcs.anl.gov Thu Apr 28 15:08:28 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 28 Apr 2011 15:08:28 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <7095F97B-6781-49E5-BD68-4F23C9D81487@gmail.com> Message-ID: <1097569199.4180.1304021308603.JavaMail.root@zimbra.anl.gov> 524 is most likely an error exit code generated from worker.pl - you can typically find the reason by looking for that message number in the worker.pl source. - Mike ----- Original Message ----- > The EOFException persists. > > However, on bridled-communicado I get this one: > > [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file > coaster-local-ssh-communicado.xml catsn.swift -n=1 > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified > locally) > > RunID: 20110428-1457-r7bzx1ha > Progress: > Progress: Active:1 > Exception in cat: > Arguments: [data.txt] > Host: communicado-ssh > Directory: catsn-20110428-1457-r7bzx1ha/jobs/t/cat-tlf05d9kTODO: outs > ---- > > Caused by: Job failed with an exit code of 524 > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: Job > failed with an exit code of 524 > Final status: Failed:1 > The following errors have occurred: > 1. Job failed with an exit code of 524 > > > Any clue what could it be due to? > > Ketan > > On Apr 28, 2011, at 2:49 PM, Michael Wilde wrote: > > > Copying these might work for you, Ketan: > > > > com$ env | grep 509 > > X509_CERT_DIR=/home/wilde/TRUSTEDCA > > X509_CADIR=/home/wilde/TRUSTEDCA > > com$ > > > > > > ----- Original Message ----- > >> You have a bunch of uknown CA errors in there. > >> > >> You should have the CA public key for your proxy in > >> ~/.globus/certificates (on both client and server machines). > >> > >> Mihael > >> > >> On Thu, 2011-04-28 at 14:29 -0500, Ketan Maheshwari wrote: > >>> They are here : /home/ketan/.globus/coasters > >>> > >>> > >>> On Apr 28, 2011, at 2:26 PM, Mihael Hategan wrote: > >>> > >>>> That EOFException doesn't make much sense. > >>>> > >>>> On beagle you should have something called coaster.log in > >>>> ~/.globus/coasters. > >>>> > >>>> Can post a link to that? > >>>> > >>>> Mihael > >>>> > >>>> On Thu, 2011-04-28 at 14:21 -0500, Ketan Maheshwari wrote: > >>>>> On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote: > >>>>> > >>>>>> What does your sites file look like? > >>>>> > >>>>> ** For beagle ** > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >>>>> url="login1.beagle.ci.uchicago.edu" jobmanager="ssh:pbs"/> > >>>>> >>>>> key="project">CI-CCR000013 > >>>>> > >>>>> 24:cray:pack > >>>>> > >>>>> 24 > >>>>> 1000 > >>>>> 1 > >>>>> 1 > >>>>> 1 > >>>>> > >>>>> .63 > >>>>> >>>>> key="initialScore">10000 > >>>>> > >>>>> >>>>> /> > >>>>> $HOME/swift.workdir > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ** for communicado ** > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >>>>> url="communicado.ci.uchicago.edu" jobmanager="ssh:ssh"/> > >>>>> > >>>>> .63 > >>>>> >>>>> key="initialScore">10000 > >>>>> > >>>>> >>>>> /> > >>>>> $HOME/swift.workdir > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>> On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: > >>>>>>> Ok, I got past CredentialException with grid-proxy-init, now I > >>>>>>> am facing this (note: I have turned on provider staging) : > >>>>>>> > >>>>>>> ======== > >>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>> -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>> modified locally) > >>>>>>> > >>>>>>> RunID: 20110428-1332-llaa031f > >>>>>>> Progress: > >>>>>>> Could not start connection handler > >>>>>>> java.io.EOFException > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >>>>>>> at > >>>>>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >>>>>>> at > >>>>>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >>>>>>> at > >>>>>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >>>>>>> at > >>>>>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >>>>>>> at org.globus.net.BaseServer.run(BaseServer.java:247) > >>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>> Progress: Submitted:1 > >>>>>>> Could not start connection handler > >>>>>>> java.io.EOFException > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >>>>>>> at > >>>>>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >>>>>>> at > >>>>>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >>>>>>> at > >>>>>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >>>>>>> at > >>>>>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >>>>>>> at > >>>>>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >>>>>>> at org.globus.net.BaseServer.run(BaseServer.java:247) > >>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>> Progress: Submitted:1 > >>>>>>> Exception in cat: > >>>>>>> Arguments: [data.txt] > >>>>>>> Host: beagle-remote-pbs-coasters-ssh > >>>>>>> Directory: > >>>>>>> catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs > >>>>>>> ---- > >>>>>>> > >>>>>>> Caused by: Could not submit job > >>>>>>> Caused by: > >>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>>>>> Could not submit job > >>>>>>> Caused by: > >>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>>>>> Could not start coaster service > >>>>>>> Caused by: > >>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>>>>> Task ended before registration was received. > >>>>>>> STDOUT: > >>>>>>> STDERR: > >>>>>>> Caused by: > >>>>>>> org.globus.cog.abstraction.impl.common.execution.JobException: > >>>>>>> Job failed with an exit code of 1 > >>>>>>> Final status: Failed:1 > >>>>>>> The following errors have occurred: > >>>>>>> 1. Job failed with an exit code of 1 > >>>>>>> > >>>>>>> ======== > >>>>>>> > >>>>>>> > >>>>>>> From bridled to communicado, I see the following error: > >>>>>>> > >>>>>>> ************** > >>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>> -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 > >>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>> modified locally) > >>>>>>> > >>>>>>> RunID: 20110428-1335-k685b2ye > >>>>>>> Progress: > >>>>>>> Progress: Submitted:1 > >>>>>>> Progress: Active:1 > >>>>>>> Exception in cat: > >>>>>>> Arguments: [data.txt] > >>>>>>> Host: communicado-ssh > >>>>>>> Directory: > >>>>>>> catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs > >>>>>>> ---- > >>>>>>> > >>>>>>> Caused by: Job failed with an exit code of 524 > >>>>>>> Caused by: > >>>>>>> org.globus.cog.abstraction.impl.common.execution.JobException: > >>>>>>> Job failed with an exit code of 524 > >>>>>>> Final status: Failed:1 > >>>>>>> The following errors have occurred: > >>>>>>> 1. Job failed with an exit code of 524 > >>>>>>> > >>>>>>> ************ > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Ketan > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > >>>>>>> > >>>>>>>> For now - create a proxy using grid-proxy-init on the swift > >>>>>>>> execution machine. > >>>>>>>> I think there is an option to set "no security" for this > >>>>>>>> config but I cant recall where that is specified. Maybe > >>>>>>>> swift.properties, I cant recall. > >>>>>>>> > >>>>>>>> - Mike > >>>>>>>> > >>>>>>>> ----- Original Message ----- > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> It looks better now. However, I am getting the following: > >>>>>>>>> > >>>>>>>>> ===== > >>>>>>>>> > >>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>>> -sites.file > >>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 > >>>>>>>>> (cog > >>>>>>>>> modified > >>>>>>>>> locally) > >>>>>>>>> > >>>>>>>>> RunID: 20110428-1251-oi9theh8 > >>>>>>>>> Progress: > >>>>>>>>> Progress: Stage in:1 > >>>>>>>>> Could not submit job > >>>>>>>>> Caused by: > >>>>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>>>>>>> Could not submit job > >>>>>>>>> Caused by: > >>>>>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>>>>>>> Could not start coaster service > >>>>>>>>> Caused by: > >>>>>>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > >>>>>>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy > >>>>>>>>> file > >>>>>>>>> (/tmp/x509up_u2006) not found. > >>>>>>>>> Caused by: org.globus.gsi.GlobusCredentialException: > >>>>>>>>> [JGLOBUS-5] Proxy > >>>>>>>>> file (/tmp/x509up_u2006) not found. > >>>>>>>>> Failed to transfer wrapper log from > >>>>>>>>> catsn-20110428-1251-oi9theh8/info/e on > >>>>>>>>> beagle-remote-pbs-coasters-ssh > >>>>>>>>> > >>>>>>>>> ===== > >>>>>>>>> > >>>>>>>>> How do I specify "-nosec" on automatic coasters? > >>>>>>>>> > >>>>>>>>> Ketan > >>>>>>>>> > >>>>>>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > >>>>>>>>> > >>>>>>>>>> OK. Was there a cookbook on the ssh settings? Did you set > >>>>>>>>>> up > >>>>>>>>>> a > >>>>>>>>>> $HOME/.ssh/auth.defaults per the user guide? > >>>>>>>>>> > >>>>>>>>>> Here is an auth.defaults example. Im not sure its 100% > >>>>>>>>>> correct, but > >>>>>>>>>> could serve as a base for you: > >>>>>>>>>> > >>>>>>>>>> xlogin1.pads.ci.uchicago.edu.type=password > >>>>>>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde > >>>>>>>>>> > >>>>>>>>>> login.pads.ci.uchicago.edu.type=key > >>>>>>>>>> login.pads.ci.uchicago.edu.username=wilde > >>>>>>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>>>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # > >>>>>>>>>> MAKE SURE > >>>>>>>>>> mode=600!!! > >>>>>>>>>> > >>>>>>>>>> login1.pads.ci.uchicago.edu.type=key > >>>>>>>>>> login1.pads.ci.uchicago.edu.username=wilde > >>>>>>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>>>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # > >>>>>>>>>> MAKE > >>>>>>>>>> SURE mode=600!!! > >>>>>>>>>> > >>>>>>>>>> login.mcs.anl.gov.type=key > >>>>>>>>>> login.mcs.anl.gov.username=wilde > >>>>>>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > >>>>>>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > >>>>>>>>>> mode=600!!! > >>>>>>>>>> > >>>>>>>>>> - Mike > >>>>>>>>>> > >>>>>>>>>> ----- Original Message ----- > >>>>>>>>>>> It does look like an ssh problem. I am getting the same > >>>>>>>>>>> stderr and > >>>>>>>>>>> log > >>>>>>>>>>> messages on trying to communicate from Bridled to > >>>>>>>>>>> Communicado. > >>>>>>>>>>> > >>>>>>>>>>> Ketan > >>>>>>>>>>> > >>>>>>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Have you already run a simple hellow-world swift test > >>>>>>>>>>>> from > >>>>>>>>>>>> communicado to bridled to make sure you have ssh > >>>>>>>>>>>> configured > >>>>>>>>>>>> correctly? I would do that first. > >>>>>>>>>>>> > >>>>>>>>>>>> Im not sure if an ssh problem explains what you show > >>>>>>>>>>>> below, or > >>>>>>>>>>>> not. > >>>>>>>>>>>> > >>>>>>>>>>>> - Mike > >>>>>>>>>>>> > >>>>>>>>>>>> ----- Original Message ----- > >>>>>>>>>>>>> Thanks, I made the change. However, now, I am getting > >>>>>>>>>>>>> the > >>>>>>>>>>>>> following > >>>>>>>>>>>>> on > >>>>>>>>>>>>> my stderr > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> =========== > >>>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file > >>>>>>>>>>>>> tc > >>>>>>>>>>>>> -sites.file > >>>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 > >>>>>>>>>>>>> (cog > >>>>>>>>>>>>> modified > >>>>>>>>>>>>> locally) > >>>>>>>>>>>>> > >>>>>>>>>>>>> RunID: 20110428-1022-n9s0k0e0 > >>>>>>>>>>>>> Progress: > >>>>>>>>>>>>> [ketan] > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> [ketan] Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>>>>>>> ======== > >>>>>>>>>>>>> > >>>>>>>>>>>>> And from the log it seems some network transmission has > >>>>>>>>>>>>> failed: > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO > >>>>>>>>>>>>> TransportProtocolCommon > >>>>>>>>>>>>> Sending > >>>>>>>>>>>>> SSH_MSG_SERVICE_REQUEST > >>>>>>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO > >>>>>>>>>>>>> TransportProtocolCommon > >>>>>>>>>>>>> Received > >>>>>>>>>>>>> SSH_MSG_SERVICE_ACCEPT > >>>>>>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO > >>>>>>>>>>>>> TransportProtocolCommon > >>>>>>>>>>>>> The > >>>>>>>>>>>>> Transport Protocol thread failed > >>>>>>>>>>>>> java.io.IOException: The socket is EOF > >>>>>>>>>>>>> at > >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > >>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Any clues? > >>>>>>>>>>>>> Ketan > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> The pool name in your sites file is > >>>>>>>>>>>>>> pads-remote-pbs-coasters-ssh > >>>>>>>>>>>>>> but > >>>>>>>>>>>>>> you used pbs in your tc.data. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - Mike > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ----- Original Message ----- > >>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Some context: > >>>>>>>>>>>>>>> I am trying to submit a big run on Beagle using swift > >>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>> coasters. > >>>>>>>>>>>>>>> However, a previous run is already underway on beagle. > >>>>>>>>>>>>>>> So, > >>>>>>>>>>>>>>> there > >>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>> two difficulties running a new run from its login > >>>>>>>>>>>>>>> node: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. Running another swift from the same jvm will result > >>>>>>>>>>>>>>> in chaos > >>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>> logs (As far as I know, please correct me if this is > >>>>>>>>>>>>>>> not the > >>>>>>>>>>>>>>> case > >>>>>>>>>>>>>>> anymore) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2. Login node is already under load because of my > >>>>>>>>>>>>>>> running > >>>>>>>>>>>>>>> previous > >>>>>>>>>>>>>>> big > >>>>>>>>>>>>>>> run. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /context > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> So, I am now trying to submit this big run from a > >>>>>>>>>>>>>>> remote host > >>>>>>>>>>>>>>> (bridled). I know this has been done on PADS using > >>>>>>>>>>>>>>> ssh:pbs, > >>>>>>>>>>>>>>> provider > >>>>>>>>>>>>>>> coaster. I tried the similar approach on a trial swift > >>>>>>>>>>>>>>> script > >>>>>>>>>>>>>>> but > >>>>>>>>>>>>>>> getting error. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Following is the error message: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file > >>>>>>>>>>>>>>> tc > >>>>>>>>>>>>>>> -sites.file > >>>>>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) > >>>>>>>>>>>>>>> cog-r3088 (cog > >>>>>>>>>>>>>>> modified > >>>>>>>>>>>>>>> locally) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 > >>>>>>>>>>>>>>> Progress: > >>>>>>>>>>>>>>> The application "cat" is not available in your tc.data > >>>>>>>>>>>>>>> catalog > >>>>>>>>>>>>>>> Caused by: > >>>>>>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > >>>>>>>>>>>>>>> Final status: Failed:1 > >>>>>>>>>>>>>>> The following errors have occurred: > >>>>>>>>>>>>>>> 1. The application "cat" is not available in your > >>>>>>>>>>>>>>> tc.data > >>>>>>>>>>>>>>> catalog > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Could someone indicate if what I am doing is doable > >>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>> if so > >>>>>>>>>>>>>>> how > >>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>> I correctly configure my sites and tc setup. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>>> Ketan > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>>> Swift-devel mailing list > >>>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> Michael Wilde > >>>>>>>>>>>>>> Computation Institute, University of Chicago > >>>>>>>>>>>>>> Mathematics and Computer Science Division > >>>>>>>>>>>>>> Argonne National Laboratory > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Michael Wilde > >>>>>>>>>>>> Computation Institute, University of Chicago > >>>>>>>>>>>> Mathematics and Computer Science Division > >>>>>>>>>>>> Argonne National Laboratory > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Michael Wilde > >>>>>>>>>> Computation Institute, University of Chicago > >>>>>>>>>> Mathematics and Computer Science Division > >>>>>>>>>> Argonne National Laboratory > >>>>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Michael Wilde > >>>>>>>> Computation Institute, University of Chicago > >>>>>>>> Mathematics and Computer Science Division > >>>>>>>> Argonne National Laboratory > >>>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Swift-devel mailing list > >>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Apr 28 15:09:46 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 15:09:46 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <2000840687.3799.1304019151105.JavaMail.root@zimbra.anl.gov> References: <2000840687.3799.1304019151105.JavaMail.root@zimbra.anl.gov> Message-ID: <503668BD-A2E2-4830-B6C9-82828D11807B@gmail.com> On Apr 28, 2011, at 2:32 PM, Michael Wilde wrote: > What is your communicado pool trying to test? > > If thats to run eg bridled to communicado, I think jobmanager should be jobmanager="ssh:local" ??? I am on bridled and want to run coaster service on bridled (so local) and workers on communicado (ssh). that is why I have jobmanager=local:ssh > > - Mike > > ----- Original Message ----- >> On Apr 28, 2011, at 2:17 PM, Mihael Hategan wrote: >> >>> What does your sites file look like? >> >> ** For beagle ** >> >> >> >> >> > jobmanager="ssh:pbs"/> >> CI-CCR000013 >> >> 24:cray:pack >> >> 24 >> 1000 >> 1 >> 1 >> 1 >> >> .63 >> 10000 >> >> >> $HOME/swift.workdir >> >> >> >> >> >> ** for communicado ** >> >> >> >> >> > jobmanager="ssh:ssh"/> >> >> .63 >> 10000 >> >> >> $HOME/swift.workdir >> >> >> >> >> >>> >>> On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: >>>> Ok, I got past CredentialException with grid-proxy-init, now I am >>>> facing this (note: I have turned on provider staging) : >>>> >>>> ======== >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>> -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>> modified locally) >>>> >>>> RunID: 20110428-1332-llaa031f >>>> Progress: >>>> Could not start connection handler >>>> java.io.EOFException >>>> at >>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >>>> at >>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >>>> at >>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >>>> at >>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >>>> at >>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >>>> at >>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >>>> at >>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >>>> at >>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >>>> at >>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >>>> at org.globus.net.BaseServer.run(BaseServer.java:247) >>>> at java.lang.Thread.run(Thread.java:662) >>>> Progress: Submitted:1 >>>> Could not start connection handler >>>> java.io.EOFException >>>> at >>>> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >>>> at >>>> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >>>> at >>>> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >>>> at >>>> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >>>> at >>>> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >>>> at >>>> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >>>> at >>>> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >>>> at >>>> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >>>> at >>>> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >>>> at org.globus.net.BaseServer.run(BaseServer.java:247) >>>> at java.lang.Thread.run(Thread.java:662) >>>> Progress: Submitted:1 >>>> Exception in cat: >>>> Arguments: [data.txt] >>>> Host: beagle-remote-pbs-coasters-ssh >>>> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: >>>> outs >>>> ---- >>>> >>>> Caused by: Could not submit job >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>> Could not submit job >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>> Could not start coaster service >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>> Task ended before registration was received. >>>> STDOUT: >>>> STDERR: >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.execution.JobException: Job >>>> failed with an exit code of 1 >>>> Final status: Failed:1 >>>> The following errors have occurred: >>>> 1. Job failed with an exit code of 1 >>>> >>>> ======== >>>> >>>> >>>> From bridled to communicado, I see the following error: >>>> >>>> ************** >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>> -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>> modified locally) >>>> >>>> RunID: 20110428-1335-k685b2ye >>>> Progress: >>>> Progress: Submitted:1 >>>> Progress: Active:1 >>>> Exception in cat: >>>> Arguments: [data.txt] >>>> Host: communicado-ssh >>>> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: >>>> outs >>>> ---- >>>> >>>> Caused by: Job failed with an exit code of 524 >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.execution.JobException: Job >>>> failed with an exit code of 524 >>>> Final status: Failed:1 >>>> The following errors have occurred: >>>> 1. Job failed with an exit code of 524 >>>> >>>> ************ >>>> >>>> >>>> -- >>>> Ketan >>>> >>>> >>>> >>>> >>>> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: >>>> >>>>> For now - create a proxy using grid-proxy-init on the swift >>>>> execution machine. >>>>> I think there is an option to set "no security" for this config >>>>> but I cant recall where that is specified. Maybe swift.properties, >>>>> I cant recall. >>>>> >>>>> - Mike >>>>> >>>>> ----- Original Message ----- >>>>>> Hi, >>>>>> >>>>>> It looks better now. However, I am getting the following: >>>>>> >>>>>> ===== >>>>>> >>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>> -sites.file >>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>> modified >>>>>> locally) >>>>>> >>>>>> RunID: 20110428-1251-oi9theh8 >>>>>> Progress: >>>>>> Progress: Stage in:1 >>>>>> Could not submit job >>>>>> Caused by: >>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>>> Could not submit job >>>>>> Caused by: >>>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>>>> Could not start coaster service >>>>>> Caused by: >>>>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: >>>>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file >>>>>> (/tmp/x509up_u2006) not found. >>>>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] >>>>>> Proxy >>>>>> file (/tmp/x509up_u2006) not found. >>>>>> Failed to transfer wrapper log from >>>>>> catsn-20110428-1251-oi9theh8/info/e on >>>>>> beagle-remote-pbs-coasters-ssh >>>>>> >>>>>> ===== >>>>>> >>>>>> How do I specify "-nosec" on automatic coasters? >>>>>> >>>>>> Ketan >>>>>> >>>>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: >>>>>> >>>>>>> OK. Was there a cookbook on the ssh settings? Did you set up a >>>>>>> $HOME/.ssh/auth.defaults per the user guide? >>>>>>> >>>>>>> Here is an auth.defaults example. Im not sure its 100% correct, >>>>>>> but >>>>>>> could serve as a base for you: >>>>>>> >>>>>>> xlogin1.pads.ci.uchicago.edu.type=password >>>>>>> xlogin1.pads.ci.uchicago.edu.username=wilde >>>>>>> >>>>>>> login.pads.ci.uchicago.edu.type=key >>>>>>> login.pads.ci.uchicago.edu.username=wilde >>>>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE >>>>>>> SURE >>>>>>> mode=600!!! >>>>>>> >>>>>>> login1.pads.ci.uchicago.edu.type=key >>>>>>> login1.pads.ci.uchicago.edu.username=wilde >>>>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE >>>>>>> SURE mode=600!!! >>>>>>> >>>>>>> login.mcs.anl.gov.type=key >>>>>>> login.mcs.anl.gov.username=wilde >>>>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa >>>>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE >>>>>>> mode=600!!! >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> It does look like an ssh problem. I am getting the same stderr >>>>>>>> and >>>>>>>> log >>>>>>>> messages on trying to communicate from Bridled to Communicado. >>>>>>>> >>>>>>>> Ketan >>>>>>>> >>>>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: >>>>>>>> >>>>>>>>> Have you already run a simple hellow-world swift test from >>>>>>>>> communicado to bridled to make sure you have ssh configured >>>>>>>>> correctly? I would do that first. >>>>>>>>> >>>>>>>>> Im not sure if an ssh problem explains what you show below, or >>>>>>>>> not. >>>>>>>>> >>>>>>>>> - Mike >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>>> Thanks, I made the change. However, now, I am getting the >>>>>>>>>> following >>>>>>>>>> on >>>>>>>>>> my stderr >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> =========== >>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>>> -sites.file >>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>>>> modified >>>>>>>>>> locally) >>>>>>>>>> >>>>>>>>>> RunID: 20110428-1022-n9s0k0e0 >>>>>>>>>> Progress: >>>>>>>>>> [ketan] >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> [ketan] Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>>>> ======== >>>>>>>>>> >>>>>>>>>> And from the log it seems some network transmission has >>>>>>>>>> failed: >>>>>>>>>> >>>>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon >>>>>>>>>> Sending >>>>>>>>>> SSH_MSG_SERVICE_REQUEST >>>>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon >>>>>>>>>> Received >>>>>>>>>> SSH_MSG_SERVICE_ACCEPT >>>>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The >>>>>>>>>> Transport Protocol thread failed >>>>>>>>>> java.io.IOException: The socket is EOF >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) >>>>>>>>>> at >>>>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) >>>>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any clues? >>>>>>>>>> Ketan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: >>>>>>>>>> >>>>>>>>>>> The pool name in your sites file is >>>>>>>>>>> pads-remote-pbs-coasters-ssh >>>>>>>>>>> but >>>>>>>>>>> you used pbs in your tc.data. >>>>>>>>>>> >>>>>>>>>>> - Mike >>>>>>>>>>> >>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> Some context: >>>>>>>>>>>> I am trying to submit a big run on Beagle using swift + >>>>>>>>>>>> coasters. >>>>>>>>>>>> However, a previous run is already underway on beagle. So, >>>>>>>>>>>> there >>>>>>>>>>>> are >>>>>>>>>>>> two difficulties running a new run from its login node: >>>>>>>>>>>> >>>>>>>>>>>> 1. Running another swift from the same jvm will result in >>>>>>>>>>>> chaos >>>>>>>>>>>> on >>>>>>>>>>>> the >>>>>>>>>>>> logs (As far as I know, please correct me if this is not >>>>>>>>>>>> the >>>>>>>>>>>> case >>>>>>>>>>>> anymore) >>>>>>>>>>>> >>>>>>>>>>>> 2. Login node is already under load because of my running >>>>>>>>>>>> previous >>>>>>>>>>>> big >>>>>>>>>>>> run. >>>>>>>>>>>> >>>>>>>>>>>> /context >>>>>>>>>>>> >>>>>>>>>>>> So, I am now trying to submit this big run from a remote >>>>>>>>>>>> host >>>>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, >>>>>>>>>>>> provider >>>>>>>>>>>> coaster. I tried the similar approach on a trial swift >>>>>>>>>>>> script >>>>>>>>>>>> but >>>>>>>>>>>> getting error. >>>>>>>>>>>> >>>>>>>>>>>> Following is the error message: >>>>>>>>>>>> >>>>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>>>>> -sites.file >>>>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 >>>>>>>>>>>> (cog >>>>>>>>>>>> modified >>>>>>>>>>>> locally) >>>>>>>>>>>> >>>>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 >>>>>>>>>>>> Progress: >>>>>>>>>>>> The application "cat" is not available in your tc.data >>>>>>>>>>>> catalog >>>>>>>>>>>> Caused by: >>>>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException >>>>>>>>>>>> Final status: Failed:1 >>>>>>>>>>>> The following errors have occurred: >>>>>>>>>>>> 1. The application "cat" is not available in your tc.data >>>>>>>>>>>> catalog >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. >>>>>>>>>>>> >>>>>>>>>>>> Could someone indicate if what I am doing is doable and if >>>>>>>>>>>> so >>>>>>>>>>>> how >>>>>>>>>>>> can >>>>>>>>>>>> I correctly configure my sites and tc setup. >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Ketan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Swift-devel mailing list >>>>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Michael Wilde >>>>>>>>>>> Computation Institute, University of Chicago >>>>>>>>>>> Mathematics and Computer Science Division >>>>>>>>>>> Argonne National Laboratory >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Michael Wilde >>>>>>>>> Computation Institute, University of Chicago >>>>>>>>> Mathematics and Computer Science Division >>>>>>>>> Argonne National Laboratory >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Michael Wilde >>>>>>> Computation Institute, University of Chicago >>>>>>> Mathematics and Computer Science Division >>>>>>> Argonne National Laboratory >>>>>>> >>>>> >>>>> -- >>>>> Michael Wilde >>>>> Computation Institute, University of Chicago >>>>> Mathematics and Computer Science Division >>>>> Argonne National Laboratory >>>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From hategan at mcs.anl.gov Thu Apr 28 15:15:45 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 28 Apr 2011 13:15:45 -0700 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> References: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> Message-ID: <1304021745.11917.0.camel@blabla2.none> On Thu, 2011-04-28 at 14:49 -0500, Michael Wilde wrote: > Copying these might work for you, Ketan: > > com$ env | grep 509 > X509_CERT_DIR=/home/wilde/TRUSTEDCA > X509_CADIR=/home/wilde/TRUSTEDCA > com$ That might take effect on the client, but those environment variables won't automatically make it on the service side. You should copy those into .globus/certificates. From ketancmaheshwari at gmail.com Thu Apr 28 15:35:06 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 15:35:06 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1304021745.11917.0.camel@blabla2.none> References: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> <1304021745.11917.0.camel@blabla2.none> Message-ID: <205B61FA-9A29-4F9E-BE46-73B9571AE7E8@gmail.com> Mihael, Both client (bridled) and server (communicado, beagle) share the /home Ketan On Apr 28, 2011, at 3:15 PM, Mihael Hategan wrote: > On Thu, 2011-04-28 at 14:49 -0500, Michael Wilde wrote: >> Copying these might work for you, Ketan: >> >> com$ env | grep 509 >> X509_CERT_DIR=/home/wilde/TRUSTEDCA >> X509_CADIR=/home/wilde/TRUSTEDCA >> com$ > > That might take effect on the client, but those environment variables > won't automatically make it on the service side. > > You should copy those into .globus/certificates. > > From hategan at mcs.anl.gov Thu Apr 28 15:38:31 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 28 Apr 2011 13:38:31 -0700 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <205B61FA-9A29-4F9E-BE46-73B9571AE7E8@gmail.com> References: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> <1304021745.11917.0.camel@blabla2.none> <205B61FA-9A29-4F9E-BE46-73B9571AE7E8@gmail.com> Message-ID: <1304023111.12665.1.camel@blabla2.none> On Thu, 2011-04-28 at 15:35 -0500, Ketan Maheshwari wrote: > Mihael, > > Both client (bridled) and server (communicado, beagle) share the /home But not environment variables. From ketancmaheshwari at gmail.com Thu Apr 28 15:42:17 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 15:42:17 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1304023111.12665.1.camel@blabla2.none> References: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> <1304021745.11917.0.camel@blabla2.none> <205B61FA-9A29-4F9E-BE46-73B9571AE7E8@gmail.com> <1304023111.12665.1.camel@blabla2.none> Message-ID: On Apr 28, 2011, at 3:38 PM, Mihael Hategan wrote: > On Thu, 2011-04-28 at 15:35 -0500, Ketan Maheshwari wrote: >> Mihael, >> >> Both client (bridled) and server (communicado, beagle) share the /home > > But not environment variables. Environment variables too as I put those lines in my .bashrc beagle ketan at login2:~> env | grep 509 X509_CERT_DIR=/home/wilde/TRUSTEDCA X509_CADIR=/home/wilde/TRUSTEDCA X509_USER_CERT=/home/ketan/.globus/usercert.pem X509_USER_KEY=/home/ketan/.globus/userkey.pem communicado/bridled [ketan at bridled catsn.works]$ env | grep 509 X509_CERT_DIR=/home/wilde/TRUSTEDCA X509_CADIR=/home/wilde/TRUSTEDCA X509_USER_CERT=/home/ketan/.globus/usercert.pem X509_USER_KEY=/home/ketan/.globus/userkey.pem From hategan at mcs.anl.gov Thu Apr 28 15:57:20 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 28 Apr 2011 13:57:20 -0700 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: References: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> <1304021745.11917.0.camel@blabla2.none> <205B61FA-9A29-4F9E-BE46-73B9571AE7E8@gmail.com> <1304023111.12665.1.camel@blabla2.none> Message-ID: <1304024240.13289.0.camel@blabla2.none> ok. good. On Thu, 2011-04-28 at 15:42 -0500, Ketan Maheshwari wrote: > On Apr 28, 2011, at 3:38 PM, Mihael Hategan wrote: > > > On Thu, 2011-04-28 at 15:35 -0500, Ketan Maheshwari wrote: > >> Mihael, > >> > >> Both client (bridled) and server (communicado, beagle) share the /home > > > > But not environment variables. > > Environment variables too as I put those lines in my .bashrc > > beagle > > ketan at login2:~> env | grep 509 > X509_CERT_DIR=/home/wilde/TRUSTEDCA > X509_CADIR=/home/wilde/TRUSTEDCA > X509_USER_CERT=/home/ketan/.globus/usercert.pem > X509_USER_KEY=/home/ketan/.globus/userkey.pem > > communicado/bridled > > [ketan at bridled catsn.works]$ env | grep 509 > X509_CERT_DIR=/home/wilde/TRUSTEDCA > X509_CADIR=/home/wilde/TRUSTEDCA > X509_USER_CERT=/home/ketan/.globus/usercert.pem > X509_USER_KEY=/home/ketan/.globus/userkey.pem > From ketancmaheshwari at gmail.com Thu Apr 28 16:51:06 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 28 Apr 2011 16:51:06 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1304024240.13289.0.camel@blabla2.none> References: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> <1304021745.11917.0.camel@blabla2.none> <205B61FA-9A29-4F9E-BE46-73B9571AE7E8@gmail.com> <1304023111.12665.1.camel@blabla2.none> <1304024240.13289.0.camel@blabla2.none> Message-ID: As an alternative for the timebeing, I am planning to use a different jvm from beagle login node to run swift and take advantage of the beagle reservation. Do you anticipate any possible issues with this? -- Ketan On Apr 28, 2011, at 3:57 PM, Mihael Hategan wrote: > ok. good. > > On Thu, 2011-04-28 at 15:42 -0500, Ketan Maheshwari wrote: >> On Apr 28, 2011, at 3:38 PM, Mihael Hategan wrote: >> >>> On Thu, 2011-04-28 at 15:35 -0500, Ketan Maheshwari wrote: >>>> Mihael, >>>> >>>> Both client (bridled) and server (communicado, beagle) share the /home >>> >>> But not environment variables. >> >> Environment variables too as I put those lines in my .bashrc >> >> beagle >> >> ketan at login2:~> env | grep 509 >> X509_CERT_DIR=/home/wilde/TRUSTEDCA >> X509_CADIR=/home/wilde/TRUSTEDCA >> X509_USER_CERT=/home/ketan/.globus/usercert.pem >> X509_USER_KEY=/home/ketan/.globus/userkey.pem >> >> communicado/bridled >> >> [ketan at bridled catsn.works]$ env | grep 509 >> X509_CERT_DIR=/home/wilde/TRUSTEDCA >> X509_CADIR=/home/wilde/TRUSTEDCA >> X509_USER_CERT=/home/ketan/.globus/usercert.pem >> X509_USER_KEY=/home/ketan/.globus/userkey.pem >> > > From hategan at mcs.anl.gov Thu Apr 28 16:56:30 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 28 Apr 2011 14:56:30 -0700 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: References: <752184432.4002.1304020192687.JavaMail.root@zimbra.anl.gov> <1304021745.11917.0.camel@blabla2.none> <205B61FA-9A29-4F9E-BE46-73B9571AE7E8@gmail.com> <1304023111.12665.1.camel@blabla2.none> <1304024240.13289.0.camel@blabla2.none> Message-ID: <1304027790.16017.0.camel@blabla2.none> On Thu, 2011-04-28 at 16:51 -0500, Ketan Maheshwari wrote: > As an alternative for the timebeing, I am planning to use a different jvm from beagle login node to run swift and take advantage of the beagle reservation. > > Do you anticipate any possible issues with this? I would watch what top says from time to time. From wilde at mcs.anl.gov Fri Apr 29 08:52:50 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 29 Apr 2011 08:52:50 -0500 (CDT) Subject: [Swift-devel] Approaches to running Swift off-head-node In-Reply-To: <289195611.6323.1304084437649.JavaMail.root@zimbra.anl.gov> Message-ID: <1167290870.6365.1304085170873.JavaMail.root@zimbra.anl.gov> Ketan, for Beagle use, but also for general cluster use, can you draw a few diagrams to show the alternatives, and consider how to implement them? 1. swift cmd on head node a) cluster scheduler provider (PBS, SGE, etc) b) coaster provider over cluster scheduler provider 2. swift cmd on external host a) submits jobs via GRAM i) to cluster scheduler ii) to coasters over cluster scheduler b) submits jobs via SSH c) submits jobs to factory-managed coaster workers 3. swift cmd on compute node a) submits jobs as if on head node b) submits jobs to factory-managed coaster workers In some of the configurations above there are additional variants based on where the coaster service runs and how its started. Note that the desire to keep resource-intensive processes off the login nodes applies to all clusters, not just Beagle. (The swift command and even the coaster service can be resource-intensive when running highly parallel scripts with high task rates). We should select a small subset of the possible configs to implement, test, document and support for users. For Beagle, we started with 1b. (1a is not viable, as its unable to readily utilize multicore nodes). The email thread from yesterday was around approach 2b. I'd suggest considering how to do 2c, as its similar to what Allan has run on OSG and might result in a common "coaster factory" script with common logic for how to control the level of factor worker job submission. For now, though, it seems best to continue with the "2b" approach (external-ssh-coasters) to see how well it works. Its got the current obstacle of - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory