From bugzilla-daemon at mcs.anl.gov Thu May 1 08:07:43 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 1 May 2008 08:07:43 -0500 (CDT) Subject: [Swift-devel] [Bug 133] New: PBS walltime violation poor reporting Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=133 Summary: PBS walltime violation poor reporting Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk PBS Walltime violations are poorly reported (or rather, not reported to the user at all) (at least when running through GRAM, perhaps with PBS directly) It looks like in VDS1 PBS walltime violations were reported on stderr from the job submission. In Swift, I think that stderr is not used (instead there is a separate stderr staging mechanism for application executables); it might be useful to report job stderr to the user sometimes. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From benc at hawaga.org.uk Thu May 1 14:35:07 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 1 May 2008 19:35:07 +0000 (GMT) Subject: [Swift-devel] introduce swift gsoc student Milena Message-ID: Swift got a Google Summer-of-Code student as part of the Globus mentoring organisation. Our student is Milena Nikolic in Serbia, who is going to work on 'Type checking and Inference for Swift'. Milena's introductory globus blog posting is here: http://globus-gsoc.blogspot.com/2008/05/type-checking-and-inference-for.html Welcome! -- From bugzilla-daemon at mcs.anl.gov Thu May 1 15:11:18 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 1 May 2008 15:11:18 -0500 (CDT) Subject: [Swift-devel] [Bug 134] New: Unicode in source text string literals not passed through Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=134 Summary: Unicode in source text string literals not passed through Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: minor Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk Non-ASCII unicode in the source text does not get passed through to the XML form correctly. For example, in the following, the XML intermediate form of this consists of a number of strange characters instead of the Japanese text below. echo "??????" stdout=@filename(t); -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Thu May 1 15:14:18 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 1 May 2008 15:14:18 -0500 (CDT) Subject: [Swift-devel] [Bug 134] Unicode in source text string literals not passed through In-Reply-To: Message-ID: <20080501201418.7AB79164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=134 ------- Comment #1 from hategan at mcs.anl.gov 2008-05-01 15:14 ------- Do you know what layer is to blame? -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Thu May 1 15:25:49 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 1 May 2008 15:25:49 -0500 (CDT) Subject: [Swift-devel] [Bug 135] New: Unicode in source text string literals not passed through Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=135 Summary: Unicode in source text string literals not passed through Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: minor Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk Non-ASCII unicode in the source text does not get passed through to the XML form correctly. For example, in the following, the XML intermediate form of this consists of a number of strange characters instead of the Japanese text below. echo "??????" stdout=@filename(t); -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Thu May 1 15:26:31 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 1 May 2008 15:26:31 -0500 (CDT) Subject: [Swift-devel] [Bug 135] Unicode in source text string literals not passed through In-Reply-To: Message-ID: <20080501202631.D3BD3164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=135 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from benc at hawaga.org.uk 2008-05-01 15:26 ------- *** This bug has been marked as a duplicate of 134 *** -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Thu May 1 15:26:31 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 1 May 2008 15:26:31 -0500 (CDT) Subject: [Swift-devel] [Bug 134] Unicode in source text string literals not passed through In-Reply-To: Message-ID: <20080501202631.EAC58164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=134 ------- Comment #2 from benc at hawaga.org.uk 2008-05-01 15:26 ------- *** Bug 135 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Thu May 1 15:38:00 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 1 May 2008 15:38:00 -0500 (CDT) Subject: [Swift-devel] [Bug 134] Unicode in source text string literals not passed through In-Reply-To: Message-ID: <20080501203800.4D2B2164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=134 ------- Comment #3 from benc at hawaga.org.uk 2008-05-01 15:38 ------- my initial suspicion is that the use of FileInputStream and not having an explicit specification of an encoding is defaulting to some non-UTF-8 encoding (at least on my development machine). -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Thu May 1 16:39:23 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 1 May 2008 16:39:23 -0500 (CDT) Subject: [Swift-devel] [Bug 87] quoting parse failure. In-Reply-To: Message-ID: <20080501213923.76291164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=87 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from benc at hawaga.org.uk 2008-05-01 16:39 ------- swift r1845 makes " work for local execution at least -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Fri May 2 03:42:41 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 2 May 2008 08:42:41 +0000 (GMT) Subject: [Swift-devel] an actual remote coaster execution success Message-ID: As of about 24h ago, I was still have problems running coasters to TG NCSA, however I successfully ran a couple of tests from Swift through gram2 to the condor pool accessible at fletch.bsd.uchicago.edu. ooOOoo. -- From bugzilla-daemon at mcs.anl.gov Fri May 2 04:25:24 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 2 May 2008 04:25:24 -0500 (CDT) Subject: [Swift-devel] [Bug 136] New: CLASSPATH construction order Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=136 Summary: CLASSPATH construction order Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: benc at hawaga.org.uk The bin/swift wrapper currently constructs a classpath for swift automatically, as whatever is already on the classpath in the current environment followed by all of the swift classes. This order seems to cause more problems than it solves - specifically, when there are overlapping classes specified in the environment (which I have seen with falkon and pegasus users, and potentially is also a problem for people with the Globus Toolkit installed). I think it would be better to construct the classpath the other way round. (I initially commented about this on swift-devel: http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-April/002956.html) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Fri May 2 07:12:11 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 2 May 2008 07:12:11 -0500 (CDT) Subject: [Swift-devel] [Bug 137] New: undeclared procedures are not detected until runtime Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=137 Summary: undeclared procedures are not detected until runtime Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk Calling an undeclared procedure is not detected until runtime. It should be detected at compile time. For example, an error message like this comes out of the Karajan runtime layer rather than the compiler: RunID: 20080502-1309-kcenkzre Progress: Execution failed: 'greeting' is not defined. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Fri May 2 07:34:46 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 2 May 2008 07:34:46 -0500 (CDT) Subject: [Swift-devel] [Bug 87] quoting parse failure. In-Reply-To: Message-ID: <20080502123446.9F66D164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=87 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #2 from benc at hawaga.org.uk 2008-05-02 07:34 ------- revisions upto and including r1851 continue to fix and test some quoting behaviour. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri May 2 11:31:53 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 2 May 2008 11:31:53 -0500 (CDT) Subject: [Swift-devel] [Bug 138] New: spaces in filenames sometimes don't work right with GRAM2+Condor Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=138 Summary: spaces in filenames sometimes don't work right with GRAM2+Condor Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: benc at hawaga.org.uk Using the site catalog in tests/sites/fletch-condor-gram2.xml tests/language-behaviour/141-space-in-filename.swift works; tests/language-behaviour/142-space-and-quotes.swift does not. with the error: > Caused by: The following output files were not created by the application: 142-space-and-quotes. space .out The 142 test has particularly unusal quoting, but works with PBS, fork and local execution. I think its likely that this is a jobmanager-condor problem, based on previous problems like this. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From hategan at mcs.anl.gov Fri May 2 17:38:34 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 May 2008 17:38:34 -0500 Subject: [Swift-devel] replication/recall of jobs from slow queues Message-ID: <1209767914.28036.5.camel@localhost> There's some code as of r1869 to deal with the situation. It is disabled by default, but can be enabled through swift.properties. In theory it works like this: if a job sits in a queue for more than replication.min.queue.time and more than 3*average_queue_time (which is measured from other jobs), then a second replica of the same job is created. The process continues until one of the replicas gets to the active state, after which all other jobs are canceled. I didn't have time to test this much (given that it's not very easy to test), so probably there will be problems. From bugzilla-daemon at mcs.anl.gov Fri May 2 18:05:57 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 2 May 2008 18:05:57 -0500 (CDT) Subject: [Swift-devel] [Bug 138] spaces in filenames sometimes don't work right with GRAM2+Condor In-Reply-To: Message-ID: <20080502230557.0FAD7164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=138 ------- Comment #1 from hategan at mcs.anl.gov 2008-05-02 18:05 ------- Odd. The wrapper was updated to deal with such things. Do you have the info files? -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From benc at hawaga.org.uk Sat May 3 03:39:18 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 3 May 2008 08:39:18 +0000 (GMT) Subject: [Swift-devel] nightly build failures Message-ID: In the NMI nightly builds, in swift r1870 cog r1999: > Fatal: Class not found: org.griphyn.vdl.karajan.lib.HostProperty I don't see it in r1999 modules/karajan/ on my dev machine. -- From hategan at mcs.anl.gov Sat May 3 08:28:47 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 03 May 2008 08:28:47 -0500 Subject: [Swift-devel] nightly build failures In-Reply-To: References: Message-ID: <1209821327.2328.2.camel@localhost> This introduced it: http://www.ci.uchicago.edu/trac/swift/changeset/1867 And this reverts it: http://www.ci.uchicago.edu/trac/swift/changeset/1871 On Sat, 2008-05-03 at 08:39 +0000, Ben Clifford wrote: > In the NMI nightly builds, in swift r1870 cog r1999: > > > Fatal: Class not found: org.griphyn.vdl.karajan.lib.HostProperty > > I don't see it in r1999 modules/karajan/ on my dev machine. > From hategan at mcs.anl.gov Sun May 4 16:30:47 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 04 May 2008 16:30:47 -0500 Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> Message-ID: <1209936647.4842.7.camel@localhost> Right. The problem here is that you can't reasonably "assign" file names to data on the fly. How do your files look like? In other words, what are your input files? (I get the ones in the graph, but you mention an array, so I figured there's more). Mihael On Sun, 2008-05-04 at 11:54 -0500, Uri Hasson wrote: > Hi Mihael, > > I'm trying to write an MRAC and so am finalizing up some swift > routines... > > One thing I can't seem to manage is to declare a complex type that > will contain two types (both of which are files), assign data to that > type, and then pass that type to a call. I think my problem is that I > don't know how to setup an array of structures and assign it values > that are file locations. > > If you have a sec, could you advise on this? > For starters, a graph of my current, nonelegant workflow can be seen > in > /disks/gpfs/fmri/cnari/swift/projects/uhasson/AFNIflows/SNR/snrgraph1.png > > and the script that generates it is: > /disks/gpfs/fmri/cnari/swift/projects/uhasson/AFNIflows/SNR/AFNIsnrV2.swift > > All the procedures I write are very simple: they take *.HEAD and > *.BRIK file pairs as input (a pair defines a brain dataset), and > output a *.HEAD and *.BRIK file as output. > > Now, Currently, I am stating each HEAD and BRIK file seperately as > arguments, e.g., AFNI_mean (string baseName, file headFile, file > brikFile ...) > > What I want to do is create a type like > type AFNI_obj{ > file head; > file brik; > }; > > assign the *.HEAD and *.BRIK file to that type and then pass that > complex type like this: AFNI_mean (string baseName, AFNI_obj t, ...) > > I've tried doing this with Sarah, but we couldn't get it to work. > If I do the csv mapper way, I can do it (AFNIsnrV4.swift in same dir) > but if I try constructing the filenames on the fly within the script > (feeble attempts in AFNIsnrV3.swift), it doesn't work. > > I think what I don't understand is how to manully set up an array of > structures similarly to what the csv mapper does. > If meeting in person is faster, would be very glad to meet. > > Any advice, > Much appreciated. > > Uri > From benc at hawaga.org.uk Sun May 4 17:05:10 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 4 May 2008 22:05:10 +0000 (GMT) Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: <1209936647.4842.7.camel@localhost> References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> <1209936647.4842.7.camel@localhost> Message-ID: The v3 code probably has a very similar problem to that expressed in bug123, which I think is that mapper parameters don't interact in well with runtime constructed datasets. Perhaps this would be doable with a custom mapper that mapped an array inp declared as: type t { file head; file brik; } t inp[][]; such that: inputs/Snnnn.runbyrun.reg_TS_run-.rrrrrr+orig.sssss is mapped to: inp[nnn][rrrr].sssss That is, rather than feeding list of nnn and rrr and then attempting to construct filenames, do it the other way round - use the presence of files in the input/ directory to cause a data structure to be constructed. That would be straightforward, I think, if you want to process all files that look like: inputs/S*.runbyrun.reg_TS_run-*+orig.* rather than some subset of those. So one question is: do you want to process all files that look like: inputs/S*.runbyrun.reg_TS_run-*+orig.* or do you want to process only a subset? And if so, what is the longer term goal for selecting the subset (you have "05" and 1 hard-coded at the moment, but I guess there's some intention to do otherwise eventually) On Sun, 4 May 2008, Mihael Hategan wrote: > Right. The problem here is that you can't reasonably "assign" file names > to data on the fly. > > How do your files look like? In other words, what are your input files? > (I get the ones in the graph, but you mention an array, so I figured > there's more). > > Mihael > > On Sun, 2008-05-04 at 11:54 -0500, Uri Hasson wrote: > > Hi Mihael, > > > > I'm trying to write an MRAC and so am finalizing up some swift > > routines... > > > > One thing I can't seem to manage is to declare a complex type that > > will contain two types (both of which are files), assign data to that > > type, and then pass that type to a call. I think my problem is that I > > don't know how to setup an array of structures and assign it values > > that are file locations. > > > > If you have a sec, could you advise on this? > > For starters, a graph of my current, nonelegant workflow can be seen > > in > > /disks/gpfs/fmri/cnari/swift/projects/uhasson/AFNIflows/SNR/snrgraph1.png > > > > and the script that generates it is: > > /disks/gpfs/fmri/cnari/swift/projects/uhasson/AFNIflows/SNR/AFNIsnrV2.swift > > > > All the procedures I write are very simple: they take *.HEAD and > > *.BRIK file pairs as input (a pair defines a brain dataset), and > > output a *.HEAD and *.BRIK file as output. > > > > Now, Currently, I am stating each HEAD and BRIK file seperately as > > arguments, e.g., AFNI_mean (string baseName, file headFile, file > > brikFile ...) > > > > What I want to do is create a type like > > type AFNI_obj{ > > file head; > > file brik; > > }; > > > > assign the *.HEAD and *.BRIK file to that type and then pass that > > complex type like this: AFNI_mean (string baseName, AFNI_obj t, ...) > > > > I've tried doing this with Sarah, but we couldn't get it to work. > > If I do the csv mapper way, I can do it (AFNIsnrV4.swift in same dir) > > but if I try constructing the filenames on the fly within the script > > (feeble attempts in AFNIsnrV3.swift), it doesn't work. > > > > I think what I don't understand is how to manully set up an array of > > structures similarly to what the csv mapper does. > > If meeting in person is faster, would be very glad to meet. > > > > Any advice, > > Much appreciated. > > > > Uri > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From benc at hawaga.org.uk Sun May 4 17:31:03 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 4 May 2008 22:31:03 +0000 (GMT) Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> <1209936647.4842.7.camel@localhost> <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> Message-ID: > E.g., if the directory contains > > file1.Head file1.BRIK, file2.Head file2.BRIK > > The mapper will map files to my complex type: > mytype t; > t[0].head = file1.HEAD > t[0].brik = file1.BRIK > t[1].head = file2.BRIK > t[1].brik = file2.BRIK > and I will be able to pass this to calls as MyCall(t[0]) > [to the best of my understanding, I can achieve this now with csv mapper, > but not via any other way unless I write a custom mapper, java?] You should be able to map this with simple_mapper too. Something like this: type t { file HEAD; file BRIK; } t inp[] ; > But this doesn't take care of the other use of types that I planned on, > which is using them as a placeholder for results of calls. This bit I'll think about - similar, but not the same, stuff has been done before. -- From uhasson at gmail.com Sun May 4 17:27:21 2008 From: uhasson at gmail.com (Uri Hasson) Date: Sun, 4 May 2008 17:27:21 -0500 Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> <1209936647.4842.7.camel@localhost> Message-ID: <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> Hi Ben, All, I was thinking of something which Is along the lines of the solution you mention: I.e., I will have the files I need in a directory, and attempt to map via regexp E.g., if the directory contains file1.Head file1.BRIK, file2.Head file2.BRIK The mapper will map files to my complex type: mytype t; t[0].head = file1.HEAD t[0].brik = file1.BRIK t[1].head = file2.BRIK t[1].brik = file2.BRIK and I will be able to pass this to calls as MyCall(t[0]) [to the best of my understanding, I can achieve this now with csv mapper, but not via any other way unless I write a custom mapper, java?] But this doesn't take care of the other use of types that I planned on, which is using them as a placeholder for results of calls. I,e., if a call generates a file4.HEAD and file4.BRIK file as output, it would be nice to have the ability to say mytpe result = MyCall(myarray[0]) and then automotatically get 'result' populated result.head == HEAD result of MyCall result.brik == BRIK result of MyCall >From the viewpoint of the workflow, what this will look like is that the functions took mytpe as input and gave mytype as output rather than specific filenames. I'm new to the SWIFT so might be confusing design principles. Best, Uri On Sun, May 4, 2008 at 5:05 PM, Ben Clifford wrote: > > The v3 code probably has a very similar problem to that expressed in > bug123, which I think is that mapper parameters don't interact in well > with runtime constructed datasets. > > Perhaps this would be doable with a custom mapper that mapped an array inp > declared as: > > type t { file head; file brik; } > t inp[][]; > > such that: > > inputs/Snnnn.runbyrun.reg_TS_run-.rrrrrr+orig.sssss > > is mapped to: > > inp[nnn][rrrr].sssss > > That is, rather than feeding list of nnn and rrr and then attempting to > construct filenames, do it the other way round - use the presence of files > in the input/ directory to cause a data structure to be constructed. > > That would be straightforward, I think, if you want to process all files > that look like: > > inputs/S*.runbyrun.reg_TS_run-*+orig.* > > rather than some subset of those. > > So one question is: do you want to process all files that look like: > inputs/S*.runbyrun.reg_TS_run-*+orig.* > or do you want to process only a subset? And if so, what is the longer > term goal for selecting the subset (you have "05" and 1 hard-coded at the > moment, but I guess there's some intention to do otherwise eventually) > > On Sun, 4 May 2008, Mihael Hategan wrote: > > > Right. The problem here is that you can't reasonably "assign" file names > > to data on the fly. > > > > How do your files look like? In other words, what are your input files? > > (I get the ones in the graph, but you mention an array, so I figured > > there's more). > > > > Mihael > > > > On Sun, 2008-05-04 at 11:54 -0500, Uri Hasson wrote: > > > Hi Mihael, > > > > > > I'm trying to write an MRAC and so am finalizing up some swift > > > routines... > > > > > > One thing I can't seem to manage is to declare a complex type that > > > will contain two types (both of which are files), assign data to that > > > type, and then pass that type to a call. I think my problem is that I > > > don't know how to setup an array of structures and assign it values > > > that are file locations. > > > > > > If you have a sec, could you advise on this? > > > For starters, a graph of my current, nonelegant workflow can be seen > > > in > > > > /disks/gpfs/fmri/cnari/swift/projects/uhasson/AFNIflows/SNR/snrgraph1.png > > > > > > and the script that generates it is: > > > > /disks/gpfs/fmri/cnari/swift/projects/uhasson/AFNIflows/SNR/AFNIsnrV2.swift > > > > > > All the procedures I write are very simple: they take *.HEAD and > > > *.BRIK file pairs as input (a pair defines a brain dataset), and > > > output a *.HEAD and *.BRIK file as output. > > > > > > Now, Currently, I am stating each HEAD and BRIK file seperately as > > > arguments, e.g., AFNI_mean (string baseName, file headFile, file > > > brikFile ...) > > > > > > What I want to do is create a type like > > > type AFNI_obj{ > > > file head; > > > file brik; > > > }; > > > > > > assign the *.HEAD and *.BRIK file to that type and then pass that > > > complex type like this: AFNI_mean (string baseName, AFNI_obj t, ...) > > > > > > I've tried doing this with Sarah, but we couldn't get it to work. > > > If I do the csv mapper way, I can do it (AFNIsnrV4.swift in same dir) > > > but if I try constructing the filenames on the fly within the script > > > (feeble attempts in AFNIsnrV3.swift), it doesn't work. > > > > > > I think what I don't understand is how to manully set up an array of > > > structures similarly to what the csv mapper does. > > > If meeting in person is faster, would be very glad to meet. > > > > > > Any advice, > > > Much appreciated. > > > > > > Uri > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Sun May 4 17:43:54 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 4 May 2008 22:43:54 +0000 (GMT) Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: <1209767914.28036.5.camel@localhost> References: <1209767914.28036.5.camel@localhost> Message-ID: On Fri, 2 May 2008, Mihael Hategan wrote: > I didn't have time to test this much (given that it's not very easy to > test), so probably there will be problems. One way I was thinking of testing on a real site is to set profile keys so that jobs go into a condor pool with a requirement to not run for a specified time after submission (I think that is expressible in the classad language). That should give reproducible at-least-one-resubmission behaviour. -- From hategan at mcs.anl.gov Sun May 4 17:50:24 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 04 May 2008 17:50:24 -0500 Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: References: <1209767914.28036.5.camel@localhost> Message-ID: <1209941424.7274.0.camel@localhost> That makes sense. I think PBS has similar capabilities, but I'm not sure how one would express that in cog/gram. On Sun, 2008-05-04 at 22:43 +0000, Ben Clifford wrote: > On Fri, 2 May 2008, Mihael Hategan wrote: > > > I didn't have time to test this much (given that it's not very easy to > > test), so probably there will be problems. > > One way I was thinking of testing on a real site is to set profile keys so > that jobs go into a condor pool with a requirement to not run for a > specified time after submission (I think that is expressible in the > classad language). That should give reproducible at-least-one-resubmission > behaviour. > From hategan at mcs.anl.gov Sun May 4 18:10:02 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 04 May 2008 18:10:02 -0500 Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> <1209936647.4842.7.camel@localhost> <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> Message-ID: <1209942602.7274.8.camel@localhost> > But this doesn't take care of the other use of types that I planned > on, which is using them as a placeholder for results of calls. > I,e., if a call generates a file4.HEAD and file4.BRIK file as output, > it would be nice to have the ability to say > mytpe result = MyCall(myarray[0]) > and then automotatically get 'result' populated > result.head == HEAD result of MyCall > result.brik == BRIK result of MyCall You can do that with the simple mapper: mytype result ; You'll then get result.HEAD and result.BRIK. > > From the viewpoint of the workflow, what this will look like is that > the functions took mytpe as input and gave mytype as output rather > than specific filenames. > I'm new to the SWIFT so might be confusing design principles. > > Best, > Uri > > > On Sun, May 4, 2008 at 5:05 PM, Ben Clifford > wrote: > > The v3 code probably has a very similar problem to that > expressed in > bug123, which I think is that mapper parameters don't interact > in well > with runtime constructed datasets. > > Perhaps this would be doable with a custom mapper that mapped > an array inp > declared as: > > type t { file head; file brik; } > t inp[][]; > > such that: > > inputs/Snnnn.runbyrun.reg_TS_run-.rrrrrr+orig.sssss > > is mapped to: > > inp[nnn][rrrr].sssss > > That is, rather than feeding list of nnn and rrr and then > attempting to > construct filenames, do it the other way round - use the > presence of files > in the input/ directory to cause a data structure to be > constructed. > > That would be straightforward, I think, if you want to process > all files > that look like: > > inputs/S*.runbyrun.reg_TS_run-*+orig.* > > rather than some subset of those. > > So one question is: do you want to process all files that look > like: > inputs/S*.runbyrun.reg_TS_run-*+orig.* > or do you want to process only a subset? And if so, what is > the longer > term goal for selecting the subset (you have "05" and 1 > hard-coded at the > moment, but I guess there's some intention to do otherwise > eventually) > > > On Sun, 4 May 2008, Mihael Hategan wrote: > > > Right. The problem here is that you can't reasonably > "assign" file names > > to data on the fly. > > > > How do your files look like? In other words, what are your > input files? > > (I get the ones in the graph, but you mention an array, so I > figured > > there's more). > > > > Mihael > > > > On Sun, 2008-05-04 at 11:54 -0500, Uri Hasson wrote: > > > Hi Mihael, > > > > > > I'm trying to write an MRAC and so am finalizing up some > swift > > > routines... > > > > > > One thing I can't seem to manage is to declare a complex > type that > > > will contain two types (both of which are files), assign > data to that > > > type, and then pass that type to a call. I think my > problem is that I > > > don't know how to setup an array of structures and assign > it values > > > that are file locations. > > > > > > If you have a sec, could you advise on this? > > > For starters, a graph of my current, nonelegant workflow > can be seen > > > in > > > > /disks/gpfs/fmri/cnari/swift/projects/uhasson/AFNIflows/SNR/snrgraph1.png > > > > > > and the script that generates it is: > > > > /disks/gpfs/fmri/cnari/swift/projects/uhasson/AFNIflows/SNR/AFNIsnrV2.swift > > > > > > All the procedures I write are very simple: they take > *.HEAD and > > > *.BRIK file pairs as input (a pair defines a brain > dataset), and > > > output a *.HEAD and *.BRIK file as output. > > > > > > Now, Currently, I am stating each HEAD and BRIK file > seperately as > > > arguments, e.g., AFNI_mean (string baseName, file > headFile, file > > > brikFile ...) > > > > > > What I want to do is create a type like > > > type AFNI_obj{ > > > file head; > > > file brik; > > > }; > > > > > > assign the *.HEAD and *.BRIK file to that type and then > pass that > > > complex type like this: AFNI_mean (string baseName, > AFNI_obj t, ...) > > > > > > I've tried doing this with Sarah, but we couldn't get it > to work. > > > If I do the csv mapper way, I can do it (AFNIsnrV4.swift > in same dir) > > > but if I try constructing the filenames on the fly within > the script > > > (feeble attempts in AFNIsnrV3.swift), it doesn't work. > > > > > > I think what I don't understand is how to manully set up > an array of > > > structures similarly to what the csv mapper does. > > > If meeting in person is faster, would be very glad to > meet. > > > > > > Any advice, > > > Much appreciated. > > > > > > Uri > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From uhasson at gmail.com Sun May 4 18:30:51 2008 From: uhasson at gmail.com (Uri Hasson) Date: Sun, 4 May 2008 18:30:51 -0500 Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> <1209936647.4842.7.camel@localhost> <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> Message-ID: <32078b1e0805041630r2eb8a906u8d97678608a02d12@mail.gmail.com> Hi all, thanks for the Sunday help. I tried the simple mapper, but not go. The short buggy SWIFT script is here: http://www.ci.uchicago.edu/~uhasson/AFNIsnrV3.swift The input directory contains S05.runbyrun.reg_TS_run-1+orig.BRIK S05.runbyrun.reg_TS_run-1+orig.HEAD S05.runbyrun.reg_TS_run-2+orig.BRIK S05.runbyrun.reg_TS_run-2+orig.HEAD S05.runbyrun.reg_TS_run-3+orig.BRIK S05.runbyrun.reg_TS_run-3+orig.HEAD I hoped that as a result of the simple_mapper, sruns[0].HEAD==S05.runbyrun.reg_TS_run-1+orig.HEAD sruns[0].BRIK==S05.runbyrun.reg_TS_run-1+orig.BRIK (I tried reading via both prefix and pattern methods in the mapper) However, I get an error: Execution failed: org.griphyn.vdl.mapping.InvalidPathException: Invalid path ([0]) for type AFNI_obj[] log file: /disks/gpfs/fmri/cnari/swift/projects/uhasson/AFNIflows/SNR/AFNIsnrV3-20080504-1826-37q86081.log Also, is there any print() or trace() command I could use to see whether sruns was assigned the correct values? I tried print(@filenames(sruns[0])), print(@filename(sruns[0].HEAD)) but am not getting anything meaningful. Best, Uri On Sun, May 4, 2008 at 5:31 PM, Ben Clifford wrote: > > > E.g., if the directory contains > > > > file1.Head file1.BRIK, file2.Head file2.BRIK > > > > The mapper will map files to my complex type: > > mytype t; > > t[0].head = file1.HEAD > > t[0].brik = file1.BRIK > > t[1].head = file2.BRIK > > t[1].brik = file2.BRIK > > and I will be able to pass this to calls as MyCall(t[0]) > > [to the best of my understanding, I can achieve this now with csv > mapper, > > but not via any other way unless I write a custom mapper, java?] > > You should be able to map this with simple_mapper too. Something like > this: > > type t { file HEAD; file BRIK; } > t inp[] ; > > > But this doesn't take care of the other use of types that I planned on, > > which is using them as a placeholder for results of calls. > > This bit I'll think about - similar, but not the same, stuff has been done > before. > > -- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Sun May 4 18:36:37 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 4 May 2008 23:36:37 +0000 (GMT) Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: <32078b1e0805041630r2eb8a906u8d97678608a02d12@mail.gmail.com> References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> <1209936647.4842.7.camel@localhost> <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> <32078b1e0805041630r2eb8a906u8d97678608a02d12@mail.gmail.com> Message-ID: > Also, is there any print() or trace() command I could use to see whether > sruns was assigned the correct values? > I tried print(@filenames(sruns[0])), > print(@filename(sruns[0].HEAD)) > but am not getting anything meaningful. Using trace rather than print should give better results there. Though if you're getting an invalid path exception accessing pieces of sruns, that might happen in such a line too. -- From benc at hawaga.org.uk Sun May 4 18:53:02 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 4 May 2008 23:53:02 +0000 (GMT) Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: <32078b1e0805041630r2eb8a906u8d97678608a02d12@mail.gmail.com> References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> <1209936647.4842.7.camel@localhost> <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> <32078b1e0805041630r2eb8a906u8d97678608a02d12@mail.gmail.com> Message-ID: ok try this: put this in afnimapper and chmod a+x it: #!/bin/bash # S05.runbyrun.reg_TS_run-2+orig.HEAD ls S*.runbyrun.reg_TS_run-*orig* | sed 's/^\(S05.runbyrun.reg_TS_run-\([0-9]*\)+orig.\(.*\)\)$/[\2].\3 \1/' then map srun like this: AFNI_obj srun[] ; You can say something like this: trace(@filename(srun[3].BRIK)); which will give you this trace: SwiftScript trace: S05.runbyrun.reg_TS_run-3+orig.BRIK -- From uhasson at gmail.com Sun May 4 19:18:31 2008 From: uhasson at gmail.com (Uri Hasson) Date: Sun, 4 May 2008 19:18:31 -0500 Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> <1209936647.4842.7.camel@localhost> <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> <32078b1e0805041630r2eb8a906u8d97678608a02d12@mail.gmail.com> Message-ID: <32078b1e0805041718i6f37e4b0ic2d3bc5cd6e62ad7@mail.gmail.com> This solves the problem. Posted final version of .swift script to http://www.ci.uchicago.edu/~uhasson/AFNIsnrV3.swift Many thanks, Uri On Sun, May 4, 2008 at 6:53 PM, Ben Clifford wrote: > ok try this: > > put this in afnimapper and chmod a+x it: > > #!/bin/bash > > # S05.runbyrun.reg_TS_run-2+orig.HEAD > > ls S*.runbyrun.reg_TS_run-*orig* | sed > 's/^\(S05.runbyrun.reg_TS_run-\([0-9]*\)+orig.\(.*\)\)$/[\2].\3 \1/' > > then map srun like this: > > AFNI_obj srun[] ; > > You can say something like this: > > trace(@filename(srun[3].BRIK)); > > which will give you this trace: > SwiftScript trace: S05.runbyrun.reg_TS_run-3+orig.BRIK > > -- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Sun May 4 20:35:49 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 5 May 2008 01:35:49 +0000 (GMT) Subject: [Swift-devel] Re: arrays of structures and building them in SWIFT In-Reply-To: <32078b1e0805041718i6f37e4b0ic2d3bc5cd6e62ad7@mail.gmail.com> References: <32078b1e0805040954q5cf80eeeg965d6145b2c0687e@mail.gmail.com> <1209936647.4842.7.camel@localhost> <32078b1e0805041527n7531f72fh1f8df2391d7d1f93@mail.gmail.com> <32078b1e0805041630r2eb8a906u8d97678608a02d12@mail.gmail.com> <32078b1e0805041718i6f37e4b0ic2d3bc5cd6e62ad7@mail.gmail.com> Message-ID: > This solves the problem. ok good. You should be able to change the 05 into another array index too, through a similar looking regexp (or something that is actually readable/maintainable...) -- From hategan at mcs.anl.gov Mon May 5 01:51:04 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 05 May 2008 01:51:04 -0500 Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: <1209941424.7274.0.camel@localhost> References: <1209767914.28036.5.camel@localhost> <1209941424.7274.0.camel@localhost> Message-ID: <1209970264.18359.1.camel@localhost> And then there's the "pbs simulator" I have. Anyway, the point is it (replication) doesn't work just yet (as one would easily have suspected). Stay tuned. On Sun, 2008-05-04 at 17:50 -0500, Mihael Hategan wrote: > That makes sense. I think PBS has similar capabilities, but I'm not sure > how one would express that in cog/gram. > > On Sun, 2008-05-04 at 22:43 +0000, Ben Clifford wrote: > > On Fri, 2 May 2008, Mihael Hategan wrote: > > > > > I didn't have time to test this much (given that it's not very easy to > > > test), so probably there will be problems. > > > > One way I was thinking of testing on a real site is to set profile keys so > > that jobs go into a condor pool with a requirement to not run for a > > specified time after submission (I think that is expressible in the > > classad language). That should give reproducible at-least-one-resubmission > > behaviour. > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From uhasson at gmail.com Mon May 5 20:59:51 2008 From: uhasson at gmail.com (Uri Hasson) Date: Mon, 5 May 2008 20:59:51 -0500 Subject: [Swift-devel] mapping files from directories Message-ID: <32078b1e0805051859i6bbd5d2dna0565bc013ba9d36@mail.gmail.com> Hello all, I seem to be able to map files when no directory is stated as part of the prefix, but this mapping fails when path is included. Works: AFNIobj inp; Breaks: AFNIobj inp; The first succeeds [SwiftScript trace: ts.005.1+orig.BRIK SwiftScript trace: ts.005.1+orig.HEAD] and the second gives: java.lang.RuntimeException: Data set initialization failed for org.griphyn.vdl.mapping.RootDataNode identifier tag:benc at ci.uchicago.edu,2008:swift:dataset:20080505-2056-s3bk01i8:720000000047 with no value at dataset=inp (closed). Missing required field: HEAD Only difference is file location. Because I intend to map files existing in various directories in the file system (or even in an 'inputs' dir underneath the current working swift dir), understanding this would be helpful. Thanks, Uri On Sun, May 4, 2008 at 8:35 PM, Ben Clifford wrote: > > > This solves the problem. > > ok good. You should be able to change the 05 into another array index too, > through a similar looking regexp (or something that is actually > readable/maintainable...) > > -- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon May 5 21:01:39 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 05 May 2008 21:01:39 -0500 Subject: [Swift-devel] Re: mapping files from directories In-Reply-To: <32078b1e0805051859i6bbd5d2dna0565bc013ba9d36@mail.gmail.com> References: <32078b1e0805051859i6bbd5d2dna0565bc013ba9d36@mail.gmail.com> Message-ID: <1210039299.2395.0.camel@localhost> I think you need to use the location parameter: inp ; On Mon, 2008-05-05 at 20:59 -0500, Uri Hasson wrote: > Hello all, > > I seem to be able to map files when no directory is stated as part of > the prefix, but this mapping fails when path is included. > > > Works: > AFNIobj > inp; > > Breaks: > AFNIobj > inp; > > The first succeeds [SwiftScript trace: ts.005.1+orig.BRIK SwiftScript > trace: ts.005.1+orig.HEAD] > and the second gives: > java.lang.RuntimeException: Data set initialization failed for > org.griphyn.vdl.mapping.RootDataNode identifier > tag:benc at ci.uchicago.edu,2008:swift:dataset:20080505-2056-s3bk01i8:720000000047 with no value at dataset=inp (closed). Missing required field: HEAD > > Only difference is file location. > > Because I intend to map files existing in various directories in the > file system (or even in an 'inputs' dir underneath the current working > swift dir), understanding this would be helpful. > > Thanks, > Uri > > > On Sun, May 4, 2008 at 8:35 PM, Ben Clifford > wrote: > > > This solves the problem. > > ok good. You should be able to change the 05 into another > array index too, > through a similar looking regexp (or something that is > actually > readable/maintainable...) > > -- > > From benc at hawaga.org.uk Tue May 6 05:46:52 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 6 May 2008 10:46:52 +0000 (GMT) Subject: [Swift-devel] provider-condor Message-ID: I spent a few minutes seeing if provider-condor would work with swift out-of-the-box. It doesn't. No real high priority need, but I thought it might be interesting to know if it works or not. Caused by: org.globus.cog.karajan.workflow.futures.FutureVariableArguments Caused by: java.lang.ClassCastException: org.globus.cog.karajan.workflow.futures.FutureVariableArguments at org.globus.cog.abstraction.impl.execution.condor.DescriptionFileGenerator.constructDescriptionFile(DescriptionFileGenerator.java:90) at org.globus.cog.abstraction.impl.execution.condor.DescriptionFileGenerator.generate(DescriptionFileGenerator.java:26) at org.globus.cog.abstraction.impl.execution.condor.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:96) at java.lang.Thread.run(Thread.java:595) -- From hategan at mcs.anl.gov Tue May 6 09:21:56 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 06 May 2008 09:21:56 -0500 Subject: [Swift-devel] provider-condor In-Reply-To: References: Message-ID: <1210083717.10355.0.camel@localhost> That ain't right. A karajan future shouldn't be able to make it into a task description. On Tue, 2008-05-06 at 10:46 +0000, Ben Clifford wrote: > I spent a few minutes seeing if provider-condor would work with swift > out-of-the-box. It doesn't. No real high priority need, but I thought it > might be interesting to know if it works or not. > > Caused by: > org.globus.cog.karajan.workflow.futures.FutureVariableArguments > Caused by: > java.lang.ClassCastException: > org.globus.cog.karajan.workflow.futures.FutureVariableArguments > at > org.globus.cog.abstraction.impl.execution.condor.DescriptionFileGenerator.constructDescriptionFile(DescriptionFileGenerator.java:90) > at > org.globus.cog.abstraction.impl.execution.condor.DescriptionFileGenerator.generate(DescriptionFileGenerator.java:26) > at > org.globus.cog.abstraction.impl.execution.condor.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:96) > at java.lang.Thread.run(Thread.java:595) > > From iraicu at cs.uchicago.edu Tue May 6 13:08:29 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 06 May 2008 13:08:29 -0500 Subject: [Swift-devel] [Fwd: 4th IEEE International Conference on eScience] Message-ID: <48209E9D.5010302@cs.uchicago.edu> Here is a good conference to showcase scientific applications, and its relatively local in Indianapolis Indiana! Ioan -------- Original Message -------- Subject: 4th IEEE International Conference on eScience Date: Tue, 6 May 2008 11:00:09 -0700 From: news at teragrid.org A new message has been posted to TeraGrid News. Categories: Science Gateways Organizing committees of the 4th International IEEE Computer Society Technical Committee on Scalable Computing eScience 2008 Conference are now accepting papers and proposals for tutorials; posters, exhibits, and demos; and workshops and special sessions. The conference is being held in partnership with the Microsoft Research eScience Workshop and is hosted by Indiana University. Conference Dates: December 7-12, 2008 Conference Location: University Place Conference Center, Indiana University/Purdue University (IUPUI) Campus, Indianapolis, Indiana Submission Deadlines: * Papers & Tutorials: July 20, 2008 * Posters, Exhibits, and Demos: September 14, 2008 * Workshops and Special Sessions: June 20, 2008 For submission guidelines and more information visit the conference Web site: http://escience2008.iu.edu . Topics of interest cover applications and technologies related to e-Science and grid and cloud computing. They include, but are not limited to, the following: * Application development environments * Autonomic, real-time, and self-organizing grids * Cloud computing and storage * Collaborative science models and techniques * Enabling technologies: Internet and Web services * e-Science for applications including physics, biology, astronomy, chemistry, finance, engineering, and the humanities * Grid economy and business models * Problem-solving environments * Programming paradigms and models * Resource management and scheduling * Security challenges for grids and e-Science * Sensor networks and environmental observatories * Service-oriented grid architectures * Virtual instruments and data access management * Virtualization for technical computing * Web 2.0 technology and services for e-Science Sponsors Include: * IEEE Computer Society Committee on Scalable Computing * Microsoft Research * Pervasive Technology Labs at Indiana University * Indiana University School of Informatics * Louisiana State University Center for Computation and Technology Conference Leadership: General Chairs Geoffrey Fox, Indiana University, United States Dennis Gannon, Indiana University, United States Program Chair Anne Trefethen, University of Oxford, United Kingdom Program Vice-Chair David Wallom, University of Oxford, United Kingdom Workshops Chair Ken Chiu, State University of New York, United States Tutorials Chair Krishna Madhavan, Clemson University, United States Exhibits, Demos, and Posters Chair Daniel S. Katz, Louisiana State University, United States Exhibits, Demos, and Posters Vice-Chair Shantenu Jha, Louisiana State University, United States Education, Diversity, and Broadening Participation Chair Alex Ramirez, Hispanic Association of Colleges and Universities Communication and Outreach Chair Daphne Siefert-Herron, Indiana University, United States Microsoft e-Science Conference Chair Kristin Tolle, Microsoft, United States Conference Manager Therese Miller, Indiana University, United States Posted on 06-MAY-2008 10:58 US Pacific Time by Nancy Wilkins-Diehr _______________________________________________________________ This message was generated by TG News v2.0. You can also view it here . To view RSS options, unsubscribe or change the categories to which you are subscribed, please visit Subscription Management . -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Tue May 6 12:55:16 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 06 May 2008 12:55:16 -0500 Subject: [Swift-devel] [Fwd: Re: [Dsl-seminar] "Towards Loosely-Coupled Programming on Petascale Systems"] Message-ID: <48209B84.8070209@cs.uchicago.edu> Hi all, I just thought this talk might be interesting to the Swift and Falkon community, feel free to come by today at 4:30PM in RI405 at UChicago for the talk. Also, sorry for the short notice, especially if you aren't on the UChicago campus. Ioan -------- Original Message -------- Subject: Re: [Dsl-seminar] tomorrow's seminar on "Towards Loosely-Coupled Programming on Petascale Systems" Date: Tue, 06 May 2008 11:36:38 -0500 From: Ioan Raicu Reply-To: iraicu at cs.uchicago.edu Organization: University of Chicago, Computer Science Department To: dsl-seminar at cs.uchicago.edu References: <481F399A.1030204 at cs.uchicago.edu> Hi all, This is just a friendly reminder for today's seminar, that will take place at 4:30PM in RI405. See you at the seminar, Ioan PS: We'll be having some pizza today, instead of the usual chips and cookies :) Ioan Raicu wrote: > Hi all, > Tomorrow Zhao will present some of our recent work on running > large-scale loosely-coupled applications on the latest IBM BlueGene/P > and the SiCortex systems. This is work we just submitted for review to > SC08. > > The talk abstract is: > We have extended the Falkon lightweight task execution framework to make > loosely coupled programming on petascale systems a practical and useful > programming model. In this work we study and measure the performance > factors involved in applying this approach to enable the utilization of > petascale systems by a broader user community, and with greater ease. > Our work enables the execution of highly parallel computations composed > of loosely coupled serial jobs with no modifications to the respective > applications. This approach allows new?and potentially far > larger?classes of application to leverage systems such as the IBM Blue > Gene/P supercomputer and similar emerging petascale architectures. We > present here the challenges of I/O performance encountered in making > this model practical, and show results using both micro-benchmarks and > real applications on two large-scale systems, the BG/P and the SiCortex > SC5832. Our benchmarks show that we can scale to thousands of processors > with high efficiency, and can achieve thousands of tasks/sec sustained > execution rates for parallel workloads of ordinary serial applications. > We measured applications from two domains, economic energy modeling and > molecular dynamics. Both show excellent speedup and efficiency as they > scale to thousands of processors. > > See you tomorrow (Tuesday) at 4:30PM in RI405, > Ioan > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== _______________________________________________ DSL-seminar mailing list DSL-seminar at mailman.cs.uchicago.edu https://mailman.cs.uchicago.edu/mailman/listinfo/dsl-seminar -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Wed May 7 07:27:49 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 07:27:49 -0500 (CDT) Subject: [Swift-devel] [Bug 129] ENV profiles using GRAM2 cause console output of environment variable value In-Reply-To: Message-ID: <20080507122749.89027164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=129 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from benc at hawaga.org.uk 2008-05-07 07:27 ------- fixed in cog r2001 -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Wed May 7 09:05:19 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 09:05:19 -0500 (CDT) Subject: [Swift-devel] [Bug 138] spaces in filenames sometimes don't work right with GRAM2+Condor In-Reply-To: Message-ID: <20080507140519.69360164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=138 ------- Comment #2 from benc at hawaga.org.uk 2008-05-07 09:05 ------- On CI GPFS: /disks/gpfs/swift/benc/142-space-and-quotes-20080507-1437-yyi0u10g is run with condor 142-space-and-quotes-20080507-1422-u55vj452 is run (successfully) with fork In the job directories run by Condor, three output files were created: 142-space-and-quotes-20080507-1437-yyi0u10g/jobs/3/touch-3gia7bsi/142-space-and-quotes. 142-space-and-quotes-20080507-1437-yyi0u10g/jobs/3/touch-3gia7bsi/space 142-space-and-quotes-20080507-1437-yyi0u10g/jobs/3/touch-3gia7bsi/.out instead of a single output file that is what should have appeared: 142-space-and-quotes-20080507-1422-u55vj452/shared/142-space-and-quotes. space .out Not clear why that fails in that way, but test 141, which attempts to create a file "141-space-in-filename.space here.out" succeeds -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed May 7 09:11:02 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 09:11:02 -0500 (CDT) Subject: [Swift-devel] [Bug 138] spaces in filenames sometimes don't work right with GRAM2+Condor In-Reply-To: Message-ID: <20080507141102.1F4F6164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=138 ------- Comment #3 from benc at hawaga.org.uk 2008-05-07 09:11 ------- I see what a difference between the two tests is. 141 creates a file with a space in the name by directing stdout to that file. stdout with spaces in appears to be passed ok. 142 creates files with spaces in the name by passing the name as a commandline parameter to touch. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed May 7 09:12:49 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 09:12:49 -0500 (CDT) Subject: [Swift-devel] [Bug 137] undeclared procedures are not detected until runtime In-Reply-To: Message-ID: <20080507141249.85464164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=137 ------- Comment #1 from benc at hawaga.org.uk 2008-05-07 09:12 ------- I suggested to Milena, my google summer of code student, that look at this. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Wed May 7 10:37:59 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 10:37:59 -0500 (CDT) Subject: [Swift-devel] [Bug 138] spaces in filenames sometimes don't work right with GRAM2+Condor In-Reply-To: Message-ID: <20080507153759.D0AC9164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=138 ------- Comment #4 from benc at hawaga.org.uk 2008-05-07 10:37 ------- I compared two info files: 142-space-and-quotes-20080507-1422-u55vj452/info/m/touch-mugn6bsi-info for fork and 142-space-and-quotes-20080507-1437-yyi0u10g/info/3/touch-3gia7bsi-info for condor using diff. In the header, the ARGS and OUTF variables as reported by the info logging have the same (byte-wise) value in both (though that doesn't mean that they actually have the same value - some stripping of something could be happening in the info logging) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Wed May 7 10:45:30 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 10:45:30 -0500 (CDT) Subject: [Swift-devel] [Bug 139] New: simple_mapper variable/optional padding Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=139 Summary: simple_mapper variable/optional padding Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk Allow padding of array index fields - rather than fixing at 4, allow this to be specified or turned off with a parameter. For example, padding=4 to get padding to 4 digits (the present behaviour), or padding=0 to disable padding. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Wed May 7 10:45:42 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 10:45:42 -0500 (CDT) Subject: [Swift-devel] [Bug 139] simple_mapper variable/optional padding In-Reply-To: Message-ID: <20080507154542.02113164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=139 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Wed May 7 11:51:02 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 11:51:02 -0500 (CDT) Subject: [Swift-devel] [Bug 97] SwiftScript parser seems to get upset with mappers that have no parameters. In-Reply-To: Message-ID: <20080507165102.6578B164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=97 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from benc at hawaga.org.uk 2008-05-07 11:51 ------- I just tested this again, and it appears to work with r1831. Rather more mysteriously, there's a compiler test for this, tests/language/016-mapper-noparam.swift that appears to have been around since r597. So its not clear what was going on when this bug was submitted. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Wed May 7 14:02:54 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 14:02:54 -0500 (CDT) Subject: [Swift-devel] [Bug 139] simple_mapper variable/optional padding In-Reply-To: Message-ID: <20080507190254.2A013164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=139 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from benc at hawaga.org.uk 2008-05-07 14:02 ------- r1939 introduces a padding parameter, but it must be a string at the moment - something doesn't work right in the argument processing so it can't take integers. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Wed May 7 14:07:36 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 7 May 2008 14:07:36 -0500 (CDT) Subject: [Swift-devel] [Bug 116] simple_mapper handling of numbered files in arrays broken In-Reply-To: Message-ID: <20080507190736.9F93F164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=116 ------- Comment #2 from benc at hawaga.org.uk 2008-05-07 14:07 ------- wrt Tibi's comment (which is not the same as the body of this bug) - see bug 139, commit r1939 -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Wed May 7 16:06:57 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 7 May 2008 21:06:57 +0000 (GMT) Subject: [Swift-devel] Attempted to close nonexistent channel buffers Message-ID: Recently I've seen this message: > Execution failed: > Attempted to close nonexistent channel buffers on occasion, when a procedure is invoked with the wrong number of parameters. I hope that this will get caught at compile time later on when some more compile-time type-checking work is done; but here's a note on it for the archives and Google. -- From hategan at mcs.anl.gov Wed May 7 18:24:51 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 May 2008 18:24:51 -0500 Subject: [Swift-devel] Attempted to close nonexistent channel buffers In-Reply-To: References: Message-ID: <1210202691.21492.11.camel@localhost> On Wed, 2008-05-07 at 21:06 +0000, Ben Clifford wrote: > Recently I've seen this message: > > > Execution failed: > > Attempted to close nonexistent channel buffers > > on occasion, when a procedure is invoked with the wrong number of > parameters. Noted. > > I hope that this will get caught at compile time later on when some more > compile-time type-checking work is done; but here's a note on it for the > archives and Google. > From benc at hawaga.org.uk Wed May 7 18:35:56 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 7 May 2008 23:35:56 +0000 (GMT) Subject: [Swift-devel] Cannot parse the given RSL Message-ID: Submitting through GRAM2, the following is happening when submitting (eg to fletch-fork or tguc-pbs) both to me and mike kubal: 2008-05-08 00:33:08,153+0100 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=touch-hp43vbsi - Application exception: Cannot parse the given RSL vdl:execute @ vdl-int.k, line: 390 sys:sequential @ vdl-int.k, line: 382 sys:try @ vdl-int.k, line: 381 task:allocatehost @ vdl-int.k, line: 360 vdl:execute2 @ execute-default.k, line: 23 sys:parallelfor @ execute-default.k, line: 21 sys:restartonerror @ execute-default.k, line: 17 [...] This has been introduced in the past 12 hours I think (though I'm entirely sure - I don't have full automated regression tests for GRAM running). -- From benc at hawaga.org.uk Wed May 7 19:02:17 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 8 May 2008 00:02:17 +0000 (GMT) Subject: [Swift-devel] gram4.2 and swift Message-ID: Earlier I talked to Martin a bit about submitting to GRAM4.2. It sounds like my previous understanding is true, that the gram4.0 and gram4.2 client code is not classpath-compatible - there are classes with the same name in both client codebases that are different between the two client codebases, in one for submitting to gram4.0 and in another for submitting to gram4.2 Classloader fun awaits! -- From tfreeman at mcs.anl.gov Wed May 7 19:13:22 2008 From: tfreeman at mcs.anl.gov (Tim Freeman) Date: Wed, 7 May 2008 19:13:22 -0500 Subject: [Swift-devel] gram4.2 and swift In-Reply-To: References: Message-ID: <20080507191322.19c6867b.tfreeman@mcs.anl.gov> On Thu, 8 May 2008 00:02:17 +0000 (GMT) Ben Clifford wrote: > > Earlier I talked to Martin a bit about submitting to GRAM4.2. > > It sounds like my previous understanding is true, that the gram4.0 and > gram4.2 client code is not classpath-compatible - there are classes with > the same name in both client codebases that are different between the two > client codebases, in one for submitting to gram4.0 and in another for > submitting to gram4.2 > > Classloader fun awaits! Classloaders are fun... but maybe this will "just work" for the situation? http://code.google.com/p/jarjar/ Tim > > -- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Wed May 7 19:28:21 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 May 2008 19:28:21 -0500 Subject: [Swift-devel] gram4.2 and swift In-Reply-To: References: Message-ID: <1210206501.26793.2.camel@localhost> On Thu, 2008-05-08 at 00:02 +0000, Ben Clifford wrote: > Earlier I talked to Martin a bit about submitting to GRAM4.2. > > It sounds like my previous understanding is true, that the gram4.0 and > gram4.2 client code is not classpath-compatible - there are classes with > the same name in both client codebases that are different between the two > client codebases, in one for submitting to gram4.0 and in another for > submitting to gram4.2 > > Classloader fun awaits! Yeah. That was fun the first time (gt3.0, gt3.2, gt4). There's also the option of compile-time selection, though that would prevent one from having both 4.0 and 4.2 services being used in the same run. > From benc at hawaga.org.uk Wed May 7 19:31:26 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 8 May 2008 00:31:26 +0000 (GMT) Subject: [Swift-devel] tests/sites/ Message-ID: I just did some reconfiguration of the tests/sites/ site tests (in r1940), so that any swift developer who has access to OSG and CI resources and TeraGrid DAC TG-CCR080002N (the Swift development DAC) should be able to run them. In practice that means me and mihael, rather than just me. What I did was make mode +t working directories on each site, usually under my home directory, and set each site test to point there. So to run: 1. put the Swift to be tested on your path 2. make a proxy 3. cd tests/sites 4. ./run-all 5. go for a two hour hike or otherwise occupy self 6. return to find list of sites which failed on stdout, at end of run. At the moment, it doesn't keep logs for sites because of the way it uses the other bits of the test suite, but you can run an individual site by typing: ./run-site tguc-fork-gram4.xml (for example) which will leave all the logs in ../language-behaviour/ -- From benc at hawaga.org.uk Wed May 7 19:53:01 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 8 May 2008 00:53:01 +0000 (GMT) Subject: [Swift-devel] gram4.2 and swift In-Reply-To: <1210206501.26793.2.camel@localhost> References: <1210206501.26793.2.camel@localhost> Message-ID: On Wed, 7 May 2008, Mihael Hategan wrote: > There's also the option of compile-time selection, though that would > prevent one from having both 4.0 and 4.2 services being used in the same > run. That is undesirable to me - it puts a flag-day in place, that being the day we choose to favour 4.2 over 4.0 in our releases. On the other hand, I would guess it is unlikely that 4.2 will see appreciable deployment in production settings in the immediate future, so this isn't something that needs sorting right away. -- From hategan at mcs.anl.gov Wed May 7 20:21:08 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 May 2008 20:21:08 -0500 Subject: [Swift-devel] gram4.2 and swift In-Reply-To: References: <1210206501.26793.2.camel@localhost> Message-ID: <1210209668.27373.3.camel@localhost> On Thu, 2008-05-08 at 00:53 +0000, Ben Clifford wrote: > On Wed, 7 May 2008, Mihael Hategan wrote: > > > There's also the option of compile-time selection, though that would > > prevent one from having both 4.0 and 4.2 services being used in the same > > run. > > That is undesirable to me - it puts a flag-day in place, that being the > day we choose to favour 4.2 over 4.0 in our releases. Not necessarily. We distribute both. We let the users choose, depending on their services. Of course, I disliked this when it had to be done for gt3X, and I equally dislike it now, so we'll probably go through the buggy and problematic path of funny class loader stuff. > > On the other hand, I would guess it is unlikely that 4.2 will see > appreciable deployment in production settings in the immediate future, so > this isn't something that needs sorting right away. > From hategan at mcs.anl.gov Wed May 7 20:23:36 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 May 2008 20:23:36 -0500 Subject: [Swift-devel] gram4.2 and swift In-Reply-To: <20080507191322.19c6867b.tfreeman@mcs.anl.gov> References: <20080507191322.19c6867b.tfreeman@mcs.anl.gov> Message-ID: <1210209816.27373.7.camel@localhost> On Wed, 2008-05-07 at 19:13 -0500, Tim Freeman wrote: > On Thu, 8 May 2008 00:02:17 +0000 (GMT) > Ben Clifford wrote: > > > > > Earlier I talked to Martin a bit about submitting to GRAM4.2. > > > > It sounds like my previous understanding is true, that the gram4.0 and > > gram4.2 client code is not classpath-compatible - there are classes with > > the same name in both client codebases that are different between the two > > client codebases, in one for submitting to gram4.0 and in another for > > submitting to gram4.2 > > > > Classloader fun awaits! > > Classloaders are fun... but maybe this will "just work" for the situation? > > http://code.google.com/p/jarjar/ Probably not. We want both 4.0 and 4.2 to be in the (a?) classpath at the same time. Also, some signed security libraries won't be very happy about being repacked. > > Tim > > > > > > -- > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Wed May 7 20:28:12 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 May 2008 20:28:12 -0500 Subject: [Swift-devel] Cannot parse the given RSL In-Reply-To: References: Message-ID: <1210210092.27373.8.camel@localhost> On Wed, 2008-05-07 at 23:35 +0000, Ben Clifford wrote: > Submitting through GRAM2, the following is happening when submitting (eg > to fletch-fork or tguc-pbs) both to me and mike kubal: > > 2008-05-08 00:33:08,153+0100 DEBUG vdl:execute2 APPLICATION_EXCEPTION > jobid=touch-hp43vbsi - Application exception: Cannot parse the given RSL > vdl:execute @ vdl-int.k, line: 390 > sys:sequential @ vdl-int.k, line: 382 > sys:try @ vdl-int.k, line: 381 > task:allocatehost @ vdl-int.k, line: 360 > vdl:execute2 @ execute-default.k, line: 23 > sys:parallelfor @ execute-default.k, line: 21 > sys:restartonerror @ execute-default.k, line: 17 > [...] > > This has been introduced in the past 12 hours I think (though I'm entirely > sure I don't quite see what could have caused it. Do you have a full stack trace? > - I don't have full automated regression tests for GRAM running). > From hategan at mcs.anl.gov Wed May 7 23:05:51 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 May 2008 23:05:51 -0500 Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: <1209970264.18359.1.camel@localhost> References: <1209767914.28036.5.camel@localhost> <1209941424.7274.0.camel@localhost> <1209970264.18359.1.camel@localhost> Message-ID: <1210219551.10331.2.camel@localhost> Ok. Second attempt. swift r1944, cog r2003. This time it got some testing. There is still some trouble and that is a possible race condition between the time a job becomes active and its replicas are canceled. It may very well happen that if canceling takes a sufficient amount of time, more than one job will complete causing who knows what. On Mon, 2008-05-05 at 01:51 -0500, Mihael Hategan wrote: > And then there's the "pbs simulator" I have. Anyway, the point is it > (replication) doesn't work just yet (as one would easily have > suspected). Stay tuned. > > On Sun, 2008-05-04 at 17:50 -0500, Mihael Hategan wrote: > > That makes sense. I think PBS has similar capabilities, but I'm not sure > > how one would express that in cog/gram. > > > > On Sun, 2008-05-04 at 22:43 +0000, Ben Clifford wrote: > > > On Fri, 2 May 2008, Mihael Hategan wrote: > > > > > > > I didn't have time to test this much (given that it's not very easy to > > > > test), so probably there will be problems. > > > > > > One way I was thinking of testing on a real site is to set profile keys so > > > that jobs go into a condor pool with a requirement to not run for a > > > specified time after submission (I think that is expressible in the > > > classad language). That should give reproducible at-least-one-resubmission > > > behaviour. > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From bugzilla-daemon at mcs.anl.gov Thu May 8 03:50:03 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 8 May 2008 03:50:03 -0500 (CDT) Subject: [Swift-devel] [Bug 138] spaces in filenames sometimes don't work right with GRAM2+Condor In-Reply-To: Message-ID: <20080508085003.B5F0A164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=138 ------- Comment #5 from hategan at mcs.anl.gov 2008-05-08 03:50 ------- Presumably the eternal condor problem with spaces. I've added a line in the wrapper to log argc. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu May 8 05:13:40 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 8 May 2008 05:13:40 -0500 (CDT) Subject: [Swift-devel] [Bug 134] Unicode in source text string literals not passed through In-Reply-To: Message-ID: <20080508101340.57359164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=134 ------- Comment #4 from hategan at mcs.anl.gov 2008-05-08 05:13 ------- I think there are a few more problems there. file 入 <"in.txt">; file out <"out.txt">; out = echo(入); Produces: ... Caused by: line 9:6: unexpected char: 0xE5 at org.globus.swift.parser.SwiftScriptLexer.nextToken(SwiftScriptLexer.java:284) at antlr.TokenBuffer.fill(TokenBuffer.java:69) at antlr.TokenBuffer.LA(TokenBuffer.java:80) at antlr.LLkParser.LA(LLkParser.java:52) at org.globus.swift.parser.SwiftScriptParser.topLevelStatement(SwiftScriptParser.java:190) at org.globus.swift.parser.SwiftScriptParser.program(SwiftScriptParser.java:107) at org.griphyn.vdl.toolkit.VDLt2VDLx.compile(VDLt2VDLx.java:63) ... 3 more -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Thu May 8 05:15:51 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 8 May 2008 05:15:51 -0500 (CDT) Subject: [Swift-devel] [Bug 134] Unicode in source text string literals not passed through In-Reply-To: Message-ID: <20080508101551.23D8C164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=134 ------- Comment #5 from hategan at mcs.anl.gov 2008-05-08 05:15 ------- Seems like Bugzilla has some problems, too. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Thu May 8 05:22:26 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 8 May 2008 05:22:26 -0500 (CDT) Subject: [Swift-devel] [Bug 134] Unicode in source text string literals not passed through In-Reply-To: Message-ID: <20080508102226.E2481164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=134 ------- Comment #6 from benc at hawaga.org.uk 2008-05-08 05:22 ------- In token names, non-ASCII will encounter two problems at least - first the same encoding problem as for literal strings; and secondly definition of tokens, which is a fairly restrictive listing of which characters are allowed. Identifier-wise the rules used in java might be something to aim for: http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#40625 (and yes bugzilla has problems - the string I typed in the original report appears (for me) as "?????" not the elementary japanese. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From benc at hawaga.org.uk Thu May 8 05:27:55 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 8 May 2008 10:27:55 +0000 (GMT) Subject: [Swift-devel] Cannot parse the given RSL In-Reply-To: <1210210092.27373.8.camel@localhost> References: <1210210092.27373.8.camel@localhost> Message-ID: On Wed, 7 May 2008, Mihael Hategan wrote: > I don't quite see what could have caused it. Do you have a full stack > trace? Here's a full log file: ci:/home/benc/public_html/tmp/061-cattwo-20080508-1123-xxmza6ad.log -- From bugzilla-daemon at mcs.anl.gov Thu May 8 09:10:41 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 8 May 2008 09:10:41 -0500 (CDT) Subject: [Swift-devel] [Bug 140] New: GRAM4 execution with incorrect local host address detection fails with hang rather than error. Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=140 Summary: GRAM4 execution with incorrect local host address detection fails with hang rather than error. Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: benc at hawaga.org.uk When a GRAM4 job submission is made requesting that status notifications go to the wrong place (eg wrong IP address), this fails as a hang rather than with an error. This manifests in common OS configuration of hostname mapping to 127.0.0.1 in /etc/hosts A solution is to remove 127.0.0.1 The gram commandline client does not suffer in this situation. If GLOBUS_HOSTNAME is set to a hostname, name->ip resolution of this appears to happen on the local system rather than the value being used to refer to self directly. This seems to be not in line with the use of GLOBUS_HOSTNAME that I am used to (as in, give up trying to figure out how others can refer to you - I am telling you in this variable). -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Thu May 8 09:16:24 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 8 May 2008 09:16:24 -0500 (CDT) Subject: [Swift-devel] [Bug 140] assorted GRAM4 own-hostname problems In-Reply-To: Message-ID: <20080508141624.24055164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=140 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|GRAM4 execution with |assorted GRAM4 own-hostname |incorrect local host address|problems |detection fails with hang | |rather than error. | -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From benc at hawaga.org.uk Thu May 8 10:10:26 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 8 May 2008 15:10:26 +0000 (GMT) Subject: [Swift-devel] Cannot parse the given RSL In-Reply-To: <1210210092.27373.8.camel@localhost> References: <1210210092.27373.8.camel@localhost> Message-ID: cog r1998 swift r1852 does not have that problem. cog r1999 swift 11871 does have the problem. In between those lie a sequence of related commits that I have not dug further into. -- From benc at hawaga.org.uk Thu May 8 12:22:08 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 8 May 2008 17:22:08 +0000 (GMT) Subject: [Swift-devel] frequent/on-commit builds Message-ID: I've increased the frequency of on-commit builds at the NMI build and test system to every 20 minutes - that means every 20 minutes, if there has been a commit in the past 20 minute period, the test will be launched; so Swift may be built and tested in local mode up to three times per hour when there is a lot of swift or cog commit activity. These tests are what you get when you run tests/run. I've also configured one of my own machines to run the site tests based on the same commit-in-last-20-minute trigger. That should increase the coverage of automatic testing to deal with gram2 and gram4 to a variety of real sites (and will also facilitate testing cog coasters). These tests are what you get when you run tests/sites/run-all. This all generates a bunch of email with results that goes to me. If anyone else is interested in getting a copy of it all, let me know. -- From benc at hawaga.org.uk Mon May 12 04:21:50 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 12 May 2008 09:21:50 +0000 (GMT) Subject: [Swift-devel] provider-condor In-Reply-To: References: Message-ID: When I added an extra layer of exception wrapping to poke at the GRAM2/Cannot parse the user defined attributes problem, I see a similar ClassCastException / futurevariablearguments error... mmm temporal seepage (the trace I see with gram2 is listed at the end of this message after the quote) > I spent a few minutes seeing if provider-condor would work with swift > out-of-the-box. It doesn't. No real high priority need, but I thought it > might be interesting to know if it works or not. > > Caused by: > org.globus.cog.karajan.workflow.futures.FutureVariableArguments > Caused by: > java.lang.ClassCastException: > org.globus.cog.karajan.workflow.futures.FutureVariableArguments > at > org.globus.cog.abstraction.impl.execution.condor.DescriptionFileGenerator.constructDescriptionFile(DescriptionFileGenerator.java:90) > at > org.globus.cog.abstraction.impl.execution.condor.DescriptionFileGenerator.generate(DescriptionFileGenerator.java:26) > at > org.globus.cog.abstraction.impl.execution.condor.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:96) > at java.lang.Thread.run(Thread.java:595) Swift svn swift-r1950 (Swift modified locally) cog-r2003 (CoG modified locally) RunID: 20080512-1017-yrz2lf16 Progress: cat started Progress: Stage in:1 cat failed Execution failed: Cannot parse the given RSL Caused by: Cannot parse the user defined attributes Caused by: java.lang.ClassCastException: org.globus.cog.karajan.workflow.futures.FutureVariableArguments at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.prepareSpecification(JobSubmissionTaskHandler.java:424) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:90) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:86) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:613) -- From hategan at mcs.anl.gov Mon May 12 11:51:21 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 12 May 2008 11:51:21 -0500 Subject: [Swift-devel] Cannot parse the given RSL In-Reply-To: References: <1210210092.27373.8.camel@localhost> Message-ID: <1210611081.3860.0.camel@localhost> Should be fixed in cog r2005. On Thu, 2008-05-08 at 15:10 +0000, Ben Clifford wrote: > cog r1998 swift r1852 does not have that problem. > > cog r1999 swift 11871 does have the problem. > > In between those lie a sequence of related commits that I have not dug > further into. > From benc at hawaga.org.uk Mon May 12 12:24:51 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 12 May 2008 17:24:51 +0000 (GMT) Subject: [Swift-devel] Cannot parse the given RSL In-Reply-To: <1210611081.3860.0.camel@localhost> References: <1210210092.27373.8.camel@localhost> <1210611081.3860.0.camel@localhost> Message-ID: On Mon, 12 May 2008, Mihael Hategan wrote: > Should be fixed in cog r2005. A few tests from the sites test suite have run and those seem to be succeeding now where they were previously failing. So this looks good so far. -- From bugzilla-daemon at mcs.anl.gov Mon May 12 14:22:11 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 12 May 2008 14:22:11 -0500 (CDT) Subject: [Swift-devel] [Bug 136] CLASSPATH construction order In-Reply-To: Message-ID: <20080512192211.344C6164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=136 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from hategan at mcs.anl.gov 2008-05-12 14:22 ------- Order changed in cog r2007. We should probably file cog bugs in the cog bugzilla. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Mon May 12 15:11:04 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 12 May 2008 15:11:04 -0500 (CDT) Subject: [Swift-devel] [Bug 140] assorted GRAM4 own-hostname problems In-Reply-To: Message-ID: <20080512201104.C7DB4164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=140 ------- Comment #1 from hategan at mcs.anl.gov 2008-05-12 15:11 ------- So it boils down to whether the client does or does not do a DNS resolution for GLOBUS_HOSTNAME before using that as a callback address. I think that's a fair point, though it may take a while to get fixed. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Mon May 12 18:22:31 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 12 May 2008 18:22:31 -0500 (CDT) Subject: [Swift-devel] [Bug 140] assorted GRAM4 own-hostname problems In-Reply-To: Message-ID: <20080512232231.B008C164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=140 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from hategan at mcs.anl.gov 2008-05-12 18:22 ------- I fixed a bunch of things. One is that the swift ip setting has been deprecated in favor of "hostname", which can also be used to specify a numeric ip. Second one is the gt2 provider and gass servers, even though that's not used by swift. And the third would be different settings in the ws-gram client, and that is to disable DNS resolutions. It seems like a general setting, which may have unintended side effects, so testing should be done. For me it seems to work so far. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Mon May 12 20:09:08 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 12 May 2008 20:09:08 -0500 (CDT) Subject: [Swift-devel] [Bug 138] spaces in filenames sometimes don't work right with GRAM2+Condor In-Reply-To: Message-ID: <20080513010908.73107164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=138 ------- Comment #6 from hategan at mcs.anl.gov 2008-05-12 20:09 ------- Doesn't look like a backspace before a space does much. Funny. That's properly quoted/escaped. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From benc at hawaga.org.uk Tue May 13 08:13:02 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 13 May 2008 13:13:02 +0000 (GMT) Subject: [Swift-devel] provider-condor In-Reply-To: References: Message-ID: On Tue, 6 May 2008, Ben Clifford wrote: > I spent a few minutes seeing if provider-condor would work with swift > out-of-the-box. It doesn't. But for the most part, it does now. The previous problem turned out to not be directly related to the condor provider and has been fixed elsewhere in the stack. The provider gets confused by quotes and spaces in arguments, but runs other site tests OK on fletch.bsd.uchicago.edu. This probably can be useful in two situations: i) with cog coasters, where it will provide a way to submit per-node workers without having to go back through GRAM, on clusters which use Condor as their LRM (eg lots of the Open Science Grid) ii) running on a login node of a condor-based cluster without using GRAM (so very much the same situations as provider-pbs is/could be used) I've added a build option -Dwith-provider-condor=true to enable compiling of provider-condor (though it probably could go in by default) (swift svn r1960) -- From iraicu at cs.uchicago.edu Tue May 13 19:45:18 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 13 May 2008 19:45:18 -0500 Subject: [Swift-devel] Falkon and Swift talk at GlobusWorld08 Message-ID: <482A361E.60101@cs.uchicago.edu> Hi all, In case any of you are at GlobusWorld08 in Oakland California this week, I just wanted to point out that I will be giving a short talk tomorrow on Swift and Falkon. My slides are at http://people.cs.uchicago.edu/~iraicu/presentations/2008_Falkon_Swift_GlobusWorld08_5-14-08.pdf. If any of you are at the conference, it would be great to have you in the audience! Cheers, Ioan -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From benc at hawaga.org.uk Wed May 14 06:40:17 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 14 May 2008 11:40:17 +0000 (GMT) Subject: [Swift-devel] coaster / Bad file descriptor Message-ID: Here's an error I haven't seen before. This is trying to run coasters with tests/sites/coaster/tgncsa-hg-coaster-pbs-gram2.xml The first bit of the outputting trace is here: Execution failed: Cannot submit job: Bad file descriptor org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Cannot submit job: Bad file descriptor at org.globus.cog.abstraction.impl.scheduler.pbs.execution.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:75) I put the log file at http://www.ci.uchicago.edu/~benc/tmp/bad-fd-061-cattwo-20080514-1231-7uzfzn8b.log This is using gt2:pbs (so should be using PBS local submission on the head node). I get the same error three times in a row (this site definition has never worked, for one reason or another). This same build can run the site tests ok to fletch using gt2:gt2, tests/sites/coaster/fletch-coaster-condor-gram2.xml. -- From benc at hawaga.org.uk Wed May 14 06:45:10 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 14 May 2008 11:45:10 +0000 (GMT) Subject: [Swift-devel] Re: coaster / Bad file descriptor In-Reply-To: References: Message-ID: Using gt2:gt2:pbs continues to fail as previously described with the profile key for project ID not being passed through; so I could imagine that being a similar problem for direct PBS submission. -- From iraicu at cs.uchicago.edu Thu May 15 12:39:02 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 15 May 2008 12:39:02 -0500 Subject: [Swift-devel] [Fwd: [gateways] [Fwd: IEEE eScience 2008 Call for Participation]] Message-ID: <482C7536.9010002@cs.uchicago.edu> FYI: Here is a eScience local conference coming up. Ioan -------- Original Message -------- Subject: [gateways] [Fwd: IEEE eScience 2008 Call for Participation] Date: Tue, 06 May 2008 10:09:47 -0400 From: Marlon Pierce Reply-To: Marlon Pierce To: gateways at teragrid.org Please feel free to redistribute. -------- Original Message -------- Subject: IEEE eScience 2008 Call for Participation Date: Tue, 6 May 2008 10:00:36 -0400 From: Siefert-Herron, Daphne Marie To: allptl-l at indiana.edu Organizing committees of the 4th International IEEE Computer Society Technical Committee on Scalable Computing eScience 2008 Conference are now accepting papers and proposals for tutorials; posters, exhibits, and demos; and workshops and special sessions. The conference is being held in partnership with the Microsoft Research eScience Workshop and is hosted by Indiana University. Conference Dates: December 7-12, 2008 Conference Location: University Place Conference Center, Indiana University/Purdue University (IUPUI) Campus, Indianapolis, Indiana Submission Deadlines: * Papers & Tutorials: July 20, 2008 * Posters, Exhibits, and Demos: September 14, 2008 * Workshops and Special Sessions: June 20, 2008 For submission guidelines and more information visit the conference Web site: http://escience2008.iu.edu . Topics of interest cover applications and technologies related to e-Science and grid and cloud computing. They include, but are not limited to, the following: * Application development environments * Autonomic, real-time, and self-organizing grids * Cloud computing and storage * Collaborative science models and techniques * Enabling technologies: Internet and Web services * e-Science for applications including physics, biology, astronomy, chemistry, finance, engineering, and the humanities * Grid economy and business models * Problem-solving environments * Programming paradigms and models * Resource management and scheduling * Security challenges for grids and e-Science * Sensor networks and environmental observatories * Service-oriented grid architectures * Virtual instruments and data access management * Virtualization for technical computing * Web 2.0 technology and services for e-Science Sponsors Include: * IEEE Computer Society Committee on Scalable Computing * Microsoft Research * Pervasive Technology Labs at Indiana University * Indiana University School of Informatics * Louisiana State University Center for Computation and Technology Conference Leadership: General Chairs Geoffrey Fox, Indiana University, United States Dennis Gannon, Indiana University, United States Program Chair Anne Trefethen, University of Oxford, United Kingdom Program Vice-Chair David Wallom, University of Oxford, United Kingdom Workshops Chair Ken Chiu, State University of New York, United States Tutorials Chair Krishna Madhavan, Clemson University, United States Exhibits, Demos, and Posters Chair Daniel S. Katz, Louisiana State University, United States Exhibits, Demos, and Posters Vice-Chair Shantenu Jha, Louisiana State University, United States Education, Diversity, and Broadening Participation Chair Alex Ramirez, Hispanic Association of Colleges and Universities Communication and Outreach Chair Daphne Siefert-Herron, Indiana University, United States Microsoft e-Science Conference Chair Kristin Tolle, Microsoft, United States Conference Manager Therese Miller, Indiana University, United States -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From hategan at mcs.anl.gov Thu May 15 13:29:11 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 15 May 2008 13:29:11 -0500 Subject: [Swift-devel] Re: coaster / Bad file descriptor In-Reply-To: References: Message-ID: <1210876151.1533.8.camel@localhost> Well, my DN mapping on that machine got changed to "ginuser", which is something that happened on UC, too, but got fixed by Ti. So I need to send mail to support. On Wed, 2008-05-14 at 11:45 +0000, Ben Clifford wrote: > Using gt2:gt2:pbs continues to fail as previously described with the > profile key for project ID not being passed through; so I could imagine > that being a similar problem for direct PBS submission. From bugzilla-daemon at mcs.anl.gov Thu May 15 16:16:27 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 15 May 2008 16:16:27 -0500 (CDT) Subject: [Swift-devel] [Bug 141] New: stdin, stdout, stderrs should be able to take mapped files Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=141 Summary: stdin, stdout, stderrs should be able to take mapped files Product: Swift Version: unspecified Platform: All OS/Version: Mac OS Status: NEW Severity: enhancement Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk App blocks take filenames like this: stdin=@filename(f) but in practice those filenames always seem to come from return parameters; so it would be nice to be able to specify: stdin=f with filename extraction happening automatically. Because string and mapped types tend to be non-overlapping, this can be implemented in a backwards compatible fashion. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Thu May 15 16:19:58 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 15 May 2008 16:19:58 -0500 (CDT) Subject: [Swift-devel] [Bug 141] stdin, stdout, stderrs should be able to take mapped files In-Reply-To: Message-ID: <20080515211958.81D14164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=141 ------- Comment #1 from hategan at mcs.anl.gov 2008-05-15 16:19 ------- I think we already have stdin=@f -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From benc at hawaga.org.uk Sun May 18 12:45:36 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 18 May 2008 17:45:36 +0000 (GMT) Subject: [Swift-devel] gram4 detecting own host identity Message-ID: (I just sent this by accident to the secret SWFT list - it was intended for this list, swift-devel) A couple (or maybe three) times in the past few days I've seen the GRAM4 submission be extremely bad at detecting its own identity to use in WS-Notification subscriptions - it seems to end up using 'localhost' rather than the global-scope hostname... Here is one from andrew.bsd.uchicago.edu, for example (it also happens on wiggum and on my laptop): https://localhost:50000/ wsrf/services/NotificationConsumerService [...] These machines, as far as any sane unix person would think, are correctly identifying themselves with the hostname command, have proper forward and reverse DNS set up, and the like; pretty much everything else can figure out the local identity correctly; so the workaround 'set GLOBUS_HOSTNAME to your hostname' seems unnecessarily complicated in these cases. -- From foster at mcs.anl.gov Sun May 18 15:01:09 2008 From: foster at mcs.anl.gov (Ian Foster) Date: Sun, 18 May 2008 15:01:09 -0500 Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: References: Message-ID: <792312DA-86DC-434D-A082-43985BB38C37@mcs.anl.gov> Ben: I'd recommend talking to your colleagues in the GRAM team--Stu Martin and Martin Feller in particular. Ian. On May 18, 2008, at 12:45 PM, Ben Clifford wrote: > > (I just sent this by accident to the secret SWFT list - it was > intended > for this list, swift-devel) > > A couple (or maybe three) times in the past few days I've seen the > GRAM4 > submission be extremely bad at detecting its own identity to use in > WS-Notification subscriptions - it seems to end up using 'localhost' > rather than the global-scope hostname... > > Here is one from andrew.bsd.uchicago.edu, for example (it also > happens on > wiggum and on my laptop): > > https://localhost:50000/ > wsrf/services/NotificationConsumerService > > [...] > > These machines, as far as any sane unix person would think, are > correctly > identifying themselves with the hostname command, have proper > forward and > reverse DNS set up, and the like; pretty much everything else can > figure > out the local identity correctly; so the workaround 'set > GLOBUS_HOSTNAME > to your hostname' seems unnecessarily complicated in these cases. > > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From feller at mcs.anl.gov Sun May 18 16:47:01 2008 From: feller at mcs.anl.gov (Martin Feller) Date: Sun, 18 May 2008 16:47:01 -0500 (CDT) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: <26047202.2601211146986265.JavaMail.root@zimbra> Message-ID: <23102591.2651211147221077.JavaMail.root@zimbra> Ben: I assume you mean the NotificationConsumer in the context of a subscription for job status notifications, which is created on the client-side and used in a job submission to Gram4 to create a subscription resource, right? How is the NotificationConsumer with the defective address created on the client-side: using Gram4's GramJob, or by other means? I think in any event it goes down to Java WS Core, since it's there where the address of the NotificationConsumer is generated. (NotificationConsumerManager.createNotificationConsumer() -> ClientNotificationConsumerManager.getURL() -> ServiceContainer.getURL() and finally the host seems to come from ServiceHost.getHost()) Gram4 does not set addresses in endpoints of NotificationConsumers itself. Maybe Rachana can shed some light on this, or correct me if I'm wrong. Does setting GLOBUS_HOSTNAME solve the issue? I know, you mentioned that you think this complicates the matter, but for debugging ... Is this 4.0 or upcoming 4.2 specific, or do you experience that problem for both? Martin ----- Original Message ----- From: "Ian Foster" To: "Ben Clifford" Cc: swift-devel at ci.uchicago.edu, "Stuart Martin" , "Martin Feller" Sent: Sunday, May 18, 2008 3:01:09 PM GMT -06:00 US/Canada Central Subject: Re: [Swift-devel] gram4 detecting own host identity Ben: I'd recommend talking to your colleagues in the GRAM team--Stu Martin and Martin Feller in particular. Ian. On May 18, 2008, at 12:45 PM, Ben Clifford wrote: > > (I just sent this by accident to the secret SWFT list - it was > intended > for this list, swift-devel) > > A couple (or maybe three) times in the past few days I've seen the > GRAM4 > submission be extremely bad at detecting its own identity to use in > WS-Notification subscriptions - it seems to end up using 'localhost' > rather than the global-scope hostname... > > Here is one from andrew.bsd.uchicago.edu, for example (it also > happens on > wiggum and on my laptop): > > https://localhost:50000/ > wsrf/services/NotificationConsumerService > > [...] > > These machines, as far as any sane unix person would think, are > correctly > identifying themselves with the hostname command, have proper > forward and > reverse DNS set up, and the like; pretty much everything else can > figure > out the local identity correctly; so the workaround 'set > GLOBUS_HOSTNAME > to your hostname' seems unnecessarily complicated in these cases. > > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun May 18 20:15:01 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 18 May 2008 20:15:01 -0500 Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: <23102591.2651211147221077.JavaMail.root@zimbra> References: <23102591.2651211147221077.JavaMail.root@zimbra> Message-ID: <1211159701.24962.2.camel@localhost> On Sun, 2008-05-18 at 16:47 -0500, Martin Feller wrote: > Ben: > > I assume you mean the NotificationConsumer in the context of a subscription for job status > notifications, which is created on the client-side and used in a job submission to Gram4 to > create a subscription resource, right? > How is the NotificationConsumer with the defective address created on the client-side: > using Gram4's GramJob, or by other means? > > I think in any event it goes down to Java WS Core, since it's there where the address > of the NotificationConsumer is generated. > (NotificationConsumerManager.createNotificationConsumer() -> > ClientNotificationConsumerManager.getURL() -> ServiceContainer.getURL() > and finally the host seems to come from ServiceHost.getHost()) > Gram4 does not set addresses in endpoints of NotificationConsumers itself. Eventually the thing ends in the jglobus CoGProperties.getHostname() or CoGProperties.getIP(). It's somewhat of a known problem. Java is poor at figuring such kind of information. Ben, can you do an ifconfig -a (or ask Andrew) on that machin? > > Maybe Rachana can shed some light on this, or correct me if I'm wrong. > > Does setting GLOBUS_HOSTNAME solve the issue? I know, you mentioned that you think > this complicates the matter, but for debugging ... > > Is this 4.0 or upcoming 4.2 specific, or do you experience that problem for both? > > Martin > > > ----- Original Message ----- > From: "Ian Foster" > To: "Ben Clifford" > Cc: swift-devel at ci.uchicago.edu, "Stuart Martin" , "Martin Feller" > Sent: Sunday, May 18, 2008 3:01:09 PM GMT -06:00 US/Canada Central > Subject: Re: [Swift-devel] gram4 detecting own host identity > > Ben: > > I'd recommend talking to your colleagues in the GRAM team--Stu Martin > and Martin Feller in particular. > > Ian. > > On May 18, 2008, at 12:45 PM, Ben Clifford wrote: > > > > > (I just sent this by accident to the secret SWFT list - it was > > intended > > for this list, swift-devel) > > > > A couple (or maybe three) times in the past few days I've seen the > > GRAM4 > > submission be extremely bad at detecting its own identity to use in > > WS-Notification subscriptions - it seems to end up using 'localhost' > > rather than the global-scope hostname... > > > > Here is one from andrew.bsd.uchicago.edu, for example (it also > > happens on > > wiggum and on my laptop): > > > > https://localhost:50000/ > > wsrf/services/NotificationConsumerService > > > > [...] > > > > These machines, as far as any sane unix person would think, are > > correctly > > identifying themselves with the hostname command, have proper > > forward and > > reverse DNS set up, and the like; pretty much everything else can > > figure > > out the local identity correctly; so the workaround 'set > > GLOBUS_HOSTNAME > > to your hostname' seems unnecessarily complicated in these cases. > > > > -- > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From feller at mcs.anl.gov Sun May 18 20:59:04 2008 From: feller at mcs.anl.gov (Martin Feller) Date: Sun, 18 May 2008 20:59:04 -0500 (CDT) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: <1211159701.24962.2.camel@localhost> Message-ID: <32862213.9911211162344581.JavaMail.root@zimbra> Maybe also interesting: from Bens description it seems that this just happens sometimes and not always. I don't know about the details of hostname settings and updates of hostnames and stuff in Linux well enough, but this sounds a bit odd. ----- Original Message ----- From: "Mihael Hategan" To: "Martin Feller" Cc: "Ben Clifford" , "Rachana Ananthakrishnan" , swift-devel at ci.uchicago.edu Sent: Sunday, May 18, 2008 8:15:01 PM GMT -06:00 US/Canada Central Subject: Re: [Swift-devel] gram4 detecting own host identity On Sun, 2008-05-18 at 16:47 -0500, Martin Feller wrote: > Ben: > > I assume you mean the NotificationConsumer in the context of a subscription for job status > notifications, which is created on the client-side and used in a job submission to Gram4 to > create a subscription resource, right? > How is the NotificationConsumer with the defective address created on the client-side: > using Gram4's GramJob, or by other means? > > I think in any event it goes down to Java WS Core, since it's there where the address > of the NotificationConsumer is generated. > (NotificationConsumerManager.createNotificationConsumer() -> > ClientNotificationConsumerManager.getURL() -> ServiceContainer.getURL() > and finally the host seems to come from ServiceHost.getHost()) > Gram4 does not set addresses in endpoints of NotificationConsumers itself. Eventually the thing ends in the jglobus CoGProperties.getHostname() or CoGProperties.getIP(). It's somewhat of a known problem. Java is poor at figuring such kind of information. Ben, can you do an ifconfig -a (or ask Andrew) on that machin? > > Maybe Rachana can shed some light on this, or correct me if I'm wrong. > > Does setting GLOBUS_HOSTNAME solve the issue? I know, you mentioned that you think > this complicates the matter, but for debugging ... > > Is this 4.0 or upcoming 4.2 specific, or do you experience that problem for both? > > Martin > > > ----- Original Message ----- > From: "Ian Foster" > To: "Ben Clifford" > Cc: swift-devel at ci.uchicago.edu, "Stuart Martin" , "Martin Feller" > Sent: Sunday, May 18, 2008 3:01:09 PM GMT -06:00 US/Canada Central > Subject: Re: [Swift-devel] gram4 detecting own host identity > > Ben: > > I'd recommend talking to your colleagues in the GRAM team--Stu Martin > and Martin Feller in particular. > > Ian. > > On May 18, 2008, at 12:45 PM, Ben Clifford wrote: > > > > > (I just sent this by accident to the secret SWFT list - it was > > intended > > for this list, swift-devel) > > > > A couple (or maybe three) times in the past few days I've seen the > > GRAM4 > > submission be extremely bad at detecting its own identity to use in > > WS-Notification subscriptions - it seems to end up using 'localhost' > > rather than the global-scope hostname... > > > > Here is one from andrew.bsd.uchicago.edu, for example (it also > > happens on > > wiggum and on my laptop): > > > > https://localhost:50000/ > > wsrf/services/NotificationConsumerService > > > > [...] > > > > These machines, as far as any sane unix person would think, are > > correctly > > identifying themselves with the hostname command, have proper > > forward and > > reverse DNS set up, and the like; pretty much everything else can > > figure > > out the local identity correctly; so the workaround 'set > > GLOBUS_HOSTNAME > > to your hostname' seems unnecessarily complicated in these cases. > > > > -- > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Mon May 19 04:20:53 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 19 May 2008 09:20:53 +0000 (GMT) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: <1211159701.24962.2.camel@localhost> References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> Message-ID: maybe we should put if $GLOBUS_HOSTNAME == "" and $HOSTNAME != "" then GLOBUS_HOSTNAME=HOSTNAME in the startup script. In the four machines I just looked at, that gives the correct globally unique name. -- From benc at soju.hawaga.org.uk Sun May 18 16:48:38 2008 From: benc at soju.hawaga.org.uk (Ben Clifford) Date: Sun, 18 May 2008 22:48:38 +0100 (BST) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: <23102591.2651211147221077.JavaMail.root@zimbra> References: <23102591.2651211147221077.JavaMail.root@zimbra> Message-ID: On Sun, 18 May 2008, Martin Feller wrote: > Does setting GLOBUS_HOSTNAME solve the issue? It does. > Is this 4.0 or upcoming 4.2 specific, or do you experience that problem for both? 4.0 (4.2 support is a-ways off) -- From benc at hawaga.org.uk Mon May 19 07:16:14 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 19 May 2008 12:16:14 +0000 (GMT) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> Message-ID: On Mon, 19 May 2008, Ben Clifford wrote: [...] > then GLOBUS_HOSTNAME=HOSTNAME [...] > In the four machines I just looked at, that gives the correct globally > unique name. I found another that it is not set correctly on. However, if this is a better way to do it on average than Java can do in real life deployments, then it is probably what should happen. -- From benc at hawaga.org.uk Mon May 19 08:18:55 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 19 May 2008 13:18:55 +0000 (GMT) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> Message-ID: On Mon, 19 May 2008, Ben Clifford wrote: > > then GLOBUS_HOSTNAME=HOSTNAME > However, if this is a better way to do it on average than Java can do in > real life deployments, then it is probably what should happen. This happens as of swift r1981, in the script wrapper that wraps the JVM invocation. -- From bugzilla-daemon at mcs.anl.gov Mon May 19 10:06:11 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 19 May 2008 10:06:11 -0500 (CDT) Subject: [Swift-devel] [Bug 107] restarts broken (by generalisation of data file handling) In-Reply-To: Message-ID: <20080519150611.86B82164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=107 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |skenny at uchicago.edu, swift- | |devel at ci.uchicago.edu ------- Comment #4 from benc at hawaga.org.uk 2008-05-19 10:05 ------- r1982 introduces another restart test, tests/misc/restart2.sh This test does not pass, so I have not put it in the automatic runs. It can be run by calling ./restart2.sh from the tests/misc/ directory. Superficially what appears to cause a problem is when a file is mapped inside a procedure; so then multiple data sets have the same variable name (one per invocation of that procedure); and then on restart, the fact that any invocation of that procedure has completed and successfully produced output for that variable means that all invocations of that procedure are treated as successfully completed. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From hategan at mcs.anl.gov Mon May 19 14:59:50 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 19 May 2008 14:59:50 -0500 Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> Message-ID: <1211227190.1114.6.camel@localhost> On Mon, 2008-05-19 at 13:18 +0000, Ben Clifford wrote: > On Mon, 19 May 2008, Ben Clifford wrote: > > > > then GLOBUS_HOSTNAME=HOSTNAME > > > However, if this is a better way to do it on average than Java can do in > > real life deployments, then it is probably what should happen. > > This happens as of swift r1981, in the script wrapper that wraps the JVM > invocation. I reverted that because it was changing GLOBUS_HOSTNAME after the OPTIONS were updated with GLOBUS_HOSTNAME, so it wasn't doing much. JGlobus looks at two properties when trying to figure the hostname out: "GLOBUS_HOSTNAME" first, and then "hostname". So I made the cog script templates do the hostname thing (cog r2022). > From bugzilla-daemon at mcs.anl.gov Mon May 19 15:23:44 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 19 May 2008 15:23:44 -0500 (CDT) Subject: [Swift-devel] [Bug 107] restarts broken (by generalisation of data file handling) In-Reply-To: Message-ID: <20080519202344.5C263164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=107 ------- Comment #5 from hategan at mcs.anl.gov 2008-05-19 15:23 ------- In r1985 I switched back from logging based on variable names to logging based on files. This fixes the latest problem, but will not recognize as done variables mapped by the concurrent mapper. I think it is better this way. In order to solve the problem in a way that satisfies all scenarios, I need to think some more, but in principle a way to "address" variables in swift based on both lexical location and run-time coordinates is necessary. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Mon May 19 16:45:38 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 19 May 2008 21:45:38 +0000 (GMT) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: <1211227190.1114.6.camel@localhost> References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> <1211227190.1114.6.camel@localhost> Message-ID: On Mon, 19 May 2008, Mihael Hategan wrote: > I reverted that because it was changing GLOBUS_HOSTNAME after the > OPTIONS were updated with GLOBUS_HOSTNAME, so it wasn't doing much. Ooops. I tested it before committing, but I think I forgot what it was that I was trying to implement when doing the testing, and managed to successfully test that explicitly setting GLOBUS_HOSTNAME still work. -- From benc at hawaga.org.uk Mon May 19 16:58:16 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 19 May 2008 21:58:16 +0000 (GMT) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: <1211227190.1114.6.camel@localhost> References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> <1211227190.1114.6.camel@localhost> Message-ID: On Mon, 19 May 2008, Mihael Hategan wrote: > JGlobus looks at two properties when trying to figure the hostname out: > "GLOBUS_HOSTNAME" first, and then "hostname". So I made the cog script > templates do the hostname thing (cog r2022). I see the: updateOptions "$HOSTNAME" "hostname" but it appears to be doing about as much as my earlier commit, in as much as I still see: https://localhost:59264/ -- From hategan at mcs.anl.gov Mon May 19 17:00:20 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 19 May 2008 17:00:20 -0500 Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> <1211227190.1114.6.camel@localhost> Message-ID: <1211234420.23391.0.camel@localhost> can you dump the java command line and send it? On Mon, 2008-05-19 at 21:58 +0000, Ben Clifford wrote: > On Mon, 19 May 2008, Mihael Hategan wrote: > > > JGlobus looks at two properties when trying to figure the hostname out: > > "GLOBUS_HOSTNAME" first, and then "hostname". So I made the cog script > > templates do the hostname thing (cog r2022). > > I see the: > > updateOptions "$HOSTNAME" "hostname" > > but it appears to be doing about as much as my earlier commit, in as much > as I still see: > > https://localhost:59264/ > From benc at hawaga.org.uk Mon May 19 17:02:28 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 19 May 2008 22:02:28 +0000 (GMT) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: <1211234420.23391.0.camel@localhost> References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> <1211227190.1114.6.camel@localhost> <1211234420.23391.0.camel@localhost> Message-ID: On Mon, 19 May 2008, Mihael Hategan wrote: > can you dump the java command line and send it? java -Djava.endorsed.dirs=/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/endorsed -DUID=501 -Dhostname=soju.hawaga.org.uk -DCOG_INSTALL_PATH=/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/.. -Dvds.home=/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/.. -Dswift.home=/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/.. -Djava.security.egd=file:///dev/urandom -classpath /Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../etc:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../libexec:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/addressing-1.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/ant.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/antlr-2.7.5.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/axis-url.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/axis.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/backport-util-concurrent.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/castor-0.9.6.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-abstraction-common-2.2.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-axis.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-grapheditor-0.47.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-jglobus-dev-080222.jar:/Users/benc! /work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-karajan-0.36-dev.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-clref-gt4_0_0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-condor-2.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-dcache-0.1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-gt2-2.3.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-gt4_0_0-2.4.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-local-2.1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-localscheduler-0.2.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-ssh-2.3.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-webdav-2.1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-resources-1.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-sv! n/bin/../lib/cog-trap-1.0.jar:/Users/benc/work/cog/modules/vds! k/dist/v dsk-svn/bin/../lib/cog-url.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-util-0.92.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-vdsk-svn.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commonj.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-beanutils.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-collections-3.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-digester.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-discovery.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-httpclient.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-logging-1.1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/concurrent.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cryptix-asn1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cryptix.jar:/Users/benc/work/co! g/modules/vdsk/dist/vdsk-svn/bin/../lib/cryptix32.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_delegation_service.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_delegation_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_wsrf_mds_aggregator_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_wsrf_rendezvous_service.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_wsrf_rendezvous_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_wsrf_rft_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/gram-client.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/gram-stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/gram-utils.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/gvds.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/j2ssh-common-0.2.2.jar:/Users/benc/work/c! og/modules/vdsk/dist/vdsk-svn/bin/../lib/j2ssh-core-0.2.2-patc! hed.jar: /Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jakarta-regexp-1.2.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jakarta-slide-webdavlib-2.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jaxrpc.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jce-jdk13-131.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jgss.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jsr173_1.0_api.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jug-lgpl-2.0.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/junit.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/log4j-1.2.8.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/naming-common.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/naming-factory.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/naming-java.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/naming-resou! rces.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/opensaml.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/puretls.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/resolver.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/saaj.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/stringtemplate.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/vdldefinitions.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsdl4j.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_core.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_core_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_mds_index_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_mds_usefulrp_schema_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_provider_jce.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/! wsrf_tools.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn! /bin/../ lib/wss4j.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xalan.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xbean.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xbean_xpath.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xercesImpl.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xml-apis.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xmlsec.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xpp3-1.1.3.4d_b4_min.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xstream-1.1.1-patched.jar: org.griphyn.vdl.karajan.Loader '-sites.file' '../sites/tguc-fork-gram4.xml' '-tc.file' 'tmp.tc.data.sites' '061-cattwo.swift' -- From hategan at mcs.anl.gov Mon May 19 17:30:34 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 19 May 2008 17:30:34 -0500 Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> <1211227190.1114.6.camel@localhost> <1211234420.23391.0.camel@localhost> Message-ID: <1211236234.23531.5.camel@localhost> My bad. JGlobus looks at it's own hostname property, not the system property. Fixed in cog r2023. Unfortunately, jglobus also resolves it before building the gass server url. One possible "solution" there is to set the org.globus.ip system property with the hostname. That does not get resolved, but it's hackish. On Mon, 2008-05-19 at 22:02 +0000, Ben Clifford wrote: > > On Mon, 19 May 2008, Mihael Hategan wrote: > > > can you dump the java command line and send it? > > java > -Djava.endorsed.dirs=/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/endorsed > -DUID=501 -Dhostname=soju.hawaga.org.uk > -DCOG_INSTALL_PATH=/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/.. > -Dvds.home=/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/.. > -Dswift.home=/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/.. > -Djava.security.egd=file:///dev/urandom -classpath > /Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../etc:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../libexec:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/addressing-1.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/ant.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/antlr-2.7.5.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/axis-url.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/axis.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/backport-util-concurrent.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/castor-0.9.6.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-abstraction-common-2.2.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-axis.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-grapheditor-0.47.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-jglobus-dev-080222.jar:/Users/ben c! > /work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-karajan-0.36-dev.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-clref-gt4_0_0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-condor-2.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-dcache-0.1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-gt2-2.3.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-gt4_0_0-2.4.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-local-2.1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-localscheduler-0.2.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-ssh-2.3.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-provider-webdav-2.1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-resources-1.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-s v! > n/bin/../lib/cog-trap-1.0.jar:/Users/benc/work/cog/modules/vds! > k/dist/v > dsk-svn/bin/../lib/cog-url.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-util-0.92.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cog-vdsk-svn.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commonj.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-beanutils.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-collections-3.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-digester.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-discovery.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-httpclient.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/commons-logging-1.1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/concurrent.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cryptix-asn1.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/cryptix.jar:/Users/benc/work/c o! > g/modules/vdsk/dist/vdsk-svn/bin/../lib/cryptix32.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_delegation_service.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_delegation_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_wsrf_mds_aggregator_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_wsrf_rendezvous_service.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_wsrf_rendezvous_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/globus_wsrf_rft_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/gram-client.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/gram-stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/gram-utils.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/gvds.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/j2ssh-common-0.2.2.jar:/Users/benc/work/ c! > og/modules/vdsk/dist/vdsk-svn/bin/../lib/j2ssh-core-0.2.2-patc! > hed.jar: > /Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jakarta-regexp-1.2.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jakarta-slide-webdavlib-2.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jaxrpc.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jce-jdk13-131.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jgss.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jsr173_1.0_api.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/jug-lgpl-2.0.0.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/junit.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/log4j-1.2.8.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/naming-common.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/naming-factory.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/naming-java.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/naming-reso u! > rces.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/opensaml.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/puretls.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/resolver.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/saaj.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/stringtemplate.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/vdldefinitions.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsdl4j.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_core.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_core_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_mds_index_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_mds_usefulrp_schema_stubs.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/wsrf_provider_jce.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib /! > wsrf_tools.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn! > /bin/../ > lib/wss4j.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xalan.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xbean.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xbean_xpath.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xercesImpl.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xml-apis.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xmlsec.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xpp3-1.1.3.4d_b4_min.jar:/Users/benc/work/cog/modules/vdsk/dist/vdsk-svn/bin/../lib/xstream-1.1.1-patched.jar: > org.griphyn.vdl.karajan.Loader '-sites.file' > '../sites/tguc-fork-gram4.xml' '-tc.file' 'tmp.tc.data.sites' > '061-cattwo.swift' > From benc at hawaga.org.uk Mon May 19 17:33:23 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 19 May 2008 22:33:23 +0000 (GMT) Subject: [Swift-devel] gram4 detecting own host identity In-Reply-To: <1211236234.23531.5.camel@localhost> References: <23102591.2651211147221077.JavaMail.root@zimbra> <1211159701.24962.2.camel@localhost> <1211227190.1114.6.camel@localhost> <1211234420.23391.0.camel@localhost> <1211236234.23531.5.camel@localhost> Message-ID: On Mon, 19 May 2008, Mihael Hategan wrote: > My bad. JGlobus looks at it's own hostname property, not the system > property. Fixed in cog r2023. yeah, that seems to work for GRAM4. I haven't tried any of the recent changes with the coasters, which was another source of trouble. -- From bugzilla-daemon at mcs.anl.gov Mon May 19 17:51:13 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 19 May 2008 17:51:13 -0500 (CDT) Subject: [Swift-devel] [Bug 107] restarts broken (by generalisation of data file handling) In-Reply-To: Message-ID: <20080519225113.86385164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=107 ------- Comment #6 from benc at hawaga.org.uk 2008-05-19 17:51 ------- r1985 breaks tests/language-behaviour/075-array-mapper to recreate: cd tests/language-behaviour ./run 075-array-mapper.swift -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon May 19 18:16:50 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 19 May 2008 18:16:50 -0500 (CDT) Subject: [Swift-devel] [Bug 107] restarts broken (by generalisation of data file handling) In-Reply-To: Message-ID: <20080519231650.51292164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=107 ------- Comment #7 from benc at hawaga.org.uk 2008-05-19 18:16 ------- comment #6 might be that there is a bug in the array mapper or some related code that has not been previously noted - some of the logging for the DSHandles involved looks wierd. I am poking round a bit more. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon May 19 18:36:16 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 19 May 2008 18:36:16 -0500 (CDT) Subject: [Swift-devel] [Bug 107] restarts broken (by generalisation of data file handling) In-Reply-To: Message-ID: <20080519233616.9A3AF164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=107 ------- Comment #8 from hategan at mcs.anl.gov 2008-05-19 18:35 ------- Actually it was IsLogged that was broken. Fixed in r1988. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon May 19 19:24:09 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 19 May 2008 19:24:09 -0500 (CDT) Subject: [Swift-devel] [Bug 107] restarts broken (by generalisation of data file handling) In-Reply-To: Message-ID: <20080520002409.C862A164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=107 ------- Comment #9 from benc at hawaga.org.uk 2008-05-19 19:23 ------- tests/language-behaviour/075-array-mapper passes, as does tests/misc/restart2. hopefully skenny can try the real life workflow that motivated tests/misc/restart2 for some real life confirmation of behaviour -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Tue May 20 10:18:17 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 20 May 2008 15:18:17 +0000 (GMT) Subject: [Swift-devel] file URLs Message-ID: This: messagefile outfile <"file://localhost/145-url.out">; creates a file in `pwd` (see tests/language-behaviour/145-url) This: messagefile outfile <"file:///146-url.out">; creates a file in / This seems undesirably inconsistent. (the latter is the behaviour that I'd expect from almost any URL processing code apart from CoG; the latter is consistent with my understanding of CoG's interpretation of the woefully inadequate file: URI spec) -- From hategan at mcs.anl.gov Tue May 20 10:23:04 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 20 May 2008 10:23:04 -0500 Subject: [Swift-devel] file URLs In-Reply-To: References: Message-ID: <1211296984.390.0.camel@localhost> On Tue, 2008-05-20 at 15:18 +0000, Ben Clifford wrote: > This: > > messagefile outfile <"file://localhost/145-url.out">; > > creates a file in `pwd` (see tests/language-behaviour/145-url) > > This: > > messagefile outfile <"file:///146-url.out">; > > creates a file in / > > This seems undesirably inconsistent. > > (the latter is the behaviour that I'd expect from almost any URL > processing code apart from CoG; the latter is consistent with my > understanding of CoG's interpretation of the woefully inadequate file: > URI spec) So what would you like it to do? > From mikekubal at yahoo.com Tue May 20 16:46:43 2008 From: mikekubal at yahoo.com (Mike Kubal) Date: Tue, 20 May 2008 14:46:43 -0700 (PDT) Subject: [Swift-devel] proper way to run swift on resource using condor? Message-ID: <839419.55851.qm@web52301.mail.re2.yahoo.com> I have been using some 'work around' methods to use Swift to run jobs on the Purdue resource that uses Condor. Is there a proper way? Thanks, Mike From foster at mcs.anl.gov Tue May 20 16:48:01 2008 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 20 May 2008 16:48:01 -0500 Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: <839419.55851.qm@web52301.mail.re2.yahoo.com> References: <839419.55851.qm@web52301.mail.re2.yahoo.com> Message-ID: Can't you just submit via GRAM, as usual? On May 20, 2008, at 4:46 PM, Mike Kubal wrote: > I have been using some 'work around' methods to use > Swift to run jobs on the Purdue resource that uses > Condor. Is there a proper way? > > Thanks, > > Mike > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Tue May 20 16:51:15 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 20 May 2008 16:51:15 -0500 Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: <839419.55851.qm@web52301.mail.re2.yahoo.com> References: <839419.55851.qm@web52301.mail.re2.yahoo.com> Message-ID: <1211320275.11229.1.camel@localhost> Are you referring to the arguments-with-spaces issue? On Tue, 2008-05-20 at 14:46 -0700, Mike Kubal wrote: > I have been using some 'work around' methods to use > Swift to run jobs on the Purdue resource that uses > Condor. Is there a proper way? > > Thanks, > > Mike > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From mikekubal at yahoo.com Tue May 20 21:23:15 2008 From: mikekubal at yahoo.com (Mike Kubal) Date: Tue, 20 May 2008 19:23:15 -0700 (PDT) Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: <1211320275.11229.1.camel@localhost> Message-ID: <384912.26453.qm@web52311.mail.re2.yahoo.com> I've been having swift invoke a perl script on Purdue. The perl script writes the the descriptor file that is passed as an argument to the condor_submit command within the perl script. The condor_submit command actually launches the job on a compute node. Every 30 seconds the perl script checks to see if the launched job has finished. When the launched job finishes the perl script ends, and the result files are returned via swift. The problem with this approach is that I am either piling many perl scripts onto the logon node or wasting a compute node to just check the status of the job. I have experimented with having swift call condor_submit directly with the idea of having a single separate swift function check the status of all jobs and returned the result files when finished. The problem here is that once the the job has been launched successfully, swift removes the job sub-directory where the results of the job launched via condor_submit are written. Suggestions welcome, Mike --- Mihael Hategan wrote: > Are you referring to the arguments-with-spaces > issue? > > On Tue, 2008-05-20 at 14:46 -0700, Mike Kubal wrote: > > I have been using some 'work around' methods to > use > > Swift to run jobs on the Purdue resource that > uses > > Condor. Is there a proper way? > > > > Thanks, > > > > Mike > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From foster at mcs.anl.gov Tue May 20 21:31:33 2008 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 20 May 2008 21:31:33 -0500 Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: <384912.26453.qm@web52311.mail.re2.yahoo.com> References: <384912.26453.qm@web52311.mail.re2.yahoo.com> Message-ID: Again, why don't you use a GRAM submit as you would at any other TG site? On May 20, 2008, at 9:23 PM, Mike Kubal wrote: > I've been having swift invoke a perl script on Purdue. > The perl script writes the the descriptor file that is > passed as an argument to the condor_submit command > within the perl script. The condor_submit command > actually launches the job on a compute node. Every 30 > seconds the perl script checks to see if the launched > job has finished. When the launched job finishes the > perl script ends, and the result files are returned > via swift. The problem with this approach is that I > am either piling many perl scripts onto the logon node > or wasting a compute node to just check the status of > the job. > > I have experimented with having swift call > condor_submit directly with the idea of having a > single separate swift function check the status of all > jobs and returned the result files when finished. The > problem here is that once the the job has been > launched successfully, swift removes the job > sub-directory where the results of the job launched > via condor_submit are written. > > Suggestions welcome, > > Mike > > --- Mihael Hategan wrote: > >> Are you referring to the arguments-with-spaces >> issue? >> >> On Tue, 2008-05-20 at 14:46 -0700, Mike Kubal wrote: >>> I have been using some 'work around' methods to >> use >>> Swift to run jobs on the Purdue resource that >> uses >>> Condor. Is there a proper way? >>> >>> Thanks, >>> >>> Mike >>> >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Wed May 21 05:54:10 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 21 May 2008 10:54:10 +0000 (GMT) Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: <384912.26453.qm@web52311.mail.re2.yahoo.com> References: <384912.26453.qm@web52311.mail.re2.yahoo.com> Message-ID: You should be able to use GRAM2 for this, using the details on here: http://www.teragrid.org/userinfo/hardware/resources.php?type=compute&select=single&id=16 or here: http://www.teragrid.org/userinfo/jobs/gram.php However, it appears to not work at the moment (either with Swift or with a simple commandline submission globus-job-run tg-condor.purdue.teragrid.org/jobmanager-fork /bin/hostname) I'll poke the teragrid helpdesk and CC you. -- From wilde at mcs.anl.gov Wed May 21 07:50:56 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 21 May 2008 07:50:56 -0500 Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: References: <384912.26453.qm@web52311.mail.re2.yahoo.com> Message-ID: <48341AB0.9010200@mcs.anl.gov> Hi Mike, Let me try to clarify. I think you used scripts to generate Condor submit files when you used the Purdue site without Swift. Using GRAM, you could have written scripts that were independent of the site scheduler: they would work work on Condor, PBS, LSF etc. Thats the purpose of GRAM, to provide a scheduler-independent means of submitting jobs. One can use GRAM through Condor-G as well: you write submit files, Condor sends jobs to GRAM, and GRAM sends them to the local scheduler. Swift talks (mainly) to GRAM, so when you use Swift, you're also scheduler-independent. And in most cases, thats the way we suggest running Swift. - Mike On 5/20/08 9:31 PM, Ian Foster wrote: > Again, why don't you use a GRAM submit as you would at any other TG site? > > > On May 20, 2008, at 9:23 PM, Mike Kubal wrote: > >> I've been having swift invoke a perl script on Purdue. >> The perl script writes the the descriptor file that is >> passed as an argument to the condor_submit command >> within the perl script. The condor_submit command >> actually launches the job on a compute node. Every 30 >> seconds the perl script checks to see if the launched >> job has finished. When the launched job finishes the >> perl script ends, and the result files are returned >> via swift. The problem with this approach is that I >> am either piling many perl scripts onto the logon node >> or wasting a compute node to just check the status of >> the job. >> >> I have experimented with having swift call >> condor_submit directly with the idea of having a >> single separate swift function check the status of all >> jobs and returned the result files when finished. The >> problem here is that once the the job has been >> launched successfully, swift removes the job >> sub-directory where the results of the job launched >> via condor_submit are written. >> >> Suggestions welcome, >> >> Mike >> >> --- Mihael Hategan wrote: >> >>> Are you referring to the arguments-with-spaces >>> issue? >>> >>> On Tue, 2008-05-20 at 14:46 -0700, Mike Kubal wrote: >>>> I have been using some 'work around' methods to >>> use >>>> Swift to run jobs on the Purdue resource that >>> uses >>>> Condor. Is there a proper way? >>>> >>>> Thanks, >>>> >>>> Mike >>>> >>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> >>> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> >> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Wed May 21 09:37:36 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 21 May 2008 14:37:36 +0000 (GMT) Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: References: <384912.26453.qm@web52311.mail.re2.yahoo.com> Message-ID: > You should be able to use GRAM2 for this, using the details on here: [...] > However, it appears to not work at the moment TG people at Purdue have fixed this now. Have a look at tests/sites/tgpurdue-condor-gram2.xml (in swift svn >=r1995) That is a sites file that I use (as of a few minutes ago) for running my semi-automated site tests; it passes all tests apart from ones with spaces in filenames (which is a problem with gram+condor in general). You will need to change the project ID (or remove that entirely if you have a default teragrid project) and change the workdirectory to something of your own. Run examples/vdsk/first.swift to check it works. You should see from that that it works basically the same as submitting to PBS on other TeraGrid sites, with no need to do condor specific things at the far side. -- From smartin at mcs.anl.gov Wed May 21 09:48:02 2008 From: smartin at mcs.anl.gov (Stuart Martin) Date: Wed, 21 May 2008 09:48:02 -0500 Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: References: <384912.26453.qm@web52311.mail.re2.yahoo.com> Message-ID: <80B4361E-B53F-4855-AE21-3ADE5392B4F0@mcs.anl.gov> Looking at the kit registrations for remote compute kits, purdue has WS GRAM (version 4.0.1 very old) registered. But maybe it is not running. Seems the kit registration should only say a version. Then the actual contact information (host:port/service path) should be only be listed when the container is up running. I think JP has made the latest remote compute kit available that includes WS GRAM from 4.0.7 + an rft patch. So we should ask Purdue to install that. Does that make sense JP? http://info.teragrid.org:8080/webmds/webmds?info=tgislocal&xsl=kitsregistration Name Type Version Endpoint prews-gram-condor prews-gram 4.0.1 tg-condor.purdue.teragrid.org:2119/ jobmanager-condor Name Type Version Endpoint globus-mds globus-mds4 4.0.1 https://tg-condor.purdue.teragrid.org:8443/wsrf/services/DefaultIndexService Name Type Version Endpoint ws-delegation ws-delegation 4.0.1 https://tg-condor.purdue.teragrid.org:8443/wsrf/services/DelegationService Name Type Version Endpoint ws-gram/Condor ws-gram 4.0.1 https://tg-condor.purdue.teragrid.org:8443/wsrf/services/ManagedJobFactoryService Name Type Version Endpoint globus-mds-auth globus-mds4 4.0.1 https://tg-condor.purdue.teragrid.org:8448/wsrf/services/DefaultIndexService Name Type Version Endpoint ws-gram/Fork ws-gram 4.0.1 https://tg-condor.purdue.teragrid.org:8443/wsrf/services/ManagedJobFactoryService Name Type Version Endpoint prews-gram-fork prews-gram 4.0.1 tg-condor.purdue.teragrid.org:2119/ jobmanager-fork On May 21, 2008, at May 21, 5:54 AM, Ben Clifford wrote: > You should be able to use GRAM2 for this, using the details on here: > > http://www.teragrid.org/userinfo/hardware/resources.php?type=compute&select=single&id=16 > > or here: > > http://www.teragrid.org/userinfo/jobs/gram.php > > However, it appears to not work at the moment (either with Swift or > with a > simple commandline submission globus-job-run > tg-condor.purdue.teragrid.org/jobmanager-fork /bin/hostname) > > I'll poke the teragrid helpdesk and CC you. > > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Wed May 21 10:15:03 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 21 May 2008 15:15:03 +0000 (GMT) Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: <80B4361E-B53F-4855-AE21-3ADE5392B4F0@mcs.anl.gov> References: <384912.26453.qm@web52311.mail.re2.yahoo.com> <80B4361E-B53F-4855-AE21-3ADE5392B4F0@mcs.anl.gov> Message-ID: On Wed, 21 May 2008, Stuart Martin wrote: > Looking at the kit registrations for remote compute kits, purdue has WS GRAM > (version 4.0.1 very old) registered. But maybe it is not running. It seems to be running, and almost accepting my jobs - it doesn't seem to pay attention to the project RSL extension (and I deliberately have no default project because I am on several). (the GRAM2 installation wasn't paying attention to that earlier today either but that has now been fixed). -- From mikekubal at yahoo.com Wed May 21 11:20:31 2008 From: mikekubal at yahoo.com (Mike Kubal) Date: Wed, 21 May 2008 09:20:31 -0700 (PDT) Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: Message-ID: <901375.35626.qm@web52312.mail.re2.yahoo.com> Thanks all. My docking application is running now on Purdue using jobmanager-condor without any need for the perl script calling condor_submit. I had tried jobmanager-condor previously and when it failed I assumed I needed to incorporate the condor_submit. --- Ben Clifford wrote: > > > You should be able to use GRAM2 for this, using > the details on here: > [...] > > However, it appears to not work at the moment > > TG people at Purdue have fixed this now. > > Have a look at tests/sites/tgpurdue-condor-gram2.xml > (in swift svn > >=r1995) > > That is a sites file that I use (as of a few minutes > ago) for running my > semi-automated site tests; it passes all tests apart > from ones with spaces > in filenames (which is a problem with gram+condor in > general). > > You will need to change the project ID (or remove > that entirely if you > have a default teragrid project) and change the > workdirectory to something > of your own. > > Run examples/vdsk/first.swift to check it works. You > should see from that > that it works basically the same as submitting to > PBS on other TeraGrid > sites, with no need to do condor specific things at > the far side. > > -- > > From benc at hawaga.org.uk Wed May 21 14:43:41 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 21 May 2008 19:43:41 +0000 (GMT) Subject: [Swift-devel] 0.6 release plan Message-ID: I think it would be good to aim for a 0.6 release in mid-June. That is two months since the 0.5 release, which seems a reasonable period. Both the coasters and job replication can hopefully be experimentally usable by then. Restarts seem to mostly work again. Plus a pile of smaller bugfixes and features. -- From navarro at mcs.anl.gov Wed May 21 15:06:13 2008 From: navarro at mcs.anl.gov (JP Navarro) Date: Wed, 21 May 2008 15:06:13 -0500 Subject: [Swift-devel] proper way to run swift on resource using condor? In-Reply-To: <80B4361E-B53F-4855-AE21-3ADE5392B4F0@mcs.anl.gov> References: <384912.26453.qm@web52311.mail.re2.yahoo.com> <80B4361E-B53F-4855-AE21-3ADE5392B4F0@mcs.anl.gov> Message-ID: <065BC8E3-2FC5-4683-801C-193547D95260@mcs.anl.gov> On May 21, 2008, at 9:48 AM, Stuart Martin wrote: > Looking at the kit registrations for remote compute kits, purdue has > WS GRAM (version 4.0.1 very old) registered. But maybe it is not > running. Seems the kit registration should only say a version. > Then the actual contact information (host:port/service path) should > be only be listed when the container is up running. I think JP has > made the latest remote compute kit available that includes WS GRAM > from 4.0.7 + an rft patch. So we should ask Purdue to install > that. Does that make sense JP? Kit service information is not how we communicate transient up/down status. We have the Inca system for that. Part of the kit registration information is a StatusURL that will take you to an Inca page that does show the current monitored status of each service in the kit. I would expect Purdue plans to deploy the new 4.0.7 services, but it wouldn't hurt to let them know that you'd like for them to do so. > http://info.teragrid.org:8080/webmds/webmds?info=tgislocal&xsl=kitsregistration > > Name Type Version Endpoint > prews-gram-condor prews-gram 4.0.1 tg-condor.purdue.teragrid.org: > 2119/jobmanager-condor > Name Type Version Endpoint > globus-mds globus-mds4 4.0.1 https://tg-condor.purdue.teragrid.org:8443/wsrf/services/DefaultIndexService > Name Type Version Endpoint > ws-delegation ws-delegation 4.0.1 https://tg-condor.purdue.teragrid.org:8443/wsrf/services/DelegationService > Name Type Version Endpoint > ws-gram/Condor ws-gram 4.0.1 https://tg-condor.purdue.teragrid.org:8443/wsrf/services/ManagedJobFactoryService > Name Type Version Endpoint > globus-mds-auth globus-mds4 4.0.1 https://tg-condor.purdue.teragrid.org:8448/wsrf/services/DefaultIndexService > Name Type Version Endpoint > ws-gram/Fork ws-gram 4.0.1 https://tg-condor.purdue.teragrid.org:8443/wsrf/services/ManagedJobFactoryService > Name Type Version Endpoint > prews-gram-fork prews-gram 4.0.1 tg-condor.purdue.teragrid.org:2119/ > jobmanager-fork > > > On May 21, 2008, at May 21, 5:54 AM, Ben Clifford wrote: > >> You should be able to use GRAM2 for this, using the details on here: >> >> http://www.teragrid.org/userinfo/hardware/resources.php?type=compute&select=single&id=16 >> >> or here: >> >> http://www.teragrid.org/userinfo/jobs/gram.php >> >> However, it appears to not work at the moment (either with Swift or >> with a >> simple commandline submission globus-job-run >> tg-condor.purdue.teragrid.org/jobmanager-fork /bin/hostname) >> >> I'll poke the teragrid helpdesk and CC you. >> >> -- >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Wed May 21 18:34:37 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 21 May 2008 23:34:37 +0000 (GMT) Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: References: <1209767914.28036.5.camel@localhost> Message-ID: On Sun, 4 May 2008, Ben Clifford wrote: > One way I was thinking of testing on a real site is to set profile keys so > that jobs go into a condor pool with a requirement to not run for a > specified time after submission (I think that is expressible in the > classad language). That should give reproducible at-least-one-resubmission > behaviour. I tried a couple of these on fletch: MY.QDate + 180 < TARGET.LastHeardFrom MY.QDate + 180 < MY.ServerTime and whilst both match after three minutes, the jobs then don't execute. grr. -- From hategan at mcs.anl.gov Wed May 21 18:40:46 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 May 2008 18:40:46 -0500 Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: References: <1209767914.28036.5.camel@localhost> Message-ID: <1211413246.30190.4.camel@localhost> On Wed, 2008-05-21 at 23:34 +0000, Ben Clifford wrote: > On Sun, 4 May 2008, Ben Clifford wrote: > > > One way I was thinking of testing on a real site is to set profile keys so > > that jobs go into a condor pool with a requirement to not run for a > > specified time after submission (I think that is expressible in the > > classad language). That should give reproducible at-least-one-resubmission > > behaviour. > > I tried a couple of these on fletch: > > MY.QDate + 180 < TARGET.LastHeardFrom > MY.QDate + 180 < MY.ServerTime > > and whilst both match after three minutes, the jobs then don't execute. > grr. Won't happen if you have a single job. Perhaps there should be a "guess" average to start with. In the mean time you can run a dummy echo or something first. > From hategan at mcs.anl.gov Wed May 21 18:42:16 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 May 2008 18:42:16 -0500 Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: <1211413246.30190.4.camel@localhost> References: <1209767914.28036.5.camel@localhost> <1211413246.30190.4.camel@localhost> Message-ID: <1211413336.30190.5.camel@localhost> On Wed, 2008-05-21 at 18:40 -0500, Mihael Hategan wrote: > On Wed, 2008-05-21 at 23:34 +0000, Ben Clifford wrote: > > On Sun, 4 May 2008, Ben Clifford wrote: > > > > > One way I was thinking of testing on a real site is to set profile keys so > > > that jobs go into a condor pool with a requirement to not run for a > > > specified time after submission (I think that is expressible in the > > > classad language). That should give reproducible at-least-one-resubmission > > > behaviour. > > > > I tried a couple of these on fletch: > > > > MY.QDate + 180 < TARGET.LastHeardFrom > > MY.QDate + 180 < MY.ServerTime > > > > and whilst both match after three minutes, the jobs then don't execute. > > grr. > > Won't happen if you have a single job. Correction: won't happen if no jobs have ever changed state from queued to active. > Perhaps there should be a "guess" > average to start with. In the mean time you can run a dummy echo or > something first. > > > From benc at hawaga.org.uk Wed May 21 18:46:58 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 21 May 2008 23:46:58 +0000 (GMT) Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: <1211413246.30190.4.camel@localhost> References: <1209767914.28036.5.camel@localhost> <1211413246.30190.4.camel@localhost> Message-ID: On Wed, 21 May 2008, Mihael Hategan wrote: > Won't happen if you have a single job. Perhaps there should be a "guess" > average to start with. In the mean time you can run a dummy echo or > something first. I mean at the condor layer just running a job in the first place, not even turning on replication; it should have all run eventually this way with a three minute delay before starting each job. -- From hategan at mcs.anl.gov Wed May 21 18:53:30 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 May 2008 18:53:30 -0500 Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: References: <1209767914.28036.5.camel@localhost> <1211413246.30190.4.camel@localhost> Message-ID: <1211414010.30355.0.camel@localhost> On Wed, 2008-05-21 at 23:46 +0000, Ben Clifford wrote: > On Wed, 21 May 2008, Mihael Hategan wrote: > > > Won't happen if you have a single job. Perhaps there should be a "guess" > > average to start with. In the mean time you can run a dummy echo or > > something first. > > I mean at the condor layer just running a job in the first place, not even > turning on replication; it should have all run eventually this way with a > three minute delay before starting each job. If you remove the time constraints, do they work? > From benc at hawaga.org.uk Wed May 21 18:54:53 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 21 May 2008 23:54:53 +0000 (GMT) Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: <1211414010.30355.0.camel@localhost> References: <1209767914.28036.5.camel@localhost> <1211413246.30190.4.camel@localhost> <1211414010.30355.0.camel@localhost> Message-ID: On Wed, 21 May 2008, Mihael Hategan wrote: > If you remove the time constraints, do they work? ja. As does putting in a different constraint such as 'True'. -- From hategan at mcs.anl.gov Wed May 21 19:01:01 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 May 2008 19:01:01 -0500 Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: References: <1209767914.28036.5.camel@localhost> <1211413246.30190.4.camel@localhost> <1211414010.30355.0.camel@localhost> Message-ID: <1211414461.30587.0.camel@localhost> On Wed, 2008-05-21 at 23:54 +0000, Ben Clifford wrote: > On Wed, 21 May 2008, Mihael Hategan wrote: > > > If you remove the time constraints, do they work? > > ja. As does putting in a different constraint such as 'True'. How do you know the requirements match? > From benc at hawaga.org.uk Thu May 22 04:30:01 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 22 May 2008 09:30:01 +0000 (GMT) Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: <1211414461.30587.0.camel@localhost> References: <1209767914.28036.5.camel@localhost> <1211413246.30190.4.camel@localhost> <1211414010.30355.0.camel@localhost> <1211414461.30587.0.camel@localhost> Message-ID: On Wed, 21 May 2008, Mihael Hategan wrote: > > ja. As does putting in a different constraint such as 'True'. > > How do you know the requirements match? condor_q -better-analyze -- From benc at hawaga.org.uk Thu May 22 07:27:28 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 22 May 2008 12:27:28 +0000 (GMT) Subject: [Swift-devel] replication/recall of jobs from slow queues In-Reply-To: References: <1209767914.28036.5.camel@localhost> <1211413246.30190.4.camel@localhost> <1211414010.30355.0.camel@localhost> <1211414461.30587.0.camel@localhost> Message-ID: anyway, screw this approach, I got what I wanted through a different approach (and discovered that the condor requirements approach would not have worked anyway, I think) which I will write in a separate email in a few minutes. -- From benc at hawaga.org.uk Thu May 22 07:48:47 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 22 May 2008 12:48:47 +0000 (GMT) Subject: [Swift-devel] testing replication Message-ID: I was motivated to build a test framework for replication, and through it I have recreated (in exciting repeatable fashion) what superficially appears to be a problem that Xi Li encountered the other day. I hacked (in racy, unundented glory) provider-local to be sometimes slow (where sometimes means 2 minute delay on every execution except the first). This is provider-wonky, which is in https://svn.ci.uchicago.edu/svn/vdl2/provider-wonky. Add a dependency for provider-wonky and build Swift. Set provider="wonky" for execution in sites.xml, instead of local provider. Attempt to run test 062-two-in-a-row. Observe that it runs (with the first job finishing pretty much instantly, and the second job taking a couple of minutes). Activate replication with: replication.enabled=true replication.min.queue.time=15 Attempt to run 062-to-in-a-row again. Observe that the workflow does not finish successfully (and that, at least superficially, it looks like the same problem Xi had). This is what I see on my console (it takes about 10 minutes to run in total): Swift svn swift-r1990 (Swift modified locally) cog-r2023 (CoG modified locally) RunID: 20080522-1339-sygzk6qc Progress: echo started Submitting wonky job 0 Wonky job in queue, job number 0 not sleeping - this is the first job, 0 Wonky job running now Wonky job completed with exitCode 0 Wonky job status COMPLETED 0 echo completed echo started Submitting wonky job 1 Wonky job in queue, job number 1 Progress: Selecting site:1 Finished successfully:1 Submitting wonky job 2 Wonky job in queue, job number 2 Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Wonky job running now Cancel called on wonky job 2 Wonky job completed with exitCode 0 Wonky job status COMPLETED 1 Submitting wonky job 3 Wonky job in queue, job number 3 Wonky job running now Progress: Executing:1 Finished successfully:1 Progress: Executing:1 Finished successfully:1 Wonky job running now Wonky job completed with exitCode 0 Wonky job status COMPLETED 3 Submitting wonky job 4 Wonky job in queue, job number 4 Progress: Selecting site:1 Finished successfully:1 Submitting wonky job 5 Wonky job in queue, job number 5 Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Wonky job running now Cancel called on wonky job 5 Wonky job completed with exitCode 0 Wonky job status COMPLETED 4 Submitting wonky job 6 Wonky job in queue, job number 6 Submitting wonky job 7 Wonky job in queue, job number 7 Progress: Selecting site:1 Finished successfully:1 Wonky job running now Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Wonky job running now Wonky job running now Wonky job completed with exitCode 0 Wonky job status COMPLETED 6 Wonky job completed with exitCode 0 Wonky job status COMPLETED 7 echo failed Execution failed: Submitting wonky job 8 Wonky job in queue, job number 8 Multiple mappings pointing to the same file (localhost:062-two-in-a-row-20080522-1339-sygzk6qc/shared/062-two-in-a-row.b.out) detected. SWIFT RETURN CODE NON-ZERO -- From benc at hawaga.org.uk Thu May 22 08:09:16 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 22 May 2008 13:09:16 +0000 (GMT) Subject: [Swift-devel] restart variable scoping. Message-ID: wrt to the on-going activity around bug NNN about restarts... if it is desirable to label restarts based on variable name, rather than by the mapped filename, then: the naive approach of storing the variable name as restart identifier: * works for global variables * does not work for non-global variables, because the same variable name can be used in different scopes, and the same code block can be invoked multiple times (in procedures and in for loops, for example) Some kind of scope identifier on the front might help here. In: (file t) p() { file q=foo(); t=qux(q); } a=p(); // l1 b=p(); // l2 the variable 'q' is used twice, once in the call at l1, and once in the call at l2. So one might use 'l1' and 'l2', the locations in the source file, as scope identifiers, giving restart identifiers of l1.q and l2.q. In a foreach loop, such as: foreach a,i in array { // l3 out[i] = p(array); // l4 } we might then scope the various instances of q as: l3-iteration1.l4.q l3-iteration2.l4.q l3-iteration3.l4.q [...] with scope labelling being hierarchical. The scope identifier is: the containing scope identifier + whatever is needed to identify the new child scope within the containing scope So: entering a foreach loop block appends the iteration identifier to the current scope; making a function call appends the function call source location (or equivalent) to the current scope. This is mostly based around me thinking about a SwiftScript level of thinking, rather than a Karajan level of thinking - it may be that this is not easy to do at the karajan level (indeed, it may be that this doesn't work at all anyway). But it seemed worth sending anyway. -- From lixi at uchicago.edu Thu May 22 10:35:10 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Thu, 22 May 2008 10:35:10 -0500 (CDT) Subject: [Swift-devel] testing replication Message-ID: <20080522103510.BAL27780@m4500-03.uchicago.edu> >Execution failed: > Submitting wonky job 8 >Wonky job in queue, job number 8 >Multiple mappings pointing to the same file >(localhost:062-two-in-a-row-20080522-1339- sygzk6qc/shared/062-two-in-a-row.b.out) >detected. Yes. I encountered this kind of problem for many times these days. Xi From lixi at uchicago.edu Thu May 22 10:44:47 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Thu, 22 May 2008 10:44:47 -0500 (CDT) Subject: [Swift-devel] testing replication Message-ID: <20080522104447.BAL29294@m4500-03.uchicago.edu> Hi, I have an idea. Could you provide a way to externally modify various job load factors which affect the score changing extent? Then I could test with them and might found more suitable values. Because during the past experiments, it seemed that some sites with bad performance could be still chosen with big probabilities. Of course, this is only my own opinion. I am not sure how hard to achieve it. Thanks, Xi From hategan at mcs.anl.gov Thu May 22 11:10:26 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 22 May 2008 11:10:26 -0500 Subject: [Swift-devel] testing replication In-Reply-To: <20080522104447.BAL29294@m4500-03.uchicago.edu> References: <20080522104447.BAL29294@m4500-03.uchicago.edu> Message-ID: <1211472626.5871.2.camel@localhost> See this: http://wiki.cogkit.org/index.php/V:Head/Schedulers#Weighted and then edit scheduler.xml in libexec. Let me know if you have problems. Mihael On Thu, 2008-05-22 at 10:44 -0500, lixi at uchicago.edu wrote: > Hi, > > I have an idea. Could you provide a way to externally modify > various job load factors which affect the score changing > extent? Then I could test with them and might found more > suitable values. Because during the past experiments, it > seemed that some sites with bad performance could be still > chosen with big probabilities. Of course, this is only my > own opinion. I am not sure how hard to achieve it. > > Thanks, > > Xi From hategan at mcs.anl.gov Thu May 22 11:12:58 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 22 May 2008 11:12:58 -0500 Subject: [Swift-devel] testing replication In-Reply-To: <1211472626.5871.2.camel@localhost> References: <20080522104447.BAL29294@m4500-03.uchicago.edu> <1211472626.5871.2.camel@localhost> Message-ID: <1211472778.5975.1.camel@localhost> On Thu, 2008-05-22 at 11:10 -0500, Mihael Hategan wrote: > See this: http://wiki.cogkit.org/index.php/V:Head/Schedulers#Weighted Hmm. That doesn't seem to have been updated in a while. You can find a more accurate version in the scheduler source, but here is a summary: connectionRefusedFactor = -10; connectionTimeoutFactor = -20; jobSubmissionTaskLoadFactor = -0.2; transferTaskLoadFactor = -0.2; fileOperationTaskLoadFactor = -0.01; successFactor = 0.1; failureFactor = -0.5; scoreHighCap = 100; defaultJobThrottle = 2; > > and then edit scheduler.xml in libexec. Let me know if you have > problems. > > Mihael > > On Thu, 2008-05-22 at 10:44 -0500, lixi at uchicago.edu wrote: > > Hi, > > > > I have an idea. Could you provide a way to externally modify > > various job load factors which affect the score changing > > extent? Then I could test with them and might found more > > suitable values. Because during the past experiments, it > > seemed that some sites with bad performance could be still > > chosen with big probabilities. Of course, this is only my > > own opinion. I am not sure how hard to achieve it. > > > > Thanks, > > > > Xi From lixi at uchicago.edu Thu May 22 12:50:53 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Thu, 22 May 2008 12:50:53 -0500 (CDT) Subject: [Swift-devel] testing replication Message-ID: <20080522125053.BAL49653@m4500-03.uchicago.edu> >On Thu, 2008-05-22 at 11:10 -0500, Mihael Hategan wrote: >> See this: http://wiki.cogkit.org/index.php/V:Head/Schedulers#Weighted > >Hmm. That doesn't seem to have been updated in a while. You can find a >more accurate version in the scheduler source, but here is a summary: > >connectionRefusedFactor = -10; >connectionTimeoutFactor = -20; >jobSubmissionTaskLoadFactor = -0.2; >transferTaskLoadFactor = -0.2; >fileOperationTaskLoadFactor = -0.01; >successFactor = 0.1; >failureFactor = -0.5; >scoreHighCap = 100; >defaultJobThrottle = 2; > I knew these values. My intention is that we might could turn these factors into variables which could be changed for each run. Is that possible or has that been already implemented? Thanks, Xi From hategan at mcs.anl.gov Thu May 22 13:02:20 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 22 May 2008 13:02:20 -0500 Subject: [Swift-devel] testing replication In-Reply-To: <20080522125053.BAL49653@m4500-03.uchicago.edu> References: <20080522125053.BAL49653@m4500-03.uchicago.edu> Message-ID: <1211479340.7366.2.camel@localhost> On Thu, 2008-05-22 at 12:50 -0500, lixi at uchicago.edu wrote: > > > > I knew these values. My intention is that we might could > turn these factors into variables which could be changed for > each run. Is that possible or has that been already > implemented? As I mentioned before, you can edit libexec/scheduler.xml You'll see something like this: ... You can set your own properties there. For example: > > Thanks, > > Xi From lixi at uchicago.edu Thu May 22 13:06:25 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Thu, 22 May 2008 13:06:25 -0500 (CDT) Subject: [Swift-devel] testing replication Message-ID: <20080522130625.BAL51410@m4500-03.uchicago.edu> I see. How silly I am! Thanks so much. :) Xi ---- Original message ---- >Date: Thu, 22 May 2008 13:02:20 -0500 >From: Mihael Hategan >Subject: Re: [Swift-devel] testing replication >To: lixi at uchicago.edu >Cc: Ben Clifford , swift- devel at ci.uchicago.edu > > >On Thu, 2008-05-22 at 12:50 -0500, lixi at uchicago.edu wrote: >> > >> >> I knew these values. My intention is that we might could >> turn these factors into variables which could be changed for >> each run. Is that possible or has that been already >> implemented? > >As I mentioned before, you can edit libexec/scheduler.xml > >You'll see something like this: > > > ... > > >You can set your own properties there. For example: > > >> >> Thanks, >> >> Xi > From hategan at mcs.anl.gov Thu May 22 16:31:42 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 22 May 2008 16:31:42 -0500 Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: References: Message-ID: <1211491902.12337.16.camel@localhost> Right. Me and Yong had a long discussion about this two years ago. You can "label" things based on the thread id, because it reflects lexical structure. You cannot, however, do this with foreach loops. Data is not to be tied to threads there but to actual iteration values, because they can appear on the iteration channel in any order. Namely, if you have for k in [1:2] { X[k] = echo(k); } for k, v in X { Y[k] = X[k]; } For the first loop it's fairly clear. k=1 - thread 1-1, k=2 - thread 1-2. But on the second loop this is determined by the order in which the echoes complete. It can very well be that echo(2) completes first, so you'll have k=1 - thread 2-2, and k=2 - thread 2-1. That's why we abandoned the idea eventually (though I forgot it when I first "fixed" the issue). It could be done, using horribly non-portable knowledge about the implementation, by going through all the karajan stack frames and knowing that for loops in swift are compiled to put the iteration variable in "$$". It will, of course, break if k is not something easily serializable. The current idea is that of using something that alludes to a RC: mark files as valid in a transaction log (simply checking for file existence on the disk may fail to check consistency of such files). On Thu, 2008-05-22 at 13:09 +0000, Ben Clifford wrote: > wrt to the on-going activity around bug NNN about restarts... > > if it is desirable to label restarts based on variable name, rather than > by the mapped filename, then: > > the naive approach of storing the variable name as restart identifier: > > * works for global variables > * does not work for non-global variables, because the same variable name > can be used in different scopes, and the same code block can be invoked > multiple times (in procedures and in for loops, for example) > > Some kind of scope identifier on the front might help here. In: > > (file t) p() { > file q=foo(); > t=qux(q); > } > > a=p(); // l1 > b=p(); // l2 > > the variable 'q' is used twice, once in the call at l1, and once in the > call at l2. > > So one might use 'l1' and 'l2', the locations in the source file, as scope > identifiers, giving restart identifiers of l1.q and l2.q. > > In a foreach loop, such as: > > foreach a,i in array { // l3 > out[i] = p(array); // l4 > } > > we might then scope the various instances of q as: > > l3-iteration1.l4.q > l3-iteration2.l4.q > l3-iteration3.l4.q > [...] > > with scope labelling being hierarchical. > > The scope identifier is: > > the containing scope identifier > + > whatever is needed to identify the new child scope within the containing > scope > > So: entering a foreach loop block appends the iteration identifier to the > current scope; making a function call appends the function call source > location (or equivalent) to the current scope. > > This is mostly based around me thinking about a SwiftScript level of > thinking, rather than a Karajan level of thinking - it may be that this > is not easy to do at the karajan level (indeed, it may be that this > doesn't work at all anyway). But it seemed worth sending anyway. > From benc at hawaga.org.uk Thu May 22 16:42:52 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 22 May 2008 21:42:52 +0000 (GMT) Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: <1211491902.12337.16.camel@localhost> References: <1211491902.12337.16.camel@localhost> Message-ID: On Thu, 22 May 2008, Mihael Hategan wrote: > for k in [1:2] { > X[k] = echo(k); > } > > for k, v in X { > Y[k] = X[k]; > } > For the first loop it's fairly clear. k=1 - thread 1-1, k=2 - thread > 1-2. But on the second loop this is determined by the order in which the > echoes complete. It can very well be that echo(2) completes first, so in the second loop, you're indexed by v. -- From hategan at mcs.anl.gov Thu May 22 16:49:04 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 22 May 2008 16:49:04 -0500 Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: References: <1211491902.12337.16.camel@localhost> Message-ID: <1211492944.12337.18.camel@localhost> On Thu, 2008-05-22 at 21:42 +0000, Ben Clifford wrote: > On Thu, 22 May 2008, Mihael Hategan wrote: > > > for k in [1:2] { > > X[k] = echo(k); > > } > > > > for k, v in X { > > Y[k] = X[k]; > > } > > > For the first loop it's fairly clear. k=1 - thread 1-1, k=2 - thread > > 1-2. But on the second loop this is determined by the order in which the > > echoes complete. It can very well be that echo(2) completes first, so > > in the second loop, you're indexed by v. Brr. Well, I use "k" and "v" to disambiguate. > From benc at hawaga.org.uk Thu May 22 16:52:23 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 22 May 2008 21:52:23 +0000 (GMT) Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: <1211492944.12337.18.camel@localhost> References: <1211491902.12337.16.camel@localhost> <1211492944.12337.18.camel@localhost> Message-ID: > Brr. Well, I use "k" and "v" to disambiguate. if you use v to disambiguate, I think that you are unabigiouus. -- From hategan at mcs.anl.gov Thu May 22 17:20:23 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 22 May 2008 17:20:23 -0500 Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: References: <1211491902.12337.16.camel@localhost> <1211492944.12337.18.camel@localhost> Message-ID: <1211494823.13366.0.camel@localhost> On Thu, 2008-05-22 at 21:52 +0000, Ben Clifford wrote: > > Brr. Well, I use "k" and "v" to disambiguate. > > if you use v to disambiguate, I think that you are unabigiouus. I needed k. From benc at hawaga.org.uk Thu May 22 17:22:05 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 22 May 2008 22:22:05 +0000 (GMT) Subject: [Swift-devel] (no subject) In-Reply-To: <1211494823.13366.0.camel@localhost> References: <1211491902.12337.16.camel@localhost> <1211492944.12337.18.camel@localhost> <1211494823.13366.0.camel@localhost> Message-ID: > > > Brr. Well, I use "k" and "v" to disambiguate. > > > > if you use v to disambiguate, I think that you are unabigiouus. > > I needed k. but v is so much better. -- From hategan at mcs.anl.gov Thu May 22 17:43:12 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 22 May 2008 17:43:12 -0500 Subject: [Swift-devel] this subject intentionally left blank? In-Reply-To: References: <1211491902.12337.16.camel@localhost> <1211492944.12337.18.camel@localhost> <1211494823.13366.0.camel@localhost> Message-ID: <1211496192.13366.22.camel@localhost> On Thu, 2008-05-22 at 22:22 +0000, Ben Clifford wrote: > > > > Brr. Well, I use "k" and "v" to disambiguate. > > > > > > if you use v to disambiguate, I think that you are unabigiouus. > > > > I needed k. > > but v is so much better. Because it holds more water? > From benc at hawaga.org.uk Thu May 22 19:47:21 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 00:47:21 +0000 (GMT) Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: <1211491902.12337.16.camel@localhost> References: <1211491902.12337.16.camel@localhost> Message-ID: > Data is not to be tied to threads there but to actual iteration values, > because they can appear on the iteration channel in any order. That pretty much is my comment about how this does/does not map from the Swift to the Karajan layer - I think it does, but not in a way that a karajan purist would be happy with. -- From hategan at mcs.anl.gov Thu May 22 20:09:55 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 22 May 2008 20:09:55 -0500 Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: References: <1211491902.12337.16.camel@localhost> Message-ID: <1211504995.16046.7.camel@localhost> On Fri, 2008-05-23 at 00:47 +0000, Ben Clifford wrote: > > Data is not to be tied to threads there but to actual iteration values, > > because they can appear on the iteration channel in any order. > > That pretty much is my comment about how this does/does not map from the > Swift to the Karajan layer - I think it does, but not in a way that a > karajan purist would be happy with. :) Not really. Hacks like that are bad. They are trading immediate results for problems later. Bigger problems. Did I also mention that it only works for basic types? > From benc at hawaga.org.uk Fri May 23 00:45:13 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 05:45:13 +0000 (GMT) Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: <1211504995.16046.7.camel@localhost> References: <1211491902.12337.16.camel@localhost> <1211504995.16046.7.camel@localhost> Message-ID: On Thu, 22 May 2008, Mihael Hategan wrote: > Not really. Hacks like that are bad. They are trading immediate results > for problems later. Bigger problems. ... > Did I also mention that it only works for basic types? So describe the location within the labeleld variable too. -- From benc at hawaga.org.uk Fri May 23 08:18:57 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 13:18:57 +0000 (GMT) Subject: [Swift-devel] exciting new coaster error Message-ID: I get the when trying to run with coasters using $ cd tests/sites $ ./run-all coaster/ All of the sites configs in the coaster directory exhibit this. Running test 061-cattwo Swift svn swift-r2001 (Swift modified locally) cog-r2023 RunID: 20080523-1417-kmqyd297 Progress: cat started cat failed Execution failed: Could not submit job Caused by: Could not start coaster service Caused by: java.lang.IllegalArgumentException: None of 'coaster.bootstrap.service.web.dir' and 'COG_INSTALL_PATH' are set From benc at hawaga.org.uk Fri May 23 08:38:04 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 13:38:04 +0000 (GMT) Subject: [Swift-devel] Re: exciting new coaster error In-Reply-To: References: Message-ID: something wrong with the launcher substitution. that stuff is fragile - annoying that ant doesn't seem to have an option to fail on no-match. fixed in r2002. -- From hategan at mcs.anl.gov Fri May 23 09:05:48 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 May 2008 09:05:48 -0500 Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: References: <1211491902.12337.16.camel@localhost> <1211504995.16046.7.camel@localhost> Message-ID: <1211551548.19883.0.camel@localhost> On Fri, 2008-05-23 at 05:45 +0000, Ben Clifford wrote: > On Thu, 22 May 2008, Mihael Hategan wrote: > > > Not really. Hacks like that are bad. They are trading immediate results > > for problems later. Bigger problems. > > ... > > > Did I also mention that it only works for basic types? > > So describe the location within the labeleld variable too. No, not that. I'm talking about iteration variables, not the ones being logged. > From benc at hawaga.org.uk Fri May 23 09:13:00 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 14:13:00 +0000 (GMT) Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: <1211551548.19883.0.camel@localhost> References: <1211491902.12337.16.camel@localhost> <1211504995.16046.7.camel@localhost> <1211551548.19883.0.camel@localhost> Message-ID: On Fri, 23 May 2008, Mihael Hategan wrote: > > So describe the location within the labeleld variable too. > > No, not that. I'm talking about iteration variables, not the ones being > logged. ok. If you use the iteration index, that is always a basic type. At the SwiftScript layer, that index identifies the scope of variables in the loop body (wrt the containing scope) that makes sense for SwiftScript; that is, local variable t in iteration index 5 of a foreach will always be the "same variable", no matter when the karajan layer goes to execute that code block. I think. I don't see any other immediately obvious way that is significantly different that does that kind of identification; and I think that kind of identification is necessary if doing variable-name based restarts. -- From hategan at mcs.anl.gov Fri May 23 09:15:43 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 May 2008 09:15:43 -0500 Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: References: <1211491902.12337.16.camel@localhost> <1211504995.16046.7.camel@localhost> <1211551548.19883.0.camel@localhost> Message-ID: <1211552143.20056.1.camel@localhost> On Fri, 2008-05-23 at 14:13 +0000, Ben Clifford wrote: > On Fri, 23 May 2008, Mihael Hategan wrote: > > > > So describe the location within the labeleld variable too. > > > > No, not that. I'm talking about iteration variables, not the ones being > > logged. > > ok. > > If you use the iteration index, that is always a basic type. Ah, right. We only allow arrays to be indexed by integers. I see. From benc at hawaga.org.uk Fri May 23 09:24:28 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 14:24:28 +0000 (GMT) Subject: [Swift-devel] Re: restart variable scoping. In-Reply-To: <1211552143.20056.1.camel@localhost> References: <1211491902.12337.16.camel@localhost> <1211504995.16046.7.camel@localhost> <1211551548.19883.0.camel@localhost> <1211552143.20056.1.camel@localhost> Message-ID: On Fri, 23 May 2008, Mihael Hategan wrote: > > If you use the iteration index, that is always a basic type. > > Ah, right. We only allow arrays to be indexed by integers. I see. yes. If we were to index an array by something like a string (or a float, or the tuple types that we talk of sometimes), then I think the same arguments apply though - it can be 'written down' and 'read in' in a sane fashion. Using types such as files that have no in-runtime value would cause problems for arrays in general, not just for restart, in as much as they don't have something like a .equals() to meaningfully use when picking elements of an array using the [] operator (aside from something like comparing the files byte-for-byte, which seems excessive). -- From benc at hawaga.org.uk Fri May 23 11:50:11 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 16:50:11 +0000 (GMT) Subject: [Swift-devel] Re: coaster->tg-ncsa In-Reply-To: <1209523861.1278.0.camel@localhost> References: <1209522838.702.8.camel@localhost> <1209523861.1278.0.camel@localhost> Message-ID: On Tue, 29 Apr 2008, Mihael Hategan wrote: > Odd. Attributes should be copied from the original task. I'll look into > that. They almost were - a prototype task specification gets passed all the way into WorkerManager.buildSpecification, at which point it is promptly entirely ignored. In CoG r2024, I added a loop there to copy attributes and now I can submit to TG sites with the project profile entry correctly propagated, using this jobManager="gt2:gt2:pbs" Perhaps it should also copy environment variables - I haven't thought about that much. (I also tried jobManager="gt4:pbs" but I get a null pointer exception there still that I am investigating some more) -- From hategan at mcs.anl.gov Fri May 23 12:47:33 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 May 2008 12:47:33 -0500 Subject: [Swift-devel] Re: coaster->tg-ncsa In-Reply-To: References: <1209522838.702.8.camel@localhost> <1209523861.1278.0.camel@localhost> Message-ID: <1211564853.23477.0.camel@localhost> On Fri, 2008-05-23 at 16:50 +0000, Ben Clifford wrote: > On Tue, 29 Apr 2008, Mihael Hategan wrote: > > > Odd. Attributes should be copied from the original task. I'll look into > > that. > > They almost were - a prototype task specification gets passed all the way > into WorkerManager.buildSpecification, at which point it is promptly > entirely ignored. Well, no: t.setSpecification(buildSpecification(sid, maxWallTime, prototype)); copyAttributes(t, prototype); > > In CoG r2024, I added a loop there to copy attributes and now I can submit > to TG sites with the project profile entry correctly propagated, using > this jobManager="gt2:gt2:pbs" > > Perhaps it should also copy environment variables - I haven't thought > about that much. > > (I also tried jobManager="gt4:pbs" but I get a null pointer exception > there still that I am investigating some more) > From hategan at mcs.anl.gov Fri May 23 12:49:23 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 May 2008 12:49:23 -0500 Subject: [Swift-devel] Re: coaster->tg-ncsa In-Reply-To: <1211564853.23477.0.camel@localhost> References: <1209522838.702.8.camel@localhost> <1209523861.1278.0.camel@localhost> <1211564853.23477.0.camel@localhost> Message-ID: <1211564963.23477.3.camel@localhost> On Fri, 2008-05-23 at 12:47 -0500, Mihael Hategan wrote: > On Fri, 2008-05-23 at 16:50 +0000, Ben Clifford wrote: > > On Tue, 29 Apr 2008, Mihael Hategan wrote: > > > > > Odd. Attributes should be copied from the original task. I'll look into > > > that. > > > > They almost were - a prototype task specification gets passed all the way > > into WorkerManager.buildSpecification, at which point it is promptly > > entirely ignored. > > Well, no: > t.setSpecification(buildSpecification(sid, maxWallTime, prototype)); > copyAttributes(t, prototype); #@$! That copies task attributes not specification attributes. I'll fix that, though it's equivalent to what you did. > > > > > In CoG r2024, I added a loop there to copy attributes and now I can submit > > to TG sites with the project profile entry correctly propagated, using > > this jobManager="gt2:gt2:pbs" > > > > Perhaps it should also copy environment variables - I haven't thought > > about that much. > > > > (I also tried jobManager="gt4:pbs" but I get a null pointer exception > > there still that I am investigating some more) > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Fri May 23 12:50:58 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 17:50:58 +0000 (GMT) Subject: [Swift-devel] Re: coaster->tg-ncsa In-Reply-To: <1211564853.23477.0.camel@localhost> References: <1209522838.702.8.camel@localhost> <1209523861.1278.0.camel@localhost> <1211564853.23477.0.camel@localhost> Message-ID: On Fri, 23 May 2008, Mihael Hategan wrote: > Well, no: > t.setSpecification(buildSpecification(sid, maxWallTime, prototype)); > copyAttributes(t, prototype); I see that now. It doesn't copy the attributes from the embedded JobSpecification to the new JobSpecification - it copies Task attributes, which appear to be different. -- From hategan at mcs.anl.gov Fri May 23 12:57:28 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 May 2008 12:57:28 -0500 Subject: [Swift-devel] Re: coaster->tg-ncsa In-Reply-To: References: <1209522838.702.8.camel@localhost> <1209523861.1278.0.camel@localhost> <1211564853.23477.0.camel@localhost> Message-ID: <1211565448.23477.6.camel@localhost> On Fri, 2008-05-23 at 17:50 +0000, Ben Clifford wrote: > On Fri, 23 May 2008, Mihael Hategan wrote: > > > Well, no: > > t.setSpecification(buildSpecification(sid, maxWallTime, prototype)); > > copyAttributes(t, prototype); > > I see that now. > > It doesn't copy the attributes from the embedded JobSpecification to the > new JobSpecification - it copies Task attributes, which appear to be > different. And mostly useless. There is no particular semantics associated with them besides the "stdoud" and "stderr" attributes, which could (and probably should) be class properties. > From lixi at uchicago.edu Fri May 23 12:59:33 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Fri, 23 May 2008 12:59:33 -0500 (CDT) Subject: [Swift-devel] sites.xml Message-ID: <20080523125933.BAM76361@m4500-03.uchicago.edu> Hi, In the latest version, I notice that there are different site items in sites.xml: For localhost: /var/tmp 0 For OSG sites: /u/ac/tstef There are some difference between them. When configuring OSG sites, which one is preferred? Or both are acceptable. Thanks, Xi From hategan at mcs.anl.gov Fri May 23 13:04:14 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 May 2008 13:04:14 -0500 Subject: [Swift-devel] Re: sites.xml In-Reply-To: <20080523125933.BAM76361@m4500-03.uchicago.edu> References: <20080523125933.BAM76361@m4500-03.uchicago.edu> Message-ID: <1211565854.23477.10.camel@localhost> On Fri, 2008-05-23 at 12:59 -0500, lixi at uchicago.edu wrote: > Hi, > > In the latest version, I notice that there are different > site items in sites.xml: > > For localhost: > > > > /var/tmp > key="jobThrottle">0 > > > For OSG sites: > > > > /u/ac/tstef > > > There are some difference between them. When configuring OSG > sites, which one is preferred? Or both are acceptable. If you're referring to execution vs. jobamanager, then there isn't much difference besides that execution allows specifying job managers in a portable fashion across different providers. For example the standard pre-WS-GRAM way is appending /jobmanager- to the URL, while in WS-GRAM it's... different. I forget exactly how. > > Thanks, > > Xi From hategan at mcs.anl.gov Fri May 23 13:04:51 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 May 2008 13:04:51 -0500 Subject: [Swift-devel] Re: coaster->tg-ncsa In-Reply-To: <1211564963.23477.3.camel@localhost> References: <1209522838.702.8.camel@localhost> <1209523861.1278.0.camel@localhost> <1211564853.23477.0.camel@localhost> <1211564963.23477.3.camel@localhost> Message-ID: <1211565891.24065.0.camel@localhost> On Fri, 2008-05-23 at 12:49 -0500, Mihael Hategan wrote: > On Fri, 2008-05-23 at 12:47 -0500, Mihael Hategan wrote: > > On Fri, 2008-05-23 at 16:50 +0000, Ben Clifford wrote: > > > On Tue, 29 Apr 2008, Mihael Hategan wrote: > > > > > > > Odd. Attributes should be copied from the original task. I'll look into > > > > that. > > > > > > They almost were - a prototype task specification gets passed all the way > > > into WorkerManager.buildSpecification, at which point it is promptly > > > entirely ignored. > > > > Well, no: > > t.setSpecification(buildSpecification(sid, maxWallTime, prototype)); > > copyAttributes(t, prototype); > > #@$! > That copies task attributes not specification attributes. I'll fix that, > though it's equivalent to what you did. r2025. > > > > > > > > > In CoG r2024, I added a loop there to copy attributes and now I can submit > > > to TG sites with the project profile entry correctly propagated, using > > > this jobManager="gt2:gt2:pbs" > > > > > > Perhaps it should also copy environment variables - I haven't thought > > > about that much. > > > > > > (I also tried jobManager="gt4:pbs" but I get a null pointer exception > > > there still that I am investigating some more) > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Fri May 23 17:05:37 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 22:05:37 +0000 (GMT) Subject: [Swift-devel] concurrent mapper and restart Message-ID: For simple tests, files mapped through the concurrent mapper do get handled apparently correctly by the present filename based restart mechanism. This is contradictory to bug 107 comment 5: > This fixes the latest problem, but will not recognize as done variables > mapped by the concurrent mapper. which I interpret to mean that concurrently mapped values will be recomputed unnecessarily after a restart [thus leading to inefficiency (perhaps to the extent that the workflow can never finish in a real failure-prone environment)] However, I'm more worried that different restarts of a workflow will have files mapped differently, such that sometimes a file will be mapped to a filename that was previously used for a different file in an earlier restart (or initial run). That would lead to a situation where workflows might appear to complete, but would actually be jumbling up intermediate datafiles and delivering incorrect output results, which is extremely bad. I haven't tried this to see if I can make it happen; nor am I sure I can (I think its probably very sensitive to the way in which restarts interact with foreach loops to create threads - if a restart causes threads to be created in a different order in a foreach loop, then I think this problem exists). If this really is a problem, there are two approaches to avoiding this more serious problem that spring to mind: i) make concurrent filenames different each restart (with a per-restart rather than per-kml-compilation unique identifier); this would change the problem to the efficiency-reducing problem - unpleasant but not producing incorrect results. ii) the same lexical/runtime scope ID stuff that I talked about yesterday for identifying variables might apply here. Instead of using a karajan thread identifier on the end of a concurrent variable which is potentially random, use a SwiftScript level equivalent - the SwiftScript level scope identifiers that I talked about yesterday that I think are recreatable no matter the order in which Karajan evaluates things. That would give between-run repeatable mappings. Which in turn would mean filename based restarts are perhaps not so bad anymore in general. -- From iraicu at cs.uchicago.edu Sun May 25 08:08:08 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 25 May 2008 08:08:08 -0500 Subject: [Swift-devel] [Fwd: Falkon v1.0 release] Message-ID: <483964B8.2010402@cs.uchicago.edu> Hi Swift community, Just wanted to pass announcement on Falkon along. Cheers, Ioan -------------- next part -------------- An embedded message was scrubbed... From: Ioan Raicu Subject: Falkon v1.0 release Date: Sun, 25 May 2008 08:06:35 -0500 Size: 8771 URL: From benc at hawaga.org.uk Sun May 25 18:31:03 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 25 May 2008 23:31:03 +0000 (GMT) Subject: [Swift-devel] coasters and CAs Message-ID: In an attempt to get some automated testing of the coaster code, I made my own CA, generated a passwordless credential for it. I set X509_CERT_DIR to point to a directory with my new CA in it. I set X509_USER_CERT and X509_USER_KEY to point to those, but that credential didn't get picked up. (problem 1) So I did a grid-proxy-init (which doesn't need a password) and set X509_USER_PROXY to that. Running coaster to the local site (test/sites/coaster/coaster-local.xml) this runs OK if the CA cert is in the default CA directory (~benc/.globus/certificates in my case). However, it looks like if the CA is not in the default CA directory, it is not picked up by the coaster service from the setting of X509_CERT_DIR. Running tests/misc/coaster.sh should demonstrate that it works with the CA files that are in tests/misc/coaster-security/ are put in the default CA directory, but not otherwise. This might be a problem for sites where CAs are stored in non-default locations - the service side should probably pick up the cert dir from the environment on the service side. -- From bugzilla-daemon at mcs.anl.gov Sun May 25 19:02:34 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 25 May 2008 19:02:34 -0500 (CDT) Subject: [Swift-devel] [Bug 142] New: concurrent mapper does not work when used inside iterate {} block Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=142 Summary: concurrent mapper does not work when used inside iterate {} block Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk CC: swift-devel at ci.uchicago.edu concurrent mapper does not work when used inside an iterate {} block. This happens: Execution failed: java.lang.Integer Caused by: java.lang.ClassCastException: java.lang.Integer at org.griphyn.vdl.karajan.lib.VDLFunction.getThreadPrefix(VDLFunction.java:117) at org.griphyn.vdl.karajan.lib.New.function(New.java:65) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:65) getThreadPrefix expects the $ variable in a karajan stack frame, if defined, to be a list where the first element is an iteration number; that is not the case with iterate. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon May 26 03:54:48 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 26 May 2008 03:54:48 -0500 (CDT) Subject: [Swift-devel] [Bug 143] New: scalability limitation in concurrent mapper used for local variables with GPFS-like site-shared filesystem Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=143 Summary: scalability limitation in concurrent mapper used for local variables with GPFS-like site-shared filesystem Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk CC: swift-devel at ci.uchicago.edu The concurrent mapper has a GPFS-friendly scalability feature for keeping large numbers of files out of the same directory, based on splitting up array indices into a multiple tier directory tree. However, it does not do this for the thread identifier. I think this means that an array of intermediate files declared outside of a large foreach will exhibit this scalability property, whilst an intermediate variable declared inside the loop will have all of its files placed in one directory. This is probably a scalability limitation on GPFS when local variables are used in loops which will manifest as poor performance in such circumstances. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon May 26 04:11:06 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 26 May 2008 04:11:06 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20080526091106.B74ED164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|blocker |normal -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From hategan at mcs.anl.gov Mon May 26 12:04:32 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 26 May 2008 12:04:32 -0500 Subject: [Swift-devel] Re: coasters and CAs In-Reply-To: References: Message-ID: <1211821472.3593.9.camel@localhost> On Sun, 2008-05-25 at 23:31 +0000, Ben Clifford wrote: > In an attempt to get some automated testing of the coaster code, I made my > own CA, generated a passwordless credential for it. > > I set X509_CERT_DIR to point to a directory with my new CA in it. > > I set X509_USER_CERT and X509_USER_KEY to point to those, but that > credential didn't get picked up. (problem 1) Those are only used by grid-proxy-*. The "client" tools only use the proxy. > > So I did a grid-proxy-init (which doesn't need a password) and set > X509_USER_PROXY to that. > > Running coaster to the local site (test/sites/coaster/coaster-local.xml) > this runs OK if the CA cert is in the default CA directory > (~benc/.globus/certificates in my case). However, it looks like if the CA > is not in the default CA directory, it is not picked up by the coaster > service from the setting of X509_CERT_DIR. It's normal. Your local X509_CERT_DIR should not apply to the "remote" site. If you want that to be set, stick it as remote env variable in sites.xml or so. > > Running tests/misc/coaster.sh should demonstrate that it works with the CA > files that are in tests/misc/coaster-security/ are put in the default CA > directory, but not otherwise. > > This might be a problem for sites where CAs are stored in non-default > locations - the service side should probably pick up the cert dir from the > environment on the service side. > From benc at hawaga.org.uk Tue May 27 05:45:12 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 May 2008 10:45:12 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] what is wrong with restart5.swift? In-Reply-To: <1211820916.3593.2.camel@localhost> References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> <1211820916.3593.2.camel@localhost> Message-ID: (moved to swift-devel) On Mon, 26 May 2008, Mihael Hategan wrote: > > I also think that is(was?) the only correctness problem with restarts at > > the moment. > > There was also the ability to persuade a mapper (in particular temp > mappers) to use specific files instead of something of their choice. If by temporary mappers you mean the concurrent mapper, then there is not a correctness problem there at the moment, I think. There is a problem (I think) with input mappers, where if the input file set is changed between restarts then mappings will be different. An example of this (which I have seen a real user do) is when a workflow outputs files that match an input file specification - subsequent runs then had more input files, and in some cases those would be mapped differently. That is a fairly broad problem, though. Forcing mapping of some subset of a mappers files would not address this; forcing all mapping to be enumerated and stored ready for restarts would. But that changes how mappers fit in (for example, not letting them invent mappings dynamically). -- From benc at hawaga.org.uk Tue May 27 06:12:59 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 May 2008 11:12:59 +0000 (GMT) Subject: [Swift-devel] Re: coasters and CAs In-Reply-To: <1211821472.3593.9.camel@localhost> References: <1211821472.3593.9.camel@localhost> Message-ID: On Mon, 26 May 2008, Mihael Hategan wrote: > > Running coaster to the local site (test/sites/coaster/coaster-local.xml) > > this runs OK if the CA cert is in the default CA directory > > (~benc/.globus/certificates in my case). However, it looks like if the CA > > is not in the default CA directory, it is not picked up by the coaster > > service from the setting of X509_CERT_DIR. > > It's normal. Your local X509_CERT_DIR should not apply to the "remote" > site. If you want that to be set, stick it as remote env variable in > sites.xml or so. In general it shouldn't. In the specific case of provider-local, though, the "remote site" environment is configured by the remote site sysadmin by setting variables in the submit side environment (rather than for example in the case of GRAM2 setting them in /etc/xinetd.d/). That is the use of X509_CERT_DIR that I am making here. And indeed X509_CERT_DIR *is* passed through when using provider-local. Setting it in the sites.xml file also causes it to be set when using provider-local (though in that case it appears to obliterate the containing environment entirely too). But the same missing CA problem happens anyway. Pretty much I think this should behave the same as GLOBUS_TCP_PORT_RANGE which was discussed here: http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-April/002982.html -- From hategan at mcs.anl.gov Tue May 27 11:50:29 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 May 2008 11:50:29 -0500 Subject: [Swift-devel] Re: coasters and CAs In-Reply-To: References: <1211821472.3593.9.camel@localhost> Message-ID: <1211907029.4946.0.camel@localhost> On Tue, 2008-05-27 at 11:12 +0000, Ben Clifford wrote: > On Mon, 26 May 2008, Mihael Hategan wrote: > > > > Running coaster to the local site (test/sites/coaster/coaster-local.xml) > > > this runs OK if the CA cert is in the default CA directory > > > (~benc/.globus/certificates in my case). However, it looks like if the CA > > > is not in the default CA directory, it is not picked up by the coaster > > > service from the setting of X509_CERT_DIR. > > > > It's normal. Your local X509_CERT_DIR should not apply to the "remote" > > site. If you want that to be set, stick it as remote env variable in > > sites.xml or so. > > In general it shouldn't. > > In the specific case of provider-local, though, the "remote site" > environment is configured by the remote site sysadmin by setting variables > in the submit side environment (rather than for example in the case of > GRAM2 setting them in /etc/xinetd.d/). That is the use of X509_CERT_DIR > that I am making here. Ah, fair point. > > And indeed X509_CERT_DIR *is* passed through when using provider-local. > > Setting it in the sites.xml file also causes it to be set when using > provider-local (though in that case it appears to obliterate the > containing environment entirely too). But the same missing CA problem > happens anyway. > > Pretty much I think this should behave the same as GLOBUS_TCP_PORT_RANGE > which was discussed here: > > http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-April/002982.html > From hategan at mcs.anl.gov Tue May 27 14:09:42 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 May 2008 14:09:42 -0500 Subject: [Swift-devel] Re: coasters and CAs In-Reply-To: <1211907029.4946.0.camel@localhost> References: <1211821472.3593.9.camel@localhost> <1211907029.4946.0.camel@localhost> Message-ID: <1211915382.25463.0.camel@localhost> On Tue, 2008-05-27 at 11:50 -0500, Mihael Hategan wrote: > > > > In the specific case of provider-local, though, the "remote site" > > environment is configured by the remote site sysadmin by setting variables > > in the submit side environment (rather than for example in the case of > > GRAM2 setting them in /etc/xinetd.d/). That is the use of X509_CERT_DIR > > that I am making here. > > Ah, fair point. Should be fixed in cog r2030. From benc at hawaga.org.uk Tue May 27 14:25:21 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 May 2008 19:25:21 +0000 (GMT) Subject: [Swift-devel] Re: coasters and CAs In-Reply-To: <1211915382.25463.0.camel@localhost> References: <1211821472.3593.9.camel@localhost> <1211907029.4946.0.camel@localhost> <1211915382.25463.0.camel@localhost> Message-ID: On Tue, 27 May 2008, Mihael Hategan wrote: > Should be fixed in cog r2030. I just committed some changes I had in my repo; as of Swift r2031, tests/misc/coaster.sh successfully runs a local coaster test (assuming that Swift has been built -Dwith-provider-coaster=true). That should now get tested daily by the NMI build and test system (the builds labelled Swift + coasters on http://nmi-s005.cs.wisc.edu/nmi/index.php?page=results%2Foverview&rows=20&opt_keyword=&opt_project=swift&opt_user=OPTION_SHOW_ALL&opt_comp=OPTION_SHOW_ALL&opt_type=OPTION_SHOW_ALL&opt_result=OPTION_SHOW_ALL&opt_platform=OPTION_SHOW_ALL&opt_month=0&opt_day=0&opt_year=0&opt_build_id=&opt_submit=OPTION_SHOW_ALL&searchSubmit=Search&page=results%2Foverview&rows=20) Also, Swift just overtook CoG in SVN revision number space. -- From hategan at mcs.anl.gov Tue May 27 14:30:43 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 May 2008 14:30:43 -0500 Subject: [Swift-devel] Re: coasters and CAs In-Reply-To: References: <1211821472.3593.9.camel@localhost> <1211907029.4946.0.camel@localhost> <1211915382.25463.0.camel@localhost> Message-ID: <1211916643.26140.0.camel@localhost> On Tue, 2008-05-27 at 19:25 +0000, Ben Clifford wrote: > On Tue, 27 May 2008, Mihael Hategan wrote: > > > Should be fixed in cog r2030. > > I just committed some changes I had in my repo; as of Swift r2031, > tests/misc/coaster.sh successfully runs a local coaster test (assuming > that Swift has been built -Dwith-provider-coaster=true). > > That should now get tested daily by the NMI build and test system (the > builds labelled Swift + coasters on > http://nmi-s005.cs.wisc.edu/nmi/index.php?page=results%2Foverview&rows=20&opt_keyword=&opt_project=swift&opt_user=OPTION_SHOW_ALL&opt_comp=OPTION_SHOW_ALL&opt_type=OPTION_SHOW_ALL&opt_result=OPTION_SHOW_ALL&opt_platform=OPTION_SHOW_ALL&opt_month=0&opt_day=0&opt_year=0&opt_build_id=&opt_submit=OPTION_SHOW_ALL&searchSubmit=Search&page=results%2Foverview&rows=20) > > Also, Swift just overtook CoG in SVN revision number space. A shallow victory I say. Cog has lived in two CVS repositories before, to both of which numerous commits have been made. :) > From benc at hawaga.org.uk Tue May 27 15:47:00 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 May 2008 20:47:00 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] what is wrong with restart5.swift? In-Reply-To: References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> <1211820916.3593.2.camel@localhost> Message-ID: The restart stuff manages to continue to draw my attention. If you assume that VDLFunction.getThreadPrefix returns a string that has the desirable scope identification properties that I talked about if run in the same place, then: Do this in New to give each dataset its scoped identifier mapping.put("swift#restartid", getThreadPrefix(stack) + ":" + dbgname) and replace the (pre-r1985) use of dbgname with swift#restartid. That gives scoped variable-based restarts (rather than file based) that appear to work. -- From hategan at mcs.anl.gov Tue May 27 15:59:27 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 May 2008 15:59:27 -0500 Subject: [Swift-devel] Re: [Swift-user] what is wrong with restart5.swift? In-Reply-To: References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> <1211820916.3593.2.camel@localhost> Message-ID: <1211921967.27675.2.camel@localhost> On Tue, 2008-05-27 at 20:47 +0000, Ben Clifford wrote: > The restart stuff manages to continue to draw my attention. > > If you assume that VDLFunction.getThreadPrefix returns a string that has > the desirable scope identification properties that I talked about if run > in the same place, then: > > Do this in New to give each dataset its scoped identifier > > mapping.put("swift#restartid", getThreadPrefix(stack) + ":" + dbgname) > > and replace the (pre-r1985) use of dbgname with swift#restartid. Right. We should deprecate "dbgname" and should have an actual name there. And it should not be used for anything but the name. > > That gives scoped variable-based restarts (rather than file based) that > appear to work. > The concurrent mapper still uses the run id as part of the path in which it puts files. That has yet to be addressed. From benc at hawaga.org.uk Tue May 27 16:07:56 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 May 2008 21:07:56 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] what is wrong with restart5.swift? In-Reply-To: <1211921967.27675.2.camel@localhost> References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> <1211820916.3593.2.camel@localhost> <1211921967.27675.2.camel@localhost> Message-ID: On Tue, 27 May 2008, Mihael Hategan wrote: > The concurrent mapper still uses the run id as part of the path in which > it puts files. That has yet to be addressed. It uses an ID that is put in at compile time. Restarting works with the concurrent mapper unless you've touched the source file (in which case, it seems reasonable (to me) that restarts are ignored, and probably should be more enforced, irrespective of whether the concurrent mapper is used or not) -- From benc at hawaga.org.uk Tue May 27 17:25:35 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 May 2008 22:25:35 +0000 (GMT) Subject: [Swift-devel] UnresolvedType Message-ID: There is a class, UnresolvedType, that appears to be used only as a parent class to TypeImpl (which all serves to implement the Type interface). I'd guess that comes from some earlier interest in having the type system more abstracted than it actually is; but that is from before I worked on this code. I'd be interested to know what the intention was. -- From hategan at mcs.anl.gov Tue May 27 17:38:43 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 May 2008 17:38:43 -0500 Subject: [Swift-devel] UnresolvedType In-Reply-To: References: Message-ID: <1211927923.28975.13.camel@localhost> On Tue, 2008-05-27 at 22:25 +0000, Ben Clifford wrote: > There is a class, UnresolvedType, that appears to be used only as a parent > class to TypeImpl (which all serves to implement the Type interface). > > I'd guess that comes from some earlier interest in having the type system > more abstracted than it actually is; but that is from before I worked on > this code. > > I'd be interested to know what the intention was. I think it's used as a placeholder before full type resolution is made. The point was that types may not be built in the exact order of the hierarchy, so an extra step of resolving unresolved types needs to be performed after all types are build. I.e. a two-pass process. > From benc at hawaga.org.uk Tue May 27 17:46:24 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 May 2008 22:46:24 +0000 (GMT) Subject: [Swift-devel] external datasets Message-ID: Someone (that I don't remember - probably mike (wilde) or mihael) requested (in person) a way of expressing dependencies between procedures that doesn't involve making a dummy data file. I have a prototype implementation that lets you do this. An example is below. There is a new type called "external". Variables of "external" type are not mapped to any file, but they participate in dependency ordering with an external value acting as if it has been assigned when it is returned from a procedure. Running the below code will cause echo to run, and then ls to run only when echo has finished. The name "external" comes from a conceptualisation of this as representing some data that is stored externally to Swift - instead of being mapped with a mapping expression, with Swift handling stage-in and stage-out etc, the data is external and its up to your apps to deal with access to whatever the data is. Be cautious using this in the presence of retries, replication, and restarts (the 3 Rs?) - Swift won't provide any of the handling for external data that it provides for output files. (external db) first() { app { echo; } } second(external db) { app { ls; } } external o; second(o); o=first(); -- From hategan at mcs.anl.gov Tue May 27 17:49:34 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 May 2008 17:49:34 -0500 Subject: [Swift-devel] external datasets In-Reply-To: References: Message-ID: <1211928574.28975.16.camel@localhost> On Tue, 2008-05-27 at 22:46 +0000, Ben Clifford wrote: > Be cautious using this in the presence of retries, replication, and > restarts (the 3 Rs?) - Swift won't provide any of the handling for > external data that it provides for output files. It should theoretically be able to handle restarts properly, no? From benc at hawaga.org.uk Tue May 27 18:08:27 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 May 2008 23:08:27 +0000 (GMT) Subject: [Swift-devel] external datasets In-Reply-To: <1211928574.28975.16.camel@localhost> References: <1211928574.28975.16.camel@localhost> Message-ID: On Tue, 27 May 2008, Mihael Hategan wrote: > It should theoretically be able to handle restarts properly, no? With a variable-based restart mechanism, I think it will remember that the variable has been closed. With filename-based restarts, I think no. Perhaps restarting might bring up the same kind of problems as replication and retries - if some data processing got half-way through and then abandoned/broken/whatever, the various things Swift does to make application execution appear atomic wrt application output files won't happen here. -- From hategan at mcs.anl.gov Tue May 27 18:10:54 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 May 2008 18:10:54 -0500 Subject: [Swift-devel] external datasets In-Reply-To: References: <1211928574.28975.16.camel@localhost> Message-ID: <1211929854.30255.1.camel@localhost> On Tue, 2008-05-27 at 23:08 +0000, Ben Clifford wrote: > On Tue, 27 May 2008, Mihael Hategan wrote: > > > It should theoretically be able to handle restarts properly, no? > > With a variable-based restart mechanism, I think it will remember that the > variable has been closed. With filename-based restarts, I think no. Right. > > Perhaps restarting might bring up the same kind of problems as replication > and retries - if some data processing got half-way through and then > abandoned/broken/whatever, the various things Swift does to make > application execution appear atomic wrt application output files won't > happen here. > Also right. All we can do there is allow user-defined "compensation handlers". From lixi at uchicago.edu Wed May 28 12:18:27 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Wed, 28 May 2008 12:18:27 -0500 (CDT) Subject: [Swift-devel] Swift finished with errors Message-ID: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Hi, I just ran a simple workflow on multiple OSG sites. But it failed with errors several times and the command line output the similar errors: ... node failed Execution failed: Failed to link input file _concurrent/intermediatefile-9c469a2f-4d9f-47a0-a660- eb1634b97559- According to the log file, it seemed that this failed job was submitted to site "UCSDT2". However, both data transfer and globus job execution could be done successfully on this site. It seems very strange for me. The log file is on CI host: /home/lixi/newswift/latest/score/100/workflowtest- 20080528-1136-qovzbq70.log Could you please help me to find out the reason and solution? Thanks a lot! Xi From benc at hawaga.org.uk Wed May 28 17:54:11 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 28 May 2008 22:54:11 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <20080528121827.BAQ66049@m4500-03.uchicago.edu> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Message-ID: can you run examples/vdsk/first.swift on that site UCSDT2 (using the same sites.xml entry as you used for this workflow, without the other 7 sites in it) ? -- From lixi at uchicago.edu Wed May 28 18:31:39 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Wed, 28 May 2008 18:31:39 -0500 (CDT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors Message-ID: <20080528183139.BAR28364@m4500-03.uchicago.edu> Yes, it can do that. But I think that first.swift doesn't produce any intermediate file. Thanks, Xi ---- Original message ---- >Date: Wed, 28 May 2008 22:54:11 +0000 (GMT) >From: Ben Clifford >Subject: Re: [Swift-user] Swift finished with errors >To: lixi at uchicago.edu >Cc: swift-user at ci.uchicago.edu, swift-devel at ci.uchicago.edu > >can you run examples/vdsk/first.swift on that site UCSDT2 (using the same >sites.xml entry as you used for this workflow, without the other 7 sites >in it) ? > >-- > > From benc at hawaga.org.uk Wed May 28 19:35:11 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 00:35:11 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <20080528183139.BAR28364@m4500-03.uchicago.edu> References: <20080528183139.BAR28364@m4500-03.uchicago.edu> Message-ID: On Wed, 28 May 2008, lixi at uchicago.edu wrote: > Yes, it can do that. But I think that first.swift doesn't > produce any intermediate file. There's a test, tests/language-behaviour/062-two-in-a-row.swift that has an intermediate file, if you specifically want to test that (as of swift svn r2000) -- From benc at hawaga.org.uk Wed May 28 19:53:02 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 00:53:02 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Message-ID: are you running this with replication enabled? if so, that's broken. don't use it. -- From benc at hawaga.org.uk Wed May 28 20:36:31 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 01:36:31 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Message-ID: I get a similar error running the site tests (tests/sites/run-site) using your site definition. -- From hategan at mcs.anl.gov Wed May 28 20:45:35 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 28 May 2008 20:45:35 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Message-ID: <1212025535.11944.0.camel@localhost> On Thu, 2008-05-29 at 00:53 +0000, Ben Clifford wrote: > are you running this with replication enabled? if so, that's broken. don't > use it. How so? From benc at hawaga.org.uk Wed May 28 20:54:16 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 01:54:16 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212025535.11944.0.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> Message-ID: On Wed, 28 May 2008, Mihael Hategan wrote: > On Thu, 2008-05-29 at 00:53 +0000, Ben Clifford wrote: > > are you running this with replication enabled? if so, that's broken. don't > > use it. > > How so? this: http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-May/003140.html is an artificial reconstruction of a problem that Xi seemed to encounter in real life, where jobs seem to get overreplicated. -- From hategan at mcs.anl.gov Wed May 28 21:14:56 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 28 May 2008 21:14:56 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> Message-ID: <1212027296.12318.2.camel@localhost> On Thu, 2008-05-29 at 01:54 +0000, Ben Clifford wrote: > On Wed, 28 May 2008, Mihael Hategan wrote: > > > On Thu, 2008-05-29 at 00:53 +0000, Ben Clifford wrote: > > > are you running this with replication enabled? if so, that's broken. don't > > > use it. > > > > How so? > > this: > > http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-May/003140.html I had a feeling that r2004 should have solved that, but I see that it might not. > > is an artificial reconstruction of a problem that Xi seemed to encounter > in real life, where jobs seem to get overreplicated. > From benc at hawaga.org.uk Wed May 28 21:23:37 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 02:23:37 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212027296.12318.2.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> Message-ID: On Wed, 28 May 2008, Mihael Hategan wrote: > I had a feeling that r2004 should have solved that, but I see that it > might not. It doesn't work for me in a run I just tried. -- From hategan at mcs.anl.gov Wed May 28 22:06:34 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 28 May 2008 22:06:34 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> Message-ID: <1212030394.13449.0.camel@localhost> On Thu, 2008-05-29 at 02:23 +0000, Ben Clifford wrote: > On Wed, 28 May 2008, Mihael Hategan wrote: > > > I had a feeling that r2004 should have solved that, but I see that it > > might not. > > It doesn't work for me in a run I just tried. Ok. Good to know. > From hategan at mcs.anl.gov Thu May 29 10:37:20 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 10:37:20 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212030394.13449.0.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> Message-ID: <1212075440.17456.1.camel@localhost> On Wed, 2008-05-28 at 22:06 -0500, Mihael Hategan wrote: > On Thu, 2008-05-29 at 02:23 +0000, Ben Clifford wrote: > > On Wed, 28 May 2008, Mihael Hategan wrote: > > > > > I had a feeling that r2004 should have solved that, but I see that it > > > might not. > > > > It doesn't work for me in a run I just tried. > > Ok. Good to know. ... though I can't seem to be able to reproduce it: mike at blabla language-behaviour$ swift 062-two-in-a-row.swift Swift svn swift-r2014 (Swift modified locally) cog-r656 (CoG modified locally) RunID: 20080529-1012-7s20nbka Progress: echo started Submitting wonky job 0 Wonky job in queue, job number 0 not sleeping - this is the first job, 0 Wonky job running now Wonky job completed with exitCode 0 Wonky job status COMPLETED 0 echo completed echo started Submitting wonky job 1 Wonky job in queue, job number 1 Progress: Selecting site:1 Finished successfully:1 Submitting wonky job 2 Wonky job in queue, job number 2 Progress: Selecting site:1 Finished successfully:1 Submitting wonky job 3 Wonky job in queue, job number 3 Progress: Executing:1 Finished successfully:1 Progress: Executing:1 Finished successfully:1 Wonky job running now Cancel called on wonky job 2 Cancel called on wonky job 3 Wonky job completed with exitCode 0 Wonky job status COMPLETED 1 echo completed Final status: Finished successfully:2 Submitting wonky job 4 Wonky job in queue, job number 4 mike at blabla language-behaviour$ I'll try a clean checkout. > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Thu May 29 11:20:03 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 16:20:03 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212075440.17456.1.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> Message-ID: On Thu, 29 May 2008, Mihael Hategan wrote: > ... though I can't seem to be able to reproduce it: I just did a clean checkout on communicado.ci.uchicago.edu, which is a very different environment from my os x laptop where I had the problem before, and I see the below. The build is in ~benc/t1/cog/modules/vdsk/dist/vdsk-svn/ $ swift 062-two-in-a-row.swift Swift svn swift-r2036 (Swift modified locally) cog-r2030 RunID: 20080529-1111-42erjh3d Progress: echo started Submitting wonky job 0 Wonky job in queue, job number 0 not sleeping - this is the first job, 0 Wonky job running now Wonky job completed with exitCode 0 Wonky job status COMPLETED 0 echo completed echo started Submitting wonky job 1 Wonky job in queue, job number 1 Progress: Selecting site:1 Finished successfully:1 Submitting wonky job 2 Wonky job in queue, job number 2 Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Wonky job running now Cancel called on wonky job 2 Wonky job completed with exitCode 0 Wonky job status COMPLETED 1 Submitting wonky job 3 Wonky job in queue, job number 3 Wonky job running now Progress: Executing:1 Finished successfully:1 Progress: Executing:1 Finished successfully:1 Wonky job running now Wonky job completed with exitCode 0 Wonky job status COMPLETED 3 Submitting wonky job 4 Wonky job in queue, job number 4 Progress: Selecting site:1 Finished successfully:1 Submitting wonky job 5 Wonky job in queue, job number 5 Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Wonky job running now Cancel called on wonky job 5 Wonky job completed with exitCode 0 Wonky job status COMPLETED 4 Submitting wonky job 6 Wonky job in queue, job number 6 Submitting wonky job 7 Wonky job in queue, job number 7 Wonky job running now Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Progress: Selecting site:1 Finished successfully:1 Wonky job running now Wonky job running now Wonky job completed with exitCode 0 Wonky job status COMPLETED 6 Wonky job completed with exitCode 0 Wonky job status COMPLETED 7 Submitting wonky job 8 Wonky job in queue, job number 8 echo failed Execution failed: Multiple mappings pointing to the same file (localhost:062-two-in-a-row-20080529-1111-42erjh3d/shared/062-two-in-a-row.b.out) detected. -- From hategan at mcs.anl.gov Thu May 29 11:20:58 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 11:20:58 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212075440.17456.1.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> Message-ID: <1212078058.18636.0.camel@localhost> On Thu, 2008-05-29 at 10:37 -0500, Mihael Hategan wrote: > > I'll try a clean checkout. Indeed. Breaks with a clean checkout. > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Thu May 29 11:28:00 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 11:28:00 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212078058.18636.0.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> Message-ID: <1212078480.18957.3.camel@localhost> On Thu, 2008-05-29 at 11:20 -0500, Mihael Hategan wrote: > On Thu, 2008-05-29 at 10:37 -0500, Mihael Hategan wrote: > > > > I'll try a clean checkout. > > Indeed. Breaks with a clean checkout. However, doing the following on sites.xml 6d5 < 0 seems to magically make it work. > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Thu May 29 12:15:13 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 12:15:13 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212078480.18957.3.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> Message-ID: <1212081313.19911.0.camel@localhost> On Thu, 2008-05-29 at 11:28 -0500, Mihael Hategan wrote: > On Thu, 2008-05-29 at 11:20 -0500, Mihael Hategan wrote: > > On Thu, 2008-05-29 at 10:37 -0500, Mihael Hategan wrote: > > > > > > I'll try a clean checkout. > > > > Indeed. Breaks with a clean checkout. > > However, doing the following on sites.xml > > 6d5 > < 0 Though that also seems to beg the question: why is that a default? > > seems to magically make it work. > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Thu May 29 12:24:59 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 17:24:59 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212081313.19911.0.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> Message-ID: On Thu, 29 May 2008, Mihael Hategan wrote: > < 0 > Though that also seems to beg the question: why is that a default? That line is a default on the local site because local sites tend to run with provider local onto ~2 CPU cores with no LRM management of load. Other site definitions (in the default sites.xml file and in for example tests/sites/) don't use that setting, so that they pick up the usually-appropriate-for-GRAM2 setting of 0.2 from swift.properties. -- From hategan at mcs.anl.gov Thu May 29 12:30:05 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 12:30:05 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> Message-ID: <1212082205.20283.1.camel@localhost> On Thu, 2008-05-29 at 17:24 +0000, Ben Clifford wrote: > On Thu, 29 May 2008, Mihael Hategan wrote: > > > < 0 > > > Though that also seems to beg the question: why is that a default? > > That line is a default on the local site because local sites tend to run > with provider local onto ~2 CPU cores with no LRM management of load. > > Other site definitions (in the default sites.xml file and in for example > tests/sites/) don't use that setting, so that they pick up the > usually-appropriate-for-GRAM2 setting of 0.2 from swift.properties. > That makes sense. Though not for wonky. From benc at hawaga.org.uk Thu May 29 12:34:09 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 17:34:09 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212082205.20283.1.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> Message-ID: On Thu, 29 May 2008, Mihael Hategan wrote: > That makes sense. Though not for wonky. well, that comes from my minimalist modifications to the local site. This should still work with jobThrottle=0, though. -- From hategan at mcs.anl.gov Thu May 29 13:50:31 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 13:50:31 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> Message-ID: <1212087031.21716.0.camel@localhost> On Thu, 2008-05-29 at 17:34 +0000, Ben Clifford wrote: > On Thu, 29 May 2008, Mihael Hategan wrote: > > > That makes sense. Though not for wonky. > > well, that comes from my minimalist modifications to the local site. > > This should still work with jobThrottle=0, though. Of course it should. I wasn't arguing. > From hategan at mcs.anl.gov Thu May 29 18:36:03 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 18:36:03 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212082205.20283.1.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> Message-ID: <1212104163.25742.2.camel@localhost> On Thu, 2008-05-29 at 12:30 -0500, Mihael Hategan wrote: > On Thu, 2008-05-29 at 17:24 +0000, Ben Clifford wrote: > > On Thu, 29 May 2008, Mihael Hategan wrote: > > > > > < 0 > > > > > Though that also seems to beg the question: why is that a default? > > > > That line is a default on the local site because local sites tend to run > > with provider local onto ~2 CPU cores with no LRM management of load. > > > > Other site definitions (in the default sites.xml file and in for example > > tests/sites/) don't use that setting, so that they pick up the > > usually-appropriate-for-GRAM2 setting of 0.2 from swift.properties. > > > > That makes sense. Though not for wonky. Also as long as we're talking about processes that don't parallelize well on a single CPU (such as CPU-bound ones). I think it doesn't hurt much to have slightly more than 2 processes there, whereas it might hurt to have only 2. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Thu May 29 18:51:26 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 23:51:26 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212104163.25742.2.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> <1212104163.25742.2.camel@localhost> Message-ID: > Also as long as we're talking about processes that don't parallelize > well on a single CPU (such as CPU-bound ones). though that applies to pretty much any managed resource. This is a fairly conservative default that a user is free to change if they really want to; on shared systems, which is where most people seem to be running swift, I'd much prefer that this doesn't flood the system with CPU intensive processes. -- From hategan at mcs.anl.gov Thu May 29 19:02:27 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 19:02:27 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> <1212104163.25742.2.camel@localhost> Message-ID: <1212105747.26549.4.camel@localhost> On Thu, 2008-05-29 at 23:51 +0000, Ben Clifford wrote: > > Also as long as we're talking about processes that don't parallelize > > well on a single CPU (such as CPU-bound ones). > > though that applies to pretty much any managed resource. > > This is a fairly conservative default that a user is free to change if > they really want to; on shared systems, which is where most people seem to > be running swift, I'd much prefer that this doesn't flood the system with > CPU intensive processes. I'm talking about localhost. We're unable to reduce the user's application in any reasonable form. However, my guess is that, statistically, something like 4 or 8 would yield better results, even, perhaps, on shared filesystems, for which we have some "fixes" in place. > From benc at hawaga.org.uk Thu May 29 19:06:56 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 30 May 2008 00:06:56 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212105747.26549.4.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> <1212104163.25742.2.camel@localhost> <1212105747.26549.4.camel@localhost> Message-ID: On Thu, 29 May 2008, Mihael Hategan wrote: > > This is a fairly conservative default that a user is free to change if > > they really want to; on shared systems, which is where most people > > seem to be running swift, I'd much prefer that this doesn't flood the > > system with CPU intensive processes. > > I'm talking about localhost. yes, so am I. most people's localhost is a shared system. it seems fairly unusual (as in: you do it, and I do it, and that's about it) for people to run swift on a single-user system. -- From hategan at mcs.anl.gov Thu May 29 19:15:34 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 19:15:34 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> <1212104163.25742.2.camel@localhost> <1212105747.26549.4.camel@localhost> Message-ID: <1212106534.27095.6.camel@localhost> On Fri, 2008-05-30 at 00:06 +0000, Ben Clifford wrote: > On Thu, 29 May 2008, Mihael Hategan wrote: > > > > This is a fairly conservative default that a user is free to change if > > > they really want to; on shared systems, which is where most people > > > seem to be running swift, I'd much prefer that this doesn't flood the > > > system with CPU intensive processes. > > > > I'm talking about localhost. > > yes, so am I. most people's localhost is a shared system. it seems fairly > unusual (as in: you do it, and I do it, and that's about it) for people to > run swift on a single-user system. The ratio of usefulness to effort for this discussion is going towards zero, but... the point is still whether 2 vs. larger single digits means the difference between a responsive system and a non-responsive one. Not whether 2, not whether the system is shared or not, but whether 4 or 8 causes visibly more distress than 2. > From benc at hawaga.org.uk Thu May 29 19:26:50 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 30 May 2008 00:26:50 +0000 (GMT) Subject: [Swift-devel] failure reporting Message-ID: The commit at swift svn r1941 appears to beak application exception reporting - in revisions after that, log messages like this: > 2008-05-30 01:24:14,573+0100 DEBUG vdl:execute2 APPLICATION_EXCEPTION > jobid=echo-1ues4dti - Application exception: don't get reported any more. That was 3 weeks ago. bleugh. I guess there should be some automated well-formedness tests for errors too... -- From hategan at mcs.anl.gov Thu May 29 19:41:33 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 19:41:33 -0500 Subject: [Swift-devel] failure reporting In-Reply-To: References: Message-ID: <1212108093.27778.0.camel@localhost> On Fri, 2008-05-30 at 00:26 +0000, Ben Clifford wrote: > The commit at swift svn r1941 appears to beak application exception > reporting - in revisions after that, log messages like this: > > > 2008-05-30 01:24:14,573+0100 DEBUG vdl:execute2 APPLICATION_EXCEPTION > > jobid=echo-1ues4dti - Application exception: > > don't get reported any more. Bad things happen to good people sometimes. It's there, so unless that last catch block doesn't execute, it should work. > > That was 3 weeks ago. bleugh. I guess there should be some automated > well-formedness tests for errors too... > From benc at hawaga.org.uk Thu May 29 19:46:11 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 30 May 2008 00:46:11 +0000 (GMT) Subject: [Swift-devel] failure reporting In-Reply-To: <1212108093.27778.0.camel@localhost> References: <1212108093.27778.0.camel@localhost> Message-ID: On Thu, 29 May 2008, Mihael Hategan wrote: > It's there, so unless that last catch block doesn't execute I think that is exactly what is happening. If I do this (which is presumably the wrong thing for replication, but otherwise I think ok): - catch("^(?!Abort)$" + catch(".*" vdl:setprogress("Failed but can retry") log(LOG:DEBUG, "APPLICATION_EXCEPTION jobid={jobid} - Application exception: ", exception) then I get the logging back. -- From hategan at mcs.anl.gov Thu May 29 19:51:35 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 19:51:35 -0500 Subject: [Swift-devel] failure reporting In-Reply-To: References: <1212108093.27778.0.camel@localhost> Message-ID: <1212108695.28255.0.camel@localhost> On Fri, 2008-05-30 at 00:46 +0000, Ben Clifford wrote: > On Thu, 29 May 2008, Mihael Hategan wrote: > > > It's there, so unless that last catch block doesn't execute > > I think that is exactly what is happening. > > If I do this (which is presumably the wrong thing for replication, but > otherwise I think ok): > > - catch("^(?!Abort)$" Now that you mention it, I might only have tested that regexp in perl. Back to the drawing board. > + catch(".*" > vdl:setprogress("Failed > but can retry") > log(LOG:DEBUG, > "APPLICATION_EXCEPTION jobid={jobid} - Application exception: ", > exception) > > then I get the logging back. > From hategan at mcs.anl.gov Thu May 29 20:21:26 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 20:21:26 -0500 Subject: [Swift-devel] failure reporting In-Reply-To: <1212108695.28255.0.camel@localhost> References: <1212108093.27778.0.camel@localhost> <1212108695.28255.0.camel@localhost> Message-ID: <1212110486.28255.8.camel@localhost> On Thu, 2008-05-29 at 19:51 -0500, Mihael Hategan wrote: > On Fri, 2008-05-30 at 00:46 +0000, Ben Clifford wrote: > > On Thu, 29 May 2008, Mihael Hategan wrote: > > > > > It's there, so unless that last catch block doesn't execute > > > > I think that is exactly what is happening. > > > > If I do this (which is presumably the wrong thing for replication, but > > otherwise I think ok): > > > > - catch("^(?!Abort)$" > > Now that you mention it, I might only have tested that regexp in perl. > Back to the drawing board. Odd. Doesn't work in perl either. I should make an appointment with the head doctor. Try r2037. > > > + catch(".*" > > vdl:setprogress("Failed > > but can retry") > > log(LOG:DEBUG, > > "APPLICATION_EXCEPTION jobid={jobid} - Application exception: ", > > exception) > > > > then I get the logging back. > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Thu May 29 22:14:10 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 29 May 2008 22:14:10 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> Message-ID: <1212117250.10293.1.camel@localhost> On Thu, 2008-05-29 at 17:34 +0000, Ben Clifford wrote: > On Thu, 29 May 2008, Mihael Hategan wrote: > > > That makes sense. Though not for wonky. > > well, that comes from my minimalist modifications to the local site. > > This should still work with jobThrottle=0, though. Replication v2 is in: cog r2031 swift r2038 This one hopefully properly deals with throttling. > From benc at hawaga.org.uk Fri May 30 06:31:55 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 30 May 2008 11:31:55 +0000 (GMT) Subject: [Swift-devel] failure reporting In-Reply-To: <1212110486.28255.8.camel@localhost> References: <1212108093.27778.0.camel@localhost> <1212108695.28255.0.camel@localhost> <1212110486.28255.8.camel@localhost> Message-ID: On Thu, 29 May 2008, Mihael Hategan wrote: > > Now that you mention it, I might only have tested that regexp in perl. > > Back to the drawing board. [..] > Try r2037. works. -- From benc at hawaga.org.uk Fri May 30 06:56:46 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 30 May 2008 11:56:46 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212117250.10293.1.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> <1212117250.10293.1.camel@localhost> Message-ID: > Replication v2 is in: > cog r2031 > swift r2038 > > This one hopefully properly deals with throttling. 062-two-in-a-row.swift itself runs without exiting with failure. However, it does not appear to do the stageout for the second job - running it with: ./run 062-two-in-a-row.swift gives a test failure because of that. Changing: to makes this work, and changing it back makes it not work. I put a copy of the log in http://www.ci.uchicago.edu/~benc/tmp/062-two-in-a-row-20080530-1248-v44k1kg2.log -- From bugzilla-daemon at mcs.anl.gov Fri May 30 08:41:44 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 30 May 2008 08:41:44 -0500 (CDT) Subject: [Swift-devel] [Bug 137] undeclared procedures are not detected until runtime In-Reply-To: Message-ID: <20080530134144.205A8164CF@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=137 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from benc at hawaga.org.uk 2008-05-30 08:41 ------- should be fixed (by Milena Nikolic) in r2039 -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From hategan at mcs.anl.gov Fri May 30 12:10:10 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 30 May 2008 12:10:10 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> <1212117250.10293.1.camel@localhost> Message-ID: <1212167410.16574.1.camel@localhost> On Fri, 2008-05-30 at 11:56 +0000, Ben Clifford wrote: > > Replication v2 is in: > > cog r2031 > > swift r2038 > > > > This one hopefully properly deals with throttling. > > 062-two-in-a-row.swift itself runs without exiting with failure. > > However, it does not appear to do the stageout for the second job - > running it with: > > ./run 062-two-in-a-row.swift > > gives a test failure because of that. Forgot to commit a file. Try cog r2033. From benc at hawaga.org.uk Sat May 31 05:12:57 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 31 May 2008 10:12:57 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212167410.16574.1.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> <1212030394.13449.0.camel@localhost> <1212075440.17456.1.camel@localhost> <1212078058.18636.0.camel@localhost> <1212078480.18957.3.camel@localhost> <1212081313.19911.0.camel@localhost> <1212082205.20283.1.camel@localhost> <1212117250.10293.1.camel@localhost> <1212167410.16574.1.camel@localhost> Message-ID: On Fri, 30 May 2008, Mihael Hategan wrote: > > ./run 062-two-in-a-row.swift > > > > gives a test failure because of that. > > Forgot to commit a file. Try cog r2033. that passes. --