From benc at hawaga.org.uk Mon Oct 1 03:42:09 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 1 Oct 2007 08:42:09 +0000 (GMT) Subject: [Swift-devel] Use case and examples needed to avoid large directories In-Reply-To: <1191203193.25550.4.camel@blabla.mcs.anl.gov> References: <46FD8EF9.8000802@mcs.anl.gov> <1191022702.9962.2.camel@blabla.mcs.anl.gov> <46FE87F0.9070800@mcs.anl.gov> <1191086531.13048.3.camel@blabla.mcs.anl.gov> <46FFFC85.6070004@mcs.anl.gov> <1191203193.25550.4.camel@blabla.mcs.anl.gov> Message-ID: On Sun, 30 Sep 2007, Mihael Hategan wrote: > In this case perhaps the mean, on whatever group, may very well be > defined as (x1 + x2 + ... + xn)/n. If we have access to "+" and "/", > then it should be fine. I looked at the docs for softmean yesterday - superficially it looks like it would be possible to use it for both of that; however a closer inspection of the documentation suggests that it takes the mean only of pixels which have a non-zero value which suggests that perhaps n is different for every pixel/voxel. Further investigation probably necessary on the app side. -- From benc at hawaga.org.uk Mon Oct 1 05:19:29 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 1 Oct 2007 10:19:29 +0000 (GMT) Subject: [Swift-devel] calling rmap Message-ID: In the AbstractFileMapper subclasses, rmap to my intuition should be callable with the same filename multiple times; however, there are implementations like this (from FileSystemArrayMapper) public Path rmap(String name) { if (name == null || name.equals("")) { return null; } String index = String.valueOf(count); filenames.put(index, name); Path p = Path.EMPTY_PATH; p = p.addFirst(index, true); ++count; return p; } which look like they can't be called multiple times for the same filename without causing additions of multiple array elements mapped to the same file. This works because rmap is indeed only called once, during invocation of the existing() method. So there's a fairly restrictive set of conditions that seems a bit icky. For example, here's one rough approximation (I haven't checked them for being strictly true, but its close enough) i) a mapper must expect existing() to be called exactly once. ii) a mapper does not have to give correct mappings until after the existing() call has been invoked. iii) rmap should not be called by anyone except the AbstractFileMapper existing() implementation (despite its 'public' prototype) I find the above three to be unintuitive as I attempt to document how to use AbstractFileMapper as a superclass of your own mappers. -- From andrewj at uchicago.edu Mon Oct 1 07:48:08 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Mon, 1 Oct 2007 07:48:08 -0500 (CDT) Subject: [Swift-devel] Use case and examples needed to avoid large directories Message-ID: <20071001074808.AUI18823@m4500-00.uchicago.edu> Hey all, Thanks for diving right in on the issues at hand. Sorry I haven't been more vocal, I will be more as time progresses, my parents were in this weekend. I will shortly be getting the first Swift code, etc. put together soon and let you know how it goes. Thanks, Andrew ---- Original message ---- >Date: Sun, 30 Sep 2007 20:46:32 -0500 >From: Mihael Hategan >Subject: Re: [Swift-devel] Use case and examples needed to avoid large directories >To: Ben Clifford >Cc: Michael Wilde , Andrew Jamieson , swift-devel > >On Sun, 2007-09-30 at 19:48 +0000, Ben Clifford wrote: >> >> On Sun, 30 Sep 2007, Michael Wilde wrote: >> >> > examples, where softmean needs to average a large set of inputs, and its >> > behavior is, we thing, associatve. >> >> mean as an operator is not usually associative - I'd think softmean isn't >> either if its doing per-pixel arithmetic means, which is what I've always >> assumed it was doing. >> >> (e.g. (1 mean 2) mean 3) = (( (1+2)/2) + 3)/2 = 2.25 >> 1 mean (2 mean 3) = ( 1 + ( (2+3)/2) ) /2 = 1.75 >> >> and neither are the actual mean of (1,2,3) which is 2. > >... assuming the operation is associative. > >In this case perhaps the mean, on whatever group, may very well be >defined as (x1 + x2 + ... + xn)/n. If we have access to "+" and "/", >then it should be fine. > >> > From benc at hawaga.org.uk Mon Oct 1 07:59:22 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 1 Oct 2007 12:59:22 +0000 (GMT) Subject: [Swift-devel] gt4 provider Message-ID: When I submit through the gt4 provider, I get warnings like this: > Unknown value for socketTimeout attribute (null). Ignoring. which appear to come from the gt4 cog provider provider-gt4_0_0. Not sure if that's a bug in cog where it should be more quietly ignoring lack of that option, or if its a bug in swift where the socket timeout parameter is mandatory but swift isn't providing. -- From hategan at mcs.anl.gov Mon Oct 1 09:08:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 01 Oct 2007 09:08:45 -0500 Subject: [Swift-devel] gt4 provider In-Reply-To: References: Message-ID: <1191247725.1561.0.camel@blabla.mcs.anl.gov> Fixed in cog 1762. On Mon, 2007-10-01 at 12:59 +0000, Ben Clifford wrote: > When I submit through the gt4 provider, I get warnings like this: > > > Unknown value for socketTimeout attribute (null). Ignoring. > > which appear to come from the gt4 cog provider provider-gt4_0_0. > > Not sure if that's a bug in cog where it should be more quietly ignoring > lack of that option, or if its a bug in swift where the socket timeout > parameter is mandatory but swift isn't providing. > From benc at hawaga.org.uk Mon Oct 1 09:59:17 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 1 Oct 2007 14:59:17 +0000 (GMT) Subject: [Swift-devel] restarts Message-ID: I was looking at restarts a bit. If I run the SwiftApps badmonkey workflow, let it fail I get this restart log: $ cat badmonkey-20071001-1404-9cqjt7of.0.rlog # Log file created Mon Oct 01 14:04:22 BST 2007 outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0001.txt outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0000.txt If I then restart it with this command: swift -tc.file ./tc.data -sites.file ./sites.xml -resume ./badmonkey-20071001-1404-9cqjt7of.0.rlog badmonkey.swift (which is the command used to start it in the first place with -resume ./badmonkey-20071001-1404-9cqjt7of.0.rlog added) swift appears to run the goodmonkey jobs again, in addition to attempting the broken badmonkey jobs, giving this restart log: $ cat badmonkey-20071001-1404-9cqjt7of.0.rlog # Log file created Mon Oct 01 14:04:22 BST 2007 outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0001.txt outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0000.txt # Log file updated Mon Oct 01 15:35:58 BST 2007 outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1535-oqxh1t78/shared/outg.0000.txt outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1535-oqxh1t78/shared/outg.0001.txt Is this caused by the presence of the unique run id (different for the two runs) in the restart entry? And also will site selection interfere there? (soju.hawaga.org.uk is the site name in my sites.xml) -- From hategan at mcs.anl.gov Mon Oct 1 11:00:11 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 01 Oct 2007 11:00:11 -0500 Subject: [Swift-devel] restarts In-Reply-To: References: Message-ID: <1191254412.5653.1.camel@blabla.mcs.anl.gov> It's caused by the addition of generalized files. So basically restarts are broken at this point. When I get some time, I'll work on the file management part and this. On Mon, 2007-10-01 at 14:59 +0000, Ben Clifford wrote: > I was looking at restarts a bit. If I run the SwiftApps badmonkey > workflow, let it fail I get this restart log: > > $ cat badmonkey-20071001-1404-9cqjt7of.0.rlog > # Log file created Mon Oct 01 14:04:22 BST 2007 > outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0001.txt > outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0000.txt > > If I then restart it with this command: > > swift -tc.file ./tc.data -sites.file ./sites.xml -resume > ./badmonkey-20071001-1404-9cqjt7of.0.rlog badmonkey.swift > > (which is the command used to start it in the first place with > -resume ./badmonkey-20071001-1404-9cqjt7of.0.rlog added) > > swift appears to run the goodmonkey jobs again, in addition to attempting > the broken badmonkey jobs, giving this restart log: > > $ cat badmonkey-20071001-1404-9cqjt7of.0.rlog > # Log file created Mon Oct 01 14:04:22 BST 2007 > outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0001.txt > outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0000.txt > # Log file updated Mon Oct 01 15:35:58 BST 2007 > outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1535-oqxh1t78/shared/outg.0000.txt > outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1535-oqxh1t78/shared/outg.0001.txt > > Is this caused by the presence of the unique run id (different for the two > runs) in the restart entry? > > And also will site selection interfere there? (soju.hawaga.org.uk is the > site name in my sites.xml) > From bugzilla-daemon at mcs.anl.gov Mon Oct 1 14:37:48 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 1 Oct 2007 14:37:48 -0500 (CDT) Subject: [Swift-devel] [Bug 92] URIs in mappers In-Reply-To: Message-ID: <20071001193748.C39A0164BC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=92 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from hategan at mcs.anl.gov 2007-10-01 14:37 ------- Should be fixed in swift at r1308/cog at r1769 -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From yongzh at cs.uchicago.edu Tue Oct 2 00:22:24 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 2 Oct 2007 00:22:24 -0500 (CDT) Subject: [Swift-devel] [Bug 98] New: Allow external scripts to be written inline within swiftscript functions In-Reply-To: References: Message-ID: Yes, it is good to finally see some action on this. Yong. On Sun, 30 Sep 2007 bugzilla-daemon at mcs.anl.gov wrote: > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=98 > > Summary: Allow external scripts to be written inline within > swiftscript functions > Product: Swift > Version: unspecified > Platform: All > OS/Version: Mac OS > Status: NEW > Severity: enhancement > Priority: P3 > Component: SwiftScript language > AssignedTo: benc at hawaga.org.uk > ReportedBy: wilde at mcs.anl.gov > > > Permit scripting language code to be specified "in line" in a swift function > declaration. > > This would be handy in that external-language wrapper code can be written right > in the swift program, to make it easier to see whats being done, and in fact > to code many simple processing scripts without needing any tc.data entries. > > One way to specify this would be to allow a declaration of the form: > > > script(bash) { > echo $IN $OUT $THRESH > } > > where: > > - bash is a name built-in to the supplied tc.data file, possibly with a > default, that can be overridden on a per-site basis. A set of such permissible > scripting options would be defined. > > - IN, OUT and THRESH are arguments to the enclosing swift function (or some > similar way of mapping swift arguments - both file and scalar - into the > scripting languages name space. > > Also, this mechanism will need a way for the script code to call an app that is > declared in tc.data, so that it can serve as a wrapper. > > Further note: these conventions need to be done on a > scripting-language-specific basis. One could envision such interfaces for: sh, > perl, python, R, MatLab, and a other languages. It should be possible for new > user groups to readily add suhc adapters. > > This was an idea that Yong and I were discussing over the years. It seems like > at least the sh version would be both feasible and very useful. > > > -- > Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Tue Oct 2 04:56:11 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 09:56:11 +0000 (GMT) Subject: [Swift-devel] provider-dcache default option -v upsets the dccp I tried Message-ID: The dccp that I used at lqcd.fnal.gov (/usr/local/bin/dccp) doesn't like the -v parameter. I commented out the options line in provider-dcache.properties and there was no problem. It might be useful to change that in the cog repo. -- From benc at hawaga.org.uk Tue Oct 2 06:02:01 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 11:02:01 +0000 (GMT) Subject: [Swift-devel] exceptions on failure for new url code Message-ID: I get exception dumps in job failure now. These are 'purely informative' in as much as swift successfully deals with the fact that the job failed and restarts - what is new is that the below exception dump gets displayed as well. Swift v0.2-dev r1309 (modified locally) RunID: 20071002-1158-t4v980la badmonkey started Warning: Task handler throws exception but does not set status org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: org.globus.cog.abstraction.impl.file.FileNotFoundException: /var/tmp/badmonkey-20071002-1158-t4v980la/status/badmonkey-novbe2ii-success not found. at org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.submit(CachingDelegatedFileOperationHandler.java:54) at org.globus.cog.abstraction.impl.common.task.CachingFileOperationTaskHandler.submit(CachingFileOperationTaskHandler.java:28) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:82) -- From benc at hawaga.org.uk Tue Oct 2 06:04:40 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 11:04:40 +0000 (GMT) Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: References: Message-ID: full log is here: http://www.ci.uchicago.edu/~benc/tmp/badmonkey-20071002-1158-t4v980la.log see for example 2007-10-02 11:58:57,329+0100 -- From bugzilla-daemon at mcs.anl.gov Tue Oct 2 06:11:06 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 2 Oct 2007 06:11:06 -0500 (CDT) Subject: [Swift-devel] [Bug 78] Submit side file locations outside of submit directory. In-Reply-To: Message-ID: <20071002111106.20A37164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=78 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from benc at hawaga.org.uk 2007-10-02 06:11 ------- this has been done (by mihael) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Tue Oct 2 06:13:38 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 2 Oct 2007 06:13:38 -0500 (CDT) Subject: [Swift-devel] [Bug 95] document use of PBS direct submission In-Reply-To: Message-ID: <20071002111338.E1B2F164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=95 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from benc at hawaga.org.uk 2007-10-02 06:13 ------- done http://www.ci.uchicago.edu/swift/guides/userguide.php#id3096047 -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Tue Oct 2 07:15:58 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 2 Oct 2007 07:15:58 -0500 (CDT) Subject: [Swift-devel] [Bug 99] New: Out of date guide Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=99 Summary: Out of date guide Product: Swift Version: unspecified Platform: PC URL: http://www.ci.uchicago.edu/swift/guides/ OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: benc at hawaga.org.uk ReportedBy: foster at mcs.anl.gov The URL above points to the old VDL guide written by Yong. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Tue Oct 2 07:18:59 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 2 Oct 2007 07:18:59 -0500 (CDT) Subject: [Swift-devel] [Bug 98] Allow external scripts to be written inline within swiftscript functions In-Reply-To: Message-ID: <20071002121859.BA435164BC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=98 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|v0.3 |--- -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Tue Oct 2 07:41:42 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 2 Oct 2007 07:41:42 -0500 (CDT) Subject: [Swift-devel] [Bug 99] Out of date guide In-Reply-To: Message-ID: <20071002124142.F0A02164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=99 ------- Comment #1 from benc at hawaga.org.uk 2007-10-02 07:41 ------- that was a temporary file left over from the documentation build process. r1312 causes that temporary file to be removed, resulting in the specified URL giving an error rather than incorrect content. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Tue Oct 2 08:02:29 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 2 Oct 2007 08:02:29 -0500 (CDT) Subject: [Swift-devel] [Bug 100] New: deprecate workflow word? Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=100 Summary: deprecate workflow word? Product: Swift Version: unspecified Platform: PC OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: benc at hawaga.org.uk ReportedBy: foster at mcs.anl.gov I believe that we should be consistent in describing Swift as a parallel scripting language, not a workflow language. And talking about Swift task graphs, not workflows, etc. This requires a light edit of various parts of the Web site ... I realize that some might say that this is not really an important topic to address. And perhaps they would be right :-) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From benc at hawaga.org.uk Tue Oct 2 08:04:56 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 13:04:56 +0000 (GMT) Subject: [Swift-devel] Re: [Bug 100] New: deprecate workflow word? In-Reply-To: References: Message-ID: On Tue, 2 Oct 2007, bugzilla-daemon at mcs.anl.gov wrote: > I believe that we should be consistent in describing Swift as a parallel > scripting language, not a workflow language. And talking about Swift task > graphs, not workflows, etc. This requires a light edit of various parts of the > Web site ... > I realize that some might say that this is not really an important topic to > address. And perhaps they would be right :-) I dislike the use of the word workflow in any technical sense, though it describes a certain class of systems so its probably useful to keep round in some places as a fuzzy classification. I think graph is entirely the wrong work to describe something written in SwiftScript - its a script or a program. -- From benc at hawaga.org.uk Tue Oct 2 08:16:20 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 13:16:20 +0000 (GMT) Subject: [Swift-devel] swift 0.3rc1 Message-ID: I built release candidate 1 for Swift v0.3. It is at: http://www.ci.uchicago.edu/~benc/vdsk-0.3-rc1.tar.gz md5sum: 8532bd56ce65f186ac760de592218e3b Please test it. If there are no serious issues, it gets promoted to real release in 24h. -- From hategan at mcs.anl.gov Tue Oct 2 09:35:31 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Oct 2007 09:35:31 -0500 Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: References: Message-ID: <1191335731.28013.5.camel@blabla.mcs.anl.gov> On Tue, 2007-10-02 at 11:02 +0000, Ben Clifford wrote: > I get exception dumps in job failure now. These are 'purely informative' > in as much as swift successfully deals with the fact that the job failed > and restarts - what is new is that the below exception dump gets displayed > as well. > > Swift v0.2-dev r1309 (modified locally) > > RunID: 20071002-1158-t4v980la > badmonkey started > Warning: Task handler throws exception but does not set status You shouldn't get that one. Can you paste the same from the actual log? I'm interested in the class it's coming from. > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > org.globus.cog.abstraction.impl.file.FileNotFoundException: > /var/tmp/badmonkey-20071002-1158-t4v980la/status/badmonkey-novbe2ii-success > not found. > at > org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.submit(CachingDelegatedFileOperationHandler.java:54) > at > org.globus.cog.abstraction.impl.common.task.CachingFileOperationTaskHandler.submit(CachingFileOperationTaskHandler.java:28) > at > org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:82) > > From hategan at mcs.anl.gov Tue Oct 2 09:38:12 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Oct 2007 09:38:12 -0500 Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: References: Message-ID: <1191335892.28013.8.camel@blabla.mcs.anl.gov> Hmm. I think it was one of those things where I didn't look at what svn ci actually did and assumed it worked OK. Should be fixed in cog 1770. On Tue, 2007-10-02 at 11:04 +0000, Ben Clifford wrote: > full log is here: > http://www.ci.uchicago.edu/~benc/tmp/badmonkey-20071002-1158-t4v980la.log > > see for example 2007-10-02 11:58:57,329+0100 From hategan at mcs.anl.gov Tue Oct 2 09:39:41 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Oct 2007 09:39:41 -0500 Subject: [Swift-devel] Re: [Bug 100] New: deprecate workflow word? In-Reply-To: References: Message-ID: <1191335981.28013.10.camel@blabla.mcs.anl.gov> On Tue, 2007-10-02 at 13:04 +0000, Ben Clifford wrote: > > On Tue, 2 Oct 2007, bugzilla-daemon at mcs.anl.gov wrote: > > > I believe that we should be consistent in describing Swift as a parallel > > scripting language, not a workflow language. And talking about Swift task > > graphs, not workflows, etc. This requires a light edit of various parts of the > > Web site ... > > > I realize that some might say that this is not really an important topic to > > address. And perhaps they would be right :-) > > I dislike the use of the word workflow in any technical sense, though it > describes a certain class of systems so its probably useful to keep round > in some places as a fuzzy classification. > > I think graph is entirely the wrong work to describe something written in > SwiftScript - its a script or a program. Couldn't agree more. > From benc at hawaga.org.uk Tue Oct 2 10:25:37 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 15:25:37 +0000 (GMT) Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: <1191335892.28013.8.camel@blabla.mcs.anl.gov> References: <1191335892.28013.8.camel@blabla.mcs.anl.gov> Message-ID: I still get problems - see http://www.ci.uchicago.edu/~benc/tmp/badmonkey-20071002-1623-a2zgmxq0.log On Tue, 2 Oct 2007, Mihael Hategan wrote: > Hmm. I think it was one of those things where I didn't look at what svn > ci actually did and assumed it worked OK. > > Should be fixed in cog 1770. > > On Tue, 2007-10-02 at 11:04 +0000, Ben Clifford wrote: > > full log is here: > > http://www.ci.uchicago.edu/~benc/tmp/badmonkey-20071002-1158-t4v980la.log > > > > see for example 2007-10-02 11:58:57,329+0100 > > From hategan at mcs.anl.gov Tue Oct 2 10:30:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Oct 2007 10:30:39 -0500 Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: References: <1191335892.28013.8.camel@blabla.mcs.anl.gov> Message-ID: <1191339039.30272.7.camel@blabla.mcs.anl.gov> Something isn't right. That message doesn't exist any more. distclean? On Tue, 2007-10-02 at 15:25 +0000, Ben Clifford wrote: > I still get problems - see > http://www.ci.uchicago.edu/~benc/tmp/badmonkey-20071002-1623-a2zgmxq0.log > > > > On Tue, 2 Oct 2007, Mihael Hategan wrote: > > > Hmm. I think it was one of those things where I didn't look at what svn > > ci actually did and assumed it worked OK. > > > > Should be fixed in cog 1770. > > > > On Tue, 2007-10-02 at 11:04 +0000, Ben Clifford wrote: > > > full log is here: > > > http://www.ci.uchicago.edu/~benc/tmp/badmonkey-20071002-1158-t4v980la.log > > > > > > see for example 2007-10-02 11:58:57,329+0100 > > > > > From benc at hawaga.org.uk Tue Oct 2 10:32:00 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 15:32:00 +0000 (GMT) Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: <1191339039.30272.7.camel@blabla.mcs.anl.gov> References: <1191335892.28013.8.camel@blabla.mcs.anl.gov> <1191339039.30272.7.camel@blabla.mcs.anl.gov> Message-ID: I did redist at vdsk before that, I think. Will try a distclean at cog/ too. On Tue, 2 Oct 2007, Mihael Hategan wrote: > Something isn't right. That message doesn't exist any more. distclean? > > On Tue, 2007-10-02 at 15:25 +0000, Ben Clifford wrote: > > I still get problems - see > > http://www.ci.uchicago.edu/~benc/tmp/badmonkey-20071002-1623-a2zgmxq0.log > > > > > > > > On Tue, 2 Oct 2007, Mihael Hategan wrote: > > > > > Hmm. I think it was one of those things where I didn't look at what svn > > > ci actually did and assumed it worked OK. > > > > > > Should be fixed in cog 1770. > > > > > > On Tue, 2007-10-02 at 11:04 +0000, Ben Clifford wrote: > > > > full log is here: > > > > http://www.ci.uchicago.edu/~benc/tmp/badmonkey-20071002-1158-t4v980la.log > > > > > > > > see for example 2007-10-02 11:58:57,329+0100 > > > > > > > > > > From benc at hawaga.org.uk Tue Oct 2 10:40:40 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 15:40:40 +0000 (GMT) Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: References: <1191335892.28013.8.camel@blabla.mcs.anl.gov> <1191339039.30272.7.camel@blabla.mcs.anl.gov> Message-ID: i changed the swift version number; but distclean doesn't clean away the old version. this caught people out (me included) last time the version number changed... perhaps distclean should clean even harder. -- From hategan at mcs.anl.gov Tue Oct 2 11:29:01 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Oct 2007 11:29:01 -0500 Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: References: <1191335892.28013.8.camel@blabla.mcs.anl.gov> <1191339039.30272.7.camel@blabla.mcs.anl.gov> Message-ID: <1191342541.867.4.camel@blabla.mcs.anl.gov> On Tue, 2007-10-02 at 15:40 +0000, Ben Clifford wrote: > i changed the swift version number; but distclean doesn't clean away the > old version. this caught people out (me included) last time the version > number changed... perhaps distclean should clean even harder. I'm thinking the cause here is that you were implicitly starting old-version/bin/swift (probably because it was in the path). So I think I'll leave it as it is, since it has worked reliably this way for a while, and I can't tell what the full implications of making it clean harder are. From benc at hawaga.org.uk Tue Oct 2 11:31:54 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 16:31:54 +0000 (GMT) Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: <1191342541.867.4.camel@blabla.mcs.anl.gov> References: <1191335892.28013.8.camel@blabla.mcs.anl.gov> <1191339039.30272.7.camel@blabla.mcs.anl.gov> <1191342541.867.4.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 2 Oct 2007, Mihael Hategan wrote: > I'm thinking the cause here is that you were implicitly starting > old-version/bin/swift (probably because it was in the path). yes. that's caught people in the past building from source trees and will continue to do so. Removing the entire content of dist/ seems a better thing to do. > So I think I'll leave it as it is, since it has worked reliably this way > for a while it hasn't - it happens every time the version number changes. -- From hategan at mcs.anl.gov Tue Oct 2 11:40:29 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Oct 2007 11:40:29 -0500 Subject: [Swift-devel] Re: exceptions on failure for new url code In-Reply-To: References: <1191335892.28013.8.camel@blabla.mcs.anl.gov> <1191339039.30272.7.camel@blabla.mcs.anl.gov> <1191342541.867.4.camel@blabla.mcs.anl.gov> Message-ID: <1191343229.1363.3.camel@blabla.mcs.anl.gov> On Tue, 2007-10-02 at 16:31 +0000, Ben Clifford wrote: > > On Tue, 2 Oct 2007, Mihael Hategan wrote: > > > I'm thinking the cause here is that you were implicitly starting > > old-version/bin/swift (probably because it was in the path). > > yes. > > that's caught people in the past building from source trees and will > continue to do so. > > Removing the entire content of dist/ seems a better thing to do. > > > So I think I'll leave it as it is, since it has worked reliably this way > > for a while > > it hasn't - it happens every time the version number changes. You're generalizing particular cases. The $dist directory is not always "dist/name-version". It can be "/usr" or "/usr/local". In which case removing the parent directory is not a very good idea. On the other hand removing the "dist" directory may work. > From bugzilla-daemon at mcs.anl.gov Tue Oct 2 12:19:29 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 2 Oct 2007 12:19:29 -0500 (CDT) Subject: [Swift-devel] [Bug 101] New: failure in site initialisation appears to cause job to fail rather than be retried elsewhere. Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=101 Summary: failure in site initialisation appears to cause job to fail rather than be retried elsewhere. Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: benc at hawaga.org.uk CC: benc at hawaga.org.uk In this specific case, I attempt to run on a site which does not have a CA corresponding to my certificate (as well as several other sites which do work). The workflow consists of 30 invocations of the goodmonkey script. The site that fails has site name: gridlab1.ci.uchicago.edu The log file for this run is here: http://www.ci.uchicago.edu/~benc/badmonkey-20071002-1645-bc36p9q5.log -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Tue Oct 2 12:58:23 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 2 Oct 2007 12:58:23 -0500 (CDT) Subject: [Swift-devel] [Bug 101] failure in site initialisation appears to cause job to fail rather than be retried elsewhere. In-Reply-To: Message-ID: <20071002175823.54D93164BC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=101 ------- Comment #1 from hategan at mcs.anl.gov 2007-10-02 12:58 ------- I think it keeps sending jobs there despite it being a bad site because it is the only free site. I'm thinking for all scores lower than 1 there should be a cool-down period in which no jobs should be attempted on that site. This is the same as the problem we've seen with Falkon and bad nodes. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. You reported the bug, or are watching the reporter. From wilde at mcs.anl.gov Tue Oct 2 13:07:14 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 02 Oct 2007 13:07:14 -0500 Subject: [Swift-devel] Use case and examples needed to avoid large directories In-Reply-To: References: <46FD8EF9.8000802@mcs.anl.gov> <1191022702.9962.2.camel@blabla.mcs.anl.gov> <46FE87F0.9070800@mcs.anl.gov> <1191086531.13048.3.camel@blabla.mcs.anl.gov> <46FFFC85.6070004@mcs.anl.gov> Message-ID: <470288D2.2030807@mcs.anl.gov> Good point. To do this with any kind of mean (image or arithmetic) you'd need to add a weight as an argument (I think). I dont know if Andrew's app is associative or can be made so, or can be re-coded to take an arbitrary # of input files. Andrew, can you comment on that? (this was sitting in my to-send box since Sun - may have already been answered) - Mike On 9/30/07 2:48 PM, Ben Clifford wrote: > > On Sun, 30 Sep 2007, Michael Wilde wrote: > >> examples, where softmean needs to average a large set of inputs, and its >> behavior is, we thing, associatve. > > mean as an operator is not usually associative - I'd think softmean isn't > either if its doing per-pixel arithmetic means, which is what I've always > assumed it was doing. > > (e.g. (1 mean 2) mean 3) = (( (1+2)/2) + 3)/2 = 2.25 > 1 mean (2 mean 3) = ( 1 + ( (2+3)/2) ) /2 = 1.75 > > and neither are the actual mean of (1,2,3) which is 2. > From andrewj at uchicago.edu Tue Oct 2 13:47:41 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Tue, 2 Oct 2007 13:47:41 -0500 (CDT) Subject: [Swift-devel] Use case and examples needed to avoid large directories Message-ID: <20071002134741.AUK51908@m4500-00.uchicago.edu> Thanks for the input. First off, before I attempt to pose a further question I have, who is in the CI currently or regularly? I think it would be more efficient for me to meet up and possibly discuss some of this issues face to face as I try to type out these commands. Let me know which days are best to stop in. Ok, quesiton is will this system allow for me to simply lay out the input files in a certain directory structure (possibly mulitple paths deep) and reference this data in one or two lines of code in Swift without explicitly outlining locations for the data. For instance, say I have 10,000 input files total. And it is split into two types, malignant and benign, 5000 each. But these guys are distributed in a mulit-layer directory tree of some sort. like ~/swifthome/malignant/a01/b01/2/* ~/swifthome/benign/c01/h5/9/* what would I write to define an array of inputs (malignant and benign separately) for all the files included in these subdirectories? Thanks, Andrew ---- Original message ---- >Date: Sat, 29 Sep 2007 08:42:59 +0000 (GMT) >From: Ben Clifford >Subject: Re: [Swift-devel] Use case and examples needed to avoid large directories >To: Michael Wilde >Cc: swift-devel , Andrew Jamieson > >In brief, if you have a file in your submit directory called: > > (swifthome)/a/b.txt > >and map it somehow: > > file f <"a/b.txt">; > >then on the submit side, in the run directory it will map to individual >job execution directory like this: > > (individualjobdir)/a/b.txt > >This should be the case for inputs, outputs and intermediates. > >If you want to map everything in the 'a' directory, you can map something >like: > > file f[] ; > >to get an array containing every file in a. > > >Its meant to be the case that you can also store stuff elsewhere (on the >same machine or on any gridftp server) rather than directly in the submit >directory but I had some troubles when playing with it so I have a bug >open and am waiting on that before further documentation of that bit. > >-- From benc at hawaga.org.uk Tue Oct 2 14:00:10 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 19:00:10 +0000 (GMT) Subject: [Swift-devel] Use case and examples needed to avoid large directories In-Reply-To: <20071002134741.AUK51908@m4500-00.uchicago.edu> References: <20071002134741.AUK51908@m4500-00.uchicago.edu> Message-ID: On Tue, 2 Oct 2007, andrewj at uchicago.edu wrote: > For instance, say I have 10,000 input files total. And it is > split into two types, malignant and benign, 5000 each. But > these guys are distributed in a mulit-layer directory tree of > some sort. > > like ~/swifthome/malignant/a01/b01/2/* > ~/swifthome/benign/c01/h5/9/* > > what would I write to define an array of inputs (malignant and > benign separately) for all the files included in these > subdirectories? You couldn't do it with the mappers that we have now. You would have to write/modify an existing one. There is one, the filesystem array mapper, which maps directories out of a directory - you would probably need something like that with a bit of magic to make it recursive. I can help you with that. When you have it, you would declare an array like this: tumourfile benign[] ; -- From hategan at mcs.anl.gov Tue Oct 2 14:07:49 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Oct 2007 14:07:49 -0500 Subject: [Swift-devel] Use case and examples needed to avoid large directories In-Reply-To: <20071002134741.AUK51908@m4500-00.uchicago.edu> References: <20071002134741.AUK51908@m4500-00.uchicago.edu> Message-ID: <1191352069.8607.1.camel@blabla.mcs.anl.gov> On Tue, 2007-10-02 at 13:47 -0500, andrewj at uchicago.edu wrote: > First off, before I attempt to pose a further question I > have, who is in the CI currently or regularly? I usually get here around noon. My office has moved to the far west end of the CI (near the AG node). > > Thanks, > Andrew > > > > ---- Original message ---- > >Date: Sat, 29 Sep 2007 08:42:59 +0000 (GMT) > >From: Ben Clifford > >Subject: Re: [Swift-devel] Use case and examples needed to > avoid large directories > >To: Michael Wilde > >Cc: swift-devel , Andrew > Jamieson > > > >In brief, if you have a file in your submit directory called: > > > > (swifthome)/a/b.txt > > > >and map it somehow: > > > > file f <"a/b.txt">; > > > >then on the submit side, in the run directory it will map to > individual > >job execution directory like this: > > > > (individualjobdir)/a/b.txt > > > >This should be the case for inputs, outputs and intermediates. > > > >If you want to map everything in the 'a' directory, you can > map something > >like: > > > > file f[] ; > > > >to get an array containing every file in a. > > > > > >Its meant to be the case that you can also store stuff > elsewhere (on the > >same machine or on any gridftp server) rather than directly > in the submit > >directory but I had some troubles when playing with it so I > have a bug > >open and am waiting on that before further documentation of > that bit. > > > >-- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Tue Oct 2 16:45:58 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Oct 2007 21:45:58 +0000 (GMT) Subject: [Swift-devel] swift 0.3rc2 In-Reply-To: References: Message-ID: release candidate 2 for Swift v0.3 is at: http://www.ci.uchicago.edu/~benc/vdsk-0.3-rc2.tar.gz md5sum is 00f184cd18ae4a025b15547efe3a5d63 As before, please test it. If there are no serious issues, it gets promoted to real release in 24h. -- From tiejing at gmail.com Tue Oct 2 20:03:37 2007 From: tiejing at gmail.com (Jing Tie) Date: Tue, 2 Oct 2007 20:03:37 -0500 Subject: [Swift-devel] swift 0.3rc2 In-Reply-To: References: Message-ID: Hi, I run SID application on Swift v0.3, and it succeed. log: http://people.cs.uchicago.edu/~jtie/swift-logs/sid-wf1-20071002-1806-rtudhli3.log Jing On 10/2/07, Ben Clifford wrote: > > release candidate 2 for Swift v0.3 is at: > > http://www.ci.uchicago.edu/~benc/vdsk-0.3-rc2.tar.gz > > md5sum is 00f184cd18ae4a025b15547efe3a5d63 > > As before, please test it. If there are no serious issues, it gets > promoted to real release in 24h. > > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Wed Oct 3 08:59:56 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 3 Oct 2007 13:59:56 +0000 (GMT) Subject: [Swift-devel] working directory cleanup in the case of application exceptions. Message-ID: vdl-int.k cleans up working directories on the remote site in the case of an application exception. for debugging purposes it might be nice for that to not happen. can I change that in the source? -- From benc at hawaga.org.uk Wed Oct 3 09:02:57 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 3 Oct 2007 14:02:57 +0000 (GMT) Subject: [Swift-devel] Re: working directory cleanup in the case of application exceptions. In-Reply-To: References: Message-ID: i lie, that happens in wrapper.sh. ignore this message. On Wed, 3 Oct 2007, Ben Clifford wrote: > > vdl-int.k cleans up working directories on the remote site in the case of > an application exception. > > for debugging purposes it might be nice for that to not happen. can I > change that in the source? > > From hategan at mcs.anl.gov Wed Oct 3 09:05:47 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Oct 2007 09:05:47 -0500 Subject: [Swift-devel] working directory cleanup in the case of application exceptions. In-Reply-To: References: Message-ID: <1191420347.26140.2.camel@blabla.mcs.anl.gov> On Wed, 2007-10-03 at 13:59 +0000, Ben Clifford wrote: > vdl-int.k cleans up working directories on the remote site in the case of > an application exception. > > for debugging purposes it might be nice for that to not happen. can I > change that in the source? Makes sense. > From hategan at mcs.anl.gov Wed Oct 3 09:09:09 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Oct 2007 09:09:09 -0500 Subject: [Swift-devel] Re: working directory cleanup in the case of application exceptions. In-Reply-To: References: Message-ID: <1191420549.26140.4.camel@blabla.mcs.anl.gov> On Wed, 2007-10-03 at 14:02 +0000, Ben Clifford wrote: > i lie, that happens in wrapper.sh. ignore this message. I think you might still be lying. I don't see it in wrapper.sh either. It's only removed on success. > > On Wed, 3 Oct 2007, Ben Clifford wrote: > > > > > vdl-int.k cleans up working directories on the remote site in the case of > > an application exception. > > > > for debugging purposes it might be nice for that to not happen. can I > > change that in the source? > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Wed Oct 3 09:15:12 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 3 Oct 2007 14:15:12 +0000 (GMT) Subject: [Swift-devel] Re: working directory cleanup in the case of application exceptions. In-Reply-To: <1191420549.26140.4.camel@blabla.mcs.anl.gov> References: <1191420549.26140.4.camel@blabla.mcs.anl.gov> Message-ID: On Wed, 3 Oct 2007, Mihael Hategan wrote: > On Wed, 2007-10-03 at 14:02 +0000, Ben Clifford wrote: > > i lie, that happens in wrapper.sh. ignore this message. > > I think you might still be lying. I don't see it in wrapper.sh either. > It's only removed on success. yeah - its actually the case in what I was looking at the the job succeeds from wrapper.sh's perspective. see the note I'm going to send in about 5 minutes time for this particular case. -- From hategan at mcs.anl.gov Wed Oct 3 11:13:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Oct 2007 11:13:39 -0500 Subject: [Swift-devel] cleanup failure Message-ID: <1191428019.31240.3.camel@blabla.mcs.anl.gov> I committed some code to cog to fix the cleanup failure warnings that Sarah was seeing. r1773. From andrewj at uchicago.edu Wed Oct 3 15:36:40 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Wed, 3 Oct 2007 15:36:40 -0500 (CDT) Subject: [Swift-devel] passing types and variables in swift Message-ID: <20071003153640.AUM34049@m4500-00.uchicago.edu> Hello, I am wondering about the nature of passing values in to the variable types in Swift. In particular, I want to conduct multiple parameter sweeps by altering the numberical values of certain parameters I am passing as input into the "apps" I end up running on the Grid. Now, before I write out all the swift code the wrong way assuming things that may not be true, I thought I would get clarification. My question has to do with using the convenient mapper system and putting these numerical and/or string parameters into place in Swift using something like the csv mapper. say I have a type: type RGIparams { int a; int b boolean c; } and atomic function: (Contour c) Segment (ROI roi, RGIparams rgiP) { app { SegNEx @roi.image @roi.center rgiP.a rgiP.b rgiP.c; } } could I do something like this: RGIparams RGIinput[] ; where rgi_runs would contain say 1,2,true 3,4,false 18,20,true and so on..so that the parameters are passed into the workflow? I am trying to configure this workflow to be as modular and general as possible for ease of experimental alteration. Thanks, Andrew From benc at hawaga.org.uk Wed Oct 3 16:58:36 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 3 Oct 2007 21:58:36 +0000 (GMT) Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <20071003153640.AUM34049@m4500-00.uchicago.edu> References: <20071003153640.AUM34049@m4500-00.uchicago.edu> Message-ID: On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: > could I do something like this: > > RGIparams RGIinput[] ; > > where rgi_runs would contain say > > 1,2,true > 3,4,false > 18,20,true > > and so on..so that the parameters are passed into the workflow? no. mappers only map filenames (or URIs now); they don't map variable values. -- From benc at hawaga.org.uk Thu Oct 4 05:35:23 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 4 Oct 2007 10:35:23 +0000 (GMT) Subject: [Swift-devel] working directory cleanup in the case of application exceptions. In-Reply-To: <1191420347.26140.2.camel@blabla.mcs.anl.gov> References: <1191420347.26140.2.camel@blabla.mcs.anl.gov> Message-ID: On Wed, 3 Oct 2007, Mihael Hategan wrote: > On Wed, 2007-10-03 at 13:59 +0000, Ben Clifford wrote: > > vdl-int.k cleans up working directories on the remote site in the case of > > an application exception. > > > > for debugging purposes it might be nice for that to not happen. can I > > change that in the source? > > Makes sense. I realised I don't know what the vdl:cacheUnlockFiles stuff does. Can I remove the whole vdl:cacheUnlockFiles(... cleanupFiles(...)) bit or does the cacheUnlockFiles call need to stay? -- From benc at hawaga.org.uk Thu Oct 4 06:19:49 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 4 Oct 2007 11:19:49 +0000 (GMT) Subject: [Swift-devel] Swift 0.3 released Message-ID: Swift 0.3 is now available for download from the swift downloads page, http://www.ci.uchicago.edu/swift/downloads/ Swift 0.3 is a development release intended to distribute new functionality and fixes that have gone into our codebase since v0.2 was released in July. There are many changes, detailed in the CHANGES.txt file inside the release. Some significant changes: * mappers can now map files in remote locations in addition to the local disk (for example, accessed through gridftp or dcache) * PBS direct job submission, for running Swift directly on a PBS cluster avoiding GRAM. * Changes to logging formats to make mechanical analysis easier. * sequential iteration language construct (for example, for running simulations with each step being a separate job) Swift homepage: http://www.ci.uchicago.edu/swift/ Please download and enjoy, and do not hesitate to post mail to either the swift-devel or swift-user list with questions, comments and complaints. -- From andrewj at uchicago.edu Thu Oct 4 09:16:07 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Thu, 4 Oct 2007 09:16:07 -0500 (CDT) Subject: [Swift-devel] Re: passing types and variables in swift Message-ID: <20071004091607.AUN26035@m4500-00.uchicago.edu> All right, thanks. I am just trying to figure out the best way to setup parameter sweeps. Ideally, I would like to have some file of directory of files containing experiments to be run on the Grid environment and Swift picks them up and automatically sets up the appropriate WF based on the parameter information in the files. > >On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: > >> could I do something like this: >> >> RGIparams RGIinput[] ; >> >> where rgi_runs would contain say >> >> 1,2,true >> 3,4,false >> 18,20,true >> >> and so on..so that the parameters are passed into the workflow? > >no. > >mappers only map filenames (or URIs now); they don't map variable values. > >-- From hategan at mcs.anl.gov Thu Oct 4 09:22:36 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Oct 2007 09:22:36 -0500 Subject: [Swift-devel] working directory cleanup in the case of application exceptions. In-Reply-To: References: <1191420347.26140.2.camel@blabla.mcs.anl.gov> Message-ID: <1191507756.30940.1.camel@blabla.mcs.anl.gov> On Thu, 2007-10-04 at 10:35 +0000, Ben Clifford wrote: > > On Wed, 3 Oct 2007, Mihael Hategan wrote: > > > On Wed, 2007-10-03 at 13:59 +0000, Ben Clifford wrote: > > > vdl-int.k cleans up working directories on the remote site in the case of > > > an application exception. > > > > > > for debugging purposes it might be nice for that to not happen. can I > > > change that in the source? > > > > Makes sense. > > I realised I don't know what the vdl:cacheUnlockFiles stuff does. It tells the cache that a file is not used any more and can be deleted if so needed. > > Can I remove the whole > > vdl:cacheUnlockFiles(... cleanupFiles(...)) No! > > bit or does the cacheUnlockFiles call need to stay? > From wilde at mcs.anl.gov Thu Oct 4 09:24:51 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 04 Oct 2007 09:24:51 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <20071004091607.AUN26035@m4500-00.uchicago.edu> References: <20071004091607.AUN26035@m4500-00.uchicago.edu> Message-ID: <4704F7B3.4010004@mcs.anl.gov> I'll try to push forward with this as Ben is focusing on SC tutorials today. Andrew, lets try to make a simple model of your workflow that we can actually run in local mode, using mock apps. I need to go backwards, but basically this is a parameter sweep, where the parameter sets have a nested loop structure. Im noty sure f Im asking this right (as Im in a meeting and cant read right now) but do you want to express the parameters in files (because the lists are long) or right in the source code of the workflow? Does that question make sense? - Mike On 10/4/07 9:16 AM, andrewj at uchicago.edu wrote: > All right, thanks. > > I am just trying to figure out the best way to setup parameter > sweeps. Ideally, I would like to have some file of directory > of files containing experiments to be run on the Grid > environment and Swift picks them up and automatically sets up > the appropriate WF based on the parameter information in the > files. > > >> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: >> >>> could I do something like this: >>> >>> RGIparams RGIinput[] ; >>> >>> where rgi_runs would contain say >>> >>> 1,2,true >>> 3,4,false >>> 18,20,true >>> >>> and so on..so that the parameters are passed into the workflow? >> no. >> >> mappers only map filenames (or URIs now); they don't map > variable values. >> -- > > From andrewj at uchicago.edu Thu Oct 4 09:48:28 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Thu, 4 Oct 2007 09:48:28 -0500 (CDT) Subject: [Swift-devel] Re: passing types and variables in swift Message-ID: <20071004094828.AUN31015@m4500-00.uchicago.edu> Mike, Your question does make sense. I am looking for the most general model and configurable model which would allow us to not have to mess with the swift script as much as possible, but rather simply be able to dump a set of parameter setting files som,ewhere and swift pick them up and arrange things based on this. And yes, this is due in part to the concern of very long lists. But maybe I am over thinking things for right now. Maybe it will work out just as efficiently to have mulitple swift WF codes with the different param settings. I think however for the time being I will create the work flow to just use the parameter settings in the Swift code itself. Thanks, Andrew ---- Original message ---- >Date: Thu, 04 Oct 2007 09:24:51 -0500 >From: Michael Wilde >Subject: Re: passing types and variables in swift >To: andrewj at uchicago.edu >Cc: Ben Clifford , Mihael Hategan , swift-devel at ci.uchicago.edu > >I'll try to push forward with this as Ben is focusing on SC tutorials today. > >Andrew, lets try to make a simple model of your workflow that we can >actually run in local mode, using mock apps. > >I need to go backwards, but basically this is a parameter sweep, where >the parameter sets have a nested loop structure. Im noty sure f Im >asking this right (as Im in a meeting and cant read right now) but do >you want to express the parameters in files (because the lists are long) >or right in the source code of the workflow? Does that question make sense? > >- Mike > > >On 10/4/07 9:16 AM, andrewj at uchicago.edu wrote: >> All right, thanks. >> >> I am just trying to figure out the best way to setup parameter >> sweeps. Ideally, I would like to have some file of directory >> of files containing experiments to be run on the Grid >> environment and Swift picks them up and automatically sets up >> the appropriate WF based on the parameter information in the >> files. >> >> >>> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: >>> >>>> could I do something like this: >>>> >>>> RGIparams RGIinput[] ; >>>> >>>> where rgi_runs would contain say >>>> >>>> 1,2,true >>>> 3,4,false >>>> 18,20,true >>>> >>>> and so on..so that the parameters are passed into the workflow? >>> no. >>> >>> mappers only map filenames (or URIs now); they don't map >> variable values. >>> -- >> >> From benc at hawaga.org.uk Thu Oct 4 10:13:45 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 4 Oct 2007 15:13:45 +0000 (GMT) Subject: [Swift-devel] RE: [Swift-user] Swift 0.3 released In-Reply-To: References: Message-ID: What is most likely happening is that Swift is retrieving that URL and placing it into your working directory in a file called 'bardir/index.rdf'; then passing the path 'bardir/index.rdf' to your app (or rather to echo), essentially saying "I was told to give you a data file and I have done that and placed it at bardir/index.rdf". If you want the URL itself, that's a problem; if you only want to retrieve the content of the URL, that should perhaps be less of a problem. The CSV mapper (in fact any mapper) can only give the location of data files to the swift runtime engine, not pass string literals - though bugs have previously made that work. Reading literals from CSV is a feature a bunch of people want, though, so I think we should look at implementing something. On Thu, 4 Oct 2007, Allen, M. David wrote: > I just dropped 0.3 into place, and tried to re-run a test workflow that > I have. > > For some reason, 0.3 seems to be causing problems with CSV files. > Attached is a simple swiftscript, the CSV file it requires as input, > and the output files generated as part of the run. > > The CSV file is simple: > > name,feedURL > The Foo Blog,http://foo.com/somedir/index.rdf > The Bar Blog,http://bar.com/bardir/index.rdf > (...) > > All the swiftscript does is take the feedURL, and then print each one > out into a different file. As far as I can tell though, splitting this > file by a comma delimiter is causing problems. Instead of getting the > full URL as an argument to "echo", it is only passing > "somedir/index.rdf", "bardir/index.rdf", and so on. > > One other thing -- my echo statement looks like this: > > echo @b.feedURL stdout=@filename(headlines); > > >From what I've read, the '@' in front of b.feedURL shouldn't be > required. If this isn't present though, the output for every line of > the csv is the string literal 'true' instead of even a portion of the > URL. > > Any idea what might be going on here? > > Apologies if I'm making some kind of silly mistake, but I can't find > any reference to this issue in the CHANGES file, and this has worked > just fine with previous releases. > > Thanks, > > -- David > > > -----Original Message----- > From: swift-user-bounces at ci.uchicago.edu > [mailto:swift-user-bounces at ci.uchicago.edu] On Behalf Of Ben Clifford > Sent: Thursday, October 04, 2007 7:20 AM > To: swift-user at ci.uchicago.edu; swift-devel at ci.uchicago.edu > Subject: [Swift-user] Swift 0.3 released > > > Swift 0.3 is now available for download from the swift downloads page, > http://www.ci.uchicago.edu/swift/downloads/ > > Swift 0.3 is a development release intended to distribute new > functionality and fixes that have gone into our codebase since v0.2 was > > released in July. > > There are many changes, detailed in the CHANGES.txt file inside the > release. > > Some significant changes: > > * mappers can now map files in remote locations in addition to the > local > disk (for example, accessed through gridftp or dcache) > > * PBS direct job submission, for running Swift directly on a PBS > cluster avoiding GRAM. > > * Changes to logging formats to make mechanical analysis easier. > > * sequential iteration language construct (for example, for running > simulations with each step being a separate job) > > Swift homepage: http://www.ci.uchicago.edu/swift/ > > Please download and enjoy, and do not hesitate to post mail to either > the > swift-devel or swift-user list with questions, comments and complaints. > > From benc at hawaga.org.uk Thu Oct 4 10:15:20 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 4 Oct 2007 15:15:20 +0000 (GMT) Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <20071004094828.AUN31015@m4500-00.uchicago.edu> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> Message-ID: you're the second person today to talk about reading in values from csv files. maybe we should implement this. -- From wilde at mcs.anl.gov Thu Oct 4 10:36:11 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 04 Oct 2007 10:36:11 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: References: <20071004094828.AUN31015@m4500-00.uchicago.edu> Message-ID: <4705086B.7030904@mcs.anl.gov> a hacky way to do this is first a loop to read the csv file into an array, then do a foreach over the array. i suspect we have the constructs (with @extractint) to do this; not sure if we can read the CSV into an array-of-structs that we can then foreach() over. But even if not, this will work using parallel arrays, and probably not be too unpleasing. lets try it. On 10/4/07 10:15 AM, Ben Clifford wrote: > you're the second person today to talk about reading in values from csv > files. maybe we should implement this. From bugzilla-daemon at mcs.anl.gov Thu Oct 4 10:45:45 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 4 Oct 2007 10:45:45 -0500 (CDT) Subject: [Swift-devel] [Bug 102] New: workflow failes due to file cache duplicates Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=102 Summary: workflow failes due to file cache duplicates Product: App-MolDyn Version: unspecified Platform: Macintosh OS/Version: Linux Status: NEW Severity: major Priority: P2 Component: FreeEnergyForMolecules AssignedTo: nefedova at mcs.anl.gov ReportedBy: nefedova at mcs.anl.gov CC: swift-devel at ci.uchicago.edu Workflow fails with this error: 3. Application "chrm_long" failed (The cache already contains UC-64:MolDyn-50-loops-20071003-1743-ed7f45xa/shared/solv_repu_0.3_0.4_a1_m003_done.) The application produced all the correct files but something interrupts the stage out process - no further retries would succeed due to a cache having the copies of the files that need to be staged out. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Thu Oct 4 10:46:02 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 4 Oct 2007 15:46:02 +0000 (GMT) Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <4705086B.7030904@mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> Message-ID: i'd be inclined to grab the csv reading code from the csv mapper and see about making a @extractcsv function. It keeps code using it a little bit more like it would if mappers could support it, which is probably the way things will end up one day. On Thu, 4 Oct 2007, Michael Wilde wrote: > a hacky way to do this is first a loop to read the csv file into an array, > then do a foreach over the array. > > i suspect we have the constructs (with @extractint) to do this; not sure if we > can read the CSV into an array-of-structs that we can then foreach() over. But > even if not, this will work using parallel arrays, and probably not be too > unpleasing. > > lets try it. > > On 10/4/07 10:15 AM, Ben Clifford wrote: > > you're the second person today to talk about reading in values from csv > > files. maybe we should implement this. > > From dmallen at mitre.org Thu Oct 4 10:40:59 2007 From: dmallen at mitre.org (Allen, M. David) Date: Thu, 4 Oct 2007 11:40:59 -0400 Subject: [Swift-devel] RE: [Swift-user] Swift 0.3 released In-Reply-To: References: Message-ID: I'm not sure I understand... Why would the CSV mapper actually attempt to fetch a URL that's just a string inside of a CSV file? There's no code that instructs it to do that. Additionally, I can't find any of those directories that would be created if this is what it was doing. If the CSV file contained other string literals with arbitrary data are you saying that swift would attempt to treat those literals as if they were files or URLs and access them too? Why would wanting just the URL as a text literal be a problem? If I used a CSV mapper to refer to only filenames, still it would be passing those filenames as string literals, right? If the CSV mapper can only give the location of files, and not string literals, how does this program even echo a portion of the URL? If only given the location of the file, wouldn't you expect it to echo the path to blogs.csv? In fact, now I have to admit I'm completely confused as to what the point of the CSV mapper is. Was there a bugfix that went in that broke previous behavior? Earlier versions worked just like I expected them to, and 0.3 now shows this problem. It seems intuitive to me that the csv mapper would allow you to access string literals inside of a CSV file. What was the bug that was fixed? -- David -----Original Message----- From: Ben Clifford [mailto:benc at hawaga.org.uk] Sent: Thursday, October 04, 2007 11:14 AM To: Allen, M. David Cc: swift-user at ci.uchicago.edu; swift-devel at ci.uchicago.edu Subject: RE: [Swift-user] Swift 0.3 released What is most likely happening is that Swift is retrieving that URL and placing it into your working directory in a file called 'bardir/index.rdf'; then passing the path 'bardir/index.rdf' to your app (or rather to echo), essentially saying "I was told to give you a data file and I have done that and placed it at bardir/index.rdf". If you want the URL itself, that's a problem; if you only want to retrieve the content of the URL, that should perhaps be less of a problem. The CSV mapper (in fact any mapper) can only give the location of data files to the swift runtime engine, not pass string literals - though bugs have previously made that work. Reading literals from CSV is a feature a bunch of people want, though, so I think we should look at implementing something. On Thu, 4 Oct 2007, Allen, M. David wrote: > I just dropped 0.3 into place, and tried to re-run a test workflow that > I have. > > For some reason, 0.3 seems to be causing problems with CSV files. > Attached is a simple swiftscript, the CSV file it requires as input, > and the output files generated as part of the run. > > The CSV file is simple: > > name,feedURL > The Foo Blog,http://foo.com/somedir/index.rdf > The Bar Blog,http://bar.com/bardir/index.rdf > (...) > > All the swiftscript does is take the feedURL, and then print each one > out into a different file. As far as I can tell though, splitting this > file by a comma delimiter is causing problems. Instead of getting the > full URL as an argument to "echo", it is only passing > "somedir/index.rdf", "bardir/index.rdf", and so on. > > One other thing -- my echo statement looks like this: > > echo @b.feedURL stdout=@filename(headlines); > > >From what I've read, the '@' in front of b.feedURL shouldn't be > required. If this isn't present though, the output for every line of > the csv is the string literal 'true' instead of even a portion of the > URL. > > Any idea what might be going on here? > > Apologies if I'm making some kind of silly mistake, but I can't find > any reference to this issue in the CHANGES file, and this has worked > just fine with previous releases. > > Thanks, > > -- David > > > -----Original Message----- > From: swift-user-bounces at ci.uchicago.edu > [mailto:swift-user-bounces at ci.uchicago.edu] On Behalf Of Ben Clifford > Sent: Thursday, October 04, 2007 7:20 AM > To: swift-user at ci.uchicago.edu; swift-devel at ci.uchicago.edu > Subject: [Swift-user] Swift 0.3 released > > > Swift 0.3 is now available for download from the swift downloads page, > http://www.ci.uchicago.edu/swift/downloads/ > > Swift 0.3 is a development release intended to distribute new > functionality and fixes that have gone into our codebase since v0.2 was > > released in July. > > There are many changes, detailed in the CHANGES.txt file inside the > release. > > Some significant changes: > > * mappers can now map files in remote locations in addition to the > local > disk (for example, accessed through gridftp or dcache) > > * PBS direct job submission, for running Swift directly on a PBS > cluster avoiding GRAM. > > * Changes to logging formats to make mechanical analysis easier. > > * sequential iteration language construct (for example, for running > simulations with each step being a separate job) > > Swift homepage: http://www.ci.uchicago.edu/swift/ > > Please download and enjoy, and do not hesitate to post mail to either > the > swift-devel or swift-user list with questions, comments and complaints. > > From benc at hawaga.org.uk Thu Oct 4 11:01:59 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 4 Oct 2007 16:01:59 +0000 (GMT) Subject: [Swift-devel] RE: [Swift-user] Swift 0.3 released In-Reply-To: References: Message-ID: On Thu, 4 Oct 2007, Allen, M. David wrote: > I'm not sure I understand... > > Why would the CSV mapper actually attempt to fetch a URL that's just a > string inside of a CSV file? There's no code that instructs it to do > that. That's what mappers do. They tell swift where data files are, so that it then can move those data files around and put them where your job will run. The 'code that instructs it to do that' is the mapper syntax with - that syntax says "this mapper will tell you where to find the filenames for this variable". The CSV mapper is a way to say "I want to have an array of structures of files in SwiftScript, and here are the filenames that you should use for the elements of that array of structures." > Additionally, I can't find any of those directories that would be > created if this is what it was doing. You shouldn't see them after the workflow has run as they should tidied up. Try running ls -l @url instead of echo @url and see what you get. > If the CSV file contained other string literals with arbitrary data are > you saying that swift would attempt to treat those literals as if they > were files or URLs and access them too? yes, it should be. I'm slightly surprised you don't get problems with the first column. > Why would wanting just the URL as a text literal be a problem? If I > used a CSV mapper to refer to only filenames, still it would be passing > those filenames as string literals, right? > If the CSV mapper can only give the location of files, and not string > literals, how does this program even echo a portion of the URL? If > only given the location of the file, wouldn't you expect it to echo the > path to blogs.csv? The filenames that get passed are filenames for the application program to use on the execute side of things. Swift is insistent that it will place the input files in the correct place and then tell you where it has put them. When you pass in http://my.url.com/foo/bar as an input file, Swift will place that file in your job's run directory as foo/bar and then tell your app where it had put it. (conversely, it might decide to put it in a file called XXX; in which case your application would receive XXX as an input string; and your app should find that the local file XXX contained the content retrieved from http://my.url.com) > In fact, now I have to admit I'm completely confused as to what the > point of the CSV mapper is. Not what you want it for. Anything to do with mappers is for instructing swift on which files your application wants moved around for it automatically. > Was there a bugfix that went in that broke previous behavior? Earlier > versions worked just like I expected them to, and 0.3 now shows this > problem. It seems intuitive to me that the csv mapper would allow you > to access string literals inside of a CSV file. What was the bug that > was fixed? Its an incorrect intuition, probably caused by the fact that we don't document mappers well enough / in the right way. The bug was that URLs didn't work properly for what they were intended. -- From dmallen at mitre.org Thu Oct 4 09:56:21 2007 From: dmallen at mitre.org (Allen, M. David) Date: Thu, 4 Oct 2007 10:56:21 -0400 Subject: [Swift-devel] RE: [Swift-user] Swift 0.3 released In-Reply-To: References: Message-ID: I just dropped 0.3 into place, and tried to re-run a test workflow that I have. For some reason, 0.3 seems to be causing problems with CSV files. Attached is a simple swiftscript, the CSV file it requires as input, and the output files generated as part of the run. The CSV file is simple: name,feedURL The Foo Blog,http://foo.com/somedir/index.rdf The Bar Blog,http://bar.com/bardir/index.rdf (...) All the swiftscript does is take the feedURL, and then print each one out into a different file. As far as I can tell though, splitting this file by a comma delimiter is causing problems. Instead of getting the full URL as an argument to "echo", it is only passing "somedir/index.rdf", "bardir/index.rdf", and so on. One other thing -- my echo statement looks like this: echo @b.feedURL stdout=@filename(headlines); >From what I've read, the '@' in front of b.feedURL shouldn't be required. If this isn't present though, the output for every line of the csv is the string literal 'true' instead of even a portion of the URL. Any idea what might be going on here? Apologies if I'm making some kind of silly mistake, but I can't find any reference to this issue in the CHANGES file, and this has worked just fine with previous releases. Thanks, -- David -----Original Message----- From: swift-user-bounces at ci.uchicago.edu [mailto:swift-user-bounces at ci.uchicago.edu] On Behalf Of Ben Clifford Sent: Thursday, October 04, 2007 7:20 AM To: swift-user at ci.uchicago.edu; swift-devel at ci.uchicago.edu Subject: [Swift-user] Swift 0.3 released Swift 0.3 is now available for download from the swift downloads page, http://www.ci.uchicago.edu/swift/downloads/ Swift 0.3 is a development release intended to distribute new functionality and fixes that have gone into our codebase since v0.2 was released in July. There are many changes, detailed in the CHANGES.txt file inside the release. Some significant changes: * mappers can now map files in remote locations in addition to the local disk (for example, accessed through gridftp or dcache) * PBS direct job submission, for running Swift directly on a PBS cluster avoiding GRAM. * Changes to logging formats to make mechanical analysis easier. * sequential iteration language construct (for example, for running simulations with each step being a separate job) Swift homepage: http://www.ci.uchicago.edu/swift/ Please download and enjoy, and do not hesitate to post mail to either the swift-devel or swift-user list with questions, comments and complaints. -- _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -------------- next part -------------- A non-text attachment was scrubbed... Name: possible-bug.dtm Type: application/octet-stream Size: 476 bytes Desc: possible-bug.dtm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blogs.csv Type: application/octet-stream Size: 820 bytes Desc: blogs.csv URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: output.zip Type: application/x-zip-compressed Size: 2294 bytes Desc: output.zip URL: From hategan at mcs.anl.gov Thu Oct 4 11:38:19 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Oct 2007 11:38:19 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> Message-ID: <1191515899.3641.0.camel@blabla.mcs.anl.gov> These hacks will bite us in the future. On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: > i'd be inclined to grab the csv reading code from the csv mapper and see > about making a @extractcsv function. It keeps code using it a little bit > more like it would if mappers could support it, which is probably the way > things will end up one day. > > On Thu, 4 Oct 2007, Michael Wilde wrote: > > > a hacky way to do this is first a loop to read the csv file into an array, > > then do a foreach over the array. > > > > i suspect we have the constructs (with @extractint) to do this; not sure if we > > can read the CSV into an array-of-structs that we can then foreach() over. But > > even if not, this will work using parallel arrays, and probably not be too > > unpleasing. > > > > lets try it. > > > > On 10/4/07 10:15 AM, Ben Clifford wrote: > > > you're the second person today to talk about reading in values from csv > > > files. maybe we should implement this. > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Thu Oct 4 11:41:37 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 4 Oct 2007 16:41:37 +0000 (GMT) Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <1191515899.3641.0.camel@blabla.mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> Message-ID: yes. that's why I'd rather they be hacks that look more like what they should end up like. in order to get people doing things with this, there are going to need to be hacks (or prototype implementations, if you prefer) - months of abstract arm waving about how mappers work without concrete use has not resulting in a mapper API that does what some people want it to do; and months more of it won't, either. On Thu, 4 Oct 2007, Mihael Hategan wrote: > These hacks will bite us in the future. > > On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: > > i'd be inclined to grab the csv reading code from the csv mapper and see > > about making a @extractcsv function. It keeps code using it a little bit > > more like it would if mappers could support it, which is probably the way > > things will end up one day. > > > > On Thu, 4 Oct 2007, Michael Wilde wrote: > > > > > a hacky way to do this is first a loop to read the csv file into an array, > > > then do a foreach over the array. > > > > > > i suspect we have the constructs (with @extractint) to do this; not sure if we > > > can read the CSV into an array-of-structs that we can then foreach() over. But > > > even if not, this will work using parallel arrays, and probably not be too > > > unpleasing. > > > > > > lets try it. > > > > > > On 10/4/07 10:15 AM, Ben Clifford wrote: > > > > you're the second person today to talk about reading in values from csv > > > > files. maybe we should implement this. > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From bugzilla-daemon at mcs.anl.gov Thu Oct 4 11:43:26 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 4 Oct 2007 11:43:26 -0500 (CDT) Subject: [Swift-devel] [Bug 103] New: Add swift parameters values to log file for diagnosis Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=103 Summary: Add swift parameters values to log file for diagnosis Product: Swift Version: unspecified Platform: Macintosh OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: wilde at mcs.anl.gov Ben wrote: those are sufficiently small we could perhaps dump them all to the log file always. On Thu, 4 Oct 2007, Michael Wilde wrote: > > it would be nice to save swift command line and the properties file and > > associate them with the log, so that a summary of the swift environment can be > > printed by the log tools. > > > > On 10/4/07 11:19 AM, Ioan Raicu wrote: >> > > OK, the link works! >> > > It looks like it used a few more nodes this time (~30 nodes concurrently), >> > > with a runtime of ~25K sec. I don't think I saw the answer to a question of >> > > mine from yesterday, was there any throttling on submission rate for GRAM, >> > > or were jobs being set as fast as possible? >> > > >> > > Ioan >> > > >> > > Ben Clifford wrote: -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From wilde at mcs.anl.gov Thu Oct 4 11:50:07 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 04 Oct 2007 11:50:07 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <1191515899.3641.0.camel@blabla.mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> Message-ID: <470519BF.7040004@mcs.anl.gov> the extractcsv function sounds a bit like the value-setting mechanism i proposed when you started on extractint: set many values from one file. is one way to do this to 'eval' a file containing swift expressions? Or, to #include them? This would look like a keyword-based value file myparms.rho[0]=0.1 mparms.theta[0]=0.2 myparms.rho[1]=0.4 mparms.theta[0]=0.8 ... myparms.rho[N-1]=0.etc mparms.theta[etc]=0.etc even if not as clean as extractcsv, it might be both fast and useful for many other purposes. - mike On 10/4/07 11:38 AM, Mihael Hategan wrote: > These hacks will bite us in the future. > > On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: >> i'd be inclined to grab the csv reading code from the csv mapper and see >> about making a @extractcsv function. It keeps code using it a little bit >> more like it would if mappers could support it, which is probably the way >> things will end up one day. >> >> On Thu, 4 Oct 2007, Michael Wilde wrote: >> >>> a hacky way to do this is first a loop to read the csv file into an array, >>> then do a foreach over the array. >>> >>> i suspect we have the constructs (with @extractint) to do this; not sure if we >>> can read the CSV into an array-of-structs that we can then foreach() over. But >>> even if not, this will work using parallel arrays, and probably not be too >>> unpleasing. >>> >>> lets try it. >>> >>> On 10/4/07 10:15 AM, Ben Clifford wrote: >>>> you're the second person today to talk about reading in values from csv >>>> files. maybe we should implement this. >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > From hategan at mcs.anl.gov Thu Oct 4 11:50:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Oct 2007 11:50:45 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> Message-ID: <1191516645.4186.0.camel@blabla.mcs.anl.gov> Or we could say some feature is not available yet. On Thu, 2007-10-04 at 16:41 +0000, Ben Clifford wrote: > yes. > > that's why I'd rather they be hacks that look more like what they should > end up like. > > in order to get people doing things with this, there are going to need to > be hacks (or prototype implementations, if you prefer) - months of > abstract arm waving about how mappers work without concrete use has not > resulting in a mapper API that does what some people want it to do; and > months more of it won't, either. > > On Thu, 4 Oct 2007, Mihael Hategan wrote: > > > These hacks will bite us in the future. > > > > On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: > > > i'd be inclined to grab the csv reading code from the csv mapper and see > > > about making a @extractcsv function. It keeps code using it a little bit > > > more like it would if mappers could support it, which is probably the way > > > things will end up one day. > > > > > > On Thu, 4 Oct 2007, Michael Wilde wrote: > > > > > > > a hacky way to do this is first a loop to read the csv file into an array, > > > > then do a foreach over the array. > > > > > > > > i suspect we have the constructs (with @extractint) to do this; not sure if we > > > > can read the CSV into an array-of-structs that we can then foreach() over. But > > > > even if not, this will work using parallel arrays, and probably not be too > > > > unpleasing. > > > > > > > > lets try it. > > > > > > > > On 10/4/07 10:15 AM, Ben Clifford wrote: > > > > > you're the second person today to talk about reading in values from csv > > > > > files. maybe we should implement this. > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From benc at hawaga.org.uk Thu Oct 4 11:52:23 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 4 Oct 2007 16:52:23 +0000 (GMT) Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <1191516645.4186.0.camel@blabla.mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <1191516645.4186.0.camel@blabla.mcs.anl.gov> Message-ID: On Thu, 4 Oct 2007, Mihael Hategan wrote: > Or we could say some feature is not available yet. My point was that the 'yet' will not end without some actual real application use of it. -- From benc at hawaga.org.uk Thu Oct 4 11:53:27 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 4 Oct 2007 16:53:27 +0000 (GMT) Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <470519BF.7040004@mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <470519BF.7040004@mcs.anl.gov> Message-ID: On Thu, 4 Oct 2007, Michael Wilde wrote: > is one way to do this to 'eval' a file containing swift expressions? > Or, to #include them? > This would look like a keyword-based value file > > myparms.rho[0]=0.1 mparms.theta[0]=0.2 > myparms.rho[1]=0.4 mparms.theta[0]=0.8 > ... > myparms.rho[N-1]=0.etc mparms.theta[etc]=0.etc > > even if not as clean as extractcsv, it might be both fast and useful for many > other purposes. you can do that already - generate a swiftscript source file from some other source. nika did something like this (and still does, but to a lesser extent) for moldyn. -- From wilde at mcs.anl.gov Thu Oct 4 11:57:50 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 04 Oct 2007 11:57:50 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <470519BF.7040004@mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <470519BF.7040004@mcs.anl.gov> Message-ID: <47051B8E.4080008@mcs.anl.gov> I should add: While we discuss a direction here, I think that Andrew could do what he needs to do by just coding a script with statements like below. As I recall Tibi and others have been doing this man many previous cases. To make it manageable, he (or we) could do a simple ad-hoc #include mechanism to separate his core code from such list-setting code. Andrew, can you take this from here using that approach? Mihael: I think that addresses your "just say no" comment: do experiments that defer language changes until we see how they would be used and how best to couch them in the language. - Mike On 10/4/07 11:50 AM, Michael Wilde wrote: > the extractcsv function sounds a bit like the value-setting mechanism i > proposed when you started on extractint: set many values from one file. > > is one way to do this to 'eval' a file containing swift expressions? > Or, to #include them? > This would look like a keyword-based value file > > myparms.rho[0]=0.1 mparms.theta[0]=0.2 > myparms.rho[1]=0.4 mparms.theta[0]=0.8 > ... > myparms.rho[N-1]=0.etc mparms.theta[etc]=0.etc > > even if not as clean as extractcsv, it might be both fast and useful for > many other purposes. > > - mike > > > > On 10/4/07 11:38 AM, Mihael Hategan wrote: >> These hacks will bite us in the future. >> >> On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: >>> i'd be inclined to grab the csv reading code from the csv mapper and >>> see about making a @extractcsv function. It keeps code using it a >>> little bit more like it would if mappers could support it, which is >>> probably the way things will end up one day. >>> >>> On Thu, 4 Oct 2007, Michael Wilde wrote: >>> >>>> a hacky way to do this is first a loop to read the csv file into an >>>> array, >>>> then do a foreach over the array. >>>> >>>> i suspect we have the constructs (with @extractint) to do this; not >>>> sure if we >>>> can read the CSV into an array-of-structs that we can then foreach() >>>> over. But >>>> even if not, this will work using parallel arrays, and probably not >>>> be too >>>> unpleasing. >>>> >>>> lets try it. >>>> >>>> On 10/4/07 10:15 AM, Ben Clifford wrote: >>>>> you're the second person today to talk about reading in values from >>>>> csv >>>>> files. maybe we should implement this. >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From bugzilla-daemon at mcs.anl.gov Thu Oct 4 14:47:21 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 4 Oct 2007 14:47:21 -0500 (CDT) Subject: [Swift-devel] [Bug 104] New: Add cert request tools to swift/bin Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=104 Summary: Add cert request tools to swift/bin Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P5 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: wilde at mcs.anl.gov CC: benc at hawaga.org.uk This would help close the loop for new users for getting DOEGrids certs: we could document how to do it much more readily. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From yongzh at cs.uchicago.edu Thu Oct 4 22:21:12 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Thu, 4 Oct 2007 22:21:12 -0500 (CDT) Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <20071004091607.AUN26035@m4500-00.uchicago.edu> References: <20071004091607.AUN26035@m4500-00.uchicago.edu> Message-ID: Mappers should ideally be able to map primitive types such as string and int, as well as file names. The CSVMapper can read all the values in the file, it is just that it needs to interprete the values according to their types. So If we convey the type info 'RGIparams' to the csv mapper, it should have no problem reading the values. I actually added getType and setType to the mapper interface, that did not get into the release, but I think it is what it should have. Yong. On Thu, 4 Oct 2007 andrewj at uchicago.edu wrote: > All right, thanks. > > I am just trying to figure out the best way to setup parameter > sweeps. Ideally, I would like to have some file of directory > of files containing experiments to be run on the Grid > environment and Swift picks them up and automatically sets up > the appropriate WF based on the parameter information in the > files. > > > > > >On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: > > > >> could I do something like this: > >> > >> RGIparams RGIinput[] ; > >> > >> where rgi_runs would contain say > >> > >> 1,2,true > >> 3,4,false > >> 18,20,true > >> > >> and so on..so that the parameters are passed into the workflow? > > > >no. > > > >mappers only map filenames (or URIs now); they don't map > variable values. > > > >-- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Thu Oct 4 22:39:16 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 04 Oct 2007 22:39:16 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: References: <20071004091607.AUN26035@m4500-00.uchicago.edu> Message-ID: <4705B1E4.8090403@mcs.anl.gov> thanks for the info, Yong. while we're discussing mappers, could you or Ben clarify: - can a mapper be only used on a single variable declaration? ie you can not map a dataset to a variable within a struct? this would be useful eg to create a struct that has both file and scalar valued members, to set up a parameter sweep where each file needs specific parameters. - are mappers processed before the program (or procedure) starts executing? so you cant use a mapper like an assignment statement to set or reset a variable? - mike On 10/4/07 10:21 PM, Yong Zhao wrote: > Mappers should ideally be able to map primitive types such as string and > int, as well as file names. The CSVMapper can read all the values in the > file, it is just that it needs to interprete the values according to their > types. So If we convey the type info 'RGIparams' to the csv mapper, it > should have no problem reading the values. > > I actually added getType and setType to the mapper interface, that did > not get into the release, but I think it is what it should have. > > Yong. > > On Thu, 4 Oct 2007 andrewj at uchicago.edu wrote: > >> All right, thanks. >> >> I am just trying to figure out the best way to setup parameter >> sweeps. Ideally, I would like to have some file of directory >> of files containing experiments to be run on the Grid >> environment and Swift picks them up and automatically sets up >> the appropriate WF based on the parameter information in the >> files. >> >> >>> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: >>> >>>> could I do something like this: >>>> >>>> RGIparams RGIinput[] ; >>>> >>>> where rgi_runs would contain say >>>> >>>> 1,2,true >>>> 3,4,false >>>> 18,20,true >>>> >>>> and so on..so that the parameters are passed into the workflow? >>> no. >>> >>> mappers only map filenames (or URIs now); they don't map >> variable values. >>> -- >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From yongzh at cs.uchicago.edu Thu Oct 4 22:53:45 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Thu, 4 Oct 2007 22:53:45 -0500 (CDT) Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <4705B1E4.8090403@mcs.anl.gov> References: <20071004091607.AUN26035@m4500-00.uchicago.edu> <4705B1E4.8090403@mcs.anl.gov> Message-ID: Mike > while we're discussing mappers, could you or Ben clarify: > > - can a mapper be only used on a single variable declaration? ie you can > not map a dataset to a variable within a struct? this would be useful > eg to create a struct that has both file and scalar valued members, to > set up a parameter sweep where each file needs specific parameters. This would be the nested mapper scenario, in order to make the scripts cleaner, I was think about a mapping descriptor (which could be an XML file to describe each layer of the mapping), but this is kind of far reaching. > > - are mappers processed before the program (or procedure) starts > executing? so you cant use a mapper like an assignment statement to set > or reset a variable? > Mappers are evaluated before procedure execution, the only exception is when a mapper depends on some intermediate data, and it will wait for that data to be available (as in the montage workflow). However, mapping is different from assignment (in assignment, it is more like calling a mapping function such as @extractint). A variable currently can not be reset or reassigned, unless it is an interation variable (it gets re-assigned implicitly). Yong. > - mike > > On 10/4/07 10:21 PM, Yong Zhao wrote: > > Mappers should ideally be able to map primitive types such as string and > > int, as well as file names. The CSVMapper can read all the values in the > > file, it is just that it needs to interprete the values according to their > > types. So If we convey the type info 'RGIparams' to the csv mapper, it > > should have no problem reading the values. > > > > I actually added getType and setType to the mapper interface, that did > > not get into the release, but I think it is what it should have. > > > > Yong. > > > > On Thu, 4 Oct 2007 andrewj at uchicago.edu wrote: > > > >> All right, thanks. > >> > >> I am just trying to figure out the best way to setup parameter > >> sweeps. Ideally, I would like to have some file of directory > >> of files containing experiments to be run on the Grid > >> environment and Swift picks them up and automatically sets up > >> the appropriate WF based on the parameter information in the > >> files. > >> > >> > >>> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: > >>> > >>>> could I do something like this: > >>>> > >>>> RGIparams RGIinput[] ; > >>>> > >>>> where rgi_runs would contain say > >>>> > >>>> 1,2,true > >>>> 3,4,false > >>>> 18,20,true > >>>> > >>>> and so on..so that the parameters are passed into the workflow? > >>> no. > >>> > >>> mappers only map filenames (or URIs now); they don't map > >> variable values. > >>> -- > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From hategan at mcs.anl.gov Thu Oct 4 23:04:04 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Oct 2007 23:04:04 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <4705B1E4.8090403@mcs.anl.gov> References: <20071004091607.AUN26035@m4500-00.uchicago.edu> <4705B1E4.8090403@mcs.anl.gov> Message-ID: <1191557044.28169.9.camel@blabla.mcs.anl.gov> On Thu, 2007-10-04 at 22:39 -0500, Michael Wilde wrote: > thanks for the info, Yong. > > while we're discussing mappers, could you or Ben clarify: > > - can a mapper be only used on a single variable declaration? ie you can > not map a dataset to a variable within a struct? this would be useful > eg to create a struct that has both file and scalar valued members, to > set up a parameter sweep where each file needs specific parameters. > > - are mappers processed before the program (or procedure) starts > executing? There is no concept of mappers being processed. I don't think there's much value in the discussion if we don't go over that one. Mappers are objects. Instances of a certain class of objects. One of their methods is map(swift_variable) which returns a file name. The other is existing() which returns a list of available things (e.g. all available indices in an array). We need to add get/setValue to that, and the respective implementations. We also need to add relevant code which, before invoking an application, can get a value from a variable using the above getValue and write it to a file. > so you cant use a mapper like an assignment statement to set > or reset a variable? > > - mike > > On 10/4/07 10:21 PM, Yong Zhao wrote: > > Mappers should ideally be able to map primitive types such as string and > > int, as well as file names. The CSVMapper can read all the values in the > > file, it is just that it needs to interprete the values according to their > > types. So If we convey the type info 'RGIparams' to the csv mapper, it > > should have no problem reading the values. > > > > I actually added getType and setType to the mapper interface, that did > > not get into the release, but I think it is what it should have. > > > > Yong. > > > > On Thu, 4 Oct 2007 andrewj at uchicago.edu wrote: > > > >> All right, thanks. > >> > >> I am just trying to figure out the best way to setup parameter > >> sweeps. Ideally, I would like to have some file of directory > >> of files containing experiments to be run on the Grid > >> environment and Swift picks them up and automatically sets up > >> the appropriate WF based on the parameter information in the > >> files. > >> > >> > >>> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: > >>> > >>>> could I do something like this: > >>>> > >>>> RGIparams RGIinput[] ; > >>>> > >>>> where rgi_runs would contain say > >>>> > >>>> 1,2,true > >>>> 3,4,false > >>>> 18,20,true > >>>> > >>>> and so on..so that the parameters are passed into the workflow? > >>> no. > >>> > >>> mappers only map filenames (or URIs now); they don't map > >> variable values. > >>> -- > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Thu Oct 4 23:06:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Oct 2007 23:06:18 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: References: <20071004091607.AUN26035@m4500-00.uchicago.edu> <4705B1E4.8090403@mcs.anl.gov> Message-ID: <1191557178.28169.12.camel@blabla.mcs.anl.gov> On Thu, 2007-10-04 at 22:53 -0500, Yong Zhao wrote: > Mike > > > while we're discussing mappers, could you or Ben clarify: > > > > - can a mapper be only used on a single variable declaration? ie you can > > not map a dataset to a variable within a struct? this would be useful > > eg to create a struct that has both file and scalar valued members, to > > set up a parameter sweep where each file needs specific parameters. > > This would be the nested mapper scenario, in order to make the scripts > cleaner, I was think about a mapping descriptor (which could be an XML > file to describe each layer of the mapping), but this is kind of far > reaching. I think C pointers provide a very good model for this. We should perhaps use that as a reference. > > > > > - are mappers processed before the program (or procedure) starts > > executing? so you cant use a mapper like an assignment statement to set > > or reset a variable? > > > Mappers are evaluated before procedure execution, the only exception is > when a mapper depends on some intermediate data, and it will wait for that > data to be available (as in the montage workflow). However, mapping is > different from assignment (in assignment, it is more like calling a > mapping function such as @extractint). A variable currently can not be > reset or reassigned, unless it is an interation variable (it gets > re-assigned implicitly). > > Yong. > > > - mike > > > > On 10/4/07 10:21 PM, Yong Zhao wrote: > > > Mappers should ideally be able to map primitive types such as string and > > > int, as well as file names. The CSVMapper can read all the values in the > > > file, it is just that it needs to interprete the values according to their > > > types. So If we convey the type info 'RGIparams' to the csv mapper, it > > > should have no problem reading the values. > > > > > > I actually added getType and setType to the mapper interface, that did > > > not get into the release, but I think it is what it should have. > > > > > > Yong. > > > > > > On Thu, 4 Oct 2007 andrewj at uchicago.edu wrote: > > > > > >> All right, thanks. > > >> > > >> I am just trying to figure out the best way to setup parameter > > >> sweeps. Ideally, I would like to have some file of directory > > >> of files containing experiments to be run on the Grid > > >> environment and Swift picks them up and automatically sets up > > >> the appropriate WF based on the parameter information in the > > >> files. > > >> > > >> > > >>> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: > > >>> > > >>>> could I do something like this: > > >>>> > > >>>> RGIparams RGIinput[] ; > > >>>> > > >>>> where rgi_runs would contain say > > >>>> > > >>>> 1,2,true > > >>>> 3,4,false > > >>>> 18,20,true > > >>>> > > >>>> and so on..so that the parameters are passed into the workflow? > > >>> no. > > >>> > > >>> mappers only map filenames (or URIs now); they don't map > > >> variable values. > > >>> -- > > >> _______________________________________________ > > >> Swift-devel mailing list > > >> Swift-devel at ci.uchicago.edu > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >> > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Thu Oct 4 23:14:51 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 04 Oct 2007 23:14:51 -0500 Subject: [Swift-devel] simple parameter test hangs Message-ID: <4705BA3B.7030603@mcs.anl.gov> -- the following program hangs (using 0.3): type messagefile; type params { int x; int y; } (messagefile t) pecho (params p[] ) { app { echo p[0].x p[0].y stdout=@filename(t); } } messagefile outfile <"hello2.txt">; params z[]; z[0].x = 111; z[0].y = 222; outfile = pecho(z); -- while this one works: type messagefile; type params { int x; int y; } (messagefile t) pecho (params p) { app { echo p.x p.y stdout=@filename(t); } } messagefile outfile <"hello2.txt">; params z; z.x = 111; z.y = 222; outfile = pecho(z); -- swift says: $ swift t2.swift Swift v0.3 r1319 (modified locally) RunID: 20071004-2312-mlnzad63 (and then it hangs) -- heres the log: $ more t2-20071004-2312-mlnzad63.log 2007-10-04 23:12:43,147-0500 INFO Loader t2.swift: source file is new. Recompiling. 2007-10-04 23:12:45,431-0500 INFO Karajan Validation of XML intermediate file was succ essful 2007-10-04 23:12:48,744-0500 INFO unknown Using sites file: /home/wilde/swift/vdsk-0.3 /bin/../etc/sites.xml 2007-10-04 23:12:48,745-0500 INFO unknown Using tc.data: /home/wilde/swift/vdsk-0.3/bi n/../etc/tc.data 2007-10-04 23:12:50,769-0500 INFO unknown Swift v0.3 r1319 (modified locally) 2007-10-04 23:12:50,772-0500 INFO unknown RunID: 20071004-2312-mlnzad63 $ -- java doesnt seem to be burning cpu - looks more like a hang than a loop; doesnt seem to be waiting on stdin either, as far as i can tell. From hategan at mcs.anl.gov Thu Oct 4 23:27:33 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Oct 2007 23:27:33 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <1191557178.28169.12.camel@blabla.mcs.anl.gov> References: <20071004091607.AUN26035@m4500-00.uchicago.edu> <4705B1E4.8090403@mcs.anl.gov> <1191557178.28169.12.camel@blabla.mcs.anl.gov> Message-ID: <1191558453.28814.15.camel@blabla.mcs.anl.gov> On Thu, 2007-10-04 at 23:06 -0500, Mihael Hategan wrote: > On Thu, 2007-10-04 at 22:53 -0500, Yong Zhao wrote: > > Mike > > > > > while we're discussing mappers, could you or Ben clarify: > > > > > > - can a mapper be only used on a single variable declaration? ie you can > > > not map a dataset to a variable within a struct? this would be useful > > > eg to create a struct that has both file and scalar valued members, to > > > set up a parameter sweep where each file needs specific parameters. > > > > This would be the nested mapper scenario, in order to make the scripts > > cleaner, I was think about a mapping descriptor (which could be an XML > > file to describe each layer of the mapping), but this is kind of far > > reaching. > > I think C pointers provide a very good model for this. We should perhaps > use that as a reference. So let me elaborate on that. All non-primitive types are pointers. For simplicity, we should have all types as pointers, but currently we distinguish a bit between primitive and non-primitive. A complex type variable in Swift is like a pointer to a struct of pointers. In the beginning, Swift needs to figure out the exact addresses of those pointers. That's what the existing() mapper method does. It goes through the struct and recursively initializes the pointer addresses based on some scheme which depends on the mapper implementation. This is like doing a recursive field = malloc(sizeof(field)). The map method of a mapper takes a struct path (a.b.c) and returns the address. Currently, nothing ever uses a value of a pointer. Grid applications are passed the address (file name). Getting/setting values amounts to pointer dereferencing. Both read and write. The addresses of these pointers are abstract. They don't represent anything in particular. Currently we assume that they are specific addresses (files), but desire is to extends this to databases and things. Let's say that we have a segmented address space and our pointers can be in different segments. The segment issue only applies to how we read/write values. There is one plain memory segment. Currently only primitive types can be here. There may be multiple memory-mapped I/O segments. Say one for files, one for databases, etc. We need to implement what actually happens when you read/write from/to a I/O segment. Basically mappers supporting read/write of values are the I/O hardware drivers. Whenever moving data between segments, we are either efficient and use DMA transfers, or we read to the plain memory segment and then write back to the other I/O segment. This is the analogy of, for example, getting some value from a database to a file so it can be passed to an application. The DMA case is when there is some entity that knows how to convert directly between the two. The non-DMA case is when we convert first to a common format and then back to another format. > > > > > > > > > - are mappers processed before the program (or procedure) starts > > > executing? so you cant use a mapper like an assignment statement to set > > > or reset a variable? > > > > > Mappers are evaluated before procedure execution, the only exception is > > when a mapper depends on some intermediate data, and it will wait for that > > data to be available (as in the montage workflow). However, mapping is > > different from assignment (in assignment, it is more like calling a > > mapping function such as @extractint). A variable currently can not be > > reset or reassigned, unless it is an interation variable (it gets > > re-assigned implicitly). > > > > Yong. > > > > > - mike > > > > > > On 10/4/07 10:21 PM, Yong Zhao wrote: > > > > Mappers should ideally be able to map primitive types such as string and > > > > int, as well as file names. The CSVMapper can read all the values in the > > > > file, it is just that it needs to interprete the values according to their > > > > types. So If we convey the type info 'RGIparams' to the csv mapper, it > > > > should have no problem reading the values. > > > > > > > > I actually added getType and setType to the mapper interface, that did > > > > not get into the release, but I think it is what it should have. > > > > > > > > Yong. > > > > > > > > On Thu, 4 Oct 2007 andrewj at uchicago.edu wrote: > > > > > > > >> All right, thanks. > > > >> > > > >> I am just trying to figure out the best way to setup parameter > > > >> sweeps. Ideally, I would like to have some file of directory > > > >> of files containing experiments to be run on the Grid > > > >> environment and Swift picks them up and automatically sets up > > > >> the appropriate WF based on the parameter information in the > > > >> files. > > > >> > > > >> > > > >>> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: > > > >>> > > > >>>> could I do something like this: > > > >>>> > > > >>>> RGIparams RGIinput[] ; > > > >>>> > > > >>>> where rgi_runs would contain say > > > >>>> > > > >>>> 1,2,true > > > >>>> 3,4,false > > > >>>> 18,20,true > > > >>>> > > > >>>> and so on..so that the parameters are passed into the workflow? > > > >>> no. > > > >>> > > > >>> mappers only map filenames (or URIs now); they don't map > > > >> variable values. > > > >>> -- > > > >> _______________________________________________ > > > >> Swift-devel mailing list > > > >> Swift-devel at ci.uchicago.edu > > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > >> > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Thu Oct 4 23:37:57 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 04 Oct 2007 23:37:57 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <1191558453.28814.15.camel@blabla.mcs.anl.gov> References: <20071004091607.AUN26035@m4500-00.uchicago.edu> <4705B1E4.8090403@mcs.anl.gov> <1191557178.28169.12.camel@blabla.mcs.anl.gov> <1191558453.28814.15.camel@blabla.mcs.anl.gov> Message-ID: <4705BFA5.4070306@mcs.anl.gov> Thanks for all the details - its helping to clarify the mystery of mappers. But I dont get all this on a quick read so I need to study it. Seems like we need to fill in a few more details of the end-user-visible parts of this model beyond what is in the current tutorial and UG, but less than what you explain here. Dont know how DMA got into the picture, but I need to stew on this for a while. - Mike On 10/4/07 11:27 PM, Mihael Hategan wrote: > On Thu, 2007-10-04 at 23:06 -0500, Mihael Hategan wrote: >> On Thu, 2007-10-04 at 22:53 -0500, Yong Zhao wrote: >>> Mike >>> >>>> while we're discussing mappers, could you or Ben clarify: >>>> >>>> - can a mapper be only used on a single variable declaration? ie you can >>>> not map a dataset to a variable within a struct? this would be useful >>>> eg to create a struct that has both file and scalar valued members, to >>>> set up a parameter sweep where each file needs specific parameters. >>> This would be the nested mapper scenario, in order to make the scripts >>> cleaner, I was think about a mapping descriptor (which could be an XML >>> file to describe each layer of the mapping), but this is kind of far >>> reaching. >> I think C pointers provide a very good model for this. We should perhaps >> use that as a reference. > > So let me elaborate on that. > > All non-primitive types are pointers. For simplicity, we should have all > types as pointers, but currently we distinguish a bit between primitive > and non-primitive. > > A complex type variable in Swift is like a pointer to a struct of > pointers. > > In the beginning, Swift needs to figure out the exact addresses of those > pointers. That's what the existing() mapper method does. It goes through > the struct and recursively initializes the pointer addresses based on > some scheme which depends on the mapper implementation. This is like > doing a recursive field = malloc(sizeof(field)). > > The map method of a mapper takes a struct path (a.b.c) and returns the > address. > > Currently, nothing ever uses a value of a pointer. Grid applications are > passed the address (file name). > > Getting/setting values amounts to pointer dereferencing. Both read and > write. > > The addresses of these pointers are abstract. They don't represent > anything in particular. Currently we assume that they are specific > addresses (files), but desire is to extends this to databases and > things. Let's say that we have a segmented address space and our > pointers can be in different segments. > > The segment issue only applies to how we read/write values. > > There is one plain memory segment. Currently only primitive types can be > here. > > There may be multiple memory-mapped I/O segments. Say one for files, one > for databases, etc. We need to implement what actually happens when you > read/write from/to a I/O segment. Basically mappers supporting > read/write of values are the I/O hardware drivers. > > Whenever moving data between segments, we are either efficient and use > DMA transfers, or we read to the plain memory segment and then write > back to the other I/O segment. This is the analogy of, for example, > getting some value from a database to a file so it can be passed to an > application. The DMA case is when there is some entity that knows how to > convert directly between the two. The non-DMA case is when we convert > first to a common format and then back to another format. > >>>> - are mappers processed before the program (or procedure) starts >>>> executing? so you cant use a mapper like an assignment statement to set >>>> or reset a variable? >>>> >>> Mappers are evaluated before procedure execution, the only exception is >>> when a mapper depends on some intermediate data, and it will wait for that >>> data to be available (as in the montage workflow). However, mapping is >>> different from assignment (in assignment, it is more like calling a >>> mapping function such as @extractint). A variable currently can not be >>> reset or reassigned, unless it is an interation variable (it gets >>> re-assigned implicitly). >>> >>> Yong. >>> >>>> - mike >>>> >>>> On 10/4/07 10:21 PM, Yong Zhao wrote: >>>>> Mappers should ideally be able to map primitive types such as string and >>>>> int, as well as file names. The CSVMapper can read all the values in the >>>>> file, it is just that it needs to interprete the values according to their >>>>> types. So If we convey the type info 'RGIparams' to the csv mapper, it >>>>> should have no problem reading the values. >>>>> >>>>> I actually added getType and setType to the mapper interface, that did >>>>> not get into the release, but I think it is what it should have. >>>>> >>>>> Yong. >>>>> >>>>> On Thu, 4 Oct 2007 andrewj at uchicago.edu wrote: >>>>> >>>>>> All right, thanks. >>>>>> >>>>>> I am just trying to figure out the best way to setup parameter >>>>>> sweeps. Ideally, I would like to have some file of directory >>>>>> of files containing experiments to be run on the Grid >>>>>> environment and Swift picks them up and automatically sets up >>>>>> the appropriate WF based on the parameter information in the >>>>>> files. >>>>>> >>>>>> >>>>>>> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: >>>>>>> >>>>>>>> could I do something like this: >>>>>>>> >>>>>>>> RGIparams RGIinput[] ; >>>>>>>> >>>>>>>> where rgi_runs would contain say >>>>>>>> >>>>>>>> 1,2,true >>>>>>>> 3,4,false >>>>>>>> 18,20,true >>>>>>>> >>>>>>>> and so on..so that the parameters are passed into the workflow? >>>>>>> no. >>>>>>> >>>>>>> mappers only map filenames (or URIs now); they don't map >>>>>> variable values. >>>>>>> -- >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Thu Oct 4 23:45:07 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Oct 2007 23:45:07 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <4705BFA5.4070306@mcs.anl.gov> References: <20071004091607.AUN26035@m4500-00.uchicago.edu> <4705B1E4.8090403@mcs.anl.gov> <1191557178.28169.12.camel@blabla.mcs.anl.gov> <1191558453.28814.15.camel@blabla.mcs.anl.gov> <4705BFA5.4070306@mcs.anl.gov> Message-ID: <1191559507.28814.18.camel@blabla.mcs.anl.gov> On Thu, 2007-10-04 at 23:37 -0500, Michael Wilde wrote: > Thanks for all the details - its helping to clarify the mystery of > mappers. But I dont get all this on a quick read so I need to study it. > Seems like we need to fill in a few more details of the end-user-visible > parts of this model beyond what is in the current tutorial and UG, but > less than what you explain here. > > Dont know how DMA got into the picture, It's an analogy. You can do funny things with it. Like transfer from HDD to graphics card directly and such. It's efficient. The same applies in the swift case. > but I need to stew on this for a > while. It won't get much more familiar than this. We must start with some common assumptions. > > - Mike > > > On 10/4/07 11:27 PM, Mihael Hategan wrote: > > On Thu, 2007-10-04 at 23:06 -0500, Mihael Hategan wrote: > >> On Thu, 2007-10-04 at 22:53 -0500, Yong Zhao wrote: > >>> Mike > >>> > >>>> while we're discussing mappers, could you or Ben clarify: > >>>> > >>>> - can a mapper be only used on a single variable declaration? ie you can > >>>> not map a dataset to a variable within a struct? this would be useful > >>>> eg to create a struct that has both file and scalar valued members, to > >>>> set up a parameter sweep where each file needs specific parameters. > >>> This would be the nested mapper scenario, in order to make the scripts > >>> cleaner, I was think about a mapping descriptor (which could be an XML > >>> file to describe each layer of the mapping), but this is kind of far > >>> reaching. > >> I think C pointers provide a very good model for this. We should perhaps > >> use that as a reference. > > > > So let me elaborate on that. > > > > All non-primitive types are pointers. For simplicity, we should have all > > types as pointers, but currently we distinguish a bit between primitive > > and non-primitive. > > > > A complex type variable in Swift is like a pointer to a struct of > > pointers. > > > > In the beginning, Swift needs to figure out the exact addresses of those > > pointers. That's what the existing() mapper method does. It goes through > > the struct and recursively initializes the pointer addresses based on > > some scheme which depends on the mapper implementation. This is like > > doing a recursive field = malloc(sizeof(field)). > > > > The map method of a mapper takes a struct path (a.b.c) and returns the > > address. > > > > Currently, nothing ever uses a value of a pointer. Grid applications are > > passed the address (file name). > > > > Getting/setting values amounts to pointer dereferencing. Both read and > > write. > > > > The addresses of these pointers are abstract. They don't represent > > anything in particular. Currently we assume that they are specific > > addresses (files), but desire is to extends this to databases and > > things. Let's say that we have a segmented address space and our > > pointers can be in different segments. > > > > The segment issue only applies to how we read/write values. > > > > There is one plain memory segment. Currently only primitive types can be > > here. > > > > There may be multiple memory-mapped I/O segments. Say one for files, one > > for databases, etc. We need to implement what actually happens when you > > read/write from/to a I/O segment. Basically mappers supporting > > read/write of values are the I/O hardware drivers. > > > > Whenever moving data between segments, we are either efficient and use > > DMA transfers, or we read to the plain memory segment and then write > > back to the other I/O segment. This is the analogy of, for example, > > getting some value from a database to a file so it can be passed to an > > application. The DMA case is when there is some entity that knows how to > > convert directly between the two. The non-DMA case is when we convert > > first to a common format and then back to another format. > > > >>>> - are mappers processed before the program (or procedure) starts > >>>> executing? so you cant use a mapper like an assignment statement to set > >>>> or reset a variable? > >>>> > >>> Mappers are evaluated before procedure execution, the only exception is > >>> when a mapper depends on some intermediate data, and it will wait for that > >>> data to be available (as in the montage workflow). However, mapping is > >>> different from assignment (in assignment, it is more like calling a > >>> mapping function such as @extractint). A variable currently can not be > >>> reset or reassigned, unless it is an interation variable (it gets > >>> re-assigned implicitly). > >>> > >>> Yong. > >>> > >>>> - mike > >>>> > >>>> On 10/4/07 10:21 PM, Yong Zhao wrote: > >>>>> Mappers should ideally be able to map primitive types such as string and > >>>>> int, as well as file names. The CSVMapper can read all the values in the > >>>>> file, it is just that it needs to interprete the values according to their > >>>>> types. So If we convey the type info 'RGIparams' to the csv mapper, it > >>>>> should have no problem reading the values. > >>>>> > >>>>> I actually added getType and setType to the mapper interface, that did > >>>>> not get into the release, but I think it is what it should have. > >>>>> > >>>>> Yong. > >>>>> > >>>>> On Thu, 4 Oct 2007 andrewj at uchicago.edu wrote: > >>>>> > >>>>>> All right, thanks. > >>>>>> > >>>>>> I am just trying to figure out the best way to setup parameter > >>>>>> sweeps. Ideally, I would like to have some file of directory > >>>>>> of files containing experiments to be run on the Grid > >>>>>> environment and Swift picks them up and automatically sets up > >>>>>> the appropriate WF based on the parameter information in the > >>>>>> files. > >>>>>> > >>>>>> > >>>>>>> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: > >>>>>>> > >>>>>>>> could I do something like this: > >>>>>>>> > >>>>>>>> RGIparams RGIinput[] ; > >>>>>>>> > >>>>>>>> where rgi_runs would contain say > >>>>>>>> > >>>>>>>> 1,2,true > >>>>>>>> 3,4,false > >>>>>>>> 18,20,true > >>>>>>>> > >>>>>>>> and so on..so that the parameters are passed into the workflow? > >>>>>>> no. > >>>>>>> > >>>>>>> mappers only map filenames (or URIs now); they don't map > >>>>>> variable values. > >>>>>>> -- > >>>>>> _______________________________________________ > >>>>>> Swift-devel mailing list > >>>>>> Swift-devel at ci.uchicago.edu > >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>> > >>>>> _______________________________________________ > >>>>> Swift-devel mailing list > >>>>> Swift-devel at ci.uchicago.edu > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>> > >>>>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From bugzilla-daemon at mcs.anl.gov Thu Oct 4 23:46:09 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 4 Oct 2007 23:46:09 -0500 (CDT) Subject: [Swift-devel] [Bug 105] New: syntax error vague when argument type is missing Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=105 Summary: syntax error vague when argument type is missing Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: trivial Priority: P5 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: wilde at mcs.anl.gov In the code: (messagefile t) echo ( file p1, file p2, string s1, string s2 ) { app { echo @p1 @p2 s1 s2 stdout=@filename(t); } } if "file" is missing and arg list reads: (messagefile t) echo ( p1, p2, string s1, string s2 ) then the syntax error message points to the "(" after "echo" instead of saying "parameter type is missing". -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From wilde at mcs.anl.gov Thu Oct 4 23:57:02 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 04 Oct 2007 23:57:02 -0500 Subject: [Swift-devel] simple parameter test hangs In-Reply-To: <4705BA3B.7030603@mcs.anl.gov> References: <4705BA3B.7030603@mcs.anl.gov> Message-ID: <4705C41E.4020303@mcs.anl.gov> -- This program hangs in the same way: type messagefile; type file; type params { string x; string y; file f; }; (messagefile t) echo ( file p1, file p2, string s1, string s2 ) { app { echo @p1 @p2 s1 s2 stdout=@filename(t); } } (messagefile t) pecho (params p ) { app { echo p.x p.y @p.f stdout=@filename(t); } } string fn[] = ["f001","f002"]; file fv[] ; params p; p.x = "x1111"; p.y = "y2222"; p.f = fv[0]; messagefile outfile <"hello2.txt">; outfile = pecho(p); -- problem in passing a struct? I'll try to narrow it down. - mike On 10/4/07 11:14 PM, Michael Wilde wrote: > -- the following program hangs (using 0.3): > > type messagefile; > > type params { > int x; > int y; > } > > (messagefile t) pecho (params p[] ) { > app { > echo p[0].x p[0].y stdout=@filename(t); > } > } > > messagefile outfile <"hello2.txt">; > > params z[]; > z[0].x = 111; > z[0].y = 222; > > outfile = pecho(z); > > -- while this one works: > > type messagefile; > > type params { > int x; > int y; > } > > (messagefile t) pecho (params p) { > app { > echo p.x p.y stdout=@filename(t); > } > } > > messagefile outfile <"hello2.txt">; > > params z; > z.x = 111; > z.y = 222; > > outfile = pecho(z); > > -- swift says: > > $ swift t2.swift > Swift v0.3 r1319 (modified locally) > > RunID: 20071004-2312-mlnzad63 > > (and then it hangs) > > -- heres the log: > $ more t2-20071004-2312-mlnzad63.log > 2007-10-04 23:12:43,147-0500 INFO Loader t2.swift: source file is new. > Recompiling. > 2007-10-04 23:12:45,431-0500 INFO Karajan Validation of XML > intermediate file was succ > essful > 2007-10-04 23:12:48,744-0500 INFO unknown Using sites file: > /home/wilde/swift/vdsk-0.3 > /bin/../etc/sites.xml > 2007-10-04 23:12:48,745-0500 INFO unknown Using tc.data: > /home/wilde/swift/vdsk-0.3/bi > n/../etc/tc.data > 2007-10-04 23:12:50,769-0500 INFO unknown Swift v0.3 r1319 (modified > locally) > > 2007-10-04 23:12:50,772-0500 INFO unknown RunID: 20071004-2312-mlnzad63 > $ > > -- > java doesnt seem to be burning cpu - looks more like a hang than a loop; > doesnt seem to be waiting on stdin either, as far as i can tell. > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From wilde at mcs.anl.gov Fri Oct 5 00:03:13 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 05 Oct 2007 00:03:13 -0500 Subject: [Swift-devel] simple parameter test hangs In-Reply-To: <4705C41E.4020303@mcs.anl.gov> References: <4705BA3B.7030603@mcs.anl.gov> <4705C41E.4020303@mcs.anl.gov> Message-ID: <4705C591.7020709@mcs.anl.gov> -- this one doesnt hang. what i removed was the file var in the params struct: type messagefile; type file; type params { string x; string y; }; (messagefile t) pecho (params p ) { app { echo "foo" stdout=@filename(t); } } string fn[] = ["f001","f002"]; file fv[] ; params p; p.x = "x1111"; p.y = "y2222"; messagefile outfile <"hello2.txt">; outfile = pecho(p); On 10/4/07 11:57 PM, Michael Wilde wrote: > -- This program hangs in the same way: > type messagefile; > > type file; > > type params { > string x; > string y; > file f; > }; > > (messagefile t) echo ( file p1, file p2, string s1, string s2 ) { > app { > echo @p1 @p2 s1 s2 stdout=@filename(t); > } > } > > (messagefile t) pecho (params p ) { > app { > echo p.x p.y @p.f stdout=@filename(t); > } > } > > string fn[] = ["f001","f002"]; > file fv[] ; > > params p; > p.x = "x1111"; > p.y = "y2222"; > p.f = fv[0]; > > messagefile outfile <"hello2.txt">; > > outfile = pecho(p); > From wilde at mcs.anl.gov Fri Oct 5 00:22:11 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 05 Oct 2007 00:22:11 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <20071004094828.AUN31015@m4500-00.uchicago.edu> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> Message-ID: <4705CA03.8020108@mcs.anl.gov> I'm certainly not an expert swift coder, but here's what I came up with for what I think is your use case, Andrew. The need as I understand is to associate a unique set of scalar parameters with each file (or set of files) that will be passed to an application. This seems to call for an array of structs, where the struct fields have a mixture of file-values and scalar-values. Then you do a foreach over the array of structs, and pass one element of the array (a struct) to each invocation of the app. A csv mapper that allows scalar fields like Yong suggests would do it nicely, but I dont yet fully grok what that does to the data model. While people debate this, you can with a tiny script (sh, awk, perl) turn a CSV file into the set of swift statements between the AUTOGEN comments below, and then include them into your swift script before you pass it to the swift command. - Mike type messagefile; type file; type params { string x; string y; file f; }; (messagefile t) pecho (params p ) { app { echo p.x p.y @p.f stdout=@filename(t); } } /* START AUTOGEN */ /* map the file params this way */ string fn[] = ["f001","f002"]; file fv[] ; /* then stuff file vals into a struct to associate with scalar vals */ params plist[]; plist[0].x = "x1111"; plist[0].y = "y2222"; plist[0].f = fv[0]; plist[1].x = "x1111"; plist[1].y = "y2222"; plist[1].f = fv[1]; /* etc */ /* END AUTOGEN */ messagefile outfile <"hello2.txt">; /* then you can iterate over the param sets and pass each set as a struct */ foreach pval in plist { outfile = pecho(pval); } On 10/4/07 9:48 AM, andrewj at uchicago.edu wrote: > Mike, > > Your question does make sense. I am looking for the most > general model and configurable model which would allow us to > not have to mess with the swift script as much as possible, > but rather simply be able to dump a set of parameter setting > files som,ewhere and swift pick them up and arrange things > based on this. > > And yes, this is due in part to the concern of very long > lists. But maybe I am over thinking things for right now. > Maybe it will work out just as efficiently to have mulitple > swift WF codes with the different param settings. > > I think however for the time being I will create the work flow > to just use the parameter settings in the Swift code itself. > > Thanks, > Andrew > > ---- Original message ---- >> Date: Thu, 04 Oct 2007 09:24:51 -0500 >> From: Michael Wilde >> Subject: Re: passing types and variables in swift >> To: andrewj at uchicago.edu >> Cc: Ben Clifford , Mihael Hategan > , swift-devel at ci.uchicago.edu >> I'll try to push forward with this as Ben is focusing on SC > tutorials today. >> Andrew, lets try to make a simple model of your workflow that > we can >> actually run in local mode, using mock apps. >> >> I need to go backwards, but basically this is a parameter > sweep, where >> the parameter sets have a nested loop structure. Im noty sure > f Im >> asking this right (as Im in a meeting and cant read right > now) but do >> you want to express the parameters in files (because the > lists are long) >> or right in the source code of the workflow? Does that > question make sense? >> - Mike >> >> >> On 10/4/07 9:16 AM, andrewj at uchicago.edu wrote: >>> All right, thanks. >>> >>> I am just trying to figure out the best way to setup parameter >>> sweeps. Ideally, I would like to have some file of directory >>> of files containing experiments to be run on the Grid >>> environment and Swift picks them up and automatically sets up >>> the appropriate WF based on the parameter information in the >>> files. >>> >>> >>>> On Wed, 3 Oct 2007, andrewj at uchicago.edu wrote: >>>> >>>>> could I do something like this: >>>>> >>>>> RGIparams RGIinput[] ; >>>>> >>>>> where rgi_runs would contain say >>>>> >>>>> 1,2,true >>>>> 3,4,false >>>>> 18,20,true >>>>> >>>>> and so on..so that the parameters are passed into the > workflow? >>>> no. >>>> >>>> mappers only map filenames (or URIs now); they don't map >>> variable values. >>>> -- >>> > > From wilde at mcs.anl.gov Fri Oct 5 00:27:15 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 05 Oct 2007 00:27:15 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <470519BF.7040004@mcs.anl.gov> Message-ID: <4705CB33.6060901@mcs.anl.gov> Nika, can you post or send the moldyn code which you generate to deal with a similar problem to this? I thought you mentioned that it was already online but I cant find it. Thanks, - Mike On 10/4/07 11:53 AM, Ben Clifford wrote: > > On Thu, 4 Oct 2007, Michael Wilde wrote: > >> is one way to do this to 'eval' a file containing swift expressions? >> Or, to #include them? >> This would look like a keyword-based value file >> >> myparms.rho[0]=0.1 mparms.theta[0]=0.2 >> myparms.rho[1]=0.4 mparms.theta[0]=0.8 >> ... >> myparms.rho[N-1]=0.etc mparms.theta[etc]=0.etc >> >> even if not as clean as extractcsv, it might be both fast and useful for many >> other purposes. > > you can do that already - generate a swiftscript source file from some > other source. nika did something like this (and still does, but to a > lesser extent) for moldyn. > From wilde at mcs.anl.gov Fri Oct 5 00:40:50 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 05 Oct 2007 00:40:50 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> Message-ID: <4705CE62.2090209@mcs.anl.gov> I'm ready to start the seconds round of months of arm waving on mappers, by the way. I hope we can cut it down to weeks. Lets get the data model cleaned up before the user and code base grows bigger, and we'll piss people off when we have to finally fix it. I'm uncomfortable with aspects of: - what the language's set of values is (what Ben calls file-space) - when mappers run - what their role is (input vs output name determination) - what they return - syntax and semantics of initialization - how values are persisted (especially across restarts) - how values are returned from app calls - why "@filename" is special and what else is - why we should *not* be able to map output datasets until they have been created - what creating (and closing) an output dataset means - how single-assignment and the functional model impacts this - issues of value-assignment dependence in blocks of assignment statements - how any of these issues might affect a provenance model Not all of these are of equal importance. I'm open to *how* this deliberation should take place if people have suggestions. Barring other suggestions, I propose to couch it as proposed revisions to the tutorial examples or users guide, before implementing it. Then implementation could start on the things we agree on, if we agree that the things we dont agree on wont change the things we do. ;) - Mike On 10/4/07 11:41 AM, Ben Clifford wrote: > yes. > > that's why I'd rather they be hacks that look more like what they should > end up like. > > in order to get people doing things with this, there are going to need to > be hacks (or prototype implementations, if you prefer) - months of > abstract arm waving about how mappers work without concrete use has not > resulting in a mapper API that does what some people want it to do; and > months more of it won't, either. > > On Thu, 4 Oct 2007, Mihael Hategan wrote: > >> These hacks will bite us in the future. >> >> On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: >>> i'd be inclined to grab the csv reading code from the csv mapper and see >>> about making a @extractcsv function. It keeps code using it a little bit >>> more like it would if mappers could support it, which is probably the way >>> things will end up one day. >>> >>> On Thu, 4 Oct 2007, Michael Wilde wrote: >>> >>>> a hacky way to do this is first a loop to read the csv file into an array, >>>> then do a foreach over the array. >>>> >>>> i suspect we have the constructs (with @extractint) to do this; not sure if we >>>> can read the CSV into an array-of-structs that we can then foreach() over. But >>>> even if not, this will work using parallel arrays, and probably not be too >>>> unpleasing. >>>> >>>> lets try it. >>>> >>>> On 10/4/07 10:15 AM, Ben Clifford wrote: >>>>> you're the second person today to talk about reading in values from csv >>>>> files. maybe we should implement this. >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> > > From benc at hawaga.org.uk Fri Oct 5 03:01:17 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 5 Oct 2007 08:01:17 +0000 (GMT) Subject: [Swift-devel] simple parameter test hangs In-Reply-To: <4705BA3B.7030603@mcs.anl.gov> References: <4705BA3B.7030603@mcs.anl.gov> Message-ID: what you have there is what as been described in the past as the 'array closing problem'. its a messy interaction of the below syntax and single-assignment semantics. pecho will get called only when z has had its 'single value assigned'. However the implementation has a slightly strange mechanism for deciding when an array has had its entire single value assigned - it does this when an array is returned from a procedure. Try moving the z[n] assignments into a procedure. This is something that needs fixing. On Thu, 4 Oct 2007, Michael Wilde wrote: > -- the following program hangs (using 0.3): > > type messagefile; > > type params { > int x; > int y; > } > > (messagefile t) pecho (params p[] ) { > app { > echo p[0].x p[0].y stdout=@filename(t); > } > } > > messagefile outfile <"hello2.txt">; > > params z[]; > z[0].x = 111; > z[0].y = 222; > > outfile = pecho(z); > > -- while this one works: > > type messagefile; > > type params { > int x; > int y; > } > > (messagefile t) pecho (params p) { > app { > echo p.x p.y stdout=@filename(t); > } > } > > messagefile outfile <"hello2.txt">; > > params z; > z.x = 111; > z.y = 222; > > outfile = pecho(z); > > -- swift says: > > $ swift t2.swift > Swift v0.3 r1319 (modified locally) > > RunID: 20071004-2312-mlnzad63 > > (and then it hangs) > > -- heres the log: > $ more t2-20071004-2312-mlnzad63.log > 2007-10-04 23:12:43,147-0500 INFO Loader t2.swift: source file is new. > Recompiling. > 2007-10-04 23:12:45,431-0500 INFO Karajan Validation of XML intermediate file > was succ > essful > 2007-10-04 23:12:48,744-0500 INFO unknown Using sites file: > /home/wilde/swift/vdsk-0.3 > /bin/../etc/sites.xml > 2007-10-04 23:12:48,745-0500 INFO unknown Using tc.data: > /home/wilde/swift/vdsk-0.3/bi > n/../etc/tc.data > 2007-10-04 23:12:50,769-0500 INFO unknown Swift v0.3 r1319 (modified locally) > > 2007-10-04 23:12:50,772-0500 INFO unknown RunID: 20071004-2312-mlnzad63 > $ > > -- > java doesnt seem to be burning cpu - looks more like a hang than a loop; > doesnt seem to be waiting on stdin either, as far as i can tell. > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From benc at hawaga.org.uk Fri Oct 5 03:10:23 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 5 Oct 2007 08:10:23 +0000 (GMT) Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: References: <20071004091607.AUN26035@m4500-00.uchicago.edu> Message-ID: On Thu, 4 Oct 2007, Yong Zhao wrote: > Mappers should ideally be able to map primitive types such as string and > int, as well as file names. The CSVMapper can read all the values in the > file, it is just that it needs to interprete the values according to their > types. and have some way of conveying those values into the implementation. The mapper API has no way to do that. -- From bugzilla-daemon at mcs.anl.gov Fri Oct 5 07:36:02 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 5 Oct 2007 07:36:02 -0500 (CDT) Subject: [Swift-devel] [Bug 104] Add cert request tools to swift/bin In-Reply-To: Message-ID: <20071005123602.152AC164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=104 ------- Comment #1 from benc at hawaga.org.uk 2007-10-05 07:36 ------- Perhaps better to align swift with efforts to release third party software such as the OSG stack (eg. a pacman packaging of Swift which should not be hard, and appropriate dependencies to pull in desired pieces of the OSG stack). -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Fri Oct 5 07:59:34 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 5 Oct 2007 07:59:34 -0500 (CDT) Subject: [Swift-devel] [Bug 102] workflow failes due to file cache duplicates In-Reply-To: Message-ID: <20071005125934.76EF8164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=102 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|nefedova at mcs.anl.gov |hategan at mcs.anl.gov -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From nefedova at mcs.anl.gov Fri Oct 5 08:31:17 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 5 Oct 2007 08:31:17 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <4705CB33.6060901@mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <470519BF.7040004@mcs.anl.gov> <4705CB33.6060901@mcs.anl.gov> Message-ID: <9CF4A662-49FE-46C7-9E73-97B167EE461F@mcs.anl.gov> The code is here: http://www.ci.uchicago.edu/~nefedova/gen-swift-loops the relevant piece is toward the end: First you construct in your shell script the string, in my case it was the strings $output_files and $outp_files. Then you define swift arrays and write it into your .swift (or .dtm) file: cat >> ./MolDyn.dtm <; file outfiles[] ; string INPUT[]; EOFF Then you populate the arrays with the values (in my case - the values are the results from applications): cat >> ./MolDyn.dtm < Nika, can you post or send the moldyn code which you generate to > deal with a similar problem to this? > > I thought you mentioned that it was already online but I cant find it. > > Thanks, > > - Mike > > > On 10/4/07 11:53 AM, Ben Clifford wrote: >> On Thu, 4 Oct 2007, Michael Wilde wrote: >>> is one way to do this to 'eval' a file containing swift expressions? >>> Or, to #include them? >>> This would look like a keyword-based value file >>> >>> myparms.rho[0]=0.1 mparms.theta[0]=0.2 >>> myparms.rho[1]=0.4 mparms.theta[0]=0.8 >>> ... >>> myparms.rho[N-1]=0.etc mparms.theta[etc]=0.etc >>> >>> even if not as clean as extractcsv, it might be both fast and >>> useful for many >>> other purposes. >> you can do that already - generate a swiftscript source file from >> some other source. nika did something like this (and still does, >> but to a lesser extent) for moldyn. > From hategan at mcs.anl.gov Fri Oct 5 08:52:20 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 05 Oct 2007 08:52:20 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <4705CE62.2090209@mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <4705CE62.2090209@mcs.anl.gov> Message-ID: <1191592341.31159.0.camel@blabla.mcs.anl.gov> I have a different idea. Let's not do this. On Fri, 2007-10-05 at 00:40 -0500, Michael Wilde wrote: > I'm ready to start the seconds round of months of arm waving on mappers, > by the way. I hope we can cut it down to weeks. > > Lets get the data model cleaned up before the user and code base grows > bigger, and we'll piss people off when we have to finally fix it. > > I'm uncomfortable with aspects of: > - what the language's set of values is (what Ben calls file-space) > - when mappers run > - what their role is (input vs output name determination) > - what they return > - syntax and semantics of initialization > - how values are persisted (especially across restarts) > - how values are returned from app calls > - why "@filename" is special and what else is > - why we should *not* be able to map output datasets until they have > been created > - what creating (and closing) an output dataset means > - how single-assignment and the functional model impacts this > - issues of value-assignment dependence in blocks of assignment statements > - how any of these issues might affect a provenance model > > Not all of these are of equal importance. > > I'm open to *how* this deliberation should take place if people have > suggestions. > > Barring other suggestions, I propose to couch it as proposed revisions > to the tutorial examples or users guide, before implementing it. > > Then implementation could start on the things we agree on, if we agree > that the things we dont agree on wont change the things we do. ;) > > - Mike > > > On 10/4/07 11:41 AM, Ben Clifford wrote: > > yes. > > > > that's why I'd rather they be hacks that look more like what they should > > end up like. > > > > in order to get people doing things with this, there are going to need to > > be hacks (or prototype implementations, if you prefer) - months of > > abstract arm waving about how mappers work without concrete use has not > > resulting in a mapper API that does what some people want it to do; and > > months more of it won't, either. > > > > On Thu, 4 Oct 2007, Mihael Hategan wrote: > > > >> These hacks will bite us in the future. > >> > >> On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: > >>> i'd be inclined to grab the csv reading code from the csv mapper and see > >>> about making a @extractcsv function. It keeps code using it a little bit > >>> more like it would if mappers could support it, which is probably the way > >>> things will end up one day. > >>> > >>> On Thu, 4 Oct 2007, Michael Wilde wrote: > >>> > >>>> a hacky way to do this is first a loop to read the csv file into an array, > >>>> then do a foreach over the array. > >>>> > >>>> i suspect we have the constructs (with @extractint) to do this; not sure if we > >>>> can read the CSV into an array-of-structs that we can then foreach() over. But > >>>> even if not, this will work using parallel arrays, and probably not be too > >>>> unpleasing. > >>>> > >>>> lets try it. > >>>> > >>>> On 10/4/07 10:15 AM, Ben Clifford wrote: > >>>>> you're the second person today to talk about reading in values from csv > >>>>> files. maybe we should implement this. > >>>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >> > > > > > From wilde at mcs.anl.gov Fri Oct 5 08:58:25 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 05 Oct 2007 08:58:25 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <9CF4A662-49FE-46C7-9E73-97B167EE461F@mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <470519BF.7040004@mcs.anl.gov> <4705CB33.6060901@mcs.anl.gov> <9CF4A662-49FE-46C7-9E73-97B167EE461F@mcs.anl.gov> Message-ID: <47064301.7040305@mcs.anl.gov> Thanks, Nika. Can you also post a sample generate swift script, say for a dataset of 3 molecules or members? - Mike On 10/5/07 8:31 AM, Veronika Nefedova wrote: > The code is here: http://www.ci.uchicago.edu/~nefedova/gen-swift-loops > > the relevant piece is toward the end: > > First you construct in your shell script the string, in my case it was > the strings $output_files and $outp_files. Then you define swift arrays > and write it into your .swift (or .dtm) file: > > cat >> ./MolDyn.dtm < > ($outp_files) = GENERATOR (whamfiles, s); > string tout = @strcat ("\1_",s,".out"); > file outf[] ; > file outfiles[] transform=tout>; > string INPUT[]; > EOFF > > Then you populate the arrays with the values (in my case - the values > are the results from applications): > > cat >> ./MolDyn.dtm < INPUT[$i] = @strcat("input:","$inp2","_",s); > outfiles [$i] = CHARMM4 ($inp1, whaminp, s1,INPUT[$i] ); > EOFF > > (you do the above inside the loop in your shell script). > > Hope this helps, > > NIka > > > On Oct 5, 2007, at 12:27 AM, Michael Wilde wrote: > >> Nika, can you post or send the moldyn code which you generate to deal >> with a similar problem to this? >> >> I thought you mentioned that it was already online but I cant find it. >> >> Thanks, >> >> - Mike >> >> >> On 10/4/07 11:53 AM, Ben Clifford wrote: >>> On Thu, 4 Oct 2007, Michael Wilde wrote: >>>> is one way to do this to 'eval' a file containing swift expressions? >>>> Or, to #include them? >>>> This would look like a keyword-based value file >>>> >>>> myparms.rho[0]=0.1 mparms.theta[0]=0.2 >>>> myparms.rho[1]=0.4 mparms.theta[0]=0.8 >>>> ... >>>> myparms.rho[N-1]=0.etc mparms.theta[etc]=0.etc >>>> >>>> even if not as clean as extractcsv, it might be both fast and useful >>>> for many >>>> other purposes. >>> you can do that already - generate a swiftscript source file from >>> some other source. nika did something like this (and still does, but >>> to a lesser extent) for moldyn. >> > > From wilde at mcs.anl.gov Fri Oct 5 09:05:35 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 05 Oct 2007 09:05:35 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <1191592341.31159.0.camel@blabla.mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <4705CE62.2090209@mcs.anl.gov> <1191592341.31159.0.camel@blabla.mcs.anl.gov> Message-ID: <470644AF.8060505@mcs.anl.gov> please clarify - briefly is fine but your meaning is unclear. are you saying: a) these parts of the language are fine as-is? b) these issues should be addressed but are low priority? c) you have an idea of how to address these issues technically? d) you dont but have an idea of how to get there socially? e) none - please specify - Mike On 10/5/07 8:52 AM, Mihael Hategan wrote: > I have a different idea. Let's not do this. > > On Fri, 2007-10-05 at 00:40 -0500, Michael Wilde wrote: >> I'm ready to start the seconds round of months of arm waving on mappers, >> by the way. I hope we can cut it down to weeks. >> >> Lets get the data model cleaned up before the user and code base grows >> bigger, and we'll piss people off when we have to finally fix it. >> >> I'm uncomfortable with aspects of: >> - what the language's set of values is (what Ben calls file-space) >> - when mappers run >> - what their role is (input vs output name determination) >> - what they return >> - syntax and semantics of initialization >> - how values are persisted (especially across restarts) >> - how values are returned from app calls >> - why "@filename" is special and what else is >> - why we should *not* be able to map output datasets until they have >> been created >> - what creating (and closing) an output dataset means >> - how single-assignment and the functional model impacts this >> - issues of value-assignment dependence in blocks of assignment statements >> - how any of these issues might affect a provenance model >> >> Not all of these are of equal importance. >> >> I'm open to *how* this deliberation should take place if people have >> suggestions. >> >> Barring other suggestions, I propose to couch it as proposed revisions >> to the tutorial examples or users guide, before implementing it. >> >> Then implementation could start on the things we agree on, if we agree >> that the things we dont agree on wont change the things we do. ;) >> >> - Mike >> >> >> On 10/4/07 11:41 AM, Ben Clifford wrote: >>> yes. >>> >>> that's why I'd rather they be hacks that look more like what they should >>> end up like. >>> >>> in order to get people doing things with this, there are going to need to >>> be hacks (or prototype implementations, if you prefer) - months of >>> abstract arm waving about how mappers work without concrete use has not >>> resulting in a mapper API that does what some people want it to do; and >>> months more of it won't, either. >>> >>> On Thu, 4 Oct 2007, Mihael Hategan wrote: >>> >>>> These hacks will bite us in the future. >>>> >>>> On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: >>>>> i'd be inclined to grab the csv reading code from the csv mapper and see >>>>> about making a @extractcsv function. It keeps code using it a little bit >>>>> more like it would if mappers could support it, which is probably the way >>>>> things will end up one day. >>>>> >>>>> On Thu, 4 Oct 2007, Michael Wilde wrote: >>>>> >>>>>> a hacky way to do this is first a loop to read the csv file into an array, >>>>>> then do a foreach over the array. >>>>>> >>>>>> i suspect we have the constructs (with @extractint) to do this; not sure if we >>>>>> can read the CSV into an array-of-structs that we can then foreach() over. But >>>>>> even if not, this will work using parallel arrays, and probably not be too >>>>>> unpleasing. >>>>>> >>>>>> lets try it. >>>>>> >>>>>> On 10/4/07 10:15 AM, Ben Clifford wrote: >>>>>>> you're the second person today to talk about reading in values from csv >>>>>>> files. maybe we should implement this. >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>> > > From nefedova at mcs.anl.gov Fri Oct 5 09:07:37 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 5 Oct 2007 09:07:37 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <47064301.7040305@mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <470519BF.7040004@mcs.anl.gov> <4705CB33.6060901@mcs.anl.gov> <9CF4A662-49FE-46C7-9E73-97B167EE461F@mcs.anl.gov> <47064301.7040305@mcs.anl.gov> Message-ID: <75012EB7-C392-44F1-A310-961E03A62FEC@mcs.anl.gov> Sure. You can see the generated swift script (for 3 molecules) here: http://www.ci.uchicago.edu/~nefedova/MolDyn-3-loops.swift Nika On Oct 5, 2007, at 8:58 AM, Michael Wilde wrote: > Thanks, Nika. > > Can you also post a sample generate swift script, say for a dataset > of 3 molecules or members? > > - Mike > > > On 10/5/07 8:31 AM, Veronika Nefedova wrote: >> The code is here: http://www.ci.uchicago.edu/~nefedova/gen-swift- >> loops >> the relevant piece is toward the end: >> First you construct in your shell script the string, in my case it >> was the strings $output_files and $outp_files. Then you define >> swift arrays and write it into your .swift (or .dtm) file: >> cat >> ./MolDyn.dtm <> ($outp_files) = GENERATOR (whamfiles, s); >> string tout = @strcat ("\1_",s,".out"); >> file outf[] ; >> file outfiles[] >> > transform=tout>; >> string INPUT[]; >> EOFF >> Then you populate the arrays with the values (in my case - the >> values are the results from applications): >> cat >> ./MolDyn.dtm <> INPUT[$i] = @strcat("input:","$inp2","_",s); >> outfiles [$i] = CHARMM4 ($inp1, whaminp, s1,INPUT[$i] ); >> EOFF >> (you do the above inside the loop in your shell script). >> Hope this helps, >> NIka >> On Oct 5, 2007, at 12:27 AM, Michael Wilde wrote: >>> Nika, can you post or send the moldyn code which you generate to >>> deal with a similar problem to this? >>> >>> I thought you mentioned that it was already online but I cant >>> find it. >>> >>> Thanks, >>> >>> - Mike >>> >>> >>> On 10/4/07 11:53 AM, Ben Clifford wrote: >>>> On Thu, 4 Oct 2007, Michael Wilde wrote: >>>>> is one way to do this to 'eval' a file containing swift >>>>> expressions? >>>>> Or, to #include them? >>>>> This would look like a keyword-based value file >>>>> >>>>> myparms.rho[0]=0.1 mparms.theta[0]=0.2 >>>>> myparms.rho[1]=0.4 mparms.theta[0]=0.8 >>>>> ... >>>>> myparms.rho[N-1]=0.etc mparms.theta[etc]=0.etc >>>>> >>>>> even if not as clean as extractcsv, it might be both fast and >>>>> useful for many >>>>> other purposes. >>>> you can do that already - generate a swiftscript source file >>>> from some other source. nika did something like this (and still >>>> does, but to a lesser extent) for moldyn. >>> > From hategan at mcs.anl.gov Fri Oct 5 09:11:07 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 05 Oct 2007 09:11:07 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <470644AF.8060505@mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <4705CE62.2090209@mcs.anl.gov> <1191592341.31159.0.camel@blabla.mcs.anl.gov> <470644AF.8060505@mcs.anl.gov> Message-ID: <1191593467.31592.3.camel@blabla.mcs.anl.gov> On Fri, 2007-10-05 at 09:05 -0500, Michael Wilde wrote: > please clarify - briefly is fine but your meaning is unclear. > are you saying: > a) these parts of the language are fine as-is? > b) these issues should be addressed but are low priority? > c) you have an idea of how to address these issues technically? > d) you dont but have an idea of how to get there socially? > e) none - please specify f) none of the above. These questions were already answered in previous discussions. Some repeatedly. So I'm not sure what you're looking for here. The answers won't change. > > - Mike > > On 10/5/07 8:52 AM, Mihael Hategan wrote: > > I have a different idea. Let's not do this. > > > > On Fri, 2007-10-05 at 00:40 -0500, Michael Wilde wrote: > >> I'm ready to start the seconds round of months of arm waving on mappers, > >> by the way. I hope we can cut it down to weeks. > >> > >> Lets get the data model cleaned up before the user and code base grows > >> bigger, and we'll piss people off when we have to finally fix it. > >> > >> I'm uncomfortable with aspects of: > >> - what the language's set of values is (what Ben calls file-space) > >> - when mappers run > >> - what their role is (input vs output name determination) > >> - what they return > >> - syntax and semantics of initialization > >> - how values are persisted (especially across restarts) > >> - how values are returned from app calls > >> - why "@filename" is special and what else is > >> - why we should *not* be able to map output datasets until they have > >> been created > >> - what creating (and closing) an output dataset means > >> - how single-assignment and the functional model impacts this > >> - issues of value-assignment dependence in blocks of assignment statements > >> - how any of these issues might affect a provenance model > >> > >> Not all of these are of equal importance. > >> > >> I'm open to *how* this deliberation should take place if people have > >> suggestions. > >> > >> Barring other suggestions, I propose to couch it as proposed revisions > >> to the tutorial examples or users guide, before implementing it. > >> > >> Then implementation could start on the things we agree on, if we agree > >> that the things we dont agree on wont change the things we do. ;) > >> > >> - Mike > >> > >> > >> On 10/4/07 11:41 AM, Ben Clifford wrote: > >>> yes. > >>> > >>> that's why I'd rather they be hacks that look more like what they should > >>> end up like. > >>> > >>> in order to get people doing things with this, there are going to need to > >>> be hacks (or prototype implementations, if you prefer) - months of > >>> abstract arm waving about how mappers work without concrete use has not > >>> resulting in a mapper API that does what some people want it to do; and > >>> months more of it won't, either. > >>> > >>> On Thu, 4 Oct 2007, Mihael Hategan wrote: > >>> > >>>> These hacks will bite us in the future. > >>>> > >>>> On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: > >>>>> i'd be inclined to grab the csv reading code from the csv mapper and see > >>>>> about making a @extractcsv function. It keeps code using it a little bit > >>>>> more like it would if mappers could support it, which is probably the way > >>>>> things will end up one day. > >>>>> > >>>>> On Thu, 4 Oct 2007, Michael Wilde wrote: > >>>>> > >>>>>> a hacky way to do this is first a loop to read the csv file into an array, > >>>>>> then do a foreach over the array. > >>>>>> > >>>>>> i suspect we have the constructs (with @extractint) to do this; not sure if we > >>>>>> can read the CSV into an array-of-structs that we can then foreach() over. But > >>>>>> even if not, this will work using parallel arrays, and probably not be too > >>>>>> unpleasing. > >>>>>> > >>>>>> lets try it. > >>>>>> > >>>>>> On 10/4/07 10:15 AM, Ben Clifford wrote: > >>>>>>> you're the second person today to talk about reading in values from csv > >>>>>>> files. maybe we should implement this. > >>>>> _______________________________________________ > >>>>> Swift-devel mailing list > >>>>> Swift-devel at ci.uchicago.edu > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>> > >>> > > > > > From wilde at mcs.anl.gov Fri Oct 5 09:48:16 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 05 Oct 2007 09:48:16 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <1191593467.31592.3.camel@blabla.mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <4705CE62.2090209@mcs.anl.gov> <1191592341.31159.0.camel@blabla.mcs.anl.gov> <470644AF.8060505@mcs.anl.gov> <1191593467.31592.3.camel@blabla.mcs.anl.gov> Message-ID: <47064EB0.4010503@mcs.anl.gov> Sorry, I'm not going to accept f) as an answer. I'm going to argue hard that some things should change. We clearly need a better way to integrate tabular data - both from files and external app returns - with datasets as in the code examples we have. We clearly have issues in variable/collection closing that cause (unpleasantly) surprising behavior. We need ways to make mapping easier. And we certainly need to more clearly explain to users how the mapper model works (although Ben's recent steps are a huge step forward on this) I think out of this list, and by analyzing and cleaning up existing code examples, we can identify the problems that are both highest prio and lowest cost to fix, and see where we stand on the rest. But there is clearly need for language assessment and discussion. I expect some of the current answers to change, sorry. - Mike On 10/5/07 9:11 AM, Mihael Hategan wrote: > On Fri, 2007-10-05 at 09:05 -0500, Michael Wilde wrote: >> please clarify - briefly is fine but your meaning is unclear. >> are you saying: >> a) these parts of the language are fine as-is? >> b) these issues should be addressed but are low priority? >> c) you have an idea of how to address these issues technically? >> d) you dont but have an idea of how to get there socially? >> e) none - please specify > > f) none of the above. > > These questions were already answered in previous discussions. Some > repeatedly. So I'm not sure what you're looking for here. The answers > won't change. > >> - Mike >> >> On 10/5/07 8:52 AM, Mihael Hategan wrote: >>> I have a different idea. Let's not do this. >>> >>> On Fri, 2007-10-05 at 00:40 -0500, Michael Wilde wrote: >>>> I'm ready to start the seconds round of months of arm waving on mappers, >>>> by the way. I hope we can cut it down to weeks. >>>> >>>> Lets get the data model cleaned up before the user and code base grows >>>> bigger, and we'll piss people off when we have to finally fix it. >>>> >>>> I'm uncomfortable with aspects of: >>>> - what the language's set of values is (what Ben calls file-space) >>>> - when mappers run >>>> - what their role is (input vs output name determination) >>>> - what they return >>>> - syntax and semantics of initialization >>>> - how values are persisted (especially across restarts) >>>> - how values are returned from app calls >>>> - why "@filename" is special and what else is >>>> - why we should *not* be able to map output datasets until they have >>>> been created >>>> - what creating (and closing) an output dataset means >>>> - how single-assignment and the functional model impacts this >>>> - issues of value-assignment dependence in blocks of assignment statements >>>> - how any of these issues might affect a provenance model >>>> >>>> Not all of these are of equal importance. >>>> >>>> I'm open to *how* this deliberation should take place if people have >>>> suggestions. >>>> >>>> Barring other suggestions, I propose to couch it as proposed revisions >>>> to the tutorial examples or users guide, before implementing it. >>>> >>>> Then implementation could start on the things we agree on, if we agree >>>> that the things we dont agree on wont change the things we do. ;) >>>> >>>> - Mike >>>> >>>> >>>> On 10/4/07 11:41 AM, Ben Clifford wrote: >>>>> yes. >>>>> >>>>> that's why I'd rather they be hacks that look more like what they should >>>>> end up like. >>>>> >>>>> in order to get people doing things with this, there are going to need to >>>>> be hacks (or prototype implementations, if you prefer) - months of >>>>> abstract arm waving about how mappers work without concrete use has not >>>>> resulting in a mapper API that does what some people want it to do; and >>>>> months more of it won't, either. >>>>> >>>>> On Thu, 4 Oct 2007, Mihael Hategan wrote: >>>>> >>>>>> These hacks will bite us in the future. >>>>>> >>>>>> On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: >>>>>>> i'd be inclined to grab the csv reading code from the csv mapper and see >>>>>>> about making a @extractcsv function. It keeps code using it a little bit >>>>>>> more like it would if mappers could support it, which is probably the way >>>>>>> things will end up one day. >>>>>>> >>>>>>> On Thu, 4 Oct 2007, Michael Wilde wrote: >>>>>>> >>>>>>>> a hacky way to do this is first a loop to read the csv file into an array, >>>>>>>> then do a foreach over the array. >>>>>>>> >>>>>>>> i suspect we have the constructs (with @extractint) to do this; not sure if we >>>>>>>> can read the CSV into an array-of-structs that we can then foreach() over. But >>>>>>>> even if not, this will work using parallel arrays, and probably not be too >>>>>>>> unpleasing. >>>>>>>> >>>>>>>> lets try it. >>>>>>>> >>>>>>>> On 10/4/07 10:15 AM, Ben Clifford wrote: >>>>>>>>> you're the second person today to talk about reading in values from csv >>>>>>>>> files. maybe we should implement this. >>>>>>> _______________________________________________ >>>>>>> Swift-devel mailing list >>>>>>> Swift-devel at ci.uchicago.edu >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>> >>> > > From hategan at mcs.anl.gov Fri Oct 5 09:58:04 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 05 Oct 2007 09:58:04 -0500 Subject: [Swift-devel] Re: passing types and variables in swift In-Reply-To: <47064EB0.4010503@mcs.anl.gov> References: <20071004094828.AUN31015@m4500-00.uchicago.edu> <4705086B.7030904@mcs.anl.gov> <1191515899.3641.0.camel@blabla.mcs.anl.gov> <4705CE62.2090209@mcs.anl.gov> <1191592341.31159.0.camel@blabla.mcs.anl.gov> <470644AF.8060505@mcs.anl.gov> <1191593467.31592.3.camel@blabla.mcs.anl.gov> <47064EB0.4010503@mcs.anl.gov> Message-ID: <1191596284.1926.4.camel@blabla.mcs.anl.gov> You're arguing against a different matter. I did not say we shouldn't solve these problems. We should. However we should not keep having the same discussion over and over. On Fri, 2007-10-05 at 09:48 -0500, Michael Wilde wrote: > Sorry, I'm not going to accept f) as an answer. > I'm going to argue hard that some things should change. > > We clearly need a better way to integrate tabular data - both from files > and external app returns - with datasets as in the code examples we have. > > We clearly have issues in variable/collection closing that cause > (unpleasantly) surprising behavior. > > We need ways to make mapping easier. And we certainly need to more > clearly explain to users how the mapper model works (although Ben's > recent steps are a huge step forward on this) > > I think out of this list, and by analyzing and cleaning up existing code > examples, we can identify the problems that are both highest prio and > lowest cost to fix, and see where we stand on the rest. > > But there is clearly need for language assessment and discussion. > I expect some of the current answers to change, sorry. > > - Mike > > > On 10/5/07 9:11 AM, Mihael Hategan wrote: > > On Fri, 2007-10-05 at 09:05 -0500, Michael Wilde wrote: > >> please clarify - briefly is fine but your meaning is unclear. > >> are you saying: > >> a) these parts of the language are fine as-is? > >> b) these issues should be addressed but are low priority? > >> c) you have an idea of how to address these issues technically? > >> d) you dont but have an idea of how to get there socially? > >> e) none - please specify > > > > f) none of the above. > > > > These questions were already answered in previous discussions. Some > > repeatedly. So I'm not sure what you're looking for here. The answers > > won't change. > > > >> - Mike > >> > >> On 10/5/07 8:52 AM, Mihael Hategan wrote: > >>> I have a different idea. Let's not do this. > >>> > >>> On Fri, 2007-10-05 at 00:40 -0500, Michael Wilde wrote: > >>>> I'm ready to start the seconds round of months of arm waving on mappers, > >>>> by the way. I hope we can cut it down to weeks. > >>>> > >>>> Lets get the data model cleaned up before the user and code base grows > >>>> bigger, and we'll piss people off when we have to finally fix it. > >>>> > >>>> I'm uncomfortable with aspects of: > >>>> - what the language's set of values is (what Ben calls file-space) > >>>> - when mappers run > >>>> - what their role is (input vs output name determination) > >>>> - what they return > >>>> - syntax and semantics of initialization > >>>> - how values are persisted (especially across restarts) > >>>> - how values are returned from app calls > >>>> - why "@filename" is special and what else is > >>>> - why we should *not* be able to map output datasets until they have > >>>> been created > >>>> - what creating (and closing) an output dataset means > >>>> - how single-assignment and the functional model impacts this > >>>> - issues of value-assignment dependence in blocks of assignment statements > >>>> - how any of these issues might affect a provenance model > >>>> > >>>> Not all of these are of equal importance. > >>>> > >>>> I'm open to *how* this deliberation should take place if people have > >>>> suggestions. > >>>> > >>>> Barring other suggestions, I propose to couch it as proposed revisions > >>>> to the tutorial examples or users guide, before implementing it. > >>>> > >>>> Then implementation could start on the things we agree on, if we agree > >>>> that the things we dont agree on wont change the things we do. ;) > >>>> > >>>> - Mike > >>>> > >>>> > >>>> On 10/4/07 11:41 AM, Ben Clifford wrote: > >>>>> yes. > >>>>> > >>>>> that's why I'd rather they be hacks that look more like what they should > >>>>> end up like. > >>>>> > >>>>> in order to get people doing things with this, there are going to need to > >>>>> be hacks (or prototype implementations, if you prefer) - months of > >>>>> abstract arm waving about how mappers work without concrete use has not > >>>>> resulting in a mapper API that does what some people want it to do; and > >>>>> months more of it won't, either. > >>>>> > >>>>> On Thu, 4 Oct 2007, Mihael Hategan wrote: > >>>>> > >>>>>> These hacks will bite us in the future. > >>>>>> > >>>>>> On Thu, 2007-10-04 at 15:46 +0000, Ben Clifford wrote: > >>>>>>> i'd be inclined to grab the csv reading code from the csv mapper and see > >>>>>>> about making a @extractcsv function. It keeps code using it a little bit > >>>>>>> more like it would if mappers could support it, which is probably the way > >>>>>>> things will end up one day. > >>>>>>> > >>>>>>> On Thu, 4 Oct 2007, Michael Wilde wrote: > >>>>>>> > >>>>>>>> a hacky way to do this is first a loop to read the csv file into an array, > >>>>>>>> then do a foreach over the array. > >>>>>>>> > >>>>>>>> i suspect we have the constructs (with @extractint) to do this; not sure if we > >>>>>>>> can read the CSV into an array-of-structs that we can then foreach() over. But > >>>>>>>> even if not, this will work using parallel arrays, and probably not be too > >>>>>>>> unpleasing. > >>>>>>>> > >>>>>>>> lets try it. > >>>>>>>> > >>>>>>>> On 10/4/07 10:15 AM, Ben Clifford wrote: > >>>>>>>>> you're the second person today to talk about reading in values from csv > >>>>>>>>> files. maybe we should implement this. > >>>>>>> _______________________________________________ > >>>>>>> Swift-devel mailing list > >>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>> > >>> > > > > > From bugzilla-daemon at mcs.anl.gov Sat Oct 6 11:11:13 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 6 Oct 2007 11:11:13 -0500 (CDT) Subject: [Swift-devel] [Bug 106] New: Improve error messages for double-set and un-set variables Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=106 Summary: Improve error messages for double-set and un-set variables Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: P5 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: wilde at mcs.anl.gov Message for double assignment: $ cat t10.swift int a; a=1; a=2; $ swift t10.swift Swift v0.3 r1319 (modified locally) RunID: 20071005-0827-f9sriuv3 Execution failed: java.lang.IllegalArgumentException: a is already assigned with a value of 1.0 $ ACTION: the presence of Java exceptions should be hidden. The message should just say: Execution failure at line x in proc y: variable a is already assigned. -- $ swift t7.swift Swift v0.3 r1319 (modified locally) RunID: 20071005-0805-6v0mxeua echo started echo completed $ cat hello2.txt org.griphyn.vdl.mapping.RootDataNode with no value at dataset=m $ cat t7.swift type messagefile; type file; (messagefile t) echo ( string s1 ) { app { echo s1 stdout=@filename(t); } } (messagefile t) pecho (string s[] ) { app { echo s[0] stdout=@filename(t); } } messagefile outfile <"hello2.txt">; string m; string ma[]; outfile = echo( m ); --- to be continued... (mike) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Mon Oct 8 05:13:22 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 8 Oct 2007 05:13:22 -0500 (CDT) Subject: [Swift-devel] [Bug 107] New: restarts broken (by generalisation of data file handling) Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=107 Summary: restarts broken (by generalisation of data file handling) Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: benc at hawaga.org.uk Changes to generalise data file handling broke restarts. see swift-devel thread Date: Mon, 1 Oct 2007 14:59:17 +0000 (GMT) From: Ben Clifford To: swift-devel at ci.uchicago.edu Subject: restarts -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From benc at hawaga.org.uk Mon Oct 8 05:13:59 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 8 Oct 2007 10:13:59 +0000 (GMT) Subject: [Swift-devel] restarts In-Reply-To: <1191254412.5653.1.camel@blabla.mcs.anl.gov> References: <1191254412.5653.1.camel@blabla.mcs.anl.gov> Message-ID: This is bug 107 so it doesn't get forgotten. On Mon, 1 Oct 2007, Mihael Hategan wrote: > It's caused by the addition of generalized files. So basically restarts > are broken at this point. When I get some time, I'll work on the file > management part and this. [broken restarts] -- From wilde at mcs.anl.gov Mon Oct 8 09:25:43 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 08 Oct 2007 09:25:43 -0500 Subject: [Swift-devel] Problem with @extractint? Message-ID: <470A3DE7.1060408@mcs.anl.gov> @extractint is behaving unpredictably for me. In the following script: -- type file; int i; file f; (file r1 ) set1 ( ) { app { set1 @r1 ; } } file f <"varf.value">; f = set1(); print("f=", at extractint(f)); // this print works when the next assign // stmt is commented out // i=@extractint(f); // Program works when you comment out this line, // hangs otherwise //print("i=",i); -- it works if i dont try to assign the value of @extractint to an int but hangs otherwise. set1 is: echo 77777 >$1 Ive tried various other patterns, and some seem to work; I dont know if I missed a data flow issue here or if @extractint has a problem. Any advice while I continue to debug this? From wilde at mcs.anl.gov Mon Oct 8 11:47:59 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 08 Oct 2007 11:47:59 -0500 Subject: [Swift-devel] Extensions to @extractint feasible? Message-ID: <470A5F3F.5090402@mcs.anl.gov> Would it be feasible and worthwhile to extend @extractint with the ability to read a file into an array, and to handle all the primitive data types? Or, just to generalize this and do @extractcsv as Ben was suggesting? Or do both sets of primitives make sense (a fuller set of @extract_ primtives)? Regarding the ability to set a field of a struct to be a mapped file - as far as I can tell, this is not currently possible: - you cant specify that a field is to be mapped - you cant assign a mapped value in any way at all So it would be nice if an @extractcsv primitive also allowed you to send any of the fields through a mapper. Am I missing any currenty implemented approach to setting mapped fields in a struct? From wilde at mcs.anl.gov Mon Oct 8 12:41:45 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 08 Oct 2007 12:41:45 -0500 Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <470A3DE7.1060408@mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> Message-ID: <470A6BD9.8050903@mcs.anl.gov> It seems that if I extractint from a file that is static it works fine, but if I extract from a file that I just derived and assign to an int, it hangs. seems like swift is (or I am) getting confused in the data flow dependencies when the value of extractint is assigned. On 10/8/07 9:25 AM, Michael Wilde wrote: > @extractint is behaving unpredictably for me. > > In the following script: > -- > type file; > > int i; > file f; > > (file r1 ) set1 ( ) { > app { set1 @r1 ; } > } > > file f <"varf.value">; > f = set1(); > print("f=", at extractint(f)); // this print works when the next assign > // stmt is commented out > > // i=@extractint(f); // Program works when you comment out this line, > // hangs otherwise > //print("i=",i); > > -- > it works if i dont try to assign the value of @extractint to an int but > hangs otherwise. > > set1 is: > echo 77777 >$1 > > > Ive tried various other patterns, and some seem to work; I dont know if > I missed a data flow issue here or if @extractint has a problem. > > Any advice while I continue to debug this? > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Mon Oct 8 12:45:07 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 08 Oct 2007 12:45:07 -0500 Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <470A6BD9.8050903@mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> Message-ID: <1191865507.18080.19.camel@blabla.mcs.anl.gov> Send the .kml file. On Mon, 2007-10-08 at 12:41 -0500, Michael Wilde wrote: > It seems that if I extractint from a file that is static it works fine, > but if I extract from a file that I just derived and assign to an int, > it hangs. > > seems like swift is (or I am) getting confused in the data flow > dependencies when the value of extractint is assigned. > > On 10/8/07 9:25 AM, Michael Wilde wrote: > > @extractint is behaving unpredictably for me. > > > > In the following script: > > -- > > type file; > > > > int i; > > file f; > > > > (file r1 ) set1 ( ) { > > app { set1 @r1 ; } > > } > > > > file f <"varf.value">; > > f = set1(); > > print("f=", at extractint(f)); // this print works when the next assign > > // stmt is commented out > > > > // i=@extractint(f); // Program works when you comment out this line, > > // hangs otherwise > > //print("i=",i); > > > > -- > > it works if i dont try to assign the value of @extractint to an int but > > hangs otherwise. > > > > set1 is: > > echo 77777 >$1 > > > > > > Ive tried various other patterns, and some seem to work; I dont know if > > I missed a data flow issue here or if @extractint has a problem. > > > > Any advice while I continue to debug this? > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Mon Oct 8 13:40:23 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 08 Oct 2007 13:40:23 -0500 Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <1191865507.18080.19.camel@blabla.mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> Message-ID: <470A7997.4040303@mcs.anl.gov> Attached, with source out kml and logs for both working and hanging cases. On 10/8/07 12:45 PM, Mihael Hategan wrote: > Send the .kml file. > > On Mon, 2007-10-08 at 12:41 -0500, Michael Wilde wrote: >> It seems that if I extractint from a file that is static it works fine, >> but if I extract from a file that I just derived and assign to an int, >> it hangs. >> >> seems like swift is (or I am) getting confused in the data flow >> dependencies when the value of extractint is assigned. >> >> On 10/8/07 9:25 AM, Michael Wilde wrote: >>> @extractint is behaving unpredictably for me. >>> >>> In the following script: >>> -- >>> type file; >>> >>> int i; >>> file f; >>> >>> (file r1 ) set1 ( ) { >>> app { set1 @r1 ; } >>> } >>> >>> file f <"varf.value">; >>> f = set1(); >>> print("f=", at extractint(f)); // this print works when the next assign >>> // stmt is commented out >>> >>> // i=@extractint(f); // Program works when you comment out this line, >>> // hangs otherwise >>> //print("i=",i); >>> >>> -- >>> it works if i dont try to assign the value of @extractint to an int but >>> hangs otherwise. >>> >>> set1 is: >>> echo 77777 >$1 >>> >>> >>> Ive tried various other patterns, and some seem to work; I dont know if >>> I missed a data flow issue here or if @extractint has a problem. >>> >>> Any advice while I continue to debug this? >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: t26.tar Type: application/x-tar Size: 30720 bytes Desc: not available URL: From wilde at mcs.anl.gov Mon Oct 8 14:08:16 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 08 Oct 2007 14:08:16 -0500 Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <470A7997.4040303@mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> Message-ID: <470A8020.9020903@mcs.anl.gov> Updated t26hangs.swift to elim duplicate declare of file. Still hangs; source and kml are: $ cat t26hangs.swift type file; int i; (file r1 ) set1 ( ) { app { set1 @r1 ; } } file f <"varf.value">; f = set1(); print("f=", at extractint(f)); i=@extractint(f); // Program works when you comment out this line, // hangs otherwise print("i=",i); $ cat t26hangs.kml set1 r1 i-b13c9714-8b61-4989-8be1-3bcbb154181f false i f f f i $ On 10/8/07 1:40 PM, Michael Wilde wrote: > Attached, with source out kml and logs for both working and hanging cases. > > > > On 10/8/07 12:45 PM, Mihael Hategan wrote: >> Send the .kml file. >> >> On Mon, 2007-10-08 at 12:41 -0500, Michael Wilde wrote: >>> It seems that if I extractint from a file that is static it works >>> fine, but if I extract from a file that I just derived and assign to >>> an int, it hangs. >>> >>> seems like swift is (or I am) getting confused in the data flow >>> dependencies when the value of extractint is assigned. >>> >>> On 10/8/07 9:25 AM, Michael Wilde wrote: >>>> @extractint is behaving unpredictably for me. >>>> >>>> In the following script: >>>> -- >>>> type file; >>>> >>>> int i; >>>> file f; >>>> >>>> (file r1 ) set1 ( ) { >>>> app { set1 @r1 ; } >>>> } >>>> >>>> file f <"varf.value">; >>>> f = set1(); >>>> print("f=", at extractint(f)); // this print works when the next assign >>>> // stmt is commented out >>>> >>>> // i=@extractint(f); // Program works when you comment out this line, >>>> // hangs otherwise >>>> //print("i=",i); >>>> >>>> -- >>>> it works if i dont try to assign the value of @extractint to an int >>>> but hangs otherwise. >>>> >>>> set1 is: >>>> echo 77777 >$1 >>>> >>>> >>>> Ive tried various other patterns, and some seem to work; I dont know >>>> if I missed a data flow issue here or if @extractint has a problem. >>>> >>>> Any advice while I continue to debug this? >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> >> > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Tue Oct 9 08:22:13 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 9 Oct 2007 13:22:13 +0000 (GMT) Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <470A7997.4040303@mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> Message-ID: The bit of kml that does the assignment is run in a sequential bit that sets up variables, before any of the parallel stuff happens (that usually consists of procedure calls, and is the part that ends up being evaluated in data dependency order rather than source text order). It makes sense to allow what you want to do, I think. -- From hategan at mcs.anl.gov Tue Oct 9 09:57:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Oct 2007 09:57:39 -0500 Subject: [Swift-devel] Problem with @extractint? In-Reply-To: References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> Message-ID: <1191941859.5743.1.camel@blabla.mcs.anl.gov> On Tue, 2007-10-09 at 13:22 +0000, Ben Clifford wrote: > The bit of kml that does the assignment is run in a sequential bit that > sets up variables, before any of the parallel stuff happens (that usually > consists of procedure calls, and is the part that ends up being evaluated > in data dependency order rather than source text order). > > It makes sense to allow what you want to do, I think. There was some discussion about removing the @ sign in front of built-in functions. There is no need for the distinction, and, apparently, it does cause problems. > From wilde at mcs.anl.gov Tue Oct 9 10:38:27 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 09 Oct 2007 10:38:27 -0500 Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <1191941859.5743.1.camel@blabla.mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> <1191941859.5743.1.camel@blabla.mcs.anl.gov> Message-ID: <470BA073.2030201@mcs.anl.gov> If I understood this case right, the data dependencies were logically correct but didnt behave so because @extractint() wasnt treated as a first-class value. (Mihael, can you clarify, if I got this wrong) I need to gather all my notes, but one point is that in the userguide and tutorial, early on, we should document how the data flow model works, how its central to swift, and some examples of how it can cause program behavior to be "surprising" (eg when the statements in a procedure execute in reverse order, or when a statement in a calling function executes while a called function is still active, trigerred by events in the callee). Until one gets this model, Swift often seems to violate the "principle of least astonishment" ;) I'll save discussion on this till I get my notes out. I think the model is fine, and that we need to better understand how it affects programming and how to train users to use it (and debug in it). On 10/9/07 9:57 AM, Mihael Hategan wrote: > On Tue, 2007-10-09 at 13:22 +0000, Ben Clifford wrote: >> The bit of kml that does the assignment is run in a sequential bit that >> sets up variables, before any of the parallel stuff happens (that usually >> consists of procedure calls, and is the part that ends up being evaluated >> in data dependency order rather than source text order). >> >> It makes sense to allow what you want to do, I think. > > There was some discussion about removing the @ sign in front of built-in > functions. There is no need for the distinction, and, apparently, it > does cause problems. > > > From hategan at mcs.anl.gov Tue Oct 9 10:43:38 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Oct 2007 10:43:38 -0500 Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <470BA073.2030201@mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> <1191941859.5743.1.camel@blabla.mcs.anl.gov> <470BA073.2030201@mcs.anl.gov> Message-ID: <1191944618.7415.2.camel@blabla.mcs.anl.gov> On Tue, 2007-10-09 at 10:38 -0500, Michael Wilde wrote: > If I understood this case right, the data dependencies were logically > correct but didnt behave so because @extractint() wasnt treated as a > first-class value. (Mihael, can you clarify, if I got this wrong) It's what Ben says below. > > I need to gather all my notes, but one point is that in the userguide > and tutorial, early on, we should document how the data flow model > works, how its central to swift, and some examples of how it can cause > program behavior to be "surprising" (eg when the statements in a > procedure execute in reverse order, or when a statement in a calling > function executes while a called function is still active, trigerred by > events in the callee). Until one gets this model, Swift often seems to > violate the "principle of least astonishment" ;) > > I'll save discussion on this till I get my notes out. I think the model > is fine, and that we need to better understand how it affects > programming and how to train users to use it (and debug in it). > > On 10/9/07 9:57 AM, Mihael Hategan wrote: > > On Tue, 2007-10-09 at 13:22 +0000, Ben Clifford wrote: > >> The bit of kml that does the assignment is run in a sequential bit that > >> sets up variables, before any of the parallel stuff happens (that usually > >> consists of procedure calls, and is the part that ends up being evaluated > >> in data dependency order rather than source text order). > >> > >> It makes sense to allow what you want to do, I think. > > > > There was some discussion about removing the @ sign in front of built-in > > functions. There is no need for the distinction, and, apparently, it > > does cause problems. > > > > > > > From benc at hawaga.org.uk Tue Oct 9 10:46:16 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 9 Oct 2007 15:46:16 +0000 (GMT) Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <470BA073.2030201@mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> <1191941859.5743.1.camel@blabla.mcs.anl.gov> <470BA073.2030201@mcs.anl.gov> Message-ID: On Tue, 9 Oct 2007, Michael Wilde wrote: > If I understood this case right, the data dependencies were logically correct > but didnt behave so because @extractint() wasnt treated as a first-class > value. (Mihael, can you clarify, if I got this wrong) It behaves differently because it is used in an assignment statement in the broken case and not in an assignment statement in the working case. Assignments compile to very different looking code than, for example, procedure calls (and different kinds of assignments compile to different kinds of code, too). There's certainly scope for reworking some of that behaviour based on changes that have happened over the past year. > Until one gets this model, Swift often seems to violate the "principle > of least astonishment" ;) There are also many rough edges in both the model and the implementation (such as the behaviour you encountered). -- From yongzh at cs.uchicago.edu Tue Oct 9 20:09:34 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 9 Oct 2007 20:09:34 -0500 (CDT) Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <1191941859.5743.1.camel@blabla.mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> <1191941859.5743.1.camel@blabla.mcs.anl.gov> Message-ID: We had discussions about mapping functions vs mappers too. Essentially readdata is doing what a mapper (an advanced CSV mapper) should do. I am not sure whether this is the right direction though, as 'readdata' might just have been implemented as a specialized karajan function, instead of being a swift level mapper or mapping function. So for mapping, there are two choices: either the type info is passed in to the mapping, which will try to interprete the values according to the types/member-types passed in. or the mapping is type agnostic, and extractint, extractstring, extractdata, whatever is applied to a generic value to get the actual typed data back. I can see cases where this may not work well. Yong. On Tue, 9 Oct 2007, Mihael Hategan wrote: > On Tue, 2007-10-09 at 13:22 +0000, Ben Clifford wrote: > > The bit of kml that does the assignment is run in a sequential bit that > > sets up variables, before any of the parallel stuff happens (that usually > > consists of procedure calls, and is the part that ends up being evaluated > > in data dependency order rather than source text order). > > > > It makes sense to allow what you want to do, I think. > > There was some discussion about removing the @ sign in front of built-in > functions. There is no need for the distinction, and, apparently, it > does cause problems. > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Tue Oct 9 20:16:07 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Oct 2007 20:16:07 -0500 Subject: [Swift-devel] Problem with @extractint? In-Reply-To: References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> <1191941859.5743.1.camel@blabla.mcs.anl.gov> Message-ID: <1191978967.20623.4.camel@blabla.mcs.anl.gov> On Tue, 2007-10-09 at 20:09 -0500, Yong Zhao wrote: > We had discussions about mapping functions vs mappers too. Essentially > readdata is doing what a mapper (an advanced CSV mapper) should do. I am > not sure whether this is the right direction though, as 'readdata' might > just have been implemented as a specialized karajan function, instead of > being a swift level mapper or mapping function. It is a specialized karajan function, not a mapping function. Thing is, I don't really like the special treatment that mapping functions get. I don't think there's a need for such special treatment, because it isn't needed. Unless I'm missing something. > > So for mapping, there are two choices: > either the type info is passed in to the mapping, which will try to > interprete the values according to the types/member-types passed in. > > or the mapping is type agnostic, and extractint, extractstring, > extractdata, whatever is applied to a generic value to get the actual > typed data back. I can see cases where this may not work well. > > Yong. > > On Tue, 9 Oct 2007, Mihael Hategan wrote: > > > On Tue, 2007-10-09 at 13:22 +0000, Ben Clifford wrote: > > > The bit of kml that does the assignment is run in a sequential bit that > > > sets up variables, before any of the parallel stuff happens (that usually > > > consists of procedure calls, and is the part that ends up being evaluated > > > in data dependency order rather than source text order). > > > > > > It makes sense to allow what you want to do, I think. > > > > There was some discussion about removing the @ sign in front of built-in > > functions. There is no need for the distinction, and, apparently, it > > does cause problems. > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From hategan at mcs.anl.gov Tue Oct 9 22:45:44 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Oct 2007 22:45:44 -0500 Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <1191978967.20623.4.camel@blabla.mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> <1191941859.5743.1.camel@blabla.mcs.anl.gov> <1191978967.20623.4.camel@blabla.mcs.anl.gov> Message-ID: <1191987944.7542.3.camel@blabla.mcs.anl.gov> On Tue, 2007-10-09 at 20:16 -0500, Mihael Hategan wrote: > I don't think there's a need for such special treatment, because it isn't needed. Hmm. Good argument there Mihael. Tell us more of these interesting inferences! > > > > So for mapping, there are two choices: > > either the type info is passed in to the mapping, which will try to > > interprete the values according to the types/member-types passed in. > > > > or the mapping is type agnostic, and extractint, extractstring, > > extractdata, whatever is applied to a generic value to get the actual > > typed data back. I can see cases where this may not work well. > > > > Yong. > > > > On Tue, 9 Oct 2007, Mihael Hategan wrote: > > > > > On Tue, 2007-10-09 at 13:22 +0000, Ben Clifford wrote: > > > > The bit of kml that does the assignment is run in a sequential bit that > > > > sets up variables, before any of the parallel stuff happens (that usually > > > > consists of procedure calls, and is the part that ends up being evaluated > > > > in data dependency order rather than source text order). > > > > > > > > It makes sense to allow what you want to do, I think. > > > > > > There was some discussion about removing the @ sign in front of built-in > > > functions. There is no need for the distinction, and, apparently, it > > > does cause problems. > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Wed Oct 10 04:44:03 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 09:44:03 +0000 (GMT) Subject: [Swift-devel] Problem with @extractint? In-Reply-To: <1191978967.20623.4.camel@blabla.mcs.anl.gov> References: <470A3DE7.1060408@mcs.anl.gov> <470A6BD9.8050903@mcs.anl.gov> <1191865507.18080.19.camel@blabla.mcs.anl.gov> <470A7997.4040303@mcs.anl.gov> <1191941859.5743.1.camel@blabla.mcs.anl.gov> <1191978967.20623.4.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 9 Oct 2007, Mihael Hategan wrote: > It is a specialized karajan function, not a mapping function. Thing is, > I don't really like the special treatment that mapping functions get. I > don't think there's a need for such special treatment, because it isn't > needed. Unless I'm missing something. I think making all procedures/functions look the same syntaxwise is the right thing to do. It will need some adjustments in the way expressions are evaluated; that's some work but straightforward. -- From benc at hawaga.org.uk Wed Oct 10 05:23:16 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 10:23:16 +0000 (GMT) Subject: [Swift-devel] readData Message-ID: Mihael added the below language construct to the language the other day. This might be useful where the csv_mapper was being used before to read in non-file data. Its in the SVN. Mihael Hategan wrote: There's a new function: readData. It's not an @function, so don't use it that way because it won't work (it needs to know what variable it assigns to, so that it knows how to interpret the contents of the file). It can read primitive things, arrays of primitive things, structs and arrays of structs. It can either take a file or a string as a parameter, although I recommend the former since it can deal with data dependencies. For example usage, see tests/language-behaviour/readData.swift. Here's a short preview: type circle { int x; int y; float r; string name; } circle ca[]; ca = readData("readData.circleArray.in"); readData.circleArray.in: x y r name 1 1 5 CircleOne 2 2 7 CircleTwo It doesn't deal with spaces in strings in the CSV format for now, but it's a start. Mihael -- From wilde at mcs.anl.gov Wed Oct 10 09:00:22 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Oct 2007 09:00:22 -0500 Subject: [Swift-devel] readData In-Reply-To: References: Message-ID: <470CDAF6.7000206@mcs.anl.gov> Mihael, all - readdata() works great, and I think gives Andrew exactly what he asked for. I updated the example parameter-sweep loop to use readdata to grab the multi-column input file. One note: as far as I can tell, you use the conventions the data columns must be exactly 16 characters wide, space separated. Is that correct? (I assume we'll generalize this time permits). Here's the new example, Andrew. - Mike type file; // Simulate encapsulating an app's parameters as a struct type params { int x; int y; float r; boolean b; string infilename; string outfilename; }; // Simulate an app myapp(params p, file infile, file outfile ) { app { db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile @outfile ; } } // Loop over the parameter array, calling app in parallel doall(params plist[]) { foreach pval,j in plist { // convert filename string to mapped file reference file infile ; file outfile ; // Call the application myapp(pval,infile, outfile); } } // Main params plist[]; plist = readdata("parameters"); doall(plist); // Data File "parameters" follows. Data files listed in it must exist. // each line is greater than 80 bytes and is only wrapped here by email // (actual files attached) x y r b infilename outfilename 1 2 1.234 1 inf001.data outf001.data 3 4 5.678 0 inf002.data outf002.data On 10/10/07 5:23 AM, Ben Clifford wrote: > Mihael added the below language construct to the language the other day. > > This might be useful where the csv_mapper was being used before to read in > non-file data. > > Its in the SVN. > > > Mihael Hategan wrote: > > There's a new function: readData. It's not an @function, so don't use it > that way because it won't work (it needs to know what variable it > assigns to, so that it knows how to interpret the contents of the > file). > > It can read primitive things, arrays of primitive things, structs and > arrays of structs. > > It can either take a file or a string as a parameter, although I > recommend the former since it can deal with data dependencies. > > For example usage, see tests/language-behaviour/readData.swift. > > Here's a short preview: > type circle { > int x; > int y; > float r; > string name; > } > > circle ca[]; > > ca = readData("readData.circleArray.in"); > > readData.circleArray.in: > x y r name > 1 1 5 CircleOne > 2 2 7 CircleTwo > > It doesn't deal with spaces in strings in the CSV format for now, but > it's a start. > > Mihael > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: t5g.swift URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: parameters URL: From hategan at mcs.anl.gov Wed Oct 10 09:31:08 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 10 Oct 2007 09:31:08 -0500 Subject: [Swift-devel] readData In-Reply-To: <470CDAF6.7000206@mcs.anl.gov> References: <470CDAF6.7000206@mcs.anl.gov> Message-ID: <1192026668.3339.4.camel@blabla.mcs.anl.gov> On Wed, 2007-10-10 at 09:00 -0500, Michael Wilde wrote: > Mihael, all - readdata() works great, and I think gives Andrew exactly > what he asked for. > > I updated the example parameter-sweep loop to use readdata to grab the > multi-column input file. > > One note: as far as I can tell, you use the conventions the data columns > must be exactly 16 characters wide, space separated. Is that correct? (I > assume we'll generalize this time permits). No. They must be horizontal-whitespace separated. The 16 characters wide restriction does not exist. The following is valid: a b c d 1 2 3 4 5 6 7 8 9 10 11 12 > > Here's the new example, Andrew. > > - Mike > > type file; > > // Simulate encapsulating an app's parameters as a struct > > type params { > int x; > int y; > float r; > boolean b; > string infilename; > string outfilename; > }; > > // Simulate an app > > myapp(params p, file infile, file outfile ) > { > app { > db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile > @outfile ; > } > } > > // Loop over the parameter array, calling app in parallel > > doall(params plist[]) > { > foreach pval,j in plist { > > // convert filename string to mapped file reference > file infile ; > file outfile ; > > // Call the application > myapp(pval,infile, outfile); > } > } > > // Main > > params plist[]; > plist = readdata("parameters"); > doall(plist); > > // Data File "parameters" follows. Data files listed in it must exist. > // each line is greater than 80 bytes and is only wrapped here by email > // (actual files attached) > > x y r b > infilename outfilename > 1 2 1.234 1 > inf001.data outf001.data > 3 4 5.678 0 > inf002.data outf002.data > > > > > > On 10/10/07 5:23 AM, Ben Clifford wrote: > > Mihael added the below language construct to the language the other day. > > > > This might be useful where the csv_mapper was being used before to read in > > non-file data. > > > > Its in the SVN. > > > > > > Mihael Hategan wrote: > > > > There's a new function: readData. It's not an @function, so don't use it > > that way because it won't work (it needs to know what variable it > > assigns to, so that it knows how to interpret the contents of the > > file). > > > > It can read primitive things, arrays of primitive things, structs and > > arrays of structs. > > > > It can either take a file or a string as a parameter, although I > > recommend the former since it can deal with data dependencies. > > > > For example usage, see tests/language-behaviour/readData.swift. > > > > Here's a short preview: > > type circle { > > int x; > > int y; > > float r; > > string name; > > } > > > > circle ca[]; > > > > ca = readData("readData.circleArray.in"); > > > > readData.circleArray.in: > > x y r name > > 1 1 5 CircleOne > > 2 2 7 CircleTwo > > > > It doesn't deal with spaces in strings in the CSV format for now, but > > it's a start. > > > > Mihael > > > plain text document attachment (t5g.swift) > type file; > > // Simulate encapsulating an app's parameters as a struct > > type params { > int x; > int y; > float r; > boolean b; > string infilename; > string outfilename; > }; > > // Simulate an app > > myapp(params p, file infile, file outfile ) > { > app { > db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile @outfile ; > } > } > > // Loop over the parameter array, calling app in parallel > > doall(params plist[]) > { > foreach pval,j in plist { > > // convert filename string to mapped file reference > file infile ; > file outfile ; > > // Call the application > myapp(pval,infile, outfile); > } > } > > // Main > > params plist[]; > plist = readdata("parameters"); > doall(plist); > plain text document attachment (parameters) > x y r b infilename outfilename > 1 2 1.234 1 inf001.data outf001.data > 3 4 5.678 0 inf002.data outf002.data > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Wed Oct 10 10:12:10 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Oct 2007 10:12:10 -0500 Subject: [Swift-devel] readData In-Reply-To: <1192026668.3339.4.camel@blabla.mcs.anl.gov> References: <470CDAF6.7000206@mcs.anl.gov> <1192026668.3339.4.camel@blabla.mcs.anl.gov> Message-ID: <470CEBCA.3010809@mcs.anl.gov> Thats not what I encountered when I tested (which surprised me). I will restest and see what confused me (or my code). - Mike On 10/10/07 9:31 AM, Mihael Hategan wrote: > On Wed, 2007-10-10 at 09:00 -0500, Michael Wilde wrote: >> Mihael, all - readdata() works great, and I think gives Andrew exactly >> what he asked for. >> >> I updated the example parameter-sweep loop to use readdata to grab the >> multi-column input file. >> >> One note: as far as I can tell, you use the conventions the data columns >> must be exactly 16 characters wide, space separated. Is that correct? (I >> assume we'll generalize this time permits). > > No. They must be horizontal-whitespace separated. The 16 characters wide > restriction does not exist. The following is valid: > a b c d > 1 2 3 4 > 5 6 7 8 > 9 10 11 12 > >> Here's the new example, Andrew. >> >> - Mike >> >> type file; >> >> // Simulate encapsulating an app's parameters as a struct >> >> type params { >> int x; >> int y; >> float r; >> boolean b; >> string infilename; >> string outfilename; >> }; >> >> // Simulate an app >> >> myapp(params p, file infile, file outfile ) >> { >> app { >> db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile >> @outfile ; >> } >> } >> >> // Loop over the parameter array, calling app in parallel >> >> doall(params plist[]) >> { >> foreach pval,j in plist { >> >> // convert filename string to mapped file reference >> file infile ; >> file outfile ; >> >> // Call the application >> myapp(pval,infile, outfile); >> } >> } >> >> // Main >> >> params plist[]; >> plist = readdata("parameters"); >> doall(plist); >> >> // Data File "parameters" follows. Data files listed in it must exist. >> // each line is greater than 80 bytes and is only wrapped here by email >> // (actual files attached) >> >> x y r b >> infilename outfilename >> 1 2 1.234 1 >> inf001.data outf001.data >> 3 4 5.678 0 >> inf002.data outf002.data >> >> >> >> >> >> On 10/10/07 5:23 AM, Ben Clifford wrote: >>> Mihael added the below language construct to the language the other day. >>> >>> This might be useful where the csv_mapper was being used before to read in >>> non-file data. >>> >>> Its in the SVN. >>> >>> >>> Mihael Hategan wrote: >>> >>> There's a new function: readData. It's not an @function, so don't use it >>> that way because it won't work (it needs to know what variable it >>> assigns to, so that it knows how to interpret the contents of the >>> file). >>> >>> It can read primitive things, arrays of primitive things, structs and >>> arrays of structs. >>> >>> It can either take a file or a string as a parameter, although I >>> recommend the former since it can deal with data dependencies. >>> >>> For example usage, see tests/language-behaviour/readData.swift. >>> >>> Here's a short preview: >>> type circle { >>> int x; >>> int y; >>> float r; >>> string name; >>> } >>> >>> circle ca[]; >>> >>> ca = readData("readData.circleArray.in"); >>> >>> readData.circleArray.in: >>> x y r name >>> 1 1 5 CircleOne >>> 2 2 7 CircleTwo >>> >>> It doesn't deal with spaces in strings in the CSV format for now, but >>> it's a start. >>> >>> Mihael >>> >> plain text document attachment (t5g.swift) >> type file; >> >> // Simulate encapsulating an app's parameters as a struct >> >> type params { >> int x; >> int y; >> float r; >> boolean b; >> string infilename; >> string outfilename; >> }; >> >> // Simulate an app >> >> myapp(params p, file infile, file outfile ) >> { >> app { >> db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile @outfile ; >> } >> } >> >> // Loop over the parameter array, calling app in parallel >> >> doall(params plist[]) >> { >> foreach pval,j in plist { >> >> // convert filename string to mapped file reference >> file infile ; >> file outfile ; >> >> // Call the application >> myapp(pval,infile, outfile); >> } >> } >> >> // Main >> >> params plist[]; >> plist = readdata("parameters"); >> doall(plist); >> plain text document attachment (parameters) >> x y r b infilename outfilename >> 1 2 1.234 1 inf001.data outf001.data >> 3 4 5.678 0 inf002.data outf002.data >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From wilde at mcs.anl.gov Wed Oct 10 10:34:09 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Oct 2007 10:34:09 -0500 Subject: [Swift-devel] readData In-Reply-To: <470CEBCA.3010809@mcs.anl.gov> References: <470CDAF6.7000206@mcs.anl.gov> <1192026668.3339.4.camel@blabla.mcs.anl.gov> <470CEBCA.3010809@mcs.anl.gov> Message-ID: <470CF0F1.6070106@mcs.anl.gov> OK, what seems to be happening is that a simple script using readdata fails occasionally. Ie, some kind of race. Heres the test script and data file: -- script t30.swift: type circle { int x; int y; } circle ca; ca = readdata("d1"); print(ca.x," ",ca.y); -- and file d1: x y 1 10 2 20 $ Running this 6 times, I got 4 successes and 2 failures (of which 5 runs are shown here with a few extraneous intevening commands removed): ... RunID: 20071010-1024-1h41p37d 1 10 $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift Swift v0.3-dev r1339 RunID: 20071010-1024-kiimrf7f 1 10 $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift Swift v0.3-dev r1339 RunID: 20071010-1024-49v2bnzb 1 10 $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift Swift v0.3-dev r1339 RunID: 20071010-1025-m0qmf0hc 1 10 $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift Swift v0.3-dev r1339 RunID: 20071010-1025-cicpxii8 org.griphyn.vdl.mapping.DataNode with no value at dataset=ca path=.x org.griphyn.vdl.mapping.DataNode with no value at dataset=ca path=.y $ On 10/10/07 10:12 AM, Michael Wilde wrote: > Thats not what I encountered when I tested (which surprised me). > I will restest and see what confused me (or my code). > > - Mike > > On 10/10/07 9:31 AM, Mihael Hategan wrote: >> On Wed, 2007-10-10 at 09:00 -0500, Michael Wilde wrote: >>> Mihael, all - readdata() works great, and I think gives Andrew >>> exactly what he asked for. >>> >>> I updated the example parameter-sweep loop to use readdata to grab >>> the multi-column input file. >>> >>> One note: as far as I can tell, you use the conventions the data >>> columns must be exactly 16 characters wide, space separated. Is that >>> correct? (I assume we'll generalize this time permits). >> >> No. They must be horizontal-whitespace separated. The 16 characters wide >> restriction does not exist. The following is valid: >> a b c d >> 1 2 3 4 >> 5 6 7 8 >> 9 10 11 12 >> >>> Here's the new example, Andrew. >>> >>> - Mike >>> >>> type file; >>> >>> // Simulate encapsulating an app's parameters as a struct >>> >>> type params { >>> int x; >>> int y; >>> float r; >>> boolean b; >>> string infilename; >>> string outfilename; >>> }; >>> >>> // Simulate an app >>> >>> myapp(params p, file infile, file outfile ) >>> { >>> app { >>> db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename >>> @infile @outfile ; >>> } >>> } >>> >>> // Loop over the parameter array, calling app in parallel >>> >>> doall(params plist[]) >>> { >>> foreach pval,j in plist { >>> >>> // convert filename string to mapped file reference >>> file infile ; >>> file outfile ; >>> >>> // Call the application >>> myapp(pval,infile, outfile); >>> } >>> } >>> >>> // Main >>> >>> params plist[]; >>> plist = readdata("parameters"); >>> doall(plist); >>> >>> // Data File "parameters" follows. Data files listed in it must exist. >>> // each line is greater than 80 bytes and is only wrapped here by email >>> // (actual files attached) >>> >>> x y r b infilename >>> outfilename >>> 1 2 1.234 1 inf001.data >>> outf001.data >>> 3 4 5.678 0 inf002.data >>> outf002.data >>> >>> >>> >>> >>> >>> On 10/10/07 5:23 AM, Ben Clifford wrote: >>>> Mihael added the below language construct to the language the other >>>> day. >>>> >>>> This might be useful where the csv_mapper was being used before to >>>> read in non-file data. >>>> >>>> Its in the SVN. >>>> >>>> >>>> Mihael Hategan wrote: >>>> >>>> There's a new function: readData. It's not an @function, so don't >>>> use it >>>> that way because it won't work (it needs to know what variable it >>>> assigns to, so that it knows how to interpret the contents of the >>>> file). >>>> It can read primitive things, arrays of primitive things, structs and >>>> arrays of structs. >>>> It can either take a file or a string as a parameter, although I >>>> recommend the former since it can deal with data dependencies. >>>> >>>> For example usage, see tests/language-behaviour/readData.swift. >>>> >>>> Here's a short preview: >>>> type circle { >>>> int x; >>>> int y; >>>> float r; >>>> string name; >>>> } >>>> >>>> circle ca[]; >>>> >>>> ca = readData("readData.circleArray.in"); >>>> >>>> readData.circleArray.in: >>>> x y r name >>>> 1 1 5 CircleOne >>>> 2 2 7 CircleTwo >>>> >>>> It doesn't deal with spaces in strings in the CSV format for now, but >>>> it's a start. >>>> >>>> Mihael >>>> >>> plain text document attachment (t5g.swift) >>> type file; >>> >>> // Simulate encapsulating an app's parameters as a struct >>> >>> type params { >>> int x; >>> int y; >>> float r; >>> boolean b; >>> string infilename; >>> string outfilename; >>> }; >>> >>> // Simulate an app >>> >>> myapp(params p, file infile, file outfile ) >>> { app { db "pecho:" p.x p.y p.r p.b p.infilename >>> p.outfilename @infile @outfile ; >>> } } >>> // Loop over the parameter array, calling app in parallel >>> >>> doall(params plist[]) >>> { >>> foreach pval,j in plist { >>> >>> // convert filename string to mapped file reference >>> file infile ; >>> file outfile ; >>> >>> // Call the application >>> myapp(pval,infile, outfile); >>> } >>> } >>> >>> // Main >>> >>> params plist[]; >>> plist = readdata("parameters"); >>> doall(plist); >>> plain text document attachment (parameters) >>> x y r b >>> infilename outfilename >>> 1 2 1.234 1 >>> inf001.data outf001.data >>> 3 4 5.678 0 >>> inf002.data outf002.data >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Wed Oct 10 10:32:44 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 10 Oct 2007 10:32:44 -0500 Subject: [Swift-devel] readData In-Reply-To: <470CEBCA.3010809@mcs.anl.gov> References: <470CDAF6.7000206@mcs.anl.gov> <1192026668.3339.4.camel@blabla.mcs.anl.gov> <470CEBCA.3010809@mcs.anl.gov> Message-ID: <1192030364.5666.1.camel@blabla.mcs.anl.gov> String[] cols = line.split("\\s+"); Maybe I'm missing something. On Wed, 2007-10-10 at 10:12 -0500, Michael Wilde wrote: > Thats not what I encountered when I tested (which surprised me). > I will restest and see what confused me (or my code). > > - Mike > > On 10/10/07 9:31 AM, Mihael Hategan wrote: > > On Wed, 2007-10-10 at 09:00 -0500, Michael Wilde wrote: > >> Mihael, all - readdata() works great, and I think gives Andrew exactly > >> what he asked for. > >> > >> I updated the example parameter-sweep loop to use readdata to grab the > >> multi-column input file. > >> > >> One note: as far as I can tell, you use the conventions the data columns > >> must be exactly 16 characters wide, space separated. Is that correct? (I > >> assume we'll generalize this time permits). > > > > No. They must be horizontal-whitespace separated. The 16 characters wide > > restriction does not exist. The following is valid: > > a b c d > > 1 2 3 4 > > 5 6 7 8 > > 9 10 11 12 > > > >> Here's the new example, Andrew. > >> > >> - Mike > >> > >> type file; > >> > >> // Simulate encapsulating an app's parameters as a struct > >> > >> type params { > >> int x; > >> int y; > >> float r; > >> boolean b; > >> string infilename; > >> string outfilename; > >> }; > >> > >> // Simulate an app > >> > >> myapp(params p, file infile, file outfile ) > >> { > >> app { > >> db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile > >> @outfile ; > >> } > >> } > >> > >> // Loop over the parameter array, calling app in parallel > >> > >> doall(params plist[]) > >> { > >> foreach pval,j in plist { > >> > >> // convert filename string to mapped file reference > >> file infile ; > >> file outfile ; > >> > >> // Call the application > >> myapp(pval,infile, outfile); > >> } > >> } > >> > >> // Main > >> > >> params plist[]; > >> plist = readdata("parameters"); > >> doall(plist); > >> > >> // Data File "parameters" follows. Data files listed in it must exist. > >> // each line is greater than 80 bytes and is only wrapped here by email > >> // (actual files attached) > >> > >> x y r b > >> infilename outfilename > >> 1 2 1.234 1 > >> inf001.data outf001.data > >> 3 4 5.678 0 > >> inf002.data outf002.data > >> > >> > >> > >> > >> > >> On 10/10/07 5:23 AM, Ben Clifford wrote: > >>> Mihael added the below language construct to the language the other day. > >>> > >>> This might be useful where the csv_mapper was being used before to read in > >>> non-file data. > >>> > >>> Its in the SVN. > >>> > >>> > >>> Mihael Hategan wrote: > >>> > >>> There's a new function: readData. It's not an @function, so don't use it > >>> that way because it won't work (it needs to know what variable it > >>> assigns to, so that it knows how to interpret the contents of the > >>> file). > >>> > >>> It can read primitive things, arrays of primitive things, structs and > >>> arrays of structs. > >>> > >>> It can either take a file or a string as a parameter, although I > >>> recommend the former since it can deal with data dependencies. > >>> > >>> For example usage, see tests/language-behaviour/readData.swift. > >>> > >>> Here's a short preview: > >>> type circle { > >>> int x; > >>> int y; > >>> float r; > >>> string name; > >>> } > >>> > >>> circle ca[]; > >>> > >>> ca = readData("readData.circleArray.in"); > >>> > >>> readData.circleArray.in: > >>> x y r name > >>> 1 1 5 CircleOne > >>> 2 2 7 CircleTwo > >>> > >>> It doesn't deal with spaces in strings in the CSV format for now, but > >>> it's a start. > >>> > >>> Mihael > >>> > >> plain text document attachment (t5g.swift) > >> type file; > >> > >> // Simulate encapsulating an app's parameters as a struct > >> > >> type params { > >> int x; > >> int y; > >> float r; > >> boolean b; > >> string infilename; > >> string outfilename; > >> }; > >> > >> // Simulate an app > >> > >> myapp(params p, file infile, file outfile ) > >> { > >> app { > >> db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile @outfile ; > >> } > >> } > >> > >> // Loop over the parameter array, calling app in parallel > >> > >> doall(params plist[]) > >> { > >> foreach pval,j in plist { > >> > >> // convert filename string to mapped file reference > >> file infile ; > >> file outfile ; > >> > >> // Call the application > >> myapp(pval,infile, outfile); > >> } > >> } > >> > >> // Main > >> > >> params plist[]; > >> plist = readdata("parameters"); > >> doall(plist); > >> plain text document attachment (parameters) > >> x y r b infilename outfilename > >> 1 2 1.234 1 inf001.data outf001.data > >> 3 4 5.678 0 inf002.data outf002.data > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From benc at hawaga.org.uk Wed Oct 10 10:38:58 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 15:38:58 +0000 (GMT) Subject: [Swift-devel] readData In-Reply-To: <470CF0F1.6070106@mcs.anl.gov> References: <470CDAF6.7000206@mcs.anl.gov> <1192026668.3339.4.camel@blabla.mcs.anl.gov> <470CEBCA.3010809@mcs.anl.gov> <470CF0F1.6070106@mcs.anl.gov> Message-ID: You may not be experiencing a race in readData. You may instead be experiencing a race in the implementation of print. It will print out what it knows about its parameters at the time that its encountered. Sometimes, that is the value (if the value has been set); sometimes its the other output you see, a description of the dataset indicating that it doesn't have a value yet. Its probably desirable for print to wait for its parameter to be closed. On Wed, 10 Oct 2007, Michael Wilde wrote: > OK, what seems to be happening is that a simple script using readdata fails > occasionally. Ie, some kind of race. > > Heres the test script and data file: > > -- script t30.swift: > type circle { > int x; > int y; > } > > circle ca; > > ca = readdata("d1"); > print(ca.x," ",ca.y); > > -- and file d1: > x y > 1 10 > 2 20 > $ > > Running this 6 times, I got 4 successes and 2 failures (of which 5 runs are > shown here with a few extraneous intevening commands removed): > > ... > RunID: 20071010-1024-1h41p37d > 1 10 > $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift > Swift v0.3-dev r1339 > > RunID: 20071010-1024-kiimrf7f > 1 10 > $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift > Swift v0.3-dev r1339 > > RunID: 20071010-1024-49v2bnzb > 1 10 > $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift > Swift v0.3-dev r1339 > > RunID: 20071010-1025-m0qmf0hc > 1 10 > $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift > Swift v0.3-dev r1339 > > RunID: 20071010-1025-cicpxii8 > org.griphyn.vdl.mapping.DataNode with no value at dataset=ca path=.x > org.griphyn.vdl.mapping.DataNode with no value at dataset=ca path=.y > $ > > > > On 10/10/07 10:12 AM, Michael Wilde wrote: > > Thats not what I encountered when I tested (which surprised me). > > I will restest and see what confused me (or my code). > > > > - Mike > > > > On 10/10/07 9:31 AM, Mihael Hategan wrote: > > > On Wed, 2007-10-10 at 09:00 -0500, Michael Wilde wrote: > > > > Mihael, all - readdata() works great, and I think gives Andrew exactly > > > > what he asked for. > > > > > > > > I updated the example parameter-sweep loop to use readdata to grab the > > > > multi-column input file. > > > > > > > > One note: as far as I can tell, you use the conventions the data columns > > > > must be exactly 16 characters wide, space separated. Is that correct? (I > > > > assume we'll generalize this time permits). > > > > > > No. They must be horizontal-whitespace separated. The 16 characters wide > > > restriction does not exist. The following is valid: > > > a b c d > > > 1 2 3 4 > > > 5 6 7 8 > > > 9 10 11 12 > > > > > > > Here's the new example, Andrew. > > > > > > > > - Mike > > > > > > > > type file; > > > > > > > > // Simulate encapsulating an app's parameters as a struct > > > > > > > > type params { > > > > int x; > > > > int y; > > > > float r; > > > > boolean b; > > > > string infilename; > > > > string outfilename; > > > > }; > > > > > > > > // Simulate an app > > > > > > > > myapp(params p, file infile, file outfile ) > > > > { > > > > app { > > > > db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile > > > > @outfile ; > > > > } > > > > } > > > > > > > > // Loop over the parameter array, calling app in parallel > > > > > > > > doall(params plist[]) > > > > { > > > > foreach pval,j in plist { > > > > > > > > // convert filename string to mapped file reference > > > > file infile ; > > > > file outfile ; > > > > > > > > // Call the application > > > > myapp(pval,infile, outfile); > > > > } > > > > } > > > > > > > > // Main > > > > > > > > params plist[]; > > > > plist = readdata("parameters"); > > > > doall(plist); > > > > > > > > // Data File "parameters" follows. Data files listed in it must exist. > > > > // each line is greater than 80 bytes and is only wrapped here by email > > > > // (actual files attached) > > > > > > > > x y r b infilename > > > > outfilename > > > > 1 2 1.234 1 inf001.data > > > > outf001.data > > > > 3 4 5.678 0 inf002.data > > > > outf002.data > > > > > > > > > > > > > > > > > > > > > > > > On 10/10/07 5:23 AM, Ben Clifford wrote: > > > > > Mihael added the below language construct to the language the other > > > > > day. > > > > > > > > > > This might be useful where the csv_mapper was being used before to > > > > > read in non-file data. > > > > > > > > > > Its in the SVN. > > > > > > > > > > > > > > > Mihael Hategan wrote: > > > > > > > > > > There's a new function: readData. It's not an @function, so don't use > > > > > it > > > > > that way because it won't work (it needs to know what variable it > > > > > assigns to, so that it knows how to interpret the contents of the > > > > > file). > > > > > It can read primitive things, arrays of primitive things, structs and > > > > > arrays of structs. > > > > > It can either take a file or a string as a parameter, although I > > > > > recommend the former since it can deal with data dependencies. > > > > > > > > > > For example usage, see tests/language-behaviour/readData.swift. > > > > > > > > > > Here's a short preview: > > > > > type circle { > > > > > int x; > > > > > int y; > > > > > float r; > > > > > string name; > > > > > } > > > > > > > > > > circle ca[]; > > > > > > > > > > ca = readData("readData.circleArray.in"); > > > > > > > > > > readData.circleArray.in: > > > > > x y r name > > > > > 1 1 5 CircleOne > > > > > 2 2 7 CircleTwo > > > > > > > > > > It doesn't deal with spaces in strings in the CSV format for now, but > > > > > it's a start. > > > > > > > > > > Mihael > > > > > > > > > plain text document attachment (t5g.swift) > > > > type file; > > > > > > > > // Simulate encapsulating an app's parameters as a struct > > > > > > > > type params { > > > > int x; > > > > int y; > > > > float r; > > > > boolean b; > > > > string infilename; > > > > string outfilename; > > > > }; > > > > > > > > // Simulate an app > > > > > > > > myapp(params p, file infile, file outfile ) > > > > { app { db "pecho:" p.x p.y p.r p.b p.infilename > > > > p.outfilename @infile @outfile ; > > > > } } // Loop over the parameter array, calling app in parallel > > > > > > > > doall(params plist[]) > > > > { > > > > foreach pval,j in plist { > > > > > > > > // convert filename string to mapped file reference > > > > file infile ; > > > > file outfile ; > > > > > > > > // Call the application > > > > myapp(pval,infile, outfile); > > > > } > > > > } > > > > > > > > // Main > > > > > > > > params plist[]; > > > > plist = readdata("parameters"); > > > > doall(plist); > > > > plain text document attachment (parameters) > > > > x y r b > > > > infilename outfilename > > > > 1 2 1.234 1 > > > > inf001.data outf001.data > > > > 3 4 5.678 0 > > > > inf002.data outf002.data > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From benc at hawaga.org.uk Wed Oct 10 11:12:47 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 16:12:47 +0000 (GMT) Subject: [Swift-devel] readData In-Reply-To: References: <470CDAF6.7000206@mcs.anl.gov> <1192026668.3339.4.camel@blabla.mcs.anl.gov> <470CEBCA.3010809@mcs.anl.gov> <470CF0F1.6070106@mcs.anl.gov> Message-ID: if you want a hack to stop the below, you can say something like: print(@strcat(a,"")) instead of print(a) I think. On Wed, 10 Oct 2007, Ben Clifford wrote: > > You may not be experiencing a race in readData. > > You may instead be experiencing a race in the implementation of print. It > will print out what it knows about its parameters at the time that its > encountered. Sometimes, that is the value (if the value has been set); > sometimes its the other output you see, a description of the dataset > indicating that it doesn't have a value yet. > > Its probably desirable for print to wait for its parameter to be closed. > > On Wed, 10 Oct 2007, Michael Wilde wrote: > > > OK, what seems to be happening is that a simple script using readdata fails > > occasionally. Ie, some kind of race. > > > > Heres the test script and data file: > > > > -- script t30.swift: > > type circle { > > int x; > > int y; > > } > > > > circle ca; > > > > ca = readdata("d1"); > > print(ca.x," ",ca.y); > > > > -- and file d1: > > x y > > 1 10 > > 2 20 > > $ > > > > Running this 6 times, I got 4 successes and 2 failures (of which 5 runs are > > shown here with a few extraneous intevening commands removed): > > > > ... > > RunID: 20071010-1024-1h41p37d > > 1 10 > > $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift > > Swift v0.3-dev r1339 > > > > RunID: 20071010-1024-kiimrf7f > > 1 10 > > $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift > > Swift v0.3-dev r1339 > > > > RunID: 20071010-1024-49v2bnzb > > 1 10 > > $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift > > Swift v0.3-dev r1339 > > > > RunID: 20071010-1025-m0qmf0hc > > 1 10 > > $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift > > Swift v0.3-dev r1339 > > > > RunID: 20071010-1025-cicpxii8 > > org.griphyn.vdl.mapping.DataNode with no value at dataset=ca path=.x > > org.griphyn.vdl.mapping.DataNode with no value at dataset=ca path=.y > > $ > > > > > > > > On 10/10/07 10:12 AM, Michael Wilde wrote: > > > Thats not what I encountered when I tested (which surprised me). > > > I will restest and see what confused me (or my code). > > > > > > - Mike > > > > > > On 10/10/07 9:31 AM, Mihael Hategan wrote: > > > > On Wed, 2007-10-10 at 09:00 -0500, Michael Wilde wrote: > > > > > Mihael, all - readdata() works great, and I think gives Andrew exactly > > > > > what he asked for. > > > > > > > > > > I updated the example parameter-sweep loop to use readdata to grab the > > > > > multi-column input file. > > > > > > > > > > One note: as far as I can tell, you use the conventions the data columns > > > > > must be exactly 16 characters wide, space separated. Is that correct? (I > > > > > assume we'll generalize this time permits). > > > > > > > > No. They must be horizontal-whitespace separated. The 16 characters wide > > > > restriction does not exist. The following is valid: > > > > a b c d > > > > 1 2 3 4 > > > > 5 6 7 8 > > > > 9 10 11 12 > > > > > > > > > Here's the new example, Andrew. > > > > > > > > > > - Mike > > > > > > > > > > type file; > > > > > > > > > > // Simulate encapsulating an app's parameters as a struct > > > > > > > > > > type params { > > > > > int x; > > > > > int y; > > > > > float r; > > > > > boolean b; > > > > > string infilename; > > > > > string outfilename; > > > > > }; > > > > > > > > > > // Simulate an app > > > > > > > > > > myapp(params p, file infile, file outfile ) > > > > > { > > > > > app { > > > > > db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile > > > > > @outfile ; > > > > > } > > > > > } > > > > > > > > > > // Loop over the parameter array, calling app in parallel > > > > > > > > > > doall(params plist[]) > > > > > { > > > > > foreach pval,j in plist { > > > > > > > > > > // convert filename string to mapped file reference > > > > > file infile ; > > > > > file outfile ; > > > > > > > > > > // Call the application > > > > > myapp(pval,infile, outfile); > > > > > } > > > > > } > > > > > > > > > > // Main > > > > > > > > > > params plist[]; > > > > > plist = readdata("parameters"); > > > > > doall(plist); > > > > > > > > > > // Data File "parameters" follows. Data files listed in it must exist. > > > > > // each line is greater than 80 bytes and is only wrapped here by email > > > > > // (actual files attached) > > > > > > > > > > x y r b infilename > > > > > outfilename > > > > > 1 2 1.234 1 inf001.data > > > > > outf001.data > > > > > 3 4 5.678 0 inf002.data > > > > > outf002.data > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 10/10/07 5:23 AM, Ben Clifford wrote: > > > > > > Mihael added the below language construct to the language the other > > > > > > day. > > > > > > > > > > > > This might be useful where the csv_mapper was being used before to > > > > > > read in non-file data. > > > > > > > > > > > > Its in the SVN. > > > > > > > > > > > > > > > > > > Mihael Hategan wrote: > > > > > > > > > > > > There's a new function: readData. It's not an @function, so don't use > > > > > > it > > > > > > that way because it won't work (it needs to know what variable it > > > > > > assigns to, so that it knows how to interpret the contents of the > > > > > > file). > > > > > > It can read primitive things, arrays of primitive things, structs and > > > > > > arrays of structs. > > > > > > It can either take a file or a string as a parameter, although I > > > > > > recommend the former since it can deal with data dependencies. > > > > > > > > > > > > For example usage, see tests/language-behaviour/readData.swift. > > > > > > > > > > > > Here's a short preview: > > > > > > type circle { > > > > > > int x; > > > > > > int y; > > > > > > float r; > > > > > > string name; > > > > > > } > > > > > > > > > > > > circle ca[]; > > > > > > > > > > > > ca = readData("readData.circleArray.in"); > > > > > > > > > > > > readData.circleArray.in: > > > > > > x y r name > > > > > > 1 1 5 CircleOne > > > > > > 2 2 7 CircleTwo > > > > > > > > > > > > It doesn't deal with spaces in strings in the CSV format for now, but > > > > > > it's a start. > > > > > > > > > > > > Mihael > > > > > > > > > > > plain text document attachment (t5g.swift) > > > > > type file; > > > > > > > > > > // Simulate encapsulating an app's parameters as a struct > > > > > > > > > > type params { > > > > > int x; > > > > > int y; > > > > > float r; > > > > > boolean b; > > > > > string infilename; > > > > > string outfilename; > > > > > }; > > > > > > > > > > // Simulate an app > > > > > > > > > > myapp(params p, file infile, file outfile ) > > > > > { app { db "pecho:" p.x p.y p.r p.b p.infilename > > > > > p.outfilename @infile @outfile ; > > > > > } } // Loop over the parameter array, calling app in parallel > > > > > > > > > > doall(params plist[]) > > > > > { > > > > > foreach pval,j in plist { > > > > > > > > > > // convert filename string to mapped file reference > > > > > file infile ; > > > > > file outfile ; > > > > > > > > > > // Call the application > > > > > myapp(pval,infile, outfile); > > > > > } > > > > > } > > > > > > > > > > // Main > > > > > > > > > > params plist[]; > > > > > plist = readdata("parameters"); > > > > > doall(plist); > > > > > plain text document attachment (parameters) > > > > > x y r b > > > > > infilename outfilename > > > > > 1 2 1.234 1 > > > > > inf001.data outf001.data > > > > > 3 4 5.678 0 > > > > > inf002.data outf002.data > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > From wilde at mcs.anl.gov Wed Oct 10 11:28:17 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Oct 2007 11:28:17 -0500 Subject: [Swift-devel] readData In-Reply-To: References: <470CDAF6.7000206@mcs.anl.gov> <1192026668.3339.4.camel@blabla.mcs.anl.gov> <470CEBCA.3010809@mcs.anl.gov> <470CF0F1.6070106@mcs.anl.gov> Message-ID: <470CFDA1.7020907@mcs.anl.gov> On 10/10/07 10:38 AM, Ben Clifford wrote: > You may not be experiencing a race in readData. Indeed. I replaced print() with a function call and it works fine now. > > You may instead be experiencing a race in the implementation of print. It > will print out what it knows about its parameters at the time that its > encountered. Sometimes, that is the value (if the value has been set); > sometimes its the other output you see, a description of the dataset > indicating that it doesn't have a value yet. > > Its probably desirable for print to wait for its parameter to be closed. Right - it should behave like any other function imo. Is that readily fixable? - Mike > > On Wed, 10 Oct 2007, Michael Wilde wrote: > >> OK, what seems to be happening is that a simple script using readdata fails >> occasionally. Ie, some kind of race. >> >> Heres the test script and data file: >> >> -- script t30.swift: >> type circle { >> int x; >> int y; >> } >> >> circle ca; >> >> ca = readdata("d1"); >> print(ca.x," ",ca.y); >> >> -- and file d1: >> x y >> 1 10 >> 2 20 >> $ >> >> Running this 6 times, I got 4 successes and 2 failures (of which 5 runs are >> shown here with a few extraneous intevening commands removed): >> >> ... >> RunID: 20071010-1024-1h41p37d >> 1 10 >> $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift >> Swift v0.3-dev r1339 >> >> RunID: 20071010-1024-kiimrf7f >> 1 10 >> $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift >> Swift v0.3-dev r1339 >> >> RunID: 20071010-1024-49v2bnzb >> 1 10 >> $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift >> Swift v0.3-dev r1339 >> >> RunID: 20071010-1025-m0qmf0hc >> 1 10 >> $ swift -sites.file ./sites.xml -tc.file ./tc.data t30.swift >> Swift v0.3-dev r1339 >> >> RunID: 20071010-1025-cicpxii8 >> org.griphyn.vdl.mapping.DataNode with no value at dataset=ca path=.x >> org.griphyn.vdl.mapping.DataNode with no value at dataset=ca path=.y >> $ >> >> >> >> On 10/10/07 10:12 AM, Michael Wilde wrote: >>> Thats not what I encountered when I tested (which surprised me). >>> I will restest and see what confused me (or my code). >>> >>> - Mike >>> >>> On 10/10/07 9:31 AM, Mihael Hategan wrote: >>>> On Wed, 2007-10-10 at 09:00 -0500, Michael Wilde wrote: >>>>> Mihael, all - readdata() works great, and I think gives Andrew exactly >>>>> what he asked for. >>>>> >>>>> I updated the example parameter-sweep loop to use readdata to grab the >>>>> multi-column input file. >>>>> >>>>> One note: as far as I can tell, you use the conventions the data columns >>>>> must be exactly 16 characters wide, space separated. Is that correct? (I >>>>> assume we'll generalize this time permits). >>>> No. They must be horizontal-whitespace separated. The 16 characters wide >>>> restriction does not exist. The following is valid: >>>> a b c d >>>> 1 2 3 4 >>>> 5 6 7 8 >>>> 9 10 11 12 >>>> >>>>> Here's the new example, Andrew. >>>>> >>>>> - Mike >>>>> >>>>> type file; >>>>> >>>>> // Simulate encapsulating an app's parameters as a struct >>>>> >>>>> type params { >>>>> int x; >>>>> int y; >>>>> float r; >>>>> boolean b; >>>>> string infilename; >>>>> string outfilename; >>>>> }; >>>>> >>>>> // Simulate an app >>>>> >>>>> myapp(params p, file infile, file outfile ) >>>>> { >>>>> app { >>>>> db "pecho:" p.x p.y p.r p.b p.infilename p.outfilename @infile >>>>> @outfile ; >>>>> } >>>>> } >>>>> >>>>> // Loop over the parameter array, calling app in parallel >>>>> >>>>> doall(params plist[]) >>>>> { >>>>> foreach pval,j in plist { >>>>> >>>>> // convert filename string to mapped file reference >>>>> file infile ; >>>>> file outfile ; >>>>> >>>>> // Call the application >>>>> myapp(pval,infile, outfile); >>>>> } >>>>> } >>>>> >>>>> // Main >>>>> >>>>> params plist[]; >>>>> plist = readdata("parameters"); >>>>> doall(plist); >>>>> >>>>> // Data File "parameters" follows. Data files listed in it must exist. >>>>> // each line is greater than 80 bytes and is only wrapped here by email >>>>> // (actual files attached) >>>>> >>>>> x y r b infilename >>>>> outfilename >>>>> 1 2 1.234 1 inf001.data >>>>> outf001.data >>>>> 3 4 5.678 0 inf002.data >>>>> outf002.data >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 10/10/07 5:23 AM, Ben Clifford wrote: >>>>>> Mihael added the below language construct to the language the other >>>>>> day. >>>>>> >>>>>> This might be useful where the csv_mapper was being used before to >>>>>> read in non-file data. >>>>>> >>>>>> Its in the SVN. >>>>>> >>>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>> There's a new function: readData. It's not an @function, so don't use >>>>>> it >>>>>> that way because it won't work (it needs to know what variable it >>>>>> assigns to, so that it knows how to interpret the contents of the >>>>>> file). >>>>>> It can read primitive things, arrays of primitive things, structs and >>>>>> arrays of structs. >>>>>> It can either take a file or a string as a parameter, although I >>>>>> recommend the former since it can deal with data dependencies. >>>>>> >>>>>> For example usage, see tests/language-behaviour/readData.swift. >>>>>> >>>>>> Here's a short preview: >>>>>> type circle { >>>>>> int x; >>>>>> int y; >>>>>> float r; >>>>>> string name; >>>>>> } >>>>>> >>>>>> circle ca[]; >>>>>> >>>>>> ca = readData("readData.circleArray.in"); >>>>>> >>>>>> readData.circleArray.in: >>>>>> x y r name >>>>>> 1 1 5 CircleOne >>>>>> 2 2 7 CircleTwo >>>>>> >>>>>> It doesn't deal with spaces in strings in the CSV format for now, but >>>>>> it's a start. >>>>>> >>>>>> Mihael >>>>>> >>>>> plain text document attachment (t5g.swift) >>>>> type file; >>>>> >>>>> // Simulate encapsulating an app's parameters as a struct >>>>> >>>>> type params { >>>>> int x; >>>>> int y; >>>>> float r; >>>>> boolean b; >>>>> string infilename; >>>>> string outfilename; >>>>> }; >>>>> >>>>> // Simulate an app >>>>> >>>>> myapp(params p, file infile, file outfile ) >>>>> { app { db "pecho:" p.x p.y p.r p.b p.infilename >>>>> p.outfilename @infile @outfile ; >>>>> } } // Loop over the parameter array, calling app in parallel >>>>> >>>>> doall(params plist[]) >>>>> { >>>>> foreach pval,j in plist { >>>>> >>>>> // convert filename string to mapped file reference >>>>> file infile ; >>>>> file outfile ; >>>>> >>>>> // Call the application >>>>> myapp(pval,infile, outfile); >>>>> } >>>>> } >>>>> >>>>> // Main >>>>> >>>>> params plist[]; >>>>> plist = readdata("parameters"); >>>>> doall(plist); >>>>> plain text document attachment (parameters) >>>>> x y r b >>>>> infilename outfilename >>>>> 1 2 1.234 1 >>>>> inf001.data outf001.data >>>>> 3 4 5.678 0 >>>>> inf002.data outf002.data >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > From benc at hawaga.org.uk Wed Oct 10 11:43:06 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 16:43:06 +0000 (GMT) Subject: [Swift-devel] readData In-Reply-To: <470CFDA1.7020907@mcs.anl.gov> References: <470CDAF6.7000206@mcs.anl.gov> <1192026668.3339.4.camel@blabla.mcs.anl.gov> <470CEBCA.3010809@mcs.anl.gov> <470CF0F1.6070106@mcs.anl.gov> <470CFDA1.7020907@mcs.anl.gov> Message-ID: On Wed, 10 Oct 2007, Michael Wilde wrote: > Right - it should behave like any other function imo. Is that readily fixable? yes. -- From andrewj at uchicago.edu Wed Oct 10 12:11:39 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Wed, 10 Oct 2007 12:11:39 -0500 (CDT) Subject: [Swift-devel] Swift Error! Message-ID: <20071010121139.AUU75068@m4500-00.uchicago.edu> Hello all, I am getting the following to happen when I try to run swift: [andrewj at terminable Swifty]$ swift -debug 10-10-07-CADWF.swift 10-10-07-CADWF.swift: source file is new. Recompiling. Validation of XML intermediate file was successful Using sites file: /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/sites.xml Using tc.data: /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/tc.data Swift v0.3 r1319 (modified locally) Swift v0.3 r1319 (modified locally) RunID: 20071010-1204-pvgabtv1 RunID: 20071010-1204-pvgabtv1 Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=malFMatrix Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=malFMatrix Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=malFMatrix Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=malFMatrix Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=feature Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=feature Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=feature You guys are free to look at the swift code located in /home/andrewj/CADGrid/Swifty/ mounted on the ci disk. Suggestions? Thanks, Andrew From benc at hawaga.org.uk Wed Oct 10 12:14:46 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 17:14:46 +0000 (GMT) Subject: [Swift-devel] Re: Swift Error! In-Reply-To: <20071010121139.AUU75068@m4500-00.uchicago.edu> References: <20071010121139.AUU75068@m4500-00.uchicago.edu> Message-ID: none of those messgaes are errors in themselves - they're debugging messages about what is going on inside. Do you mean to report that it then hangs 'forever'? On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: > Hello all, > > I am getting the following to happen when I try to run swift: > > [andrewj at terminable Swifty]$ swift -debug 10-10-07-CADWF.swift > 10-10-07-CADWF.swift: source file is new. Recompiling. > Validation of XML intermediate file was successful > Using sites file: > /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/sites.xml > Using tc.data: > /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/tc.data > Swift v0.3 r1319 (modified locally) > > Swift v0.3 r1319 (modified locally) > > RunID: 20071010-1204-pvgabtv1 > RunID: 20071010-1204-pvgabtv1 > Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > at dataset=malFMatrix > Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > at dataset=malFMatrix > Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > at dataset=malFMatrix > Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > at dataset=malFMatrix > Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > at dataset=feature > Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > at dataset=feature > Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > at dataset=feature > > You guys are free to look at the swift code located in > /home/andrewj/CADGrid/Swifty/ mounted on the ci disk. > > Suggestions? > > Thanks, > Andrew > > From andrewj at uchicago.edu Wed Oct 10 12:16:31 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Wed, 10 Oct 2007 12:16:31 -0500 (CDT) Subject: [Swift-devel] Re: Swift Error! Message-ID: <20071010121631.AUU75753@m4500-00.uchicago.edu> Well, I will give it 10 mins, and let you know. >none of those messgaes are errors in themselves - they're debugging >messages about what is going on inside. > >Do you mean to report that it then hangs 'forever'? > >On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: > >> Hello all, >> >> I am getting the following to happen when I try to run swift: >> >> [andrewj at terminable Swifty]$ swift -debug 10-10-07-CADWF.swift >> 10-10-07-CADWF.swift: source file is new. Recompiling. >> Validation of XML intermediate file was successful >> Using sites file: >> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/sites.xml >> Using tc.data: >> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/tc.data >> Swift v0.3 r1319 (modified locally) >> >> Swift v0.3 r1319 (modified locally) >> >> RunID: 20071010-1204-pvgabtv1 >> RunID: 20071010-1204-pvgabtv1 >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=malFMatrix >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=malFMatrix >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=malFMatrix >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=malFMatrix >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=feature >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=feature >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=feature >> >> You guys are free to look at the swift code located in >> /home/andrewj/CADGrid/Swifty/ mounted on the ci disk. >> >> Suggestions? >> >> Thanks, >> Andrew >> >> From andrewj at uchicago.edu Wed Oct 10 12:17:23 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Wed, 10 Oct 2007 12:17:23 -0500 (CDT) Subject: [Swift-devel] Re: Swift Error! Message-ID: <20071010121723.AUU75899@m4500-00.uchicago.edu> On that note, is there anyway to grab more information in real time as to what is going on with swift? de at mcs.anl.gov>, swiftdevel > >none of those messgaes are errors in themselves - they're debugging >messages about what is going on inside. > >Do you mean to report that it then hangs 'forever'? > >On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: > >> Hello all, >> >> I am getting the following to happen when I try to run swift: >> >> [andrewj at terminable Swifty]$ swift -debug 10-10-07-CADWF.swift >> 10-10-07-CADWF.swift: source file is new. Recompiling. >> Validation of XML intermediate file was successful >> Using sites file: >> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/sites.xml >> Using tc.data: >> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/tc.data >> Swift v0.3 r1319 (modified locally) >> >> Swift v0.3 r1319 (modified locally) >> >> RunID: 20071010-1204-pvgabtv1 >> RunID: 20071010-1204-pvgabtv1 >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=malFMatrix >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=malFMatrix >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=malFMatrix >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=malFMatrix >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=feature >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=feature >> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >> at dataset=feature >> >> You guys are free to look at the swift code located in >> /home/andrewj/CADGrid/Swifty/ mounted on the ci disk. >> >> Suggestions? >> >> Thanks, >> Andrew >> >> From benc at hawaga.org.uk Wed Oct 10 12:17:38 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 17:17:38 +0000 (GMT) Subject: [Swift-devel] Re: Swift Error! In-Reply-To: <20071010121139.AUU75068@m4500-00.uchicago.edu> References: <20071010121139.AUU75068@m4500-00.uchicago.edu> Message-ID: On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: > Swift v0.3 r1319 (modified locally) Mike sent you a more recent swift build - this looks like the 0.3 release code. -- From wilde at mcs.anl.gov Wed Oct 10 12:32:57 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Oct 2007 12:32:57 -0500 Subject: [Swift-devel] Re: Swift Error! In-Reply-To: <20071010121723.AUU75899@m4500-00.uchicago.edu> References: <20071010121723.AUU75899@m4500-00.uchicago.edu> Message-ID: <470D0CC9.3060806@mcs.anl.gov> Andrew, due to Swift's highly parallel nature, debugging is stil tricky. Ben and Mihael may have other suggestions, but I used the debug function (coded as "myapp") in the example I sent you, to print various variable values and to confirm that various functions were entered and exited. Ive got a lot of notes to write up (and userguid text to suggest) related to this, but in the meantime we need to give you a clear understanding of how variables and expression evaluation works in swift. In a nutshell: - every statement in a function is evaluated in data-dependency order, not in source-code order - variable instances can only be set once - expressions, including function calls, that consume values wait until the value is set - references to arrays as objects (ie by using one n a foreach) block until the array is set ("closed") - arrays are closed only when they are returned as the result of a procedure (note: not "array members" = the entire array) - print() does not currently wait for its arguments to close (it will soon: this is a deficiency) (All - please correct/extend this as needed. I will try to write text and examples of each of these points for the userguide/tutorials but please dont wait for me if you get the urge). Until you get the hang of this "data flow" model, writing even simple code can be tricky. Whats worse is that we dont yet have the ability to a) include line numbers in errors once you are out of the swift parser) and b) its hard to tell what you are hanging on when the whole script hangs. Thats why I urged you, and urge you again, Andrew, to start testing the program in small pieces, from the inside working outwards, so that you are building on known working code. This gets quite easy once you develop a solid mental model of how swift behaves, but getting there takes a few days of experimentation (at least it did for me, and Im about 75% there ;) I'll try to help you tomorrow, but others may be able to spot some problems in your source code. - Mike On 10/10/07 12:17 PM, andrewj at uchicago.edu wrote: > On that note, is there anyway to grab more information in real > time as to what is going on with swift? > > de at mcs.anl.gov>, swiftdevel >> none of those messgaes are errors in themselves - they're > debugging >> messages about what is going on inside. >> >> Do you mean to report that it then hangs 'forever'? >> >> On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: >> >>> Hello all, >>> >>> I am getting the following to happen when I try to run swift: >>> >>> [andrewj at terminable Swifty]$ swift -debug 10-10-07-CADWF.swift >>> 10-10-07-CADWF.swift: source file is new. Recompiling. >>> Validation of XML intermediate file was successful >>> Using sites file: >>> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/sites.xml >>> Using tc.data: >>> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/tc.data >>> Swift v0.3 r1319 (modified locally) >>> >>> Swift v0.3 r1319 (modified locally) >>> >>> RunID: 20071010-1204-pvgabtv1 >>> RunID: 20071010-1204-pvgabtv1 >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=malFMatrix >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=malFMatrix >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=malFMatrix >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=malFMatrix >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=feature >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=feature >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=feature >>> >>> You guys are free to look at the swift code located in >>> /home/andrewj/CADGrid/Swifty/ mounted on the ci disk. >>> >>> Suggestions? >>> >>> Thanks, >>> Andrew >>> >>> > > From wilde at mcs.anl.gov Wed Oct 10 12:44:34 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Oct 2007 12:44:34 -0500 Subject: [Swift-devel] Re: Swift Error! In-Reply-To: <20071010121631.AUU75753@m4500-00.uchicago.edu> References: <20071010121631.AUU75753@m4500-00.uchicago.edu> Message-ID: <470D0F82.4070404@mcs.anl.gov> You should try a test of everything running on localhost, with the smallest possible dataset sizes and runtimes (either set your params to achieve this or replace apps with quick dummies). This will help ensure that your script has basic data-flow sanity. as Ben points out, also, if you are going to try readdata you need to use the 1339 release I sent, or higher. The instructions to fetch and build the source are on the Download page, but for now I can post compiled devel snapshots for you, or you can grab from the nightly builds. - Mike On 10/10/07 12:16 PM, andrewj at uchicago.edu wrote: > Well, I will give it 10 mins, and let you know. > > >> none of those messgaes are errors in themselves - they're > debugging >> messages about what is going on inside. >> >> Do you mean to report that it then hangs 'forever'? >> >> On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: >> >>> Hello all, >>> >>> I am getting the following to happen when I try to run swift: >>> >>> [andrewj at terminable Swifty]$ swift -debug 10-10-07-CADWF.swift >>> 10-10-07-CADWF.swift: source file is new. Recompiling. >>> Validation of XML intermediate file was successful >>> Using sites file: >>> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/sites.xml >>> Using tc.data: >>> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/tc.data >>> Swift v0.3 r1319 (modified locally) >>> >>> Swift v0.3 r1319 (modified locally) >>> >>> RunID: 20071010-1204-pvgabtv1 >>> RunID: 20071010-1204-pvgabtv1 >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=malFMatrix >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=malFMatrix >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=malFMatrix >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=malFMatrix >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=feature >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=feature >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>> at dataset=feature >>> >>> You guys are free to look at the swift code located in >>> /home/andrewj/CADGrid/Swifty/ mounted on the ci disk. >>> >>> Suggestions? >>> >>> Thanks, >>> Andrew >>> >>> > > From hategan at mcs.anl.gov Wed Oct 10 12:45:55 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 10 Oct 2007 12:45:55 -0500 Subject: [Swift-devel] Re: Swift Error! In-Reply-To: <470D0CC9.3060806@mcs.anl.gov> References: <20071010121723.AUU75899@m4500-00.uchicago.edu> <470D0CC9.3060806@mcs.anl.gov> Message-ID: <1192038355.15115.1.camel@blabla.mcs.anl.gov> On Wed, 2007-10-10 at 12:32 -0500, Michael Wilde wrote: > - references to arrays as objects (ie by using one n a foreach) block > until the array is set ("closed") Approximately. The foreach will start iterating on items as soon as they are available. The whole foreach will only complete when it's known that no more items will be added to an array (aka. the array is "closed"). > (All - please correct/extend this as needed. I will try to write text > and examples of each of these points for the userguide/tutorials but > please dont wait for me if you get the urge). > > Until you get the hang of this "data flow" model, writing even simple > code can be tricky. > > Whats worse is that we dont yet have the ability to a) include line > numbers in errors once you are out of the swift parser) and b) its hard > to tell what you are hanging on when the whole script hangs. > > Thats why I urged you, and urge you again, Andrew, to start testing the > program in small pieces, from the inside working outwards, so that you > are building on known working code. > > This gets quite easy once you develop a solid mental model of how swift > behaves, but getting there takes a few days of experimentation (at least > it did for me, and Im about 75% there ;) > > I'll try to help you tomorrow, but others may be able to spot some > problems in your source code. > > - Mike > > > On 10/10/07 12:17 PM, andrewj at uchicago.edu wrote: > > On that note, is there anyway to grab more information in real > > time as to what is going on with swift? > > > > de at mcs.anl.gov>, swiftdevel > >> none of those messgaes are errors in themselves - they're > > debugging > >> messages about what is going on inside. > >> > >> Do you mean to report that it then hangs 'forever'? > >> > >> On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: > >> > >>> Hello all, > >>> > >>> I am getting the following to happen when I try to run swift: > >>> > >>> [andrewj at terminable Swifty]$ swift -debug 10-10-07-CADWF.swift > >>> 10-10-07-CADWF.swift: source file is new. Recompiling. > >>> Validation of XML intermediate file was successful > >>> Using sites file: > >>> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/sites.xml > >>> Using tc.data: > >>> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/tc.data > >>> Swift v0.3 r1319 (modified locally) > >>> > >>> Swift v0.3 r1319 (modified locally) > >>> > >>> RunID: 20071010-1204-pvgabtv1 > >>> RunID: 20071010-1204-pvgabtv1 > >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > >>> at dataset=malFMatrix > >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > >>> at dataset=malFMatrix > >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > >>> at dataset=malFMatrix > >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > >>> at dataset=malFMatrix > >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > >>> at dataset=feature > >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > >>> at dataset=feature > >>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value > >>> at dataset=feature > >>> > >>> You guys are free to look at the swift code located in > >>> /home/andrewj/CADGrid/Swifty/ mounted on the ci disk. > >>> > >>> Suggestions? > >>> > >>> Thanks, > >>> Andrew > >>> > >>> > > > > > From andrewj at uchicago.edu Wed Oct 10 13:36:10 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Wed, 10 Oct 2007 13:36:10 -0500 (CDT) Subject: [Swift-devel] Re: Swift Error! Message-ID: <20071010133610.AUU90733@m4500-00.uchicago.edu> Mike, Thanks for the insight. You are right. I do need to attack this piece-wise. I guess I was hoping to get lucky! I think it is just the time constraint that is scarring me. Here goes it! Thanks, Andrew > >Andrew, due to Swift's highly parallel nature, debugging is stil tricky. > >Ben and Mihael may have other suggestions, but I used the debug function > (coded as "myapp") in the example I sent you, to print various >variable values and to confirm that various functions were entered and >exited. > >Ive got a lot of notes to write up (and userguid text to suggest) >related to this, but in the meantime we need to give you a clear >understanding of how variables and expression evaluation works in swift. > >In a nutshell: > >- every statement in a function is evaluated in data-dependency order, >not in source-code order >- variable instances can only be set once >- expressions, including function calls, that consume values wait until >the value is set >- references to arrays as objects (ie by using one n a foreach) block >until the array is set ("closed") >- arrays are closed only when they are returned as the result of a >procedure (note: not "array members" = the entire array) >- print() does not currently wait for its arguments to close (it will >soon: this is a deficiency) > >(All - please correct/extend this as needed. I will try to write text >and examples of each of these points for the userguide/tutorials but >please dont wait for me if you get the urge). > >Until you get the hang of this "data flow" model, writing even simple >code can be tricky. > >Whats worse is that we dont yet have the ability to a) include line >numbers in errors once you are out of the swift parser) and b) its hard >to tell what you are hanging on when the whole script hangs. > >Thats why I urged you, and urge you again, Andrew, to start testing the >program in small pieces, from the inside working outwards, so that you >are building on known working code. > >This gets quite easy once you develop a solid mental model of how swift >behaves, but getting there takes a few days of experimentation (at least >it did for me, and Im about 75% there ;) > >I'll try to help you tomorrow, but others may be able to spot some >problems in your source code. > >- Mike > > >On 10/10/07 12:17 PM, andrewj at uchicago.edu wrote: >> On that note, is there anyway to grab more information in real >> time as to what is going on with swift? >> >> de at mcs.anl.gov>, swiftdevel >>> none of those messgaes are errors in themselves - they're >> debugging >>> messages about what is going on inside. >>> >>> Do you mean to report that it then hangs 'forever'? >>> >>> On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: >>> >>>> Hello all, >>>> >>>> I am getting the following to happen when I try to run swift: >>>> >>>> [andrewj at terminable Swifty]$ swift -debug 10-10-07-CADWF.swift >>>> 10-10-07-CADWF.swift: source file is new. Recompiling. >>>> Validation of XML intermediate file was successful >>>> Using sites file: >>>> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/sites.xml >>>> Using tc.data: >>>> /home/andrewj/CADGrid/Swifty/vdsk-0.3/bin/../etc/tc.data >>>> Swift v0.3 r1319 (modified locally) >>>> >>>> Swift v0.3 r1319 (modified locally) >>>> >>>> RunID: 20071010-1204-pvgabtv1 >>>> RunID: 20071010-1204-pvgabtv1 >>>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>>> at dataset=malFMatrix >>>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>>> at dataset=malFMatrix >>>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>>> at dataset=malFMatrix >>>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>>> at dataset=malFMatrix >>>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>>> at dataset=feature >>>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>>> at dataset=feature >>>> Waiting for org.griphyn.vdl.mapping.RootDataNode with no value >>>> at dataset=feature >>>> >>>> You guys are free to look at the swift code located in >>>> /home/andrewj/CADGrid/Swifty/ mounted on the ci disk. >>>> >>>> Suggestions? >>>> >>>> Thanks, >>>> Andrew >>>> >>>> >> >> From benc at hawaga.org.uk Wed Oct 10 17:06:54 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 22:06:54 +0000 (GMT) Subject: [Swift-devel] data dependency guts Message-ID: At present, procedures and the main program are compiled to karajan code in two sections: * a declaration section * a statements section When a variable is declared, the kml code to make a karjan-level variable of the same name goes into the declaration section. Non-variable declaration code such as procedure calls or foreach loops go into the statements sections. Statements which set the value of variables appear in either the declaration section or the statements section, depending on their particular nature (for example, initialisation with expressions goes into the declaration section; initialisation with the return value of a procedure goes into the statements section). When this code is executed in karajan, first all the declarations are executed in sequence; then when all of those have finished, all of the statements are executed in parallel. If there are data dependencies, those will cause an ordering of the parallel statements in the statements block such that they don't actually execute in parallel, but are instead ordered by their data dependencies. That data dependency management in the statements section happens through the variables which are declared in the declarations section; when a declaration happens for a variable that doesn't have an initial value, that variable instead stores (amongst other things) an indication that it doesn't have a value (yet). Data dependency ordering of execution will only happen for statements in the statements section, not for anything in the declaration section. Mapper declarations also go in the declaration section. This means that their parameters do not participate in data dependency ordering. For example, you can't say this in the present (r1339) implementation: type file; string s; file f ; s="foo"; This will fail (with an exception) because some initialisation will happen (or rather will attempt to happen) for f strictly before s is given a value (by strictly before, I mean always before, rather than a race to perhaps be before, perhaps after). There does not seem to be an immediately correct easy solution to this. One idea I have toyed with a little is doing more compile-time analysis of dataflow to generalise the 'declaration block -> statement block' serialisation, so that there is more serialised/parallel specification in the kml. Another is to put add a new concept of 'not mapped yet' to datasets; and allow the same dataflow-ordered execution model that populates values also be used to populate mapping configuration. -- From wilde at mcs.anl.gov Wed Oct 10 17:29:35 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Oct 2007 17:29:35 -0500 Subject: [Swift-devel] data dependency guts In-Reply-To: References: Message-ID: <470D524F.5070006@mcs.anl.gov> This is great info and Im still digesting it. Lets put it in a design doc/page. My first question is, regarding: type file; string s; file f ; s="foo"; Why cant this execute as if it were: type file; string s; string f; file=s; s="foo"; I havent tried this yet, but in this case where f is a string rather than file, swift would swift not sort out the data dependencies correctly? The point is that mapping a file to a var is (or should/could be) very much like initializing it. regular vars have a type, a state (set/unset) and a value (if set); file vars have a type, a state, a mapping, and a value. The problem we have is both that file vars can only get their mapping in a specific and limited way (through the <> declaration) and that this mapping is not processed like any other swift expression (else it would sort out the data flow correctly). - Mike On 10/10/07 5:06 PM, Ben Clifford wrote: > At present, procedures and the main program are compiled to karajan code > in two sections: > > * a declaration section > * a statements section > > When a variable is declared, the kml code to make a karjan-level variable > of the same name goes into the declaration section. > > Non-variable declaration code such as procedure calls or foreach loops go > into the statements sections. > > Statements which set the value of variables appear in either the > declaration section or the statements section, depending on their > particular nature (for example, initialisation with expressions goes into > the declaration section; initialisation with the return value of a > procedure goes into the statements section). > > When this code is executed in karajan, first all the declarations are > executed in sequence; then when all of those have finished, all of the > statements are executed in parallel. > > If there are data dependencies, those will cause an ordering of the > parallel statements in the statements block such that they don't actually > execute in parallel, but are instead ordered by their data dependencies. > > That data dependency management in the statements section happens through > the variables which are declared in the declarations section; when a > declaration happens for a variable that doesn't have an initial value, > that variable instead stores (amongst other things) an indication that it > doesn't have a value (yet). > > Data dependency ordering of execution will only happen for statements in > the statements section, not for anything in the declaration section. > > Mapper declarations also go in the declaration section. This means that > their parameters do not participate in data dependency ordering. > > For example, you can't say this in the present (r1339) implementation: > > type file; > string s; > file f ; > s="foo"; > > This will fail (with an exception) because some initialisation will happen > (or rather will attempt to happen) for f strictly before s is given a > value (by strictly before, I mean always before, rather than a race to > perhaps be before, perhaps after). > > There does not seem to be an immediately correct easy solution to this. > > One idea I have toyed with a little is doing more compile-time analysis of > dataflow to generalise the 'declaration block -> statement block' > serialisation, so that there is more serialised/parallel specification in > the kml. > > Another is to put add a new concept of 'not mapped yet' to datasets; and > allow the same dataflow-ordered execution model that populates values also > be used to populate mapping configuration. > From andrewj at uchicago.edu Wed Oct 10 17:48:34 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Wed, 10 Oct 2007 17:48:34 -0500 (CDT) Subject: [Swift-devel] Swift Error! Round 2 Message-ID: <20071010174834.AUV34036@m4500-00.uchicago.edu> Hi all, I am back after a very long day of debugging! Ok, I think I have isolated a problem with Swift. I have slowly worked part by part up to this point in my WF. It appears to be a problem related to passing arrays of things into the transformation or whatever its called. I have tested SegNClass already, it works fine with the mappers and everything. The problem seems to be involved with SegNClassRun or the array types I am feeding it to use. Please look at this portion of the code: (it is also in the file /home/andrewj/CADGrid/Swifty/AphexTwin.swift) ***************************************** (Feature feature) SegNClass (ROI roi, ParamsFile SegInputParams, ReqFile rfile1, ReqFile rfile2, ParamsFile FeatInputParams,int SegNum, int FeatNum, int ClassNum) { Contour contour ; contour = Segement_RGI_File(roi,SegInputParams,rfile1,rfile2); feature = MassClassify(roi,contour,FeatInputParams); } (Feature fs[]) SegNClassRun (ROI ROIs[],ParamsFile SegInputParams, ReqFile rfile1, ReqFile rfile2, ParamsFile FeatInputParams,int SegNum, int FeatNum, int ClassNum) { ROI r; foreach r, i in ROIs{ Feature feature; feature = SegNClass(ROIs[i],SegInputParams,rfile1,rfile2,FeatInputParams,SegNum,FeatNum,ClassNum); fs[i] = feature; } } ReqFile rFileRGI1<"TESTS/etc/SegNExtract.params">; ReqFile rFileRGI2<"TESTS/etc/feat-names.lst">; #ParamsFile inputParamsRGI; ParamsFile SegSweeps[] ; ParamsFile FeatSweeps[] ; #ParamsFile LDASettings ; ROI MalROIs[] ; ROI BenROIs[] ; Feature malFeats[],benFeats[]; malFeats = SegNClassRun(MalROIs,SegSweeps[1],rFileRGI1,rFileRGI2,FeatSweeps[1],1,1,1); benFeats = SegNClassRun(BenROIs,SegSweeps[1],rFileRGI1,rFileRGI2,FeatSweeps[1],1,1,1); FeatureMatrix malFMatrix ; FeatureMatrix benFMatrix ; malFMatrix = paste(malFeats); benFMatrix = paste(benFeats); ************************************************ This code gives the hanging again and the statement: RunID: 20071010-1738-hyvdrd04 Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=feature Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=feature Waiting for org.griphyn.vdl.mapping.RootDataNode with no value at dataset=feature Thanks, Andrew From benc at hawaga.org.uk Wed Oct 10 17:57:43 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 22:57:43 +0000 (GMT) Subject: [Swift-devel] Re: Swift Error! Round 2 In-Reply-To: <20071010174834.AUV34036@m4500-00.uchicago.edu> References: <20071010174834.AUV34036@m4500-00.uchicago.edu> Message-ID: On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: > Ok, I think I have isolated a problem with Swift. I have > slowly worked part by part up to this point in my WF. It > appears to be a problem related to passing arrays of things > into the transformation or whatever its called. What was the step before that did work? > The problem seems to be involved with SegNClassRun or the > array types I am feeding it to use. Your use of the feature variable there is a bit weird - I'm not sure what it will actually do off the top of my head: > foreach r, i in ROIs{ > Feature > feature; > feature = > SegNClass(ROIs[i],SegInputParams,rfile1,rfile2,FeatInputParams,SegNum,FeatNum,ClassNum); > fs[i] = feature; > } can you say this: foreach r, i in ROIs { fs[i] = SegNClass(...) } without that variable at all? I'm not sure if thats the problem here but thats more like the usage I would make. -- From andrewj at uchicago.edu Wed Oct 10 18:15:59 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Wed, 10 Oct 2007 18:15:59 -0500 (CDT) Subject: [Swift-devel] Re: Swift Error! Round 2 Message-ID: <20071010181559.AUV36793@m4500-00.uchicago.edu> >> Ok, I think I have isolated a problem with Swift. I have >> slowly worked part by part up to this point in my WF. It >> appears to be a problem related to passing arrays of things >> into the transformation or whatever its called. > >What was the step before that did work? SegNClass by itself with a foreach loop. > >> The problem seems to be involved with SegNClassRun or the >> array types I am feeding it to use. > >Your use of the feature variable there is a bit weird - I'm not sure what >it will actually do off the top of my head: > I got this concept from some of Yong's code from the old workflow. The main idea hear is to avoid the problem of too many files (including the temporary swift files) from landing in one directory.) >> foreach r, i in ROIs{ >> Feature >> feature; >> feature = >> SegNClass(ROIs[i],SegInputParams,rfile1,rfile2,FeatInputParams,SegNum,FeatNum,ClassNum); >> fs[i] = feature; >> } > >can you say this: > > foreach r, i in ROIs { > fs[i] = SegNClass(...) > } > >without that variable at all? > >I'm not sure if thats the problem here but thats more like the usage I >would make. This does work, but like I mentioned above, this could be bad when we seriously scale up to thousands of little output files coming and going in the same place. > >-- From benc at hawaga.org.uk Wed Oct 10 18:18:22 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 23:18:22 +0000 (GMT) Subject: [Swift-devel] Re: Swift Error! Round 2 In-Reply-To: <20071010181559.AUV36793@m4500-00.uchicago.edu> References: <20071010181559.AUV36793@m4500-00.uchicago.edu> Message-ID: On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: > > foreach r, i in ROIs { > > fs[i] = SegNClass(...) > > } > This does work, but like I mentioned above, this could be > bad when we seriously scale up to thousands of little output > files coming and going in the same place. Do it the way shown above. For 'serious scaling up', you probably need to put a different mapper on the output array to cause it to put files in different directories. -- From wilde at mcs.anl.gov Wed Oct 10 18:45:14 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Oct 2007 18:45:14 -0500 Subject: [Swift-devel] Re: Swift Error! Round 2 In-Reply-To: References: <20071010174834.AUV34036@m4500-00.uchicago.edu> Message-ID: <470D640A.4090403@mcs.anl.gov> Ben, what do you see thats weird about the use of the feature var? It seems to be set from the return value of SegNClass, then assigned to an array member. Thats not a data dependency problem, is it? If this causes the function to hang, I'd like to try to isolate this into a simple test case. Is the reason obvious to you? This also makes me want to better understand something Mihael described to me about how mappers recurse to traverse a data structure. Is it possible to apply a mapper to the entire array fs[] to specify a specific pattern for the names of the output files that are placed in it? - Mike On 10/10/07 5:57 PM, Ben Clifford wrote: > > On Wed, 10 Oct 2007, andrewj at uchicago.edu wrote: > >> Ok, I think I have isolated a problem with Swift. I have >> slowly worked part by part up to this point in my WF. It >> appears to be a problem related to passing arrays of things >> into the transformation or whatever its called. > > What was the step before that did work? > >> The problem seems to be involved with SegNClassRun or the >> array types I am feeding it to use. > > Your use of the feature variable there is a bit weird - I'm not sure what > it will actually do off the top of my head: > >> foreach r, i in ROIs{ >> Feature >> feature; >> feature = >> SegNClass(ROIs[i],SegInputParams,rfile1,rfile2,FeatInputParams,SegNum,FeatNum,ClassNum); >> fs[i] = feature; >> } > > can you say this: > > foreach r, i in ROIs { > fs[i] = SegNClass(...) > } > > without that variable at all? > > I'm not sure if thats the problem here but thats more like the usage I > would make. > From benc at hawaga.org.uk Wed Oct 10 18:50:56 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Oct 2007 23:50:56 +0000 (GMT) Subject: [Swift-devel] Re: Swift Error! Round 2 In-Reply-To: <470D640A.4090403@mcs.anl.gov> References: <20071010174834.AUV34036@m4500-00.uchicago.edu> <470D640A.4090403@mcs.anl.gov> Message-ID: On Wed, 10 Oct 2007, Michael Wilde wrote: > If this causes the function to hang, I'd like to try to isolate this into a > simple test case. > > Is the reason obvious to you? You've got stuff put in one place - a file, mapped by the mapper assigned to the feature variable. Then you use = to 'assign' that to another place - a different file, mapped by the mapper assigned to the array. What does that mean? make a copy of the file? Its not defined in the present implementation. Two options: one is properly prohibit this usage of = and give a proper error message; another is to copy the RHS file to the LHS. But that's probably messy in the case of more elaborate structures than single files. > Is it possible to apply a mapper to the entire array fs[] to specify a > specific pattern for the names of the output files that are placed in > it? There is a mapper assigned to the entire array - its implicit because it isn't specified in the SwiftScript source, and will give filenames that have a particular base name and also the array index of the relevant array element. You can attach whatever mapper you want, though for the behaviour of spreading files between multiple directories, none of the default mappers will do. -- From hategan at mcs.anl.gov Wed Oct 10 18:54:03 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 10 Oct 2007 18:54:03 -0500 Subject: [Swift-devel] data dependency guts In-Reply-To: References: Message-ID: <1192060443.19072.29.camel@blabla.mcs.anl.gov> On Wed, 2007-10-10 at 22:06 +0000, Ben Clifford wrote: > There does not seem to be an immediately correct easy solution to this. > > One idea I have toyed with a little is doing more compile-time analysis of > dataflow to generalise the 'declaration block -> statement block' > serialisation, so that there is more serialised/parallel specification in > the kml. > > Another is to put add a new concept of 'not mapped yet' to datasets; and > allow the same dataflow-ordered execution model that populates values also > be used to populate mapping configuration. Yep, mappers should, too, be futures. We can actually rewrite the whole thing in the following way (in a pseudo-karajan language): Assume File f = proc("blabla"); f := vdl:new(File) f.setMapper(future(vdl:newMapper(type, args))) f.getMapper().setValue(future(proc("blabla"))) What's currently happening is that all these things are created as futures to begin with and parallel() is used for actually running them in parallel. This complicates the implementation of all the Swift data stuff. I think the above would be more consistent. From hategan at mcs.anl.gov Wed Oct 10 19:04:08 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 10 Oct 2007 19:04:08 -0500 Subject: [Swift-devel] data dependency guts In-Reply-To: <1192060443.19072.29.camel@blabla.mcs.anl.gov> References: <1192060443.19072.29.camel@blabla.mcs.anl.gov> Message-ID: <1192061048.19072.36.camel@blabla.mcs.anl.gov> On Wed, 2007-10-10 at 18:54 -0500, Mihael Hategan wrote: > On Wed, 2007-10-10 at 22:06 +0000, Ben Clifford wrote: > > > There does not seem to be an immediately correct easy solution to this. > > > > One idea I have toyed with a little is doing more compile-time analysis of > > dataflow to generalise the 'declaration block -> statement block' > > serialisation, so that there is more serialised/parallel specification in > > the kml. > > > > Another is to put add a new concept of 'not mapped yet' to datasets; and > > allow the same dataflow-ordered execution model that populates values also > > be used to populate mapping configuration. > > Yep, mappers should, too, be futures. > > We can actually rewrite the whole thing in the following way (in a > pseudo-karajan language): > > Assume File f = proc("blabla"); > > f := vdl:new(File) > f.setMapper(future(vdl:newMapper(type, args))) > f.getMapper().setValue(future(proc("blabla"))) > > What's currently happening is that all these things are created as > futures to begin with and parallel() is used for actually running them > in parallel. This complicates the implementation of all the Swift data > stuff. > > I think the above would be more consistent. Or we could do it implicitly which would reduce the number of changes we have to do and the amount of troubles. I think it might be a bit too late for major changes like that. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Thu Oct 11 09:40:02 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 11 Oct 2007 14:40:02 +0000 (GMT) Subject: [Swift-devel] readData In-Reply-To: References: <470CDAF6.7000206@mcs.anl.gov> <1192026668.3339.4.camel@blabla.mcs.anl.gov> <470CEBCA.3010809@mcs.anl.gov> <470CF0F1.6070106@mcs.anl.gov> Message-ID: On Wed, 10 Oct 2007, Ben Clifford wrote: > Its probably desirable for print to wait for its parameter to be closed. r1343 introduces a trace procedure. Use it in a similar fashion to print. It will wait for all of its arguments to be marked closed, and then send them through the logging mechanism so that the end up both on the stdout of the executing process and also in the log file. For example: (string r) p() { r="foo"; } string s; trace(s); s=p(); -- From benc at hawaga.org.uk Fri Oct 12 03:12:12 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 12 Oct 2007 08:12:12 +0000 (GMT) Subject: [Swift-devel] Re: Swift Error! Round 2 In-Reply-To: References: <20071010174834.AUV34036@m4500-00.uchicago.edu> <470D640A.4090403@mcs.anl.gov> Message-ID: On Wed, 10 Oct 2007, Ben Clifford wrote: > > If this causes the function to hang, I'd like to try to isolate this into a > > simple test case. > > > > Is the reason obvious to you? Here is a program that does the same: type file; file a <"foo">; file b <"bar">; a = b; > You've got stuff put in one place - a file, mapped by the mapper assigned > to the feature variable. > > Then you use = to 'assign' that to another place - a different file, > mapped by the mapper assigned to the array. > > What does that mean? make a copy of the file? Its not defined in the > present implementation. -- From benc at hawaga.org.uk Fri Oct 12 04:33:40 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 12 Oct 2007 09:33:40 +0000 (GMT) Subject: [Swift-devel] labelling things as new in the docs Message-ID: In the docs, I have started labelling things introduced after 0.3 as 'new in 0.4', on the basis that the first main release that these features will arrive is 0.4. If you are an SVN user, this actually means 'introduced sometime between 0.3 and the not-yet-existing 0.4; the latest SVN should have that feature'. This is an attempt to maintain one set of documentation that works for both the present release and the present SVN head. -- From andrewj at uchicago.edu Fri Oct 12 14:31:01 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Fri, 12 Oct 2007 14:31:01 -0500 (CDT) Subject: [Swift-devel] Grid Identity Mapping Message-ID: <20071012143101.AUY34654@m4500-00.uchicago.edu> Hello folks, I think I need to be mapped or whatever it is called to Teraport. Is this true? Thanks, Andrew See below. *********************** Execution failed: Could not initialize shared directory on teraport Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error communicating with the GridFTP server Caused by: Server refused performing the request. Custom message: Bad password. (error code 1) [Nested exception message: Custom message: Unexpected reply: 530-Login incorrect. : gridmap.c:globus_gss_assist_map_and_authorize:1910: 530-Error invoking callout 530-globus_callout.c:globus_callout_handle_call_type:727: 530-The callout returned an error 530-prima_module.c:Globus Gridmap Callout:430: 530-Gridmap lookup failure: Could not retrieve mapping for /DC=org/DC=doegrids/OU=People/CN=Andrew Jamieson 732984 from identity mapping server 530- 530 End.] RGI_File_sh failed END_FAILURE thread=0-1-2-0 tr=RGI_File_sh Swift finished - workflow had errors From hategan at mcs.anl.gov Fri Oct 12 14:33:28 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 12 Oct 2007 14:33:28 -0500 Subject: [Swift-devel] Re: Grid Identity Mapping In-Reply-To: <20071012143101.AUY34654@m4500-00.uchicago.edu> References: <20071012143101.AUY34654@m4500-00.uchicago.edu> Message-ID: <1192217608.32130.0.camel@blabla.mcs.anl.gov> http://www.ci.uchicago.edu/wiki/bin/view/Teraport/UserSupport#Updating_Grid_or_SSH_Credentials On Fri, 2007-10-12 at 14:31 -0500, andrewj at uchicago.edu wrote: > Hello folks, > > I think I need to be mapped or whatever it is called to > Teraport. > > Is this true? > > Thanks, > Andrew > > See below. > *********************** > > Execution failed: > Could not initialize shared directory on teraport > Caused by: > > org.globus.cog.abstraction.impl.file.FileResourceException: > Error communicating with the GridFTP server > Caused by: > Server refused performing the request. Custom message: > Bad password. (error code 1) [Nested exception message: > Custom message: Unexpected reply: 530-Login incorrect. : > gridmap.c:globus_gss_assist_map_and_authorize:1910: > 530-Error invoking callout > 530-globus_callout.c:globus_callout_handle_call_type:727: > 530-The callout returned an error > 530-prima_module.c:Globus Gridmap Callout:430: > 530-Gridmap lookup failure: Could not retrieve mapping for > /DC=org/DC=doegrids/OU=People/CN=Andrew Jamieson 732984 from > identity mapping server > 530- > 530 End.] > RGI_File_sh failed > END_FAILURE thread=0-1-2-0 tr=RGI_File_sh > Swift finished - workflow had errors > From bugzilla-daemon at mcs.anl.gov Sat Oct 13 14:35:16 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 13 Oct 2007 14:35:16 -0500 (CDT) Subject: [Swift-devel] [Bug 102] workflow failes due to file cache duplicates In-Reply-To: Message-ID: <20071013193516.5CC32164BC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=102 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from hategan at mcs.anl.gov 2007-10-13 14:35 ------- This was fixed in r1331, but I forgot to update the bug. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sat Oct 13 14:37:31 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 13 Oct 2007 14:37:31 -0500 (CDT) Subject: [Swift-devel] [Bug 92] URIs in mappers In-Reply-To: Message-ID: <20071013193731.D2445164BC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=92 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #2 from hategan at mcs.anl.gov 2007-10-13 14:37 ------- No further complaints seen. Reopen if necessary. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sat Oct 13 14:37:32 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 13 Oct 2007 14:37:32 -0500 (CDT) Subject: [Swift-devel] [Bug 93] document URIs in mappers In-Reply-To: Message-ID: <20071013193732.0B02316505@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=93 Bug 93 depends on bug 92, which changed state. Bug 92 Summary: URIs in mappers http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=92 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Status|ASSIGNED |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Sat Oct 13 14:40:10 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 13 Oct 2007 14:40:10 -0500 (CDT) Subject: [Swift-devel] [Bug 79] execute cleanup jobs through different mechanism to 'bulk' jobs In-Reply-To: Message-ID: <20071013194010.E9123164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=79 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from hategan at mcs.anl.gov 2007-10-13 14:40 ------- Implemented the batch thing in r1259. Must check if it does not kill the job with the local provider. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Sat Oct 13 14:43:48 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 13 Oct 2007 14:43:48 -0500 (CDT) Subject: [Swift-devel] [Bug 104] Add cert request tools to swift/bin In-Reply-To: Message-ID: <20071013194348.B984C164BC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=104 ------- Comment #2 from hategan at mcs.anl.gov 2007-10-13 14:43 ------- I agree with Ben. Our goal should not be to repackage other software unless absolutely necessary. Furthermore, DOE certs require the requester to use a browser and does not, as far as I know, have any kind of "cert-request" tool for user certificates. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Sat Oct 13 14:48:33 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 13 Oct 2007 14:48:33 -0500 (CDT) Subject: [Swift-devel] [Bug 63] env profile doesn't work In-Reply-To: Message-ID: <20071013194833.2F2D7164BC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=63 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from hategan at mcs.anl.gov 2007-10-13 14:48 ------- Fixed in r1323. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Sun Oct 14 03:50:11 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 14 Oct 2007 03:50:11 -0500 (CDT) Subject: [Swift-devel] [Bug 104] Add cert request tools to swift/bin In-Reply-To: Message-ID: <20071014085011.4BDBE164BB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=104 ------- Comment #3 from benc at hawaga.org.uk 2007-10-14 03:50 ------- There is such a tool! Its a well kept secret, though. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Sun Oct 14 21:20:19 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 14 Oct 2007 21:20:19 -0500 (CDT) Subject: [Swift-devel] [Bug 104] Add cert request tools to swift/bin In-Reply-To: Message-ID: <20071015022019.2C3C1164BC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=104 ------- Comment #4 from hategan at mcs.anl.gov 2007-10-14 21:20 ------- Yeah. Gregor also had a student who wrote such a tool. Point is we should instead point to the DOE cert site in the documentation. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Mon Oct 15 04:55:58 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 15 Oct 2007 04:55:58 -0500 (CDT) Subject: [Swift-devel] [Bug 104] Add cert request tools to swift/bin In-Reply-To: Message-ID: <20071015095558.01D5E164BC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=104 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #5 from benc at hawaga.org.uk 2007-10-15 04:55 ------- Should also document CA requests for teragrid - they have a bunch of CAs that are (if you have a teragrid account) all much easier to use than OSG, which is the main place I encounter the DOE CA. I'll do add those. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Mon Oct 15 04:57:11 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 15 Oct 2007 04:57:11 -0500 (CDT) Subject: [Swift-devel] [Bug 99] Out of date guide In-Reply-To: Message-ID: <20071015095711.91F16164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=99 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From wilde at mcs.anl.gov Tue Oct 23 14:07:32 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 23 Oct 2007 14:07:32 -0500 Subject: [Swift-devel] Loging still messed up after Falkon provider installed Message-ID: <471E4674.40405@mcs.anl.gov> I installed the deef provider per the instructions in the Falkon SVN root, and my swift .log file is no longer produced. There was much discussion on this on the lists; was it never fixed, or did it break again? My non-working log4j.properties file - what I had after installing the deef provider - is below. Below that is what I added from one of Mihael's suggestions in a thread with Jing, which didnt solve the problem till I added back in all the lines that came with the default Swift build (r1339) Is this my error somewhere in setting up Falkon, or is the Falkon provider install messing this up? - Mike ----- Broken config after deef provider installed: #Root category log4j.rootCategory=WARN, CONSOLE # CONSOLE is set to be a ConsoleAppender using a PatternLayout # The class is not printed log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%-5p %x - %m%n # CONSOLE-C is the same as above, but the class is printed log4j.appender.CONSOLE-C=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE-C.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE-C.layout.ConversionPattern=%-5p [%c] %x - %m%n # %provider-deef% log4j.logger.org.apache.axis.utils.JavaUtils=ERROR log4j.logger.org.globus.cog.abstraction.impl.execution.deef=DEBUG # %abstraction% log4j.logger.org.globus.cog.abstraction=WARN # %abstraction-common% log4j.logger.org.globus.cog.abstraction.impl.common=WARN # %jglobus% # %util% # %provider-gt2% # %provider-gt2ft% # %provider-gt4_0_0% log4j.logger.org.apache.axis.utils.JavaUtils=ERROR # %provider-condor% # %provider-ssh% # %provider-webdav% # %provider-local% ---- logging works when I add the following lines: log4j.logger.org.globus.cog.abstraction.impl.common=WARN log4j.logger.org.globus.cog.abstraction=DEBUG # ^^ Added by Mike per Mihael log4j.rootCategory=WARN, CONSOLE, FILE log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.Threshold=WARN log4j.appender.CONSOLE.layout.ConversionPattern=%m%n log4j.appender.FILE=org.apache.log4j.FileAppender log4j.appender.FILE.File=swift.log log4j.appender.FILE.layout=org.apache.log4j.PatternLayout log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n log4j.logger.swift=DEBUG log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG log4j.logger.org.griphyn.vdl.engine.Karajan=INFO From hategan at mcs.anl.gov Tue Oct 23 14:25:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 23 Oct 2007 14:25:45 -0500 Subject: [Swift-devel] Loging still messed up after Falkon provider installed In-Reply-To: <471E4674.40405@mcs.anl.gov> References: <471E4674.40405@mcs.anl.gov> Message-ID: <1193167545.24849.0.camel@blabla.mcs.anl.gov> On Tue, 2007-10-23 at 14:07 -0500, Michael Wilde wrote: > I installed the deef provider per the instructions in the Falkon SVN > root, and my swift .log file is no longer produced. > > There was much discussion on this on the lists; was it never fixed, or > did it break again? Never fixed, I'm guessing. Is there a bug report? > > My non-working log4j.properties file - what I had after installing the > deef provider - is below. > > Below that is what I added from one of Mihael's suggestions in a thread > with Jing, which didnt solve the problem till I added back in all the > lines that came with the default Swift build (r1339) > > Is this my error somewhere in setting up Falkon, or is the Falkon > provider install messing this up? > > - Mike > > ----- Broken config after deef provider installed: > > #Root category > log4j.rootCategory=WARN, CONSOLE > > # CONSOLE is set to be a ConsoleAppender using a PatternLayout > # The class is not printed > log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender > log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout > log4j.appender.CONSOLE.layout.ConversionPattern=%-5p %x - %m%n > > # CONSOLE-C is the same as above, but the class is printed > log4j.appender.CONSOLE-C=org.apache.log4j.ConsoleAppender > log4j.appender.CONSOLE-C.layout=org.apache.log4j.PatternLayout > log4j.appender.CONSOLE-C.layout.ConversionPattern=%-5p [%c] %x - %m%n > > # %provider-deef% > log4j.logger.org.apache.axis.utils.JavaUtils=ERROR > log4j.logger.org.globus.cog.abstraction.impl.execution.deef=DEBUG > > # %abstraction% > log4j.logger.org.globus.cog.abstraction=WARN > > # %abstraction-common% > log4j.logger.org.globus.cog.abstraction.impl.common=WARN > > # %jglobus% > > # %util% > > # %provider-gt2% > > # %provider-gt2ft% > > # %provider-gt4_0_0% > log4j.logger.org.apache.axis.utils.JavaUtils=ERROR > # %provider-condor% > > # %provider-ssh% > > # %provider-webdav% > > # %provider-local% > > ---- logging works when I add the following lines: > > log4j.logger.org.globus.cog.abstraction.impl.common=WARN > log4j.logger.org.globus.cog.abstraction=DEBUG > # ^^ Added by Mike per Mihael > > log4j.rootCategory=WARN, CONSOLE, FILE > > log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender > log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout > log4j.appender.CONSOLE.Threshold=WARN > log4j.appender.CONSOLE.layout.ConversionPattern=%m%n > > log4j.appender.FILE=org.apache.log4j.FileAppender > log4j.appender.FILE.File=swift.log > log4j.appender.FILE.layout=org.apache.log4j.PatternLayout > log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd > HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n > > log4j.logger.swift=DEBUG > > log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG > log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG > log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG > log4j.logger.org.griphyn.vdl.engine.Karajan=INFO > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From bugzilla-daemon at mcs.anl.gov Tue Oct 23 14:47:49 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 23 Oct 2007 14:47:49 -0500 (CDT) Subject: [Swift-devel] [Bug 108] New: Installing deef provider damages log4j config Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=108 Summary: Installing deef provider damages log4j config Product: Swift Version: unspecified Platform: All OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov CC: hategan at mcs.anl.gov I installed the deef provider per the instructions in the Falkon SVN root, and my swift .log file is no longer produced. There was much discussion on this on the lists; was it never fixed, or did it break again? My non-working log4j.properties file - what I had after installing the deef provider - is below. Below that is what I added from one of Mihael's suggestions in a thread with Jing, which didnt solve the problem till I added back in all the lines that came with the default Swift build (r1339) Is this my error somewhere in setting up Falkon, or is the Falkon provider install messing this up? - Mike ----- Broken config after deef provider installed: #Root category log4j.rootCategory=WARN, CONSOLE # CONSOLE is set to be a ConsoleAppender using a PatternLayout # The class is not printed log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%-5p %x - %m%n # CONSOLE-C is the same as above, but the class is printed log4j.appender.CONSOLE-C=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE-C.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE-C.layout.ConversionPattern=%-5p [%c] %x - %m%n # %provider-deef% log4j.logger.org.apache.axis.utils.JavaUtils=ERROR log4j.logger.org.globus.cog.abstraction.impl.execution.deef=DEBUG # %abstraction% log4j.logger.org.globus.cog.abstraction=WARN # %abstraction-common% log4j.logger.org.globus.cog.abstraction.impl.common=WARN # %jglobus% # %util% # %provider-gt2% # %provider-gt2ft% # %provider-gt4_0_0% log4j.logger.org.apache.axis.utils.JavaUtils=ERROR # %provider-condor% # %provider-ssh% # %provider-webdav% # %provider-local% ---- logging works when I add the following lines: log4j.logger.org.globus.cog.abstraction.impl.common=WARN log4j.logger.org.globus.cog.abstraction=DEBUG # ^^ Added by Mike per Mihael log4j.rootCategory=WARN, CONSOLE, FILE log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.Threshold=WARN log4j.appender.CONSOLE.layout.ConversionPattern=%m%n log4j.appender.FILE=org.apache.log4j.FileAppender log4j.appender.FILE.File=swift.log log4j.appender.FILE.layout=org.apache.log4j.PatternLayout log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n log4j.logger.swift=DEBUG log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG log4j.logger.org.griphyn.vdl.engine.Karajan=INFO -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From wilde at mcs.anl.gov Tue Oct 23 14:48:47 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 23 Oct 2007 14:48:47 -0500 Subject: [Swift-devel] Loging still messed up after Falkon provider installed In-Reply-To: <1193167545.24849.0.camel@blabla.mcs.anl.gov> References: <471E4674.40405@mcs.anl.gov> <1193167545.24849.0.camel@blabla.mcs.anl.gov> Message-ID: <471E501F.2030206@mcs.anl.gov> Yes, there is now :) bug 108. On 10/23/07 2:25 PM, Mihael Hategan wrote: > On Tue, 2007-10-23 at 14:07 -0500, Michael Wilde wrote: >> I installed the deef provider per the instructions in the Falkon SVN >> root, and my swift .log file is no longer produced. >> >> There was much discussion on this on the lists; was it never fixed, or >> did it break again? > > Never fixed, I'm guessing. Is there a bug report? > >> My non-working log4j.properties file - what I had after installing the >> deef provider - is below. >> >> Below that is what I added from one of Mihael's suggestions in a thread >> with Jing, which didnt solve the problem till I added back in all the >> lines that came with the default Swift build (r1339) >> >> Is this my error somewhere in setting up Falkon, or is the Falkon >> provider install messing this up? >> >> - Mike >> >> ----- Broken config after deef provider installed: >> >> #Root category >> log4j.rootCategory=WARN, CONSOLE >> >> # CONSOLE is set to be a ConsoleAppender using a PatternLayout >> # The class is not printed >> log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender >> log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout >> log4j.appender.CONSOLE.layout.ConversionPattern=%-5p %x - %m%n >> >> # CONSOLE-C is the same as above, but the class is printed >> log4j.appender.CONSOLE-C=org.apache.log4j.ConsoleAppender >> log4j.appender.CONSOLE-C.layout=org.apache.log4j.PatternLayout >> log4j.appender.CONSOLE-C.layout.ConversionPattern=%-5p [%c] %x - %m%n >> >> # %provider-deef% >> log4j.logger.org.apache.axis.utils.JavaUtils=ERROR >> log4j.logger.org.globus.cog.abstraction.impl.execution.deef=DEBUG >> >> # %abstraction% >> log4j.logger.org.globus.cog.abstraction=WARN >> >> # %abstraction-common% >> log4j.logger.org.globus.cog.abstraction.impl.common=WARN >> >> # %jglobus% >> >> # %util% >> >> # %provider-gt2% >> >> # %provider-gt2ft% >> >> # %provider-gt4_0_0% >> log4j.logger.org.apache.axis.utils.JavaUtils=ERROR >> # %provider-condor% >> >> # %provider-ssh% >> >> # %provider-webdav% >> >> # %provider-local% >> >> ---- logging works when I add the following lines: >> >> log4j.logger.org.globus.cog.abstraction.impl.common=WARN >> log4j.logger.org.globus.cog.abstraction=DEBUG >> # ^^ Added by Mike per Mihael >> >> log4j.rootCategory=WARN, CONSOLE, FILE >> >> log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender >> log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout >> log4j.appender.CONSOLE.Threshold=WARN >> log4j.appender.CONSOLE.layout.ConversionPattern=%m%n >> >> log4j.appender.FILE=org.apache.log4j.FileAppender >> log4j.appender.FILE.File=swift.log >> log4j.appender.FILE.layout=org.apache.log4j.PatternLayout >> log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd >> HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n >> >> log4j.logger.swift=DEBUG >> >> log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG >> log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG >> log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG >> log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG >> log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG >> log4j.logger.org.griphyn.vdl.engine.Karajan=INFO >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > From andrewj at uchicago.edu Wed Oct 24 09:19:52 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Wed, 24 Oct 2007 09:19:52 -0500 (CDT) Subject: [Swift-devel] New Terminable Issue Message-ID: <20071024091952.AVM98745@m4500-00.uchicago.edu> Hello all, I know there is already discussion of terminable's problems, however, I encountered a new error in running swift from terminable for jobs on TP previously not present. RGI_Man_sh failed Execution failed: Could not initialize shared directory on teraport Caused by: org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: Error communicating with the GridFTP server Caused by: Authentication failed [Caused by: Failure unspecified at GSS-API level [Caused by: Unknown CA]] Thanks, Andrew From andrewj at uchicago.edu Thu Oct 25 10:50:33 2007 From: andrewj at uchicago.edu (Andrew Robert Jamieson) Date: Thu, 25 Oct 2007 10:50:33 -0500 (CDT) Subject: [Swift-devel] Strange Problem with TG-UCANL Message-ID: Any thoughts on why this would happen on a simple "hello world" (see below) Thanks, Andrew ******************** andrewj at tg-viz-login1:~/CADGrid/Swifty/vdsk-0.3-dev/examples/vdsk> swift -debug -tc.file ~/CADGrid/Swifty/UCANL-tc.data -sites.file ~/.swift/sites.xml first.swift Recompilation suppressed. Using sites /home/andrewj/.swift/sites.xml Using tc.data: /home/andrewj/CADGrid/Swifty/UCANL-tc.data Swift v0.3-dev r1339 Swift v0.3-dev r1339 RunID: 20071025-1044-zo4kzfjg RunID: 20071025-1044-zo4kzfjg echo started START thread=0 tr=echo START host=UCANL - Initializing shared directory Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting status to Completed Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting status to Submitted Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting status to Active Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting status to Completed Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting status to Submitted Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting status to Active Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting status to Completed Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting status to Completed Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting status to Completed Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting status to Completed END host=UCANL - Done initializing shared directory THREAD_ASSOCIATION jobid=echo-0gj1k5ji thread=0 host=UCANL START jobid=echo-0gj1k5ji host=UCANL - Initializing directory structure START path= dir=first-20071025-1044-zo4kzfjg/shared - Creating directory structure Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting status to Completed END jobid=echo-0gj1k5ji - Done initializing directory structure START jobid=echo-0gj1k5ji - Staging in files END jobid=echo-0gj1k5ji - Staging in finished JOB_START jobid=echo-0gj1k5ji tr=echo arguments=[Hello, world!] tmpdir=first-20071025-1044-zo4kzfjg/echo-0gj1k5ji host=UCANL Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting status to Submitted Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting status to Active Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting status to Completed START jobid=echo-0gj1k5ji Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting status to Failed org.globus.cog.abstraction.impl.file.FileResourceException: Cannot delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/status/echo-0gj1k5ji-success Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting status to Completed NO_STATUS_FILE jobid=echo-0gj1k5ji - Both status files are missing APPLICATION_EXCEPTION jobid=echo-0gj1k5ji - Application exception: No status file was found. Check the shared filesystem on UCANL sys:throw @ vdl-int.k, line: 96 sys:else @ vdl-int.k, line: 94 sys:if @ vdl-int.k, line: 82 sys:try @ vdl-int.k, line: 70 vdl:checkjobstatus @ vdl-int.k, line: 379 sys:sequential @ vdl-int.k, line: 355 sys:try @ vdl-int.k, line: 354 task:allocatehost @ vdl-int.k, line: 336 vdl:execute2 @ execute-default.k, line: 23 sys:restartonerror @ execute-default.k, line: 21 sys:sequential @ execute-default.k, line: 19 sys:try @ execute-default.k, line: 18 sys:if @ execute-default.k, line: 17 sys:then @ execute-default.k, line: 16 sys:if @ execute-default.k, line: 15 vdl:execute @ first.kml, line: 16 greeting @ first.kml, line: 43 vdl:mainp @ first.kml, line: 42 mainp @ vdl.k, line: 148 vdl:mains @ first.kml, line: 41 vdl:mains @ first.kml, line: 41 rlog:restartlog @ first.kml, line: 39 kernel:project @ first.kml, line: 2 first-20071025-1044-zo4kzfjg Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting status to Submitted Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting status to Active Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting status to Failed Exception in getFile Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting status to Completed Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting status to Submitted Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting status to Active Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting status to Failed Exception in getFile Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting status to Completed THREAD_ASSOCIATION jobid=echo-1gj1k5ji thread=0 host=UCANL START jobid=echo-1gj1k5ji host=UCANL - Initializing directory structure END jobid=echo-1gj1k5ji - Done initializing directory structure START jobid=echo-1gj1k5ji - Staging in files END jobid=echo-1gj1k5ji - Staging in finished JOB_START jobid=echo-1gj1k5ji tr=echo arguments=[Hello, world!] tmpdir=first-20071025-1044-zo4kzfjg/echo-1gj1k5ji host=UCANL Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting status to Submitted Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting status to Active Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting status to Completed START jobid=echo-1gj1k5ji Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting status to Failed org.globus.cog.abstraction.impl.file.FileResourceException: Cannot delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/status/echo-1gj1k5ji-success Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting status to Completed NO_STATUS_FILE jobid=echo-1gj1k5ji - Both status files are missing APPLICATION_EXCEPTION jobid=echo-1gj1k5ji - Application exception: No status file was found. Check the shared filesystem on UCANL sys:throw @ vdl-int.k, line: 96 sys:else @ vdl-int.k, line: 94 sys:if @ vdl-int.k, line: 82 sys:try @ vdl-int.k, line: 70 vdl:checkjobstatus @ vdl-int.k, line: 379 sys:sequential @ vdl-int.k, line: 355 sys:try @ vdl-int.k, line: 354 task:allocatehost @ vdl-int.k, line: 336 vdl:execute2 @ execute-default.k, line: 23 sys:restartonerror @ execute-default.k, line: 21 sys:sequential @ execute-default.k, line: 19 sys:try @ execute-default.k, line: 18 sys:if @ execute-default.k, line: 17 sys:then @ execute-default.k, line: 16 sys:if @ execute-default.k, line: 15 vdl:execute @ first.kml, line: 16 greeting @ first.kml, line: 43 vdl:mainp @ first.kml, line: 42 mainp @ vdl.k, line: 148 vdl:mains @ first.kml, line: 41 vdl:mains @ first.kml, line: 41 rlog:restartlog @ first.kml, line: 39 kernel:project @ first.kml, line: 2 first-20071025-1044-zo4kzfjg Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting status to Submitted Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting status to Active Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting status to Failed Exception in getFile Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting status to Completed Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting status to Submitted Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting status to Active Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting status to Failed Exception in getFile Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting status to Completed THREAD_ASSOCIATION jobid=echo-2gj1k5ji thread=0 host=UCANL START jobid=echo-2gj1k5ji host=UCANL - Initializing directory structure END jobid=echo-2gj1k5ji - Done initializing directory structure START jobid=echo-2gj1k5ji - Staging in files END jobid=echo-2gj1k5ji - Staging in finished JOB_START jobid=echo-2gj1k5ji tr=echo arguments=[Hello, world!] tmpdir=first-20071025-1044-zo4kzfjg/echo-2gj1k5ji host=UCANL Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting status to Submitted Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting status to Active Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting status to Completed START jobid=echo-2gj1k5ji Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting status to Completed SUCCESS jobid=echo-2gj1k5ji - Success file found JOB_END jobid=echo-2gj1k5ji START jobid=echo-2gj1k5ji - Staging out files FILE_STAGE_OUT_START srcname=hello.txt srcdir=first-20071025-1044-zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost provider=file Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting status to Completed Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting status to Submitted Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting status to Active Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting status to Completed FILE_STAGE_OUT_END srcname=hello.txt srcdir=first-20071025-1044-zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost provider=file Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting status to Completed END jobid=echo-2gj1k5ji - Staging out finished echo completed END_SUCCESS thread=0 tr=echo START cleanups=[[first-20071025-1044-zo4kzfjg, UCANL]] START dir=first-20071025-1044-zo4kzfjg host=UCANL Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting status to Submitted Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting status to Completed END dir=first-20071025-1044-zo4kzfjg host=UCANL Swift finished - workflow had no errors From nefedova at mcs.anl.gov Thu Oct 25 11:07:47 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 25 Oct 2007 11:07:47 -0500 Subject: [Swift-devel] Strange Problem with TG-UCANL In-Reply-To: References: Message-ID: <9302B48B-F4E3-4D45-A6C0-D762D7C7892B@mcs.anl.gov> If you are using kickstart - try to use this setting (on TG-UC): gridlaunch="/home/nefedova/pegasus/src/tools/kickstart/kickstart" in your site.xml. file ( replace the one you have with this one) Nika On Oct 25, 2007, at 10:50 AM, Andrew Robert Jamieson wrote: > Any thoughts on why this would happen on a simple "hello world" > (see below) > Thanks, > Andrew > > > ******************** > andrewj at tg-viz-login1:~/CADGrid/Swifty/vdsk-0.3-dev/examples/vdsk> > swift -debug -tc.file ~/CADGrid/Swifty/UCANL-tc.data -sites.file > ~/.swift/sites.xml first.swift > Recompilation suppressed. > Using sites /home/andrewj/.swift/sites.xml > Using tc.data: /home/andrewj/CADGrid/Swifty/UCANL-tc.data > Swift v0.3-dev r1339 > > Swift v0.3-dev r1339 > > RunID: 20071025-1044-zo4kzfjg > RunID: 20071025-1044-zo4kzfjg > echo started > START thread=0 tr=echo > START host=UCANL - Initializing shared directory > Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting > status to Completed > Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting > status to Submitted > Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting > status to Active > Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting > status to Completed > Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting > status to Submitted > Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting > status to Active > Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting > status to Completed > Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting > status to Completed > Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting > status to Completed > Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting > status to Completed > END host=UCANL - Done initializing shared directory > THREAD_ASSOCIATION jobid=echo-0gj1k5ji thread=0 host=UCANL > START jobid=echo-0gj1k5ji host=UCANL - Initializing directory > structure > START path= dir=first-20071025-1044-zo4kzfjg/shared - Creating > directory structure > Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting > status to Completed > END jobid=echo-0gj1k5ji - Done initializing directory structure > START jobid=echo-0gj1k5ji - Staging in files > END jobid=echo-0gj1k5ji - Staging in finished > JOB_START jobid=echo-0gj1k5ji tr=echo arguments=[Hello, world!] > tmpdir=first-20071025-1044-zo4kzfjg/echo-0gj1k5ji host=UCANL > Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting > status to Submitted > Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting > status to Active > Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting > status to Completed > START jobid=echo-0gj1k5ji > Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting > status to Failed > org.globus.cog.abstraction.impl.file.FileResourceException: Cannot > delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/ > status/echo-0gj1k5ji-success > Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting > status to Completed > NO_STATUS_FILE jobid=echo-0gj1k5ji - Both status files are missing > APPLICATION_EXCEPTION jobid=echo-0gj1k5ji - Application exception: > No status file was found. Check the shared filesystem on UCANL > sys:throw @ vdl-int.k, line: 96 > sys:else @ vdl-int.k, line: 94 > sys:if @ vdl-int.k, line: 82 > sys:try @ vdl-int.k, line: 70 > vdl:checkjobstatus @ vdl-int.k, line: 379 > sys:sequential @ vdl-int.k, line: 355 > sys:try @ vdl-int.k, line: 354 > task:allocatehost @ vdl-int.k, line: 336 > vdl:execute2 @ execute-default.k, line: 23 > sys:restartonerror @ execute-default.k, line: 21 > sys:sequential @ execute-default.k, line: 19 > sys:try @ execute-default.k, line: 18 > sys:if @ execute-default.k, line: 17 > sys:then @ execute-default.k, line: 16 > sys:if @ execute-default.k, line: 15 > vdl:execute @ first.kml, line: 16 > greeting @ first.kml, line: 43 > vdl:mainp @ first.kml, line: 42 > mainp @ vdl.k, line: 148 > vdl:mains @ first.kml, line: 41 > vdl:mains @ first.kml, line: 41 > rlog:restartlog @ first.kml, line: 39 > kernel:project @ first.kml, line: 2 > first-20071025-1044-zo4kzfjg > > Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting > status to Submitted > Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting > status to Active > Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting > status to Failed Exception in getFile > Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting > status to Completed > Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting > status to Submitted > Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting > status to Active > Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting > status to Failed Exception in getFile > Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting > status to Completed > THREAD_ASSOCIATION jobid=echo-1gj1k5ji thread=0 host=UCANL > START jobid=echo-1gj1k5ji host=UCANL - Initializing directory > structure > END jobid=echo-1gj1k5ji - Done initializing directory structure > START jobid=echo-1gj1k5ji - Staging in files > END jobid=echo-1gj1k5ji - Staging in finished > JOB_START jobid=echo-1gj1k5ji tr=echo arguments=[Hello, world!] > tmpdir=first-20071025-1044-zo4kzfjg/echo-1gj1k5ji host=UCANL > Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting > status to Submitted > Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting > status to Active > Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting > status to Completed > START jobid=echo-1gj1k5ji > Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting > status to Failed > org.globus.cog.abstraction.impl.file.FileResourceException: Cannot > delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/ > status/echo-1gj1k5ji-success > Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting > status to Completed > NO_STATUS_FILE jobid=echo-1gj1k5ji - Both status files are missing > APPLICATION_EXCEPTION jobid=echo-1gj1k5ji - Application exception: > No status file was found. Check the shared filesystem on UCANL > sys:throw @ vdl-int.k, line: 96 > sys:else @ vdl-int.k, line: 94 > sys:if @ vdl-int.k, line: 82 > sys:try @ vdl-int.k, line: 70 > vdl:checkjobstatus @ vdl-int.k, line: 379 > sys:sequential @ vdl-int.k, line: 355 > sys:try @ vdl-int.k, line: 354 > task:allocatehost @ vdl-int.k, line: 336 > vdl:execute2 @ execute-default.k, line: 23 > sys:restartonerror @ execute-default.k, line: 21 > sys:sequential @ execute-default.k, line: 19 > sys:try @ execute-default.k, line: 18 > sys:if @ execute-default.k, line: 17 > sys:then @ execute-default.k, line: 16 > sys:if @ execute-default.k, line: 15 > vdl:execute @ first.kml, line: 16 > greeting @ first.kml, line: 43 > vdl:mainp @ first.kml, line: 42 > mainp @ vdl.k, line: 148 > vdl:mains @ first.kml, line: 41 > vdl:mains @ first.kml, line: 41 > rlog:restartlog @ first.kml, line: 39 > kernel:project @ first.kml, line: 2 > first-20071025-1044-zo4kzfjg > > Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting > status to Submitted > Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting > status to Active > Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting > status to Failed Exception in getFile > Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting > status to Completed > Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting > status to Submitted > Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting > status to Active > Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting > status to Failed Exception in getFile > Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting > status to Completed > THREAD_ASSOCIATION jobid=echo-2gj1k5ji thread=0 host=UCANL > START jobid=echo-2gj1k5ji host=UCANL - Initializing directory > structure > END jobid=echo-2gj1k5ji - Done initializing directory structure > START jobid=echo-2gj1k5ji - Staging in files > END jobid=echo-2gj1k5ji - Staging in finished > JOB_START jobid=echo-2gj1k5ji tr=echo arguments=[Hello, world!] > tmpdir=first-20071025-1044-zo4kzfjg/echo-2gj1k5ji host=UCANL > Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting > status to Submitted > Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting > status to Active > Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting > status to Completed > START jobid=echo-2gj1k5ji > Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting > status to Completed > SUCCESS jobid=echo-2gj1k5ji - Success file found > JOB_END jobid=echo-2gj1k5ji > START jobid=echo-2gj1k5ji - Staging out files > FILE_STAGE_OUT_START srcname=hello.txt srcdir=first-20071025-1044- > zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost > provider=file > Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting > status to Completed > Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting > status to Submitted > Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting > status to Active > Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting > status to Completed > FILE_STAGE_OUT_END srcname=hello.txt srcdir=first-20071025-1044- > zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost > provider=file > Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting > status to Active > Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting > status to Completed > END jobid=echo-2gj1k5ji - Staging out finished > echo completed > END_SUCCESS thread=0 tr=echo > START cleanups=[[first-20071025-1044-zo4kzfjg, UCANL]] > START dir=first-20071025-1044-zo4kzfjg host=UCANL > Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting > status to Submitted > Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting > status to Completed > END dir=first-20071025-1044-zo4kzfjg host=UCANL > Swift finished - workflow had no errors > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From andrewj at uchicago.edu Thu Oct 25 12:46:40 2007 From: andrewj at uchicago.edu (Andrew Robert Jamieson) Date: Thu, 25 Oct 2007 12:46:40 -0500 (CDT) Subject: [Swift-devel] Strange Problem with TG-UCANL In-Reply-To: <9302B48B-F4E3-4D45-A6C0-D762D7C7892B@mcs.anl.gov> References: <9302B48B-F4E3-4D45-A6C0-D762D7C7892B@mcs.anl.gov> Message-ID: Thanks for the suggestion, unfortunately I am not using kickstart. On Thu, 25 Oct 2007, Veronika Nefedova wrote: > If you are using kickstart - try to use this setting (on TG-UC): > gridlaunch="/home/nefedova/pegasus/src/tools/kickstart/kickstart" in your > site.xml. file ( replace the one you have with this one) > > Nika > > On Oct 25, 2007, at 10:50 AM, Andrew Robert Jamieson wrote: > >> Any thoughts on why this would happen on a simple "hello world" >> (see below) >> Thanks, >> Andrew >> >> >> ******************** >> andrewj at tg-viz-login1:~/CADGrid/Swifty/vdsk-0.3-dev/examples/vdsk> swift >> -debug -tc.file ~/CADGrid/Swifty/UCANL-tc.data -sites.file >> ~/.swift/sites.xml first.swift >> Recompilation suppressed. >> Using sites /home/andrewj/.swift/sites.xml >> Using tc.data: /home/andrewj/CADGrid/Swifty/UCANL-tc.data >> Swift v0.3-dev r1339 >> >> Swift v0.3-dev r1339 >> >> RunID: 20071025-1044-zo4kzfjg >> RunID: 20071025-1044-zo4kzfjg >> echo started >> START thread=0 tr=echo >> START host=UCANL - Initializing shared directory >> Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting status to >> Completed >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting status to >> Submitted >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting status to >> Active >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting status to >> Completed >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting status to >> Submitted >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting status to >> Active >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting status to >> Completed >> Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting status to >> Completed >> Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting status to >> Completed >> Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting status to >> Completed >> END host=UCANL - Done initializing shared directory >> THREAD_ASSOCIATION jobid=echo-0gj1k5ji thread=0 host=UCANL >> START jobid=echo-0gj1k5ji host=UCANL - Initializing directory structure >> START path= dir=first-20071025-1044-zo4kzfjg/shared - Creating directory >> structure >> Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting status to >> Completed >> END jobid=echo-0gj1k5ji - Done initializing directory structure >> START jobid=echo-0gj1k5ji - Staging in files >> END jobid=echo-0gj1k5ji - Staging in finished >> JOB_START jobid=echo-0gj1k5ji tr=echo arguments=[Hello, world!] >> tmpdir=first-20071025-1044-zo4kzfjg/echo-0gj1k5ji host=UCANL >> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting status to >> Submitted >> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting status to >> Active >> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting status to >> Completed >> START jobid=echo-0gj1k5ji >> Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting status to >> Failed org.globus.cog.abstraction.impl.file.FileResourceException: Cannot >> delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/ >> status/echo-0gj1k5ji-success >> Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting status to >> Completed >> NO_STATUS_FILE jobid=echo-0gj1k5ji - Both status files are missing >> APPLICATION_EXCEPTION jobid=echo-0gj1k5ji - Application exception: No >> status file was found. Check the shared filesystem on UCANL >> sys:throw @ vdl-int.k, line: 96 >> sys:else @ vdl-int.k, line: 94 >> sys:if @ vdl-int.k, line: 82 >> sys:try @ vdl-int.k, line: 70 >> vdl:checkjobstatus @ vdl-int.k, line: 379 >> sys:sequential @ vdl-int.k, line: 355 >> sys:try @ vdl-int.k, line: 354 >> task:allocatehost @ vdl-int.k, line: 336 >> vdl:execute2 @ execute-default.k, line: 23 >> sys:restartonerror @ execute-default.k, line: 21 >> sys:sequential @ execute-default.k, line: 19 >> sys:try @ execute-default.k, line: 18 >> sys:if @ execute-default.k, line: 17 >> sys:then @ execute-default.k, line: 16 >> sys:if @ execute-default.k, line: 15 >> vdl:execute @ first.kml, line: 16 >> greeting @ first.kml, line: 43 >> vdl:mainp @ first.kml, line: 42 >> mainp @ vdl.k, line: 148 >> vdl:mains @ first.kml, line: 41 >> vdl:mains @ first.kml, line: 41 >> rlog:restartlog @ first.kml, line: 39 >> kernel:project @ first.kml, line: 2 >> first-20071025-1044-zo4kzfjg >> >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting status to >> Submitted >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting status to >> Active >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting status to >> Failed Exception in getFile >> Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting status to >> Completed >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting status to >> Submitted >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting status to >> Active >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting status to >> Failed Exception in getFile >> Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting status to >> Completed >> THREAD_ASSOCIATION jobid=echo-1gj1k5ji thread=0 host=UCANL >> START jobid=echo-1gj1k5ji host=UCANL - Initializing directory structure >> END jobid=echo-1gj1k5ji - Done initializing directory structure >> START jobid=echo-1gj1k5ji - Staging in files >> END jobid=echo-1gj1k5ji - Staging in finished >> JOB_START jobid=echo-1gj1k5ji tr=echo arguments=[Hello, world!] >> tmpdir=first-20071025-1044-zo4kzfjg/echo-1gj1k5ji host=UCANL >> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting status to >> Submitted >> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting status to >> Active >> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting status to >> Completed >> START jobid=echo-1gj1k5ji >> Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting status to >> Failed org.globus.cog.abstraction.impl.file.FileResourceException: Cannot >> delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/ >> status/echo-1gj1k5ji-success >> Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting status to >> Completed >> NO_STATUS_FILE jobid=echo-1gj1k5ji - Both status files are missing >> APPLICATION_EXCEPTION jobid=echo-1gj1k5ji - Application exception: No >> status file was found. Check the shared filesystem on UCANL >> sys:throw @ vdl-int.k, line: 96 >> sys:else @ vdl-int.k, line: 94 >> sys:if @ vdl-int.k, line: 82 >> sys:try @ vdl-int.k, line: 70 >> vdl:checkjobstatus @ vdl-int.k, line: 379 >> sys:sequential @ vdl-int.k, line: 355 >> sys:try @ vdl-int.k, line: 354 >> task:allocatehost @ vdl-int.k, line: 336 >> vdl:execute2 @ execute-default.k, line: 23 >> sys:restartonerror @ execute-default.k, line: 21 >> sys:sequential @ execute-default.k, line: 19 >> sys:try @ execute-default.k, line: 18 >> sys:if @ execute-default.k, line: 17 >> sys:then @ execute-default.k, line: 16 >> sys:if @ execute-default.k, line: 15 >> vdl:execute @ first.kml, line: 16 >> greeting @ first.kml, line: 43 >> vdl:mainp @ first.kml, line: 42 >> mainp @ vdl.k, line: 148 >> vdl:mains @ first.kml, line: 41 >> vdl:mains @ first.kml, line: 41 >> rlog:restartlog @ first.kml, line: 39 >> kernel:project @ first.kml, line: 2 >> first-20071025-1044-zo4kzfjg >> >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting status to >> Submitted >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting status to >> Active >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting status to >> Failed Exception in getFile >> Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting status to >> Completed >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting status to >> Submitted >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting status to >> Active >> Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting status to >> Failed Exception in getFile >> Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting status to >> Completed >> THREAD_ASSOCIATION jobid=echo-2gj1k5ji thread=0 host=UCANL >> START jobid=echo-2gj1k5ji host=UCANL - Initializing directory structure >> END jobid=echo-2gj1k5ji - Done initializing directory structure >> START jobid=echo-2gj1k5ji - Staging in files >> END jobid=echo-2gj1k5ji - Staging in finished >> JOB_START jobid=echo-2gj1k5ji tr=echo arguments=[Hello, world!] >> tmpdir=first-20071025-1044-zo4kzfjg/echo-2gj1k5ji host=UCANL >> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting status to >> Submitted >> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting status to >> Active >> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting status to >> Completed >> START jobid=echo-2gj1k5ji >> Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting status to >> Completed >> SUCCESS jobid=echo-2gj1k5ji - Success file found >> JOB_END jobid=echo-2gj1k5ji >> START jobid=echo-2gj1k5ji - Staging out files >> FILE_STAGE_OUT_START srcname=hello.txt srcdir=first-20071025-1044- >> zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost provider=file >> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting status to >> Completed >> Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting status to >> Submitted >> Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting status to >> Active >> Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting status to >> Completed >> FILE_STAGE_OUT_END srcname=hello.txt srcdir=first-20071025-1044- >> zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost provider=file >> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting status to >> Active >> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting status to >> Completed >> END jobid=echo-2gj1k5ji - Staging out finished >> echo completed >> END_SUCCESS thread=0 tr=echo >> START cleanups=[[first-20071025-1044-zo4kzfjg, UCANL]] >> START dir=first-20071025-1044-zo4kzfjg host=UCANL >> Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting status to >> Submitted >> Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting status to >> Completed >> END dir=first-20071025-1044-zo4kzfjg host=UCANL >> Swift finished - workflow had no errors >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > From foster at mcs.anl.gov Thu Oct 25 13:03:00 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Thu, 25 Oct 2007 13:03:00 -0500 Subject: [Swift-devel] Strange Problem with TG-UCANL In-Reply-To: References: <9302B48B-F4E3-4D45-A6C0-D762D7C7892B@mcs.anl.gov> Message-ID: <4720DA54.6000408@mcs.anl.gov> can we decide that we always use kickstart? Andrew Robert Jamieson wrote: > Thanks for the suggestion, unfortunately I am not using kickstart. > > On Thu, 25 Oct 2007, Veronika Nefedova wrote: > >> If you are using kickstart - try to use this setting (on TG-UC): >> gridlaunch="/home/nefedova/pegasus/src/tools/kickstart/kickstart" in >> your site.xml. file ( replace the one you have with this one) >> >> Nika >> >> On Oct 25, 2007, at 10:50 AM, Andrew Robert Jamieson wrote: >> >>> Any thoughts on why this would happen on a simple "hello world" >>> (see below) >>> Thanks, >>> Andrew >>> >>> >>> ******************** >>> andrewj at tg-viz-login1:~/CADGrid/Swifty/vdsk-0.3-dev/examples/vdsk> >>> swift -debug -tc.file ~/CADGrid/Swifty/UCANL-tc.data -sites.file >>> ~/.swift/sites.xml first.swift >>> Recompilation suppressed. >>> Using sites /home/andrewj/.swift/sites.xml >>> Using tc.data: /home/andrewj/CADGrid/Swifty/UCANL-tc.data >>> Swift v0.3-dev r1339 >>> >>> Swift v0.3-dev r1339 >>> >>> RunID: 20071025-1044-zo4kzfjg >>> RunID: 20071025-1044-zo4kzfjg >>> echo started >>> START thread=0 tr=echo >>> START host=UCANL - Initializing shared directory >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting >>> status to Completed >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting >>> status to Submitted >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting >>> status to Active >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting >>> status to Completed >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting >>> status to Submitted >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting >>> status to Active >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting >>> status to Completed >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting >>> status to Completed >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting >>> status to Completed >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting >>> status to Completed >>> END host=UCANL - Done initializing shared directory >>> THREAD_ASSOCIATION jobid=echo-0gj1k5ji thread=0 host=UCANL >>> START jobid=echo-0gj1k5ji host=UCANL - Initializing directory structure >>> START path= dir=first-20071025-1044-zo4kzfjg/shared - Creating >>> directory structure >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting >>> status to Completed >>> END jobid=echo-0gj1k5ji - Done initializing directory structure >>> START jobid=echo-0gj1k5ji - Staging in files >>> END jobid=echo-0gj1k5ji - Staging in finished >>> JOB_START jobid=echo-0gj1k5ji tr=echo arguments=[Hello, world!] >>> tmpdir=first-20071025-1044-zo4kzfjg/echo-0gj1k5ji host=UCANL >>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting >>> status to Submitted >>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting >>> status to Active >>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting >>> status to Completed >>> START jobid=echo-0gj1k5ji >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting >>> status to Failed >>> org.globus.cog.abstraction.impl.file.FileResourceException: Cannot >>> delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/ >>> status/echo-0gj1k5ji-success >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting >>> status to Completed >>> NO_STATUS_FILE jobid=echo-0gj1k5ji - Both status files are missing >>> APPLICATION_EXCEPTION jobid=echo-0gj1k5ji - Application exception: >>> No status file was found. Check the shared filesystem on UCANL >>> sys:throw @ vdl-int.k, line: 96 >>> sys:else @ vdl-int.k, line: 94 >>> sys:if @ vdl-int.k, line: 82 >>> sys:try @ vdl-int.k, line: 70 >>> vdl:checkjobstatus @ vdl-int.k, line: 379 >>> sys:sequential @ vdl-int.k, line: 355 >>> sys:try @ vdl-int.k, line: 354 >>> task:allocatehost @ vdl-int.k, line: 336 >>> vdl:execute2 @ execute-default.k, line: 23 >>> sys:restartonerror @ execute-default.k, line: 21 >>> sys:sequential @ execute-default.k, line: 19 >>> sys:try @ execute-default.k, line: 18 >>> sys:if @ execute-default.k, line: 17 >>> sys:then @ execute-default.k, line: 16 >>> sys:if @ execute-default.k, line: 15 >>> vdl:execute @ first.kml, line: 16 >>> greeting @ first.kml, line: 43 >>> vdl:mainp @ first.kml, line: 42 >>> mainp @ vdl.k, line: 148 >>> vdl:mains @ first.kml, line: 41 >>> vdl:mains @ first.kml, line: 41 >>> rlog:restartlog @ first.kml, line: 39 >>> kernel:project @ first.kml, line: 2 >>> first-20071025-1044-zo4kzfjg >>> >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting >>> status to Submitted >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting >>> status to Active >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting >>> status to Failed Exception in getFile >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting >>> status to Completed >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting >>> status to Submitted >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting >>> status to Active >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting >>> status to Failed Exception in getFile >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting >>> status to Completed >>> THREAD_ASSOCIATION jobid=echo-1gj1k5ji thread=0 host=UCANL >>> START jobid=echo-1gj1k5ji host=UCANL - Initializing directory structure >>> END jobid=echo-1gj1k5ji - Done initializing directory structure >>> START jobid=echo-1gj1k5ji - Staging in files >>> END jobid=echo-1gj1k5ji - Staging in finished >>> JOB_START jobid=echo-1gj1k5ji tr=echo arguments=[Hello, world!] >>> tmpdir=first-20071025-1044-zo4kzfjg/echo-1gj1k5ji host=UCANL >>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting >>> status to Submitted >>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting >>> status to Active >>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting >>> status to Completed >>> START jobid=echo-1gj1k5ji >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting >>> status to Failed >>> org.globus.cog.abstraction.impl.file.FileResourceException: Cannot >>> delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/ >>> status/echo-1gj1k5ji-success >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting >>> status to Completed >>> NO_STATUS_FILE jobid=echo-1gj1k5ji - Both status files are missing >>> APPLICATION_EXCEPTION jobid=echo-1gj1k5ji - Application exception: >>> No status file was found. Check the shared filesystem on UCANL >>> sys:throw @ vdl-int.k, line: 96 >>> sys:else @ vdl-int.k, line: 94 >>> sys:if @ vdl-int.k, line: 82 >>> sys:try @ vdl-int.k, line: 70 >>> vdl:checkjobstatus @ vdl-int.k, line: 379 >>> sys:sequential @ vdl-int.k, line: 355 >>> sys:try @ vdl-int.k, line: 354 >>> task:allocatehost @ vdl-int.k, line: 336 >>> vdl:execute2 @ execute-default.k, line: 23 >>> sys:restartonerror @ execute-default.k, line: 21 >>> sys:sequential @ execute-default.k, line: 19 >>> sys:try @ execute-default.k, line: 18 >>> sys:if @ execute-default.k, line: 17 >>> sys:then @ execute-default.k, line: 16 >>> sys:if @ execute-default.k, line: 15 >>> vdl:execute @ first.kml, line: 16 >>> greeting @ first.kml, line: 43 >>> vdl:mainp @ first.kml, line: 42 >>> mainp @ vdl.k, line: 148 >>> vdl:mains @ first.kml, line: 41 >>> vdl:mains @ first.kml, line: 41 >>> rlog:restartlog @ first.kml, line: 39 >>> kernel:project @ first.kml, line: 2 >>> first-20071025-1044-zo4kzfjg >>> >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting >>> status to Submitted >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting >>> status to Active >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting >>> status to Failed Exception in getFile >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting >>> status to Completed >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting >>> status to Submitted >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting >>> status to Active >>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting >>> status to Failed Exception in getFile >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting >>> status to Completed >>> THREAD_ASSOCIATION jobid=echo-2gj1k5ji thread=0 host=UCANL >>> START jobid=echo-2gj1k5ji host=UCANL - Initializing directory structure >>> END jobid=echo-2gj1k5ji - Done initializing directory structure >>> START jobid=echo-2gj1k5ji - Staging in files >>> END jobid=echo-2gj1k5ji - Staging in finished >>> JOB_START jobid=echo-2gj1k5ji tr=echo arguments=[Hello, world!] >>> tmpdir=first-20071025-1044-zo4kzfjg/echo-2gj1k5ji host=UCANL >>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting >>> status to Submitted >>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting >>> status to Active >>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting >>> status to Completed >>> START jobid=echo-2gj1k5ji >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting >>> status to Completed >>> SUCCESS jobid=echo-2gj1k5ji - Success file found >>> JOB_END jobid=echo-2gj1k5ji >>> START jobid=echo-2gj1k5ji - Staging out files >>> FILE_STAGE_OUT_START srcname=hello.txt srcdir=first-20071025-1044- >>> zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost >>> provider=file >>> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting >>> status to Completed >>> Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting >>> status to Submitted >>> Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting >>> status to Active >>> Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting >>> status to Completed >>> FILE_STAGE_OUT_END srcname=hello.txt srcdir=first-20071025-1044- >>> zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost >>> provider=file >>> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting >>> status to Active >>> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting >>> status to Completed >>> END jobid=echo-2gj1k5ji - Staging out finished >>> echo completed >>> END_SUCCESS thread=0 tr=echo >>> START cleanups=[[first-20071025-1044-zo4kzfjg, UCANL]] >>> START dir=first-20071025-1044-zo4kzfjg host=UCANL >>> Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting >>> status to Submitted >>> Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting >>> status to Completed >>> END dir=first-20071025-1044-zo4kzfjg host=UCANL >>> Swift finished - workflow had no errors >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From andrewj at uchicago.edu Thu Oct 25 13:07:15 2007 From: andrewj at uchicago.edu (Andrew Robert Jamieson) Date: Thu, 25 Oct 2007 13:07:15 -0500 (CDT) Subject: [Swift-devel] Strange Problem with TG-UCANL In-Reply-To: <4720DA54.6000408@mcs.anl.gov> References: <9302B48B-F4E3-4D45-A6C0-D762D7C7892B@mcs.anl.gov> <4720DA54.6000408@mcs.anl.gov> Message-ID: I was just looking at some of the swift.properties configurations. If I understand it correctly I think this can be set as such. Perhaps I will try. Thanks, Andrew On Thu, 25 Oct 2007, Ian Foster wrote: > can we decide that we always use kickstart? > > Andrew Robert Jamieson wrote: >> Thanks for the suggestion, unfortunately I am not using kickstart. >> >> On Thu, 25 Oct 2007, Veronika Nefedova wrote: >> >>> If you are using kickstart - try to use this setting (on TG-UC): >>> gridlaunch="/home/nefedova/pegasus/src/tools/kickstart/kickstart" in your >>> site.xml. file ( replace the one you have with this one) >>> >>> Nika >>> >>> On Oct 25, 2007, at 10:50 AM, Andrew Robert Jamieson wrote: >>> >>>> Any thoughts on why this would happen on a simple "hello world" >>>> (see below) >>>> Thanks, >>>> Andrew >>>> >>>> >>>> ******************** >>>> andrewj at tg-viz-login1:~/CADGrid/Swifty/vdsk-0.3-dev/examples/vdsk> swift >>>> -debug -tc.file ~/CADGrid/Swifty/UCANL-tc.data -sites.file >>>> ~/.swift/sites.xml first.swift >>>> Recompilation suppressed. >>>> Using sites /home/andrewj/.swift/sites.xml >>>> Using tc.data: /home/andrewj/CADGrid/Swifty/UCANL-tc.data >>>> Swift v0.3-dev r1339 >>>> >>>> Swift v0.3-dev r1339 >>>> >>>> RunID: 20071025-1044-zo4kzfjg >>>> RunID: 20071025-1044-zo4kzfjg >>>> echo started >>>> START thread=0 tr=echo >>>> START host=UCANL - Initializing shared directory >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080146) setting status to >>>> Completed >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting status to >>>> Submitted >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting status to >>>> Active >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080149) setting status to >>>> Completed >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting status to >>>> Submitted >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting status to >>>> Active >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080153) setting status to >>>> Completed >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080156) setting status to >>>> Completed >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080158) setting status to >>>> Completed >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080160) setting status to >>>> Completed >>>> END host=UCANL - Done initializing shared directory >>>> THREAD_ASSOCIATION jobid=echo-0gj1k5ji thread=0 host=UCANL >>>> START jobid=echo-0gj1k5ji host=UCANL - Initializing directory structure >>>> START path= dir=first-20071025-1044-zo4kzfjg/shared - Creating directory >>>> structure >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080162) setting status to >>>> Completed >>>> END jobid=echo-0gj1k5ji - Done initializing directory structure >>>> START jobid=echo-0gj1k5ji - Staging in files >>>> END jobid=echo-0gj1k5ji - Staging in finished >>>> JOB_START jobid=echo-0gj1k5ji tr=echo arguments=[Hello, world!] >>>> tmpdir=first-20071025-1044-zo4kzfjg/echo-0gj1k5ji host=UCANL >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting status to >>>> Submitted >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting status to >>>> Active >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080164) setting status to >>>> Completed >>>> START jobid=echo-0gj1k5ji >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080166) setting status to >>>> Failed org.globus.cog.abstraction.impl.file.FileResourceException: Cannot >>>> delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/ >>>> status/echo-0gj1k5ji-success >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080168) setting status to >>>> Completed >>>> NO_STATUS_FILE jobid=echo-0gj1k5ji - Both status files are missing >>>> APPLICATION_EXCEPTION jobid=echo-0gj1k5ji - Application exception: No >>>> status file was found. Check the shared filesystem on UCANL >>>> sys:throw @ vdl-int.k, line: 96 >>>> sys:else @ vdl-int.k, line: 94 >>>> sys:if @ vdl-int.k, line: 82 >>>> sys:try @ vdl-int.k, line: 70 >>>> vdl:checkjobstatus @ vdl-int.k, line: 379 >>>> sys:sequential @ vdl-int.k, line: 355 >>>> sys:try @ vdl-int.k, line: 354 >>>> task:allocatehost @ vdl-int.k, line: 336 >>>> vdl:execute2 @ execute-default.k, line: 23 >>>> sys:restartonerror @ execute-default.k, line: 21 >>>> sys:sequential @ execute-default.k, line: 19 >>>> sys:try @ execute-default.k, line: 18 >>>> sys:if @ execute-default.k, line: 17 >>>> sys:then @ execute-default.k, line: 16 >>>> sys:if @ execute-default.k, line: 15 >>>> vdl:execute @ first.kml, line: 16 >>>> greeting @ first.kml, line: 43 >>>> vdl:mainp @ first.kml, line: 42 >>>> mainp @ vdl.k, line: 148 >>>> vdl:mains @ first.kml, line: 41 >>>> vdl:mains @ first.kml, line: 41 >>>> rlog:restartlog @ first.kml, line: 39 >>>> kernel:project @ first.kml, line: 2 >>>> first-20071025-1044-zo4kzfjg >>>> >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting status to >>>> Submitted >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting status to >>>> Active >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080170) setting status to >>>> Failed Exception in getFile >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080173) setting status to >>>> Completed >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting status to >>>> Submitted >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting status to >>>> Active >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080176) setting status to >>>> Failed Exception in getFile >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080179) setting status to >>>> Completed >>>> THREAD_ASSOCIATION jobid=echo-1gj1k5ji thread=0 host=UCANL >>>> START jobid=echo-1gj1k5ji host=UCANL - Initializing directory structure >>>> END jobid=echo-1gj1k5ji - Done initializing directory structure >>>> START jobid=echo-1gj1k5ji - Staging in files >>>> END jobid=echo-1gj1k5ji - Staging in finished >>>> JOB_START jobid=echo-1gj1k5ji tr=echo arguments=[Hello, world!] >>>> tmpdir=first-20071025-1044-zo4kzfjg/echo-1gj1k5ji host=UCANL >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting status to >>>> Submitted >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting status to >>>> Active >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080183) setting status to >>>> Completed >>>> START jobid=echo-1gj1k5ji >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080185) setting status to >>>> Failed org.globus.cog.abstraction.impl.file.FileResourceException: Cannot >>>> delete /disks/scratchgpfs1/andrewj/first-20071025-1044-zo4kzfjg/ >>>> status/echo-1gj1k5ji-success >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080187) setting status to >>>> Completed >>>> NO_STATUS_FILE jobid=echo-1gj1k5ji - Both status files are missing >>>> APPLICATION_EXCEPTION jobid=echo-1gj1k5ji - Application exception: No >>>> status file was found. Check the shared filesystem on UCANL >>>> sys:throw @ vdl-int.k, line: 96 >>>> sys:else @ vdl-int.k, line: 94 >>>> sys:if @ vdl-int.k, line: 82 >>>> sys:try @ vdl-int.k, line: 70 >>>> vdl:checkjobstatus @ vdl-int.k, line: 379 >>>> sys:sequential @ vdl-int.k, line: 355 >>>> sys:try @ vdl-int.k, line: 354 >>>> task:allocatehost @ vdl-int.k, line: 336 >>>> vdl:execute2 @ execute-default.k, line: 23 >>>> sys:restartonerror @ execute-default.k, line: 21 >>>> sys:sequential @ execute-default.k, line: 19 >>>> sys:try @ execute-default.k, line: 18 >>>> sys:if @ execute-default.k, line: 17 >>>> sys:then @ execute-default.k, line: 16 >>>> sys:if @ execute-default.k, line: 15 >>>> vdl:execute @ first.kml, line: 16 >>>> greeting @ first.kml, line: 43 >>>> vdl:mainp @ first.kml, line: 42 >>>> mainp @ vdl.k, line: 148 >>>> vdl:mains @ first.kml, line: 41 >>>> vdl:mains @ first.kml, line: 41 >>>> rlog:restartlog @ first.kml, line: 39 >>>> kernel:project @ first.kml, line: 2 >>>> first-20071025-1044-zo4kzfjg >>>> >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting status to >>>> Submitted >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting status to >>>> Active >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080189) setting status to >>>> Failed Exception in getFile >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080192) setting status to >>>> Completed >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting status to >>>> Submitted >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting status to >>>> Active >>>> Task(type=FILE_TRANSFER, identity=urn:0-1193327080195) setting status to >>>> Failed Exception in getFile >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080198) setting status to >>>> Completed >>>> THREAD_ASSOCIATION jobid=echo-2gj1k5ji thread=0 host=UCANL >>>> START jobid=echo-2gj1k5ji host=UCANL - Initializing directory structure >>>> END jobid=echo-2gj1k5ji - Done initializing directory structure >>>> START jobid=echo-2gj1k5ji - Staging in files >>>> END jobid=echo-2gj1k5ji - Staging in finished >>>> JOB_START jobid=echo-2gj1k5ji tr=echo arguments=[Hello, world!] >>>> tmpdir=first-20071025-1044-zo4kzfjg/echo-2gj1k5ji host=UCANL >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting status to >>>> Submitted >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting status to >>>> Active >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1193327080202) setting status to >>>> Completed >>>> START jobid=echo-2gj1k5ji >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting status to >>>> Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1193327080204) setting status to >>>> Completed >>>> SUCCESS jobid=echo-2gj1k5ji - Success file found >>>> JOB_END jobid=echo-2gj1k5ji >>>> START jobid=echo-2gj1k5ji - Staging out files >>>> FILE_STAGE_OUT_START srcname=hello.txt srcdir=first-20071025-1044- >>>> zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost provider=file >>>> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting status >>>> to Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080206) setting status >>>> to Completed >>>> Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting status >>>> to Submitted >>>> Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting status >>>> to Active >>>> Task(type=FILE_TRANSFER, identity=urn:0-1-1193327080209) setting status >>>> to Completed >>>> FILE_STAGE_OUT_END srcname=hello.txt srcdir=first-20071025-1044- >>>> zo4kzfjg/shared/ srchost=UCANL destdir= desthost=localhost provider=file >>>> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting status >>>> to Active >>>> Task(type=FILE_OPERATION, identity=urn:0-1-1193327080213) setting status >>>> to Completed >>>> END jobid=echo-2gj1k5ji - Staging out finished >>>> echo completed >>>> END_SUCCESS thread=0 tr=echo >>>> START cleanups=[[first-20071025-1044-zo4kzfjg, UCANL]] >>>> START dir=first-20071025-1044-zo4kzfjg host=UCANL >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting status >>>> to Submitted >>>> Task(type=JOB_SUBMISSION, identity=urn:0-1-1193327080216) setting status >>>> to Completed >>>> END dir=first-20071025-1044-zo4kzfjg host=UCANL >>>> Swift finished - workflow had no errors >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > -- > > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > Globus Alliance: www.globus.org. > > From benc at hawaga.org.uk Thu Oct 25 13:22:42 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 25 Oct 2007 18:22:42 +0000 (GMT) Subject: [Swift-devel] Strange Problem with TG-UCANL In-Reply-To: <4720DA54.6000408@mcs.anl.gov> References: <9302B48B-F4E3-4D45-A6C0-D762D7C7892B@mcs.anl.gov> <4720DA54.6000408@mcs.anl.gov> Message-ID: On Thu, 25 Oct 2007, Ian Foster wrote: > can we decide that we always use kickstart? we can decide whatever our group-mind decides. that doesn't make reality though. However, it was an explicit decision at least on my part to permit the use of swift without using kickstart because that is much easier to deploy. I can very easily modify the code to made swift require the use of kickstart and refuse to operate if it is not configured. -- From bugzilla-daemon at mcs.anl.gov Thu Oct 25 20:54:04 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 25 Oct 2007 20:54:04 -0500 (CDT) Subject: [Swift-devel] [Bug 109] New: Change default max heap size to 256M Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=109 Summary: Change default max heap size to 256M Product: Swift Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: hategan at mcs.anl.gov CC: swift-devel at ci.uchicago.edu The default (of 32 or 64M) is obviously too low for our typical applications. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From andrewj at uchicago.edu Fri Oct 26 13:59:53 2007 From: andrewj at uchicago.edu (Andrew Robert Jamieson) Date: Fri, 26 Oct 2007 13:59:53 -0500 (CDT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift Message-ID: Hello all, I am encountering the following problem on Teraport. I submit a clustered swift WF which should amount to something on the order of 850x3 individual jobs total. I have clustered the jobs because they are very fast (somewhere around 20 sec to 1 min long). When I submit the WF on TP things start out fantastic, I get 10s of output files in a matter of seconds and nodes would start and finish clustered batches in a matter of minutes or less. However, after waiting about 3-5 mins, when clustered jobs are begin to line up in the queue and more start running at the same time, things start to slow down to a trickle in terms of output. One thing I noticed is when I try a simply ls on TP in the swift temp running directory where the temp job dirs are created and destroyed, it take a very long time. And when it is done only five or so things are in the dir. (this is the dir with "info kickstart shared status wrapper.log" in it). What I think is happening is that TP's filesystem cant handle this extremely rapid creation/destruction of directories in that shared location. From what I have been told these temp dirs come and go as long as the job runs successfully. What I am wondering is if there is anyway to move that dir to the local node tmp diretory not the shared file system, while it is running and if something fails then have it sent to the appropriate place. Or, if another layer of temp dir wrapping could be applied with labeld perhaps with respect to the clustered job grouping and not simply the individual jobs (since there are thousands being computed at once). That these things would only be generated/deleted every 5 mins or 10 mins (if clustered properly on my part) instead of one event every milli second or what have you. I don't know which solution is feasible or if any are at all, but this seems to be a major problem for my WFs. In general it is never good to have a million things coming and going on a shared file system in one place, from my experience at least. Thanks, Andrew From andrewj at uchicago.edu Fri Oct 26 14:58:46 2007 From: andrewj at uchicago.edu (Andrew Robert Jamieson) Date: Fri, 26 Oct 2007 14:58:46 -0500 (CDT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: Message-ID: I am kind of at a stand still for getting anything done on TP right now with this problem. Are there any suggestions to overcome this for the time being? On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > Hello all, > > I am encountering the following problem on Teraport. I submit a clustered > swift WF which should amount to something on the order of 850x3 individual > jobs total. I have clustered the jobs because they are very fast (somewhere > around 20 sec to 1 min long). When I submit the WF on TP things start out > fantastic, I get 10s of output files in a matter of seconds and nodes would > start and finish clustered batches in a matter of minutes or less. However, > after waiting about 3-5 mins, when clustered jobs are begin to line up in the > queue and more start running at the same time, things start to slow down to a > trickle in terms of output. > > One thing I noticed is when I try a simply ls on TP in the swift temp running > directory where the temp job dirs are created and destroyed, it take a very > long time. And when it is done only five or so things are in the dir. (this > is the dir with "info kickstart shared status wrapper.log" in it). What I > think is happening is that TP's filesystem cant handle this extremely rapid > creation/destruction of directories in that shared location. From what I have > been told these temp dirs come and go as long as the job runs successfully. > > What I am wondering is if there is anyway to move that dir to the local node > tmp diretory not the shared file system, while it is running and if something > fails then have it sent to the appropriate place. > > Or, if another layer of temp dir wrapping could be applied with labeld > perhaps with respect to the clustered job grouping and not simply the > individual jobs (since there are thousands being computed at once). > That these things would only be generated/deleted every 5 mins or 10 mins (if > clustered properly on my part) instead of one event every milli second or > what have you. > > I don't know which solution is feasible or if any are at all, but this seems > to be a major problem for my WFs. In general it is never good to have a > million things coming and going on a shared file system in one place, from my > experience at least. > > > Thanks, > Andrew > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri Oct 26 15:04:34 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 26 Oct 2007 15:04:34 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: Message-ID: <1193429074.32607.2.camel@blabla.mcs.anl.gov> >From my live discussion with Andrew, I think we concluded that the reasonable way of proceeding is to reduce things happening on the shared filesystem. That may mean: - making sure the temporary job directory is created on a local filesystem - making seq.sh log to individual files (perhaps in info), like the wrapper. This may reduce contention. Mihael On Fri, 2007-10-26 at 14:58 -0500, Andrew Robert Jamieson wrote: > I am kind of at a stand still for getting anything done on TP right now > with this problem. Are there any suggestions to overcome this for the time > being? > > On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > > > Hello all, > > > > I am encountering the following problem on Teraport. I submit a clustered > > swift WF which should amount to something on the order of 850x3 individual > > jobs total. I have clustered the jobs because they are very fast (somewhere > > around 20 sec to 1 min long). When I submit the WF on TP things start out > > fantastic, I get 10s of output files in a matter of seconds and nodes would > > start and finish clustered batches in a matter of minutes or less. However, > > after waiting about 3-5 mins, when clustered jobs are begin to line up in the > > queue and more start running at the same time, things start to slow down to a > > trickle in terms of output. > > > > One thing I noticed is when I try a simply ls on TP in the swift temp running > > directory where the temp job dirs are created and destroyed, it take a very > > long time. And when it is done only five or so things are in the dir. (this > > is the dir with "info kickstart shared status wrapper.log" in it). What I > > think is happening is that TP's filesystem cant handle this extremely rapid > > creation/destruction of directories in that shared location. From what I have > > been told these temp dirs come and go as long as the job runs successfully. > > > > What I am wondering is if there is anyway to move that dir to the local node > > tmp diretory not the shared file system, while it is running and if something > > fails then have it sent to the appropriate place. > > > > Or, if another layer of temp dir wrapping could be applied with labeld > > perhaps with respect to the clustered job grouping and not simply the > > individual jobs (since there are thousands being computed at once). > > That these things would only be generated/deleted every 5 mins or 10 mins (if > > clustered properly on my part) instead of one event every milli second or > > what have you. > > > > I don't know which solution is feasible or if any are at all, but this seems > > to be a major problem for my WFs. In general it is never good to have a > > million things coming and going on a shared file system in one place, from my > > experience at least. > > > > > > Thanks, > > Andrew > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From iraicu at cs.uchicago.edu Fri Oct 26 15:11:35 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 26 Oct 2007 15:11:35 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: Message-ID: <472249F7.1010002@cs.uchicago.edu> I am not sure what configuration exists on TP, but on the TeraGrid ANL/UC cluster, with 8 servers behind GPFS, the wrapper script performance (create dir, create symbolic links, remove directory... all on GPFS) is anywhere between 20~40 / sec, depending on how many nodes you have doing this concurrently. The throughput increases first as you add nodes, but then decreases down to about 20/sec with 20~30+ nodes. What this means is that even if you bundle jobs up, you will not get anything better than this, throughput wise, regardless of how short the jobs are. Now, if TP has less than 8 servers, its likely that the throughput it can sustain is even lower, and if you push it over the edge, even to the point of thrashing where the throughput can be extremely small. I don't have any suggestions of how you can get around this, with the exception of making your job sizes larger on average, and hence have fewer jobs over the same period of time. Ioan Andrew Robert Jamieson wrote: > I am kind of at a stand still for getting anything done on TP right > now with this problem. Are there any suggestions to overcome this for > the time being? > > On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > >> Hello all, >> >> I am encountering the following problem on Teraport. I submit a >> clustered swift WF which should amount to something on the order of >> 850x3 individual jobs total. I have clustered the jobs because they >> are very fast (somewhere around 20 sec to 1 min long). When I submit >> the WF on TP things start out fantastic, I get 10s of output files in >> a matter of seconds and nodes would start and finish clustered >> batches in a matter of minutes or less. However, after waiting about >> 3-5 mins, when clustered jobs are begin to line up in the queue and >> more start running at the same time, things start to slow down to a >> trickle in terms of output. >> >> One thing I noticed is when I try a simply ls on TP in the swift temp >> running directory where the temp job dirs are created and destroyed, >> it take a very long time. And when it is done only five or so things >> are in the dir. (this is the dir with "info kickstart shared >> status wrapper.log" in it). What I think is happening is that TP's >> filesystem cant handle this extremely rapid creation/destruction of >> directories in that shared location. From what I have been told these >> temp dirs come and go as long as the job runs successfully. >> >> What I am wondering is if there is anyway to move that dir to the >> local node tmp diretory not the shared file system, while it is >> running and if something fails then have it sent to the appropriate >> place. >> >> Or, if another layer of temp dir wrapping could be applied with >> labeld perhaps with respect to the clustered job grouping and not >> simply the individual jobs (since there are thousands being computed >> at once). >> That these things would only be generated/deleted every 5 mins or 10 >> mins (if clustered properly on my part) instead of one event every >> milli second or what have you. >> >> I don't know which solution is feasible or if any are at all, but >> this seems to be a major problem for my WFs. In general it is never >> good to have a million things coming and going on a shared file >> system in one place, from my experience at least. >> >> >> Thanks, >> Andrew >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From andrewj at uchicago.edu Fri Oct 26 16:05:32 2007 From: andrewj at uchicago.edu (Andrew Robert Jamieson) Date: Fri, 26 Oct 2007 16:05:32 -0500 (CDT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <472249F7.1010002@cs.uchicago.edu> References: <472249F7.1010002@cs.uchicago.edu> Message-ID: Ioan, Thanks for the explaination. It seems like you characterized what is going on pretty well. One question I have is, does this case occur only for situations in which it is in the same directory or is it anywhere at any given time in the shared GPFS? Furthermore, why can't the short lived directory live on the local node's /tmp/* somewhere? I have wrapped all my programs to ensure that things are ONLY executed on the local node directories to specifically aviod this type of problem. Now Swift is making that effort irrelevant it seems. Does this seem reasonable? Thanks, Andrew On Fri, 26 Oct 2007, Ioan Raicu wrote: > I am not sure what configuration exists on TP, but on the TeraGrid ANL/UC > cluster, with 8 servers behind GPFS, the wrapper script performance (create > dir, create symbolic links, remove directory... all on GPFS) is anywhere > between 20~40 / sec, depending on how many nodes you have doing this > concurrently. The throughput increases first as you add nodes, but then > decreases down to about 20/sec with 20~30+ nodes. What this means is that > even if you bundle jobs up, you will not get anything better than this, > throughput wise, regardless of how short the jobs are. Now, if TP has less > than 8 servers, its likely that the throughput it can sustain is even lower, > and if you push it over the edge, even to the point of thrashing where the > throughput can be extremely small. I don't have any suggestions of how you > can get around this, with the exception of making your job sizes larger on > average, and hence have fewer jobs over the same period of time. > > Ioan > > Andrew Robert Jamieson wrote: >> I am kind of at a stand still for getting anything done on TP right now >> with this problem. Are there any suggestions to overcome this for the time >> being? >> >> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: >> >>> Hello all, >>> >>> I am encountering the following problem on Teraport. I submit a >>> clustered swift WF which should amount to something on the order of 850x3 >>> individual jobs total. I have clustered the jobs because they are very >>> fast (somewhere around 20 sec to 1 min long). When I submit the WF on TP >>> things start out fantastic, I get 10s of output files in a matter of >>> seconds and nodes would start and finish clustered batches in a matter of >>> minutes or less. However, after waiting about 3-5 mins, when clustered >>> jobs are begin to line up in the queue and more start running at the same >>> time, things start to slow down to a trickle in terms of output. >>> >>> One thing I noticed is when I try a simply ls on TP in the swift temp >>> running directory where the temp job dirs are created and destroyed, it >>> take a very long time. And when it is done only five or so things are in >>> the dir. (this is the dir with "info kickstart shared status >>> wrapper.log" in it). What I think is happening is that TP's filesystem >>> cant handle this extremely rapid creation/destruction of directories in >>> that shared location. From what I have been told these temp dirs come and >>> go as long as the job runs successfully. >>> >>> What I am wondering is if there is anyway to move that dir to the local >>> node tmp diretory not the shared file system, while it is running and if >>> something fails then have it sent to the appropriate place. >>> >>> Or, if another layer of temp dir wrapping could be applied with labeld >>> perhaps with respect to the clustered job grouping and not simply the >>> individual jobs (since there are thousands being computed at once). >>> That these things would only be generated/deleted every 5 mins or 10 mins >>> (if clustered properly on my part) instead of one event every milli second >>> or what have you. >>> >>> I don't know which solution is feasible or if any are at all, but this >>> seems to be a major problem for my WFs. In general it is never good to >>> have a million things coming and going on a shared file system in one >>> place, from my experience at least. >>> >>> >>> Thanks, >>> Andrew >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ > > From nefedova at mcs.anl.gov Fri Oct 26 16:17:01 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 26 Oct 2007 16:17:01 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472249F7.1010002@cs.uchicago.edu> Message-ID: <5ED73E03-28EF-4F14-8376-5EE8AC29AAA3@mcs.anl.gov> Andrew, I am not sure if I understand you correctly. If you want to have all your working directories to be on a local disk, why don't you specify that local directory in you sites.xml file as a 'workddirectory'? All temp dirs will be relative to that workdirectory from the sites.xml file. Nika On Oct 26, 2007, at 4:05 PM, Andrew Robert Jamieson wrote: > Ioan, > > Thanks for the explaination. It seems like you characterized > what is going on pretty well. > > One question I have is, does this case occur only for situations in > which it is in the same directory or is it anywhere at any given > time in the shared GPFS? > > Furthermore, why can't the short lived directory live on the local > node's /tmp/* somewhere? I have wrapped all my programs to ensure > that things are ONLY executed on the local node directories to > specifically aviod this type of problem. Now Swift is making that > effort irrelevant it seems. > > Does this seem reasonable? > > Thanks, > Andrew > > On Fri, 26 Oct 2007, Ioan Raicu wrote: > >> I am not sure what configuration exists on TP, but on the TeraGrid >> ANL/UC cluster, with 8 servers behind GPFS, the wrapper script >> performance (create dir, create symbolic links, remove >> directory... all on GPFS) is anywhere between 20~40 / sec, >> depending on how many nodes you have doing this concurrently. The >> throughput increases first as you add nodes, but then decreases >> down to about 20/sec with 20~30+ nodes. What this means is that >> even if you bundle jobs up, you will not get anything better than >> this, throughput wise, regardless of how short the jobs are. Now, >> if TP has less than 8 servers, its likely that the throughput it >> can sustain is even lower, and if you push it over the edge, even >> to the point of thrashing where the throughput can be extremely >> small. I don't have any suggestions of how you can get around >> this, with the exception of making your job sizes larger on >> average, and hence have fewer jobs over the same period of time. >> >> Ioan >> >> Andrew Robert Jamieson wrote: >>> I am kind of at a stand still for getting anything done on TP >>> right now with this problem. Are there any suggestions to >>> overcome this for the time being? >>> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: >>>> Hello all, >>>> I am encountering the following problem on Teraport. I submit >>>> a clustered swift WF which should amount to something on the >>>> order of 850x3 individual jobs total. I have clustered the jobs >>>> because they are very fast (somewhere around 20 sec to 1 min >>>> long). When I submit the WF on TP things start out fantastic, I >>>> get 10s of output files in a matter of seconds and nodes would >>>> start and finish clustered batches in a matter of minutes or >>>> less. However, after waiting about 3-5 mins, when clustered jobs >>>> are begin to line up in the queue and more start running at the >>>> same time, things start to slow down to a trickle in terms of >>>> output. >>>> One thing I noticed is when I try a simply ls on TP in the swift >>>> temp running directory where the temp job dirs are created and >>>> destroyed, it take a very long time. And when it is done only >>>> five or so things are in the dir. (this is the dir with "info >>>> kickstart shared status wrapper.log" in it). What I think is >>>> happening is that TP's filesystem cant handle this extremely >>>> rapid creation/destruction of directories in that shared >>>> location. From what I have been told these temp dirs come and go >>>> as long as the job runs successfully. >>>> What I am wondering is if there is anyway to move that dir to >>>> the local node tmp diretory not the shared file system, while it >>>> is running and if something fails then have it sent to the >>>> appropriate place. >>>> Or, if another layer of temp dir wrapping could be applied with >>>> labeld perhaps with respect to the clustered job grouping and >>>> not simply the individual jobs (since there are thousands being >>>> computed at once). >>>> That these things would only be generated/deleted every 5 mins >>>> or 10 mins (if clustered properly on my part) instead of one >>>> event every milli second or what have you. >>>> I don't know which solution is feasible or if any are at all, >>>> but this seems to be a major problem for my WFs. In general it >>>> is never good to have a million things coming and going on a >>>> shared file system in one place, from my experience at least. >>>> Thanks, >>>> Andrew >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From iraicu at cs.uchicago.edu Fri Oct 26 16:20:29 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 26 Oct 2007 16:20:29 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <5ED73E03-28EF-4F14-8376-5EE8AC29AAA3@mcs.anl.gov> References: <472249F7.1010002@cs.uchicago.edu> <5ED73E03-28EF-4F14-8376-5EE8AC29AAA3@mcs.anl.gov> Message-ID: <47225A1D.6080808@cs.uchicago.edu> Nika, Can it really be that simple? How does the data then move from the local disk scratch directory to the shared directory on GPFS? At the very least, you'd have to modify the wrapper script to not do symbolic linking, but actually to copy the input data to the local disk temporary scratch directory. Ioan Veronika Nefedova wrote: > Andrew, > > I am not sure if I understand you correctly. If you want to have all > your working directories to be on a local disk, why don't you specify > that local directory in you sites.xml file as a 'workddirectory'? All > temp dirs will be relative to that workdirectory from the sites.xml file. > > Nika > > On Oct 26, 2007, at 4:05 PM, Andrew Robert Jamieson wrote: > >> Ioan, >> >> Thanks for the explaination. It seems like you characterized what >> is going on pretty well. >> >> One question I have is, does this case occur only for situations in >> which it is in the same directory or is it anywhere at any given time >> in the shared GPFS? >> >> Furthermore, why can't the short lived directory live on the local >> node's /tmp/* somewhere? I have wrapped all my programs to ensure >> that things are ONLY executed on the local node directories to >> specifically aviod this type of problem. Now Swift is making that >> effort irrelevant it seems. >> >> Does this seem reasonable? >> >> Thanks, >> Andrew >> >> On Fri, 26 Oct 2007, Ioan Raicu wrote: >> >>> I am not sure what configuration exists on TP, but on the TeraGrid >>> ANL/UC cluster, with 8 servers behind GPFS, the wrapper script >>> performance (create dir, create symbolic links, remove directory... >>> all on GPFS) is anywhere between 20~40 / sec, depending on how many >>> nodes you have doing this concurrently. The throughput increases >>> first as you add nodes, but then decreases down to about 20/sec with >>> 20~30+ nodes. What this means is that even if you bundle jobs up, >>> you will not get anything better than this, throughput wise, >>> regardless of how short the jobs are. Now, if TP has less than 8 >>> servers, its likely that the throughput it can sustain is even >>> lower, and if you push it over the edge, even to the point of >>> thrashing where the throughput can be extremely small. I don't >>> have any suggestions of how you can get around this, with the >>> exception of making your job sizes larger on average, and hence have >>> fewer jobs over the same period of time. >>> >>> Ioan >>> >>> Andrew Robert Jamieson wrote: >>>> I am kind of at a stand still for getting anything done on TP right >>>> now with this problem. Are there any suggestions to overcome this >>>> for the time being? >>>> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: >>>>> Hello all, >>>>> I am encountering the following problem on Teraport. I submit a >>>>> clustered swift WF which should amount to something on the order >>>>> of 850x3 individual jobs total. I have clustered the jobs because >>>>> they are very fast (somewhere around 20 sec to 1 min long). When >>>>> I submit the WF on TP things start out fantastic, I get 10s of >>>>> output files in a matter of seconds and nodes would start and >>>>> finish clustered batches in a matter of minutes or less. However, >>>>> after waiting about 3-5 mins, when clustered jobs are begin to >>>>> line up in the queue and more start running at the same time, >>>>> things start to slow down to a trickle in terms of output. >>>>> One thing I noticed is when I try a simply ls on TP in the swift >>>>> temp running directory where the temp job dirs are created and >>>>> destroyed, it take a very long time. And when it is done only >>>>> five or so things are in the dir. (this is the dir with "info >>>>> kickstart shared status wrapper.log" in it). What I think is >>>>> happening is that TP's filesystem cant handle this extremely rapid >>>>> creation/destruction of directories in that shared location. From >>>>> what I have been told these temp dirs come and go as long as the >>>>> job runs successfully. >>>>> What I am wondering is if there is anyway to move that dir to the >>>>> local node tmp diretory not the shared file system, while it is >>>>> running and if something fails then have it sent to the >>>>> appropriate place. >>>>> Or, if another layer of temp dir wrapping could be applied with >>>>> labeld perhaps with respect to the clustered job grouping and not >>>>> simply the individual jobs (since there are thousands being >>>>> computed at once). >>>>> That these things would only be generated/deleted every 5 mins or >>>>> 10 mins (if clustered properly on my part) instead of one event >>>>> every milli second or what have you. >>>>> I don't know which solution is feasible or if any are at all, but >>>>> this seems to be a major problem for my WFs. In general it is >>>>> never good to have a million things coming and going on a shared >>>>> file system in one place, from my experience at least. >>>>> Thanks, >>>>> Andrew >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> -- >>> ============================================ >>> Ioan Raicu >>> Ph.D. Student >>> ============================================ >>> Distributed Systems Laboratory >>> Computer Science Department >>> University of Chicago >>> 1100 E. 58th Street, Ryerson Hall >>> Chicago, IL 60637 >>> ============================================ >>> Email: iraicu at cs.uchicago.edu >>> Web: http://www.cs.uchicago.edu/~iraicu >>> http://dsl.cs.uchicago.edu/ >>> ============================================ >>> ============================================ >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From iraicu at cs.uchicago.edu Fri Oct 26 16:23:12 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 26 Oct 2007 16:23:12 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472249F7.1010002@cs.uchicago.edu> Message-ID: <47225AC0.2000304@cs.uchicago.edu> Hi, Andrew Robert Jamieson wrote: > Ioan, > > Thanks for the explaination. It seems like you characterized what > is going on pretty well. > > One question I have is, does this case occur only for situations in > which it is in the same directory or is it anywhere at any given time > in the shared GPFS? > I don't know, but as far as I can tell, Swift will create these temp scratch directories per job in the same subdirectory (Mihael or Ben, please correct me if I am wrong on this). I have seen this behavior for certain in this case, but am not sure if things get better if you were to work in completely separate parts of the filesystem. > Furthermore, why can't the short lived directory live on the local > node's /tmp/* somewhere? I have wrapped all my programs to ensure > that things are ONLY executed on the local node directories to > specifically aviod this type of problem. Now Swift is making that > effort irrelevant it seems. They could, with some modifications to the wrapper script. Or with some higher level logic that manages the data on the local disk and moves it in and out from and to the shared file system. Your short term solution would probably be the first option, changing the wrapper script to support local disk usage. Maybe there are other solutions as well. Ioan > > Does this seem reasonable? > > Thanks, > Andrew > > On Fri, 26 Oct 2007, Ioan Raicu wrote: > >> I am not sure what configuration exists on TP, but on the TeraGrid >> ANL/UC cluster, with 8 servers behind GPFS, the wrapper script >> performance (create dir, create symbolic links, remove directory... >> all on GPFS) is anywhere between 20~40 / sec, depending on how many >> nodes you have doing this concurrently. The throughput increases >> first as you add nodes, but then decreases down to about 20/sec with >> 20~30+ nodes. What this means is that even if you bundle jobs up, >> you will not get anything better than this, throughput wise, >> regardless of how short the jobs are. Now, if TP has less than 8 >> servers, its likely that the throughput it can sustain is even lower, >> and if you push it over the edge, even to the point of thrashing >> where the throughput can be extremely small. I don't have any >> suggestions of how you can get around this, with the exception of >> making your job sizes larger on average, and hence have fewer jobs >> over the same period of time. >> >> Ioan >> >> Andrew Robert Jamieson wrote: >>> I am kind of at a stand still for getting anything done on TP right >>> now with this problem. Are there any suggestions to overcome this >>> for the time being? >>> >>> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: >>> >>>> Hello all, >>>> >>>> I am encountering the following problem on Teraport. I submit a >>>> clustered swift WF which should amount to something on the order of >>>> 850x3 individual jobs total. I have clustered the jobs because they >>>> are very fast (somewhere around 20 sec to 1 min long). When I >>>> submit the WF on TP things start out fantastic, I get 10s of output >>>> files in a matter of seconds and nodes would start and finish >>>> clustered batches in a matter of minutes or less. However, after >>>> waiting about 3-5 mins, when clustered jobs are begin to line up in >>>> the queue and more start running at the same time, things start to >>>> slow down to a trickle in terms of output. >>>> >>>> One thing I noticed is when I try a simply ls on TP in the swift >>>> temp running directory where the temp job dirs are created and >>>> destroyed, it take a very long time. And when it is done only five >>>> or so things are in the dir. (this is the dir with "info >>>> kickstart shared status wrapper.log" in it). What I think is >>>> happening is that TP's filesystem cant handle this extremely rapid >>>> creation/destruction of directories in that shared location. From >>>> what I have been told these temp dirs come and go as long as the >>>> job runs successfully. >>>> >>>> What I am wondering is if there is anyway to move that dir to the >>>> local node tmp diretory not the shared file system, while it is >>>> running and if something fails then have it sent to the appropriate >>>> place. >>>> >>>> Or, if another layer of temp dir wrapping could be applied with >>>> labeld perhaps with respect to the clustered job grouping and not >>>> simply the individual jobs (since there are thousands being >>>> computed at once). >>>> That these things would only be generated/deleted every 5 mins or >>>> 10 mins (if clustered properly on my part) instead of one event >>>> every milli second or what have you. >>>> >>>> I don't know which solution is feasible or if any are at all, but >>>> this seems to be a major problem for my WFs. In general it is >>>> never good to have a million things coming and going on a shared >>>> file system in one place, from my experience at least. >>>> >>>> >>>> Thanks, >>>> Andrew >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> >> > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From hategan at mcs.anl.gov Fri Oct 26 16:29:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 26 Oct 2007 16:29:45 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <47225A1D.6080808@cs.uchicago.edu> References: <472249F7.1010002@cs.uchicago.edu> <5ED73E03-28EF-4F14-8376-5EE8AC29AAA3@mcs.anl.gov> <47225A1D.6080808@cs.uchicago.edu> Message-ID: <1193434185.32607.17.camel@blabla.mcs.anl.gov> On Fri, 2007-10-26 at 16:20 -0500, Ioan Raicu wrote: > Nika, > Can it really be that simple? How does the data then move from the > local disk scratch directory to the shared directory on GPFS? At the > very least, you'd have to modify the wrapper script to not do symbolic > linking, but actually to copy the input data to the local disk temporary > scratch directory. Yes, you would. > > Ioan > > Veronika Nefedova wrote: > > Andrew, > > > > I am not sure if I understand you correctly. If you want to have all > > your working directories to be on a local disk, why don't you specify > > that local directory in you sites.xml file as a 'workddirectory'? All > > temp dirs will be relative to that workdirectory from the sites.xml file. > > > > Nika > > > > On Oct 26, 2007, at 4:05 PM, Andrew Robert Jamieson wrote: > > > >> Ioan, > >> > >> Thanks for the explaination. It seems like you characterized what > >> is going on pretty well. > >> > >> One question I have is, does this case occur only for situations in > >> which it is in the same directory or is it anywhere at any given time > >> in the shared GPFS? > >> > >> Furthermore, why can't the short lived directory live on the local > >> node's /tmp/* somewhere? I have wrapped all my programs to ensure > >> that things are ONLY executed on the local node directories to > >> specifically aviod this type of problem. Now Swift is making that > >> effort irrelevant it seems. > >> > >> Does this seem reasonable? > >> > >> Thanks, > >> Andrew > >> > >> On Fri, 26 Oct 2007, Ioan Raicu wrote: > >> > >>> I am not sure what configuration exists on TP, but on the TeraGrid > >>> ANL/UC cluster, with 8 servers behind GPFS, the wrapper script > >>> performance (create dir, create symbolic links, remove directory... > >>> all on GPFS) is anywhere between 20~40 / sec, depending on how many > >>> nodes you have doing this concurrently. The throughput increases > >>> first as you add nodes, but then decreases down to about 20/sec with > >>> 20~30+ nodes. What this means is that even if you bundle jobs up, > >>> you will not get anything better than this, throughput wise, > >>> regardless of how short the jobs are. Now, if TP has less than 8 > >>> servers, its likely that the throughput it can sustain is even > >>> lower, and if you push it over the edge, even to the point of > >>> thrashing where the throughput can be extremely small. I don't > >>> have any suggestions of how you can get around this, with the > >>> exception of making your job sizes larger on average, and hence have > >>> fewer jobs over the same period of time. > >>> > >>> Ioan > >>> > >>> Andrew Robert Jamieson wrote: > >>>> I am kind of at a stand still for getting anything done on TP right > >>>> now with this problem. Are there any suggestions to overcome this > >>>> for the time being? > >>>> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > >>>>> Hello all, > >>>>> I am encountering the following problem on Teraport. I submit a > >>>>> clustered swift WF which should amount to something on the order > >>>>> of 850x3 individual jobs total. I have clustered the jobs because > >>>>> they are very fast (somewhere around 20 sec to 1 min long). When > >>>>> I submit the WF on TP things start out fantastic, I get 10s of > >>>>> output files in a matter of seconds and nodes would start and > >>>>> finish clustered batches in a matter of minutes or less. However, > >>>>> after waiting about 3-5 mins, when clustered jobs are begin to > >>>>> line up in the queue and more start running at the same time, > >>>>> things start to slow down to a trickle in terms of output. > >>>>> One thing I noticed is when I try a simply ls on TP in the swift > >>>>> temp running directory where the temp job dirs are created and > >>>>> destroyed, it take a very long time. And when it is done only > >>>>> five or so things are in the dir. (this is the dir with "info > >>>>> kickstart shared status wrapper.log" in it). What I think is > >>>>> happening is that TP's filesystem cant handle this extremely rapid > >>>>> creation/destruction of directories in that shared location. From > >>>>> what I have been told these temp dirs come and go as long as the > >>>>> job runs successfully. > >>>>> What I am wondering is if there is anyway to move that dir to the > >>>>> local node tmp diretory not the shared file system, while it is > >>>>> running and if something fails then have it sent to the > >>>>> appropriate place. > >>>>> Or, if another layer of temp dir wrapping could be applied with > >>>>> labeld perhaps with respect to the clustered job grouping and not > >>>>> simply the individual jobs (since there are thousands being > >>>>> computed at once). > >>>>> That these things would only be generated/deleted every 5 mins or > >>>>> 10 mins (if clustered properly on my part) instead of one event > >>>>> every milli second or what have you. > >>>>> I don't know which solution is feasible or if any are at all, but > >>>>> this seems to be a major problem for my WFs. In general it is > >>>>> never good to have a million things coming and going on a shared > >>>>> file system in one place, from my experience at least. > >>>>> Thanks, > >>>>> Andrew > >>>>> _______________________________________________ > >>>>> Swift-devel mailing list > >>>>> Swift-devel at ci.uchicago.edu > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>> _______________________________________________ > >>>> Swift-devel mailing list > >>>> Swift-devel at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >>> -- > >>> ============================================ > >>> Ioan Raicu > >>> Ph.D. Student > >>> ============================================ > >>> Distributed Systems Laboratory > >>> Computer Science Department > >>> University of Chicago > >>> 1100 E. 58th Street, Ryerson Hall > >>> Chicago, IL 60637 > >>> ============================================ > >>> Email: iraicu at cs.uchicago.edu > >>> Web: http://www.cs.uchicago.edu/~iraicu > >>> http://dsl.cs.uchicago.edu/ > >>> ============================================ > >>> ============================================ > >>> > >>> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > > > From hategan at mcs.anl.gov Fri Oct 26 16:31:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 26 Oct 2007 16:31:30 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <47225AC0.2000304@cs.uchicago.edu> References: <472249F7.1010002@cs.uchicago.edu> <47225AC0.2000304@cs.uchicago.edu> Message-ID: <1193434291.32607.20.camel@blabla.mcs.anl.gov> On Fri, 2007-10-26 at 16:23 -0500, Ioan Raicu wrote: > Hi, > > Andrew Robert Jamieson wrote: > > Ioan, > > > > Thanks for the explaination. It seems like you characterized what > > is going on pretty well. > > > > One question I have is, does this case occur only for situations in > > which it is in the same directory or is it anywhere at any given time > > in the shared GPFS? > > > I don't know, but as far as I can tell, Swift will create these temp > scratch directories per job in the same subdirectory (Mihael or Ben, > please correct me if I am wrong on this). I have seen this behavior for > certain in this case, but am not sure if things get better if you were > to work in completely separate parts of the filesystem. > > Furthermore, why can't the short lived directory live on the local > > node's /tmp/* somewhere? I have wrapped all my programs to ensure > > that things are ONLY executed on the local node directories to > > specifically aviod this type of problem. Now Swift is making that > > effort irrelevant it seems. Right. And Swift has an inefficient implementation there which needs to be fixed. > They could, with some modifications to the wrapper script. Or with some > higher level logic that manages the data on the local disk and moves it > in and out from and to the shared file system. Your short term > solution would probably be the first option, changing the wrapper script > to support local disk usage. Maybe there are other solutions as well. > > Ioan > > > > Does this seem reasonable? > > > > Thanks, > > Andrew > > > > On Fri, 26 Oct 2007, Ioan Raicu wrote: > > > >> I am not sure what configuration exists on TP, but on the TeraGrid > >> ANL/UC cluster, with 8 servers behind GPFS, the wrapper script > >> performance (create dir, create symbolic links, remove directory... > >> all on GPFS) is anywhere between 20~40 / sec, depending on how many > >> nodes you have doing this concurrently. The throughput increases > >> first as you add nodes, but then decreases down to about 20/sec with > >> 20~30+ nodes. What this means is that even if you bundle jobs up, > >> you will not get anything better than this, throughput wise, > >> regardless of how short the jobs are. Now, if TP has less than 8 > >> servers, its likely that the throughput it can sustain is even lower, > >> and if you push it over the edge, even to the point of thrashing > >> where the throughput can be extremely small. I don't have any > >> suggestions of how you can get around this, with the exception of > >> making your job sizes larger on average, and hence have fewer jobs > >> over the same period of time. > >> > >> Ioan > >> > >> Andrew Robert Jamieson wrote: > >>> I am kind of at a stand still for getting anything done on TP right > >>> now with this problem. Are there any suggestions to overcome this > >>> for the time being? > >>> > >>> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > >>> > >>>> Hello all, > >>>> > >>>> I am encountering the following problem on Teraport. I submit a > >>>> clustered swift WF which should amount to something on the order of > >>>> 850x3 individual jobs total. I have clustered the jobs because they > >>>> are very fast (somewhere around 20 sec to 1 min long). When I > >>>> submit the WF on TP things start out fantastic, I get 10s of output > >>>> files in a matter of seconds and nodes would start and finish > >>>> clustered batches in a matter of minutes or less. However, after > >>>> waiting about 3-5 mins, when clustered jobs are begin to line up in > >>>> the queue and more start running at the same time, things start to > >>>> slow down to a trickle in terms of output. > >>>> > >>>> One thing I noticed is when I try a simply ls on TP in the swift > >>>> temp running directory where the temp job dirs are created and > >>>> destroyed, it take a very long time. And when it is done only five > >>>> or so things are in the dir. (this is the dir with "info > >>>> kickstart shared status wrapper.log" in it). What I think is > >>>> happening is that TP's filesystem cant handle this extremely rapid > >>>> creation/destruction of directories in that shared location. From > >>>> what I have been told these temp dirs come and go as long as the > >>>> job runs successfully. > >>>> > >>>> What I am wondering is if there is anyway to move that dir to the > >>>> local node tmp diretory not the shared file system, while it is > >>>> running and if something fails then have it sent to the appropriate > >>>> place. > >>>> > >>>> Or, if another layer of temp dir wrapping could be applied with > >>>> labeld perhaps with respect to the clustered job grouping and not > >>>> simply the individual jobs (since there are thousands being > >>>> computed at once). > >>>> That these things would only be generated/deleted every 5 mins or > >>>> 10 mins (if clustered properly on my part) instead of one event > >>>> every milli second or what have you. > >>>> > >>>> I don't know which solution is feasible or if any are at all, but > >>>> this seems to be a major problem for my WFs. In general it is > >>>> never good to have a million things coming and going on a shared > >>>> file system in one place, from my experience at least. > >>>> > >>>> > >>>> Thanks, > >>>> Andrew > >>>> _______________________________________________ > >>>> Swift-devel mailing list > >>>> Swift-devel at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >> > >> -- > >> ============================================ > >> Ioan Raicu > >> Ph.D. Student > >> ============================================ > >> Distributed Systems Laboratory > >> Computer Science Department > >> University of Chicago > >> 1100 E. 58th Street, Ryerson Hall > >> Chicago, IL 60637 > >> ============================================ > >> Email: iraicu at cs.uchicago.edu > >> Web: http://www.cs.uchicago.edu/~iraicu > >> http://dsl.cs.uchicago.edu/ > >> ============================================ > >> ============================================ > >> > >> > > > From hategan at mcs.anl.gov Fri Oct 26 18:39:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 26 Oct 2007 18:39:45 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <472249F7.1010002@cs.uchicago.edu> References: <472249F7.1010002@cs.uchicago.edu> Message-ID: <1193441985.9302.1.camel@blabla.mcs.anl.gov> On Fri, 2007-10-26 at 15:11 -0500, Ioan Raicu wrote: > I am not sure what configuration exists on TP, but on the TeraGrid > ANL/UC cluster, with 8 servers behind GPFS, the wrapper script > performance (create dir, create symbolic links, remove directory... all > on GPFS) is anywhere between 20~40 / sec, depending on how many nodes > you have doing this concurrently. The throughput increases first as you > add nodes, but then decreases down to about 20/sec with 20~30+ nodes. > What this means is that even if you bundle jobs up, you will not get > anything better than this, throughput wise, regardless of how short the > jobs are. Now, if TP has less than 8 servers, its likely that the > throughput it can sustain is even lower, Perhaps in terms of bytes/s. But I wouldn't be so sure that this applies to other file stuff. > and if you push it over the > edge, even to the point of thrashing where the throughput can be > extremely small. I don't have any suggestions of how you can get > around this, with the exception of making your job sizes larger on > average, and hence have fewer jobs over the same period of time. > > Ioan > > Andrew Robert Jamieson wrote: > > I am kind of at a stand still for getting anything done on TP right > > now with this problem. Are there any suggestions to overcome this for > > the time being? > > > > On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > > > >> Hello all, > >> > >> I am encountering the following problem on Teraport. I submit a > >> clustered swift WF which should amount to something on the order of > >> 850x3 individual jobs total. I have clustered the jobs because they > >> are very fast (somewhere around 20 sec to 1 min long). When I submit > >> the WF on TP things start out fantastic, I get 10s of output files in > >> a matter of seconds and nodes would start and finish clustered > >> batches in a matter of minutes or less. However, after waiting about > >> 3-5 mins, when clustered jobs are begin to line up in the queue and > >> more start running at the same time, things start to slow down to a > >> trickle in terms of output. > >> > >> One thing I noticed is when I try a simply ls on TP in the swift temp > >> running directory where the temp job dirs are created and destroyed, > >> it take a very long time. And when it is done only five or so things > >> are in the dir. (this is the dir with "info kickstart shared > >> status wrapper.log" in it). What I think is happening is that TP's > >> filesystem cant handle this extremely rapid creation/destruction of > >> directories in that shared location. From what I have been told these > >> temp dirs come and go as long as the job runs successfully. > >> > >> What I am wondering is if there is anyway to move that dir to the > >> local node tmp diretory not the shared file system, while it is > >> running and if something fails then have it sent to the appropriate > >> place. > >> > >> Or, if another layer of temp dir wrapping could be applied with > >> labeld perhaps with respect to the clustered job grouping and not > >> simply the individual jobs (since there are thousands being computed > >> at once). > >> That these things would only be generated/deleted every 5 mins or 10 > >> mins (if clustered properly on my part) instead of one event every > >> milli second or what have you. > >> > >> I don't know which solution is feasible or if any are at all, but > >> this seems to be a major problem for my WFs. In general it is never > >> good to have a million things coming and going on a shared file > >> system in one place, from my experience at least. > >> > >> > >> Thanks, > >> Andrew > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Fri Oct 26 19:41:37 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Oct 2007 00:41:37 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: Message-ID: On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > I am kind of at a stand still for getting anything done on TP right now with > this problem. Are there any suggestions to overcome this for the time being? run with lazy.errors=true in swift.properties. come up with more compelling evidence (at least compelling to me) that there is actually a problem. -- From benc at hawaga.org.uk Fri Oct 26 19:37:51 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Oct 2007 00:37:51 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: Message-ID: the most recent run logs I've seen of this are that things were progressing with a small number of job failures, however, one job failed three times (as happens sometimes, perhaps indicative of a problem with that job, perhaps statistically/stochastically because you have a lot of jobs and the execute hosts arent' perfect) and because of that three times failure, the workflow was aborted. I discussed with you on IM the possibility of running with lazy.errors=true which will cause the workflow to run for longer in the case of such a problem. The output rate stuff is interesting. I'll try to get some better statistics on that. It is the case that jobs finishing don't immediately put their output in your run directory. This interacts with jobs that have not yet been run in a slightly surprising way. Hopefully I can graph this better soon. The charts at http://www.ci.uchicago.edu/~benc/report-Windowlicker-20071025-2116-ue28hhtc/ suggest that there are plenty of jobs finishing. Here are some questions (that I think can be answered by logs, but not with the graphs I have now): i) how fast are jobs finishing executing? ii) how fast are jobs *completely* finishing (which I think is what you are expecting) which includes staging out files from the compute site to the submit site? I'll have some more plots of this in 12h or so. On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > I am kind of at a stand still for getting anything done on TP right now with > this problem. Are there any suggestions to overcome this for the time being? > > On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > > > Hello all, > > > > I am encountering the following problem on Teraport. I submit a clustered > > swift WF which should amount to something on the order of 850x3 individual > > jobs total. I have clustered the jobs because they are very fast (somewhere > > around 20 sec to 1 min long). When I submit the WF on TP things start out > > fantastic, I get 10s of output files in a matter of seconds and nodes would > > start and finish clustered batches in a matter of minutes or less. However, > > after waiting about 3-5 mins, when clustered jobs are begin to line up in > > the queue and more start running at the same time, things start to slow down > > to a trickle in terms of output. > > > > One thing I noticed is when I try a simply ls on TP in the swift temp > > running directory where the temp job dirs are created and destroyed, it take > > a very long time. And when it is done only five or so things are in the > > dir. (this is the dir with "info kickstart shared status wrapper.log" in > > it). What I think is happening is that TP's filesystem cant handle this > > extremely rapid creation/destruction of directories in that shared location. > > From what I have been told these temp dirs come and go as long as the job > > runs successfully. > > > > What I am wondering is if there is anyway to move that dir to the local node > > tmp diretory not the shared file system, while it is running and if > > something fails then have it sent to the appropriate place. > > > > Or, if another layer of temp dir wrapping could be applied with labeld > > perhaps with respect to the clustered job grouping and not simply the > > individual jobs (since there are thousands being computed at once). > > That these things would only be generated/deleted every 5 mins or 10 mins > > (if clustered properly on my part) instead of one event every milli second > > or what have you. > > > > I don't know which solution is feasible or if any are at all, but this seems > > to be a major problem for my WFs. In general it is never good to have a > > million things coming and going on a shared file system in one place, from > > my experience at least. > > > > > > Thanks, > > Andrew > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From andrewj at uchicago.edu Fri Oct 26 21:19:03 2007 From: andrewj at uchicago.edu (andrewj at uchicago.edu) Date: Fri, 26 Oct 2007 21:19:03 -0500 (CDT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift Message-ID: <20071026211903.AVQ85022@m4500-00.uchicago.edu> The problem is when we get more than 5 nodes running the clustered jobs we go from 50 output files being spit out in a minute to about 1 a minute. If you look at your graphs you notice that the jobs become increasingly longer and longer as time progresses. In fact this entire workflow should have been able to be completed by a single node in something like 2 hours max. In my case I was running on around 20+ nodes and the WF was stretching to nearly 3 hours or something like that. That is when the error finally occurred, and by then it was largely irrelevant. This should not be happening. I think Ioan correctly described the problem. This seems like a well understood phenomenon for GPFS and regardless of what my WFs are doing. And the fact is I will absolutely have more that 20+ jobs/sec completing or whatever the limit is for my massive WFs. I will try with kickstart on and lazy errors on, but I think the same will happen. > >the most recent run logs I've seen of this are that things were >progressing with a small number of job failures, however, one job failed >three times (as happens sometimes, perhaps indicative of a problem with >that job, perhaps statistically/stochastically because you have a lot of >jobs and the execute hosts arent' perfect) and because of that three times >failure, the workflow was aborted. > >I discussed with you on IM the possibility of running with >lazy.errors=true which will cause the workflow to run for longer in the >case of such a problem. > >The output rate stuff is interesting. I'll try to get some better >statistics on that. It is the case that jobs finishing don't immediately >put their output in your run directory. This interacts with jobs that have >not yet been run in a slightly surprising way. Hopefully I can graph this >better soon. > >The charts at >http://www.ci.uchicago.edu/~benc/report-Windowlicker-20071025-2116-ue28hhtc/ >suggest that there are plenty of jobs finishing. > >Here are some questions (that I think can be answered by logs, but not >with the graphs I have now): > > i) how fast are jobs finishing executing? > > ii) how fast are jobs *completely* finishing (which I think is what you >are expecting) which includes staging out files from the compute site to >the submit site? > >I'll have some more plots of this in 12h or so. > >On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > >> I am kind of at a stand still for getting anything done on TP right now with >> this problem. Are there any suggestions to overcome this for the time being? >> >> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: >> >> > Hello all, >> > >> > I am encountering the following problem on Teraport. I submit a clustered >> > swift WF which should amount to something on the order of 850x3 individual >> > jobs total. I have clustered the jobs because they are very fast (somewhere >> > around 20 sec to 1 min long). When I submit the WF on TP things start out >> > fantastic, I get 10s of output files in a matter of seconds and nodes would >> > start and finish clustered batches in a matter of minutes or less. However, >> > after waiting about 3-5 mins, when clustered jobs are begin to line up in >> > the queue and more start running at the same time, things start to slow down >> > to a trickle in terms of output. >> > >> > One thing I noticed is when I try a simply ls on TP in the swift temp >> > running directory where the temp job dirs are created and destroyed, it take >> > a very long time. And when it is done only five or so things are in the >> > dir. (this is the dir with "info kickstart shared status wrapper.log" in >> > it). What I think is happening is that TP's filesystem cant handle this >> > extremely rapid creation/destruction of directories in that shared location. >> > From what I have been told these temp dirs come and go as long as the job >> > runs successfully. >> > >> > What I am wondering is if there is anyway to move that dir to the local node >> > tmp diretory not the shared file system, while it is running and if >> > something fails then have it sent to the appropriate place. >> > >> > Or, if another layer of temp dir wrapping could be applied with labeld >> > perhaps with respect to the clustered job grouping and not simply the >> > individual jobs (since there are thousands being computed at once). >> > That these things would only be generated/deleted every 5 mins or 10 mins >> > (if clustered properly on my part) instead of one event every milli second >> > or what have you. >> > >> > I don't know which solution is feasible or if any are at all, but this seems >> > to be a major problem for my WFs. In general it is never good to have a >> > million things coming and going on a shared file system in one place, from >> > my experience at least. >> > >> > >> > Thanks, >> > Andrew >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > >> >> From iraicu at cs.uchicago.edu Fri Oct 26 23:02:57 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 26 Oct 2007 23:02:57 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193441985.9302.1.camel@blabla.mcs.anl.gov> References: <472249F7.1010002@cs.uchicago.edu> <1193441985.9302.1.camel@blabla.mcs.anl.gov> Message-ID: <4722B871.1000503@cs.uchicago.edu> If it doesn't apply to meta-data operations, such as directories, then it means that meta-data changes in the file system is rather centralized (maybe this explains the relatively poor performance for creating and removing directories). I would be curious to see how well the solution works to move data to the local disk first prior to processing, to avoid working from the shared file system (including the creation and removal of the scratch temp directory on GPFS). Ioan Mihael Hategan wrote: > On Fri, 2007-10-26 at 15:11 -0500, Ioan Raicu wrote: > >> I am not sure what configuration exists on TP, but on the TeraGrid >> ANL/UC cluster, with 8 servers behind GPFS, the wrapper script >> performance (create dir, create symbolic links, remove directory... all >> on GPFS) is anywhere between 20~40 / sec, depending on how many nodes >> you have doing this concurrently. The throughput increases first as you >> add nodes, but then decreases down to about 20/sec with 20~30+ nodes. >> What this means is that even if you bundle jobs up, you will not get >> anything better than this, throughput wise, regardless of how short the >> jobs are. Now, if TP has less than 8 servers, its likely that the >> throughput it can sustain is even lower, >> > > Perhaps in terms of bytes/s. But I wouldn't be so sure that this applies > to other file stuff. > > >> and if you push it over the >> edge, even to the point of thrashing where the throughput can be >> extremely small. I don't have any suggestions of how you can get >> around this, with the exception of making your job sizes larger on >> average, and hence have fewer jobs over the same period of time. >> >> Ioan >> >> Andrew Robert Jamieson wrote: >> >>> I am kind of at a stand still for getting anything done on TP right >>> now with this problem. Are there any suggestions to overcome this for >>> the time being? >>> >>> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: >>> >>> >>>> Hello all, >>>> >>>> I am encountering the following problem on Teraport. I submit a >>>> clustered swift WF which should amount to something on the order of >>>> 850x3 individual jobs total. I have clustered the jobs because they >>>> are very fast (somewhere around 20 sec to 1 min long). When I submit >>>> the WF on TP things start out fantastic, I get 10s of output files in >>>> a matter of seconds and nodes would start and finish clustered >>>> batches in a matter of minutes or less. However, after waiting about >>>> 3-5 mins, when clustered jobs are begin to line up in the queue and >>>> more start running at the same time, things start to slow down to a >>>> trickle in terms of output. >>>> >>>> One thing I noticed is when I try a simply ls on TP in the swift temp >>>> running directory where the temp job dirs are created and destroyed, >>>> it take a very long time. And when it is done only five or so things >>>> are in the dir. (this is the dir with "info kickstart shared >>>> status wrapper.log" in it). What I think is happening is that TP's >>>> filesystem cant handle this extremely rapid creation/destruction of >>>> directories in that shared location. From what I have been told these >>>> temp dirs come and go as long as the job runs successfully. >>>> >>>> What I am wondering is if there is anyway to move that dir to the >>>> local node tmp diretory not the shared file system, while it is >>>> running and if something fails then have it sent to the appropriate >>>> place. >>>> >>>> Or, if another layer of temp dir wrapping could be applied with >>>> labeld perhaps with respect to the clustered job grouping and not >>>> simply the individual jobs (since there are thousands being computed >>>> at once). >>>> That these things would only be generated/deleted every 5 mins or 10 >>>> mins (if clustered properly on my part) instead of one event every >>>> milli second or what have you. >>>> >>>> I don't know which solution is feasible or if any are at all, but >>>> this seems to be a major problem for my WFs. In general it is never >>>> good to have a million things coming and going on a shared file >>>> system in one place, from my experience at least. >>>> >>>> >>>> Thanks, >>>> Andrew >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Oct 26 23:27:26 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 26 Oct 2007 23:27:26 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <4722B871.1000503@cs.uchicago.edu> References: <472249F7.1010002@cs.uchicago.edu> <1193441985.9302.1.camel@blabla.mcs.anl.gov> <4722B871.1000503@cs.uchicago.edu> Message-ID: <1193459247.21362.6.camel@blabla.mcs.anl.gov> On Fri, 2007-10-26 at 23:02 -0500, Ioan Raicu wrote: > If it doesn't apply to meta-data operations, such as directories, then > it means that meta-data changes in the file system is rather > centralized (maybe this explains the relatively poor performance for > creating and removing directories). On GPFS, according to my understanding of their documentation, exactly one node controls access to one file at any given time. If, for all observable aspects of the implementation, a directory is a file with a bunch of metadata for the files it contains, then doing things in a directory from multiple places is similar to accessing the same file from multiple places. Unless I'm blatantly wrong. Probably some complications of that model exist even if I'm not. > I would be curious to see how well the solution works to move data > to the local disk first prior to processing, to avoid working from the > shared file system (including the creation and removal of the scratch > temp directory on GPFS). > > Ioan > > Mihael Hategan wrote: > > On Fri, 2007-10-26 at 15:11 -0500, Ioan Raicu wrote: > > > > > I am not sure what configuration exists on TP, but on the TeraGrid > > > ANL/UC cluster, with 8 servers behind GPFS, the wrapper script > > > performance (create dir, create symbolic links, remove directory... all > > > on GPFS) is anywhere between 20~40 / sec, depending on how many nodes > > > you have doing this concurrently. The throughput increases first as you > > > add nodes, but then decreases down to about 20/sec with 20~30+ nodes. > > > What this means is that even if you bundle jobs up, you will not get > > > anything better than this, throughput wise, regardless of how short the > > > jobs are. Now, if TP has less than 8 servers, its likely that the > > > throughput it can sustain is even lower, > > > > > > > Perhaps in terms of bytes/s. But I wouldn't be so sure that this applies > > to other file stuff. > > > > > > > and if you push it over the > > > edge, even to the point of thrashing where the throughput can be > > > extremely small. I don't have any suggestions of how you can get > > > around this, with the exception of making your job sizes larger on > > > average, and hence have fewer jobs over the same period of time. > > > > > > Ioan > > > > > > Andrew Robert Jamieson wrote: > > > > > > > I am kind of at a stand still for getting anything done on TP right > > > > now with this problem. Are there any suggestions to overcome this for > > > > the time being? > > > > > > > > On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > > > > > > > > > > > > > Hello all, > > > > > > > > > > I am encountering the following problem on Teraport. I submit a > > > > > clustered swift WF which should amount to something on the order of > > > > > 850x3 individual jobs total. I have clustered the jobs because they > > > > > are very fast (somewhere around 20 sec to 1 min long). When I submit > > > > > the WF on TP things start out fantastic, I get 10s of output files in > > > > > a matter of seconds and nodes would start and finish clustered > > > > > batches in a matter of minutes or less. However, after waiting about > > > > > 3-5 mins, when clustered jobs are begin to line up in the queue and > > > > > more start running at the same time, things start to slow down to a > > > > > trickle in terms of output. > > > > > > > > > > One thing I noticed is when I try a simply ls on TP in the swift temp > > > > > running directory where the temp job dirs are created and destroyed, > > > > > it take a very long time. And when it is done only five or so things > > > > > are in the dir. (this is the dir with "info kickstart shared > > > > > status wrapper.log" in it). What I think is happening is that TP's > > > > > filesystem cant handle this extremely rapid creation/destruction of > > > > > directories in that shared location. From what I have been told these > > > > > temp dirs come and go as long as the job runs successfully. > > > > > > > > > > What I am wondering is if there is anyway to move that dir to the > > > > > local node tmp diretory not the shared file system, while it is > > > > > running and if something fails then have it sent to the appropriate > > > > > place. > > > > > > > > > > Or, if another layer of temp dir wrapping could be applied with > > > > > labeld perhaps with respect to the clustered job grouping and not > > > > > simply the individual jobs (since there are thousands being computed > > > > > at once). > > > > > That these things would only be generated/deleted every 5 mins or 10 > > > > > mins (if clustered properly on my part) instead of one event every > > > > > milli second or what have you. > > > > > > > > > > I don't know which solution is feasible or if any are at all, but > > > > > this seems to be a major problem for my WFs. In general it is never > > > > > good to have a million things coming and going on a shared file > > > > > system in one place, from my experience at least. > > > > > > > > > > > > > > > Thanks, > > > > > Andrew > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > -- > > > ============================================ > > > Ioan Raicu > > > Ph.D. Student > > > ============================================ > > > Distributed Systems Laboratory > > > Computer Science Department > > > University of Chicago > > > 1100 E. 58th Street, Ryerson Hall > > > Chicago, IL 60637 > > > ============================================ > > > Email: iraicu at cs.uchicago.edu > > > Web: http://www.cs.uchicago.edu/~iraicu > > > http://dsl.cs.uchicago.edu/ > > > ============================================ > > > ============================================ > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From iraicu at cs.uchicago.edu Fri Oct 26 23:35:05 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 26 Oct 2007 23:35:05 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193459247.21362.6.camel@blabla.mcs.anl.gov> References: <472249F7.1010002@cs.uchicago.edu> <1193441985.9302.1.camel@blabla.mcs.anl.gov> <4722B871.1000503@cs.uchicago.edu> <1193459247.21362.6.camel@blabla.mcs.anl.gov> Message-ID: <4722BFF9.7060105@cs.uchicago.edu> But the scenario is probably more paralelizable, in theory. There is some common path, say /shared/common/path, and then you have x directories that you want to create in path, say dir1, dir2, ... , dirx. If the meta-data information is distributed over the 8 I/O servers, than the creating these x directories should be load balanced across the 8 I/O servers. If the meta-data is centralized, they will all hit the same server. In the end, it doesn't really matter. What matters is that it limits the job granularity you can really have, as the cost of the mkdir and rm dir can quickly outpace the cost of computation and data staging in and out. It would be great to have some alternatives, for workflows that need more throughput than GPFS can handle. Ioan Mihael Hategan wrote: > On Fri, 2007-10-26 at 23:02 -0500, Ioan Raicu wrote: > >> If it doesn't apply to meta-data operations, such as directories, then >> it means that meta-data changes in the file system is rather >> centralized (maybe this explains the relatively poor performance for >> creating and removing directories). >> > > On GPFS, according to my understanding of their documentation, exactly > one node controls access to one file at any given time. If, for all > observable aspects of the implementation, a directory is a file with a > bunch of metadata for the files it contains, then doing things in a > directory from multiple places is similar to accessing the same file > from multiple places. > > Unless I'm blatantly wrong. Probably some complications of that model > exist even if I'm not. > > >> I would be curious to see how well the solution works to move data >> to the local disk first prior to processing, to avoid working from the >> shared file system (including the creation and removal of the scratch >> temp directory on GPFS). >> >> Ioan >> >> Mihael Hategan wrote: >> >>> On Fri, 2007-10-26 at 15:11 -0500, Ioan Raicu wrote: >>> >>> >>>> I am not sure what configuration exists on TP, but on the TeraGrid >>>> ANL/UC cluster, with 8 servers behind GPFS, the wrapper script >>>> performance (create dir, create symbolic links, remove directory... all >>>> on GPFS) is anywhere between 20~40 / sec, depending on how many nodes >>>> you have doing this concurrently. The throughput increases first as you >>>> add nodes, but then decreases down to about 20/sec with 20~30+ nodes. >>>> What this means is that even if you bundle jobs up, you will not get >>>> anything better than this, throughput wise, regardless of how short the >>>> jobs are. Now, if TP has less than 8 servers, its likely that the >>>> throughput it can sustain is even lower, >>>> >>>> >>> Perhaps in terms of bytes/s. But I wouldn't be so sure that this applies >>> to other file stuff. >>> >>> >>> >>>> and if you push it over the >>>> edge, even to the point of thrashing where the throughput can be >>>> extremely small. I don't have any suggestions of how you can get >>>> around this, with the exception of making your job sizes larger on >>>> average, and hence have fewer jobs over the same period of time. >>>> >>>> Ioan >>>> >>>> Andrew Robert Jamieson wrote: >>>> >>>> >>>>> I am kind of at a stand still for getting anything done on TP right >>>>> now with this problem. Are there any suggestions to overcome this for >>>>> the time being? >>>>> >>>>> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: >>>>> >>>>> >>>>> >>>>>> Hello all, >>>>>> >>>>>> I am encountering the following problem on Teraport. I submit a >>>>>> clustered swift WF which should amount to something on the order of >>>>>> 850x3 individual jobs total. I have clustered the jobs because they >>>>>> are very fast (somewhere around 20 sec to 1 min long). When I submit >>>>>> the WF on TP things start out fantastic, I get 10s of output files in >>>>>> a matter of seconds and nodes would start and finish clustered >>>>>> batches in a matter of minutes or less. However, after waiting about >>>>>> 3-5 mins, when clustered jobs are begin to line up in the queue and >>>>>> more start running at the same time, things start to slow down to a >>>>>> trickle in terms of output. >>>>>> >>>>>> One thing I noticed is when I try a simply ls on TP in the swift temp >>>>>> running directory where the temp job dirs are created and destroyed, >>>>>> it take a very long time. And when it is done only five or so things >>>>>> are in the dir. (this is the dir with "info kickstart shared >>>>>> status wrapper.log" in it). What I think is happening is that TP's >>>>>> filesystem cant handle this extremely rapid creation/destruction of >>>>>> directories in that shared location. From what I have been told these >>>>>> temp dirs come and go as long as the job runs successfully. >>>>>> >>>>>> What I am wondering is if there is anyway to move that dir to the >>>>>> local node tmp diretory not the shared file system, while it is >>>>>> running and if something fails then have it sent to the appropriate >>>>>> place. >>>>>> >>>>>> Or, if another layer of temp dir wrapping could be applied with >>>>>> labeld perhaps with respect to the clustered job grouping and not >>>>>> simply the individual jobs (since there are thousands being computed >>>>>> at once). >>>>>> That these things would only be generated/deleted every 5 mins or 10 >>>>>> mins (if clustered properly on my part) instead of one event every >>>>>> milli second or what have you. >>>>>> >>>>>> I don't know which solution is feasible or if any are at all, but >>>>>> this seems to be a major problem for my WFs. In general it is never >>>>>> good to have a million things coming and going on a shared file >>>>>> system in one place, from my experience at least. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Andrew >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>>> >>>>> >>>> -- >>>> ============================================ >>>> Ioan Raicu >>>> Ph.D. Student >>>> ============================================ >>>> Distributed Systems Laboratory >>>> Computer Science Department >>>> University of Chicago >>>> 1100 E. 58th Street, Ryerson Hall >>>> Chicago, IL 60637 >>>> ============================================ >>>> Email: iraicu at cs.uchicago.edu >>>> Web: http://www.cs.uchicago.edu/~iraicu >>>> http://dsl.cs.uchicago.edu/ >>>> ============================================ >>>> ============================================ >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>>> >>> >>> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Oct 26 23:48:38 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 26 Oct 2007 23:48:38 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <4722BFF9.7060105@cs.uchicago.edu> References: <472249F7.1010002@cs.uchicago.edu> <1193441985.9302.1.camel@blabla.mcs.anl.gov> <4722B871.1000503@cs.uchicago.edu> <1193459247.21362.6.camel@blabla.mcs.anl.gov> <4722BFF9.7060105@cs.uchicago.edu> Message-ID: <1193460518.21362.24.camel@blabla.mcs.anl.gov> On Fri, 2007-10-26 at 23:35 -0500, Ioan Raicu wrote: > But the scenario is probably more paralelizable, in theory. There is > some common path, say /shared/common/path, and then you have x > directories that you want to create in path, say dir1, dir2, ... , > dirx. If the meta-data information is distributed over the 8 I/O > servers, than the creating these x directories should be load balanced > across the 8 I/O servers. If the meta-data is centralized, they will > all hit the same server. In the end, it doesn't really matter. What > matters is that it limits the job granularity you can really have, as > the cost of the mkdir and rm dir can quickly outpace the cost of > computation and data staging in and out. It would be great to have > some alternatives, for workflows that need more throughput than GPFS > can handle. It's hard to ensure correctness in distributed systems. And for some specific problems, it is impossible. Leslie Lamport's page with papers is a rich source of seemingly trivial issues that are actually hard. What should raise a flag is that a bunch of relatively competitive teams haven't been able to make it very good given that they had quite a bit of time. Unless you get some more knowledge in the topic, it's somewhat likely that you missed some aspect of the problem. You may have something, but I for one am incapable of assessing whether you actually do (or not). You may also want to consider trade-offs between performance of small operations and performance of big operations. Perhaps, if such a trade-off is necessary, they biased things toward big operations. Mihael > > Ioan > > Mihael Hategan wrote: > > On Fri, 2007-10-26 at 23:02 -0500, Ioan Raicu wrote: > > > > > If it doesn't apply to meta-data operations, such as directories, then > > > it means that meta-data changes in the file system is rather > > > centralized (maybe this explains the relatively poor performance for > > > creating and removing directories). > > > > > > > On GPFS, according to my understanding of their documentation, exactly > > one node controls access to one file at any given time. If, for all > > observable aspects of the implementation, a directory is a file with a > > bunch of metadata for the files it contains, then doing things in a > > directory from multiple places is similar to accessing the same file > > from multiple places. > > > > Unless I'm blatantly wrong. Probably some complications of that model > > exist even if I'm not. > > > > > > > I would be curious to see how well the solution works to move data > > > to the local disk first prior to processing, to avoid working from the > > > shared file system (including the creation and removal of the scratch > > > temp directory on GPFS). > > > > > > Ioan > > > > > > Mihael Hategan wrote: > > > > > > > On Fri, 2007-10-26 at 15:11 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > I am not sure what configuration exists on TP, but on the TeraGrid > > > > > ANL/UC cluster, with 8 servers behind GPFS, the wrapper script > > > > > performance (create dir, create symbolic links, remove directory... all > > > > > on GPFS) is anywhere between 20~40 / sec, depending on how many nodes > > > > > you have doing this concurrently. The throughput increases first as you > > > > > add nodes, but then decreases down to about 20/sec with 20~30+ nodes. > > > > > What this means is that even if you bundle jobs up, you will not get > > > > > anything better than this, throughput wise, regardless of how short the > > > > > jobs are. Now, if TP has less than 8 servers, its likely that the > > > > > throughput it can sustain is even lower, > > > > > > > > > > > > > > Perhaps in terms of bytes/s. But I wouldn't be so sure that this applies > > > > to other file stuff. > > > > > > > > > > > > > > > > > and if you push it over the > > > > > edge, even to the point of thrashing where the throughput can be > > > > > extremely small. I don't have any suggestions of how you can get > > > > > around this, with the exception of making your job sizes larger on > > > > > average, and hence have fewer jobs over the same period of time. > > > > > > > > > > Ioan > > > > > > > > > > Andrew Robert Jamieson wrote: > > > > > > > > > > > > > > > > I am kind of at a stand still for getting anything done on TP right > > > > > > now with this problem. Are there any suggestions to overcome this for > > > > > > the time being? > > > > > > > > > > > > On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > I am encountering the following problem on Teraport. I submit a > > > > > > > clustered swift WF which should amount to something on the order of > > > > > > > 850x3 individual jobs total. I have clustered the jobs because they > > > > > > > are very fast (somewhere around 20 sec to 1 min long). When I submit > > > > > > > the WF on TP things start out fantastic, I get 10s of output files in > > > > > > > a matter of seconds and nodes would start and finish clustered > > > > > > > batches in a matter of minutes or less. However, after waiting about > > > > > > > 3-5 mins, when clustered jobs are begin to line up in the queue and > > > > > > > more start running at the same time, things start to slow down to a > > > > > > > trickle in terms of output. > > > > > > > > > > > > > > One thing I noticed is when I try a simply ls on TP in the swift temp > > > > > > > running directory where the temp job dirs are created and destroyed, > > > > > > > it take a very long time. And when it is done only five or so things > > > > > > > are in the dir. (this is the dir with "info kickstart shared > > > > > > > status wrapper.log" in it). What I think is happening is that TP's > > > > > > > filesystem cant handle this extremely rapid creation/destruction of > > > > > > > directories in that shared location. From what I have been told these > > > > > > > temp dirs come and go as long as the job runs successfully. > > > > > > > > > > > > > > What I am wondering is if there is anyway to move that dir to the > > > > > > > local node tmp diretory not the shared file system, while it is > > > > > > > running and if something fails then have it sent to the appropriate > > > > > > > place. > > > > > > > > > > > > > > Or, if another layer of temp dir wrapping could be applied with > > > > > > > labeld perhaps with respect to the clustered job grouping and not > > > > > > > simply the individual jobs (since there are thousands being computed > > > > > > > at once). > > > > > > > That these things would only be generated/deleted every 5 mins or 10 > > > > > > > mins (if clustered properly on my part) instead of one event every > > > > > > > milli second or what have you. > > > > > > > > > > > > > > I don't know which solution is feasible or if any are at all, but > > > > > > > this seems to be a major problem for my WFs. In general it is never > > > > > > > good to have a million things coming and going on a shared file > > > > > > > system in one place, from my experience at least. > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > Andrew > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > ============================================ > > > > > Ioan Raicu > > > > > Ph.D. Student > > > > > ============================================ > > > > > Distributed Systems Laboratory > > > > > Computer Science Department > > > > > University of Chicago > > > > > 1100 E. 58th Street, Ryerson Hall > > > > > Chicago, IL 60637 > > > > > ============================================ > > > > > Email: iraicu at cs.uchicago.edu > > > > > Web: http://www.cs.uchicago.edu/~iraicu > > > > > http://dsl.cs.uchicago.edu/ > > > > > ============================================ > > > > > ============================================ > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > ============================================ > > > Ioan Raicu > > > Ph.D. Student > > > ============================================ > > > Distributed Systems Laboratory > > > Computer Science Department > > > University of Chicago > > > 1100 E. 58th Street, Ryerson Hall > > > Chicago, IL 60637 > > > ============================================ > > > Email: iraicu at cs.uchicago.edu > > > Web: http://www.cs.uchicago.edu/~iraicu > > > http://dsl.cs.uchicago.edu/ > > > ============================================ > > > ============================================ > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From benc at hawaga.org.uk Sat Oct 27 03:46:01 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Oct 2007 08:46:01 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <20071026211903.AVQ85022@m4500-00.uchicago.edu> References: <20071026211903.AVQ85022@m4500-00.uchicago.edu> Message-ID: On Fri, 26 Oct 2007, andrewj at uchicago.edu wrote: > I will try with kickstart on and lazy errors on, but I think > the same will happen. I suspect so. (the point of those was to collect more data rather than fix brokenness) -- From benc at hawaga.org.uk Sat Oct 27 04:45:32 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Oct 2007 09:45:32 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193459247.21362.6.camel@blabla.mcs.anl.gov> References: <472249F7.1010002@cs.uchicago.edu> <1193441985.9302.1.camel@blabla.mcs.anl.gov> <4722B871.1000503@cs.uchicago.edu> <1193459247.21362.6.camel@blabla.mcs.anl.gov> Message-ID: On Fri, 26 Oct 2007, Mihael Hategan wrote: > On GPFS, according to my understanding of their documentation, exactly > one node controls access to one file at any given time. If, for all > observable aspects of the implementation, a directory is a file with a > bunch of metadata for the files it contains, then doing things in a > directory from multiple places is similar to accessing the same file > from multiple places. Googling around, the IBM document 'Sizing and Tuning GPFS' talks about this a small amount http://www.redbooks.ibm.com/redbooks/pdfs/sg245610.pdf Its for RS/6000 (but I think that's irrelevant) and I don't know how it ties in versionwise (which is more relevant) I think it agrees with what you say above. They briefly discuss write contention for a directory page 62, section 2.4.2. I think basically they're saying in that section that what we do with our shared directories is going to not work very well, because of contention for the write lock on the big (i.e. lots of people accessing) shared directories. It might even be said that the site directory layout at the moment is perfectly designed to work badly with GPFS' directory model, in that there is a lot of shared directory use (for status reports, data file caching, etc) even when jobs are entirely independent. However, this GPFS behaviour, if it really is whats causing the problem, is possibly relatively straightforward to accomodate. I think its worth spending effort accomodating GPFS, given its scalability on other axes. -- From andrewj at uchicago.edu Sat Oct 27 11:36:54 2007 From: andrewj at uchicago.edu (Andrew Robert Jamieson) Date: Sat, 27 Oct 2007 11:36:54 -0500 (CDT) Subject: [Swift-devel] Re: [ci.uchicago.edu #349] Clustering and Temp Dirs with Swift (fwd) Message-ID: ---------- Forwarded message ---------- Date: Fri, 26 Oct 2007 17:00:33 -0500 (CDT) From: "Greg Cross (CI) via RT" To: andrewj at uchicago.edu Subject: Re: [ci.uchicago.edu #349] Clustering and Temp Dirs with Swift You're correct that a lot of localized traffic on GPFS can cause noticeable degradations in performance. The question is, though, are you asking to have scratch space available on all nodes available at any arbitrary time, or if you want to instrument your workflow to copy the data to a scratch space as part of your job. It's difficult to group filesystem operations on a per-job basis. We do not allow access to compute nodes that aren't assigned to an active job for a given user. In theory, you could specify your job to request a group of nodes at once. You could then disperse the data accordingly and instrument a wrapper script to run your single-node tasks in a parallel fashion. You could write the wrapper script such that it uses the same job to run many tasks serially. (On top of this, the parallel and serial techniques could be combined.) There are environmental variables that are coupled with job submissions that can be used to identify hostnames for a multi-node job. This may not be ideal or easy to instrument in a script, but if it's feasible, I'd suggest submitting an interactive job for multiple nodes, and you can test and debug any wrapper scripts accordingly. -- Greg On Fri 26 Oct 2007, at 13:59, Andrew Jamieson via RT wrote: > > Fri Oct 26 13:59:56 2007: Request 349 was acted upon. > Transaction: Ticket created by andrewj at uchicago.edu > Queue: General > Subject: Clustering and Temp Dirs with Swift > Owner: Nobody > Requestors: andrewj at uchicago.edu > Status: new > Ticket Display.html?id=349 > > > > Hello all, > > I am encountering the following problem on Teraport. I submit a > clustered swift WF which should amount to something on the order of > 850x3 > individual jobs total. I have clustered the jobs because they are very > fast (somewhere around 20 sec to 1 min long). When I submit the WF > on TP > things start out fantastic, I get 10s of output files in a matter of > seconds and nodes would start and finish clustered batches in a > matter of > minutes or less. However, after waiting about 3-5 mins, when clustered > jobs are begin to line up in the queue and more start running at > the same > time, things start to slow down to a trickle in terms of output. > > One thing I noticed is when I try a simply ls on TP in the swift temp > running directory where the temp job dirs are created and > destroyed, it > take a very long time. And when it is done only five or so things > are in > the dir. (this is the dir with "info kickstart shared status > wrapper.log" in it). What I think is happening is that TP's > filesystem > cant handle this extremely rapid creation/destruction of > directories in > that shared location. From what I have been told these temp dirs > come and > go as long as the job runs successfully. > > What I am wondering is if there is anyway to move that dir to the > local > node tmp diretory not the shared file system, while it is running > and if > something fails then have it sent to the appropriate place. > > Or, if another layer of temp dir wrapping could be applied with > labeld perhaps with respect to the clustered job grouping and not > simply > the individual jobs (since there are thousands being computed at > once). > That these things would only be generated/deleted every 5 mins or > 10 mins > (if clustered properly on my part) instead of one event every milli > second > or what have you. > > I don't know which solution is feasible or if any are at all, but this > seems to be a major problem for my WFs. In general it is never > good to > have a million things coming and going on a shared file system in one > place, from my experience at least. > > > Thanks, > Andrew > From wilde at mcs.anl.gov Sat Oct 27 13:43:34 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 27 Oct 2007 13:43:34 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: Message-ID: <472386D6.2020707@mcs.anl.gov> Many good points have been raised on this thread. Ive read through it once, probably need to do another pass. I also spoke to Mihael in person on this yesterday afternoon, and want now to try to organize our efforts on this (as I think we all realize its a clear barrier to performance). Plus Andrew is still pushing for results by Nov 1 for an NIH grant resubmission. I suspect that my angle workflow on UC teragrid was having similar problems: lots of jobs finishing but data coming back very slowly. (Btw I really appreciate everyones efforts on this and I *do* realize that its a weekend) What I understand to be happening now is: - Ben is doing more measurements - Mihael was going to try to rework remote jobs to read and write to/from local disk - Mihael suggested we instrument wrapper.sh to record times. Possibly insert date and time commands at every step. (I'd also like to have an option to retrieve these logs from the remote side, but we can do that as a separate utility script for now). I have long wanted to document the current data management logic; I think seeing this in writing would help us pinpoint likely points of contention. I didnt take notes when Mihael answered my questions on this yesterday, but would like to go back and recapture this. It would also help us design new options for data handling conventions. Mihael: are you doing anything on the rework to local I/O at the moment? Knowing your plan would help guide what others should do next. Ben: is the log_processing code changing as we speak, and is it sensible for me and others to try to run your latest versions? Or just send you logfiles? I think in general moving as much I/O (including metadata I/O) from shared disk to local disk is a good thing. If its easy to move almost *all* metadata access there, this is low hanging fruit, and we can just compare before or after times. However if this is hard, then its better to get more measurements and find *which* I/O operations are causing problems, and go after the worst offenders first. Question: do people feel that a move to local disk could be done *entirely* in wrapper.sh, or is it known that other parts of swift would have to change as well? For the moment, until I hear comments on the questions above, I will work on Angle, see if I get the same problems (I should see the same) and try to start a simple text doc on the data management mechanism that will at least help *me* better understand whats going on. - Mike On 10/26/07 7:37 PM, Ben Clifford wrote: > the most recent run logs I've seen of this are that things were > progressing with a small number of job failures, however, one job failed > three times (as happens sometimes, perhaps indicative of a problem with > that job, perhaps statistically/stochastically because you have a lot of > jobs and the execute hosts arent' perfect) and because of that three times > failure, the workflow was aborted. > > I discussed with you on IM the possibility of running with > lazy.errors=true which will cause the workflow to run for longer in the > case of such a problem. > > The output rate stuff is interesting. I'll try to get some better > statistics on that. It is the case that jobs finishing don't immediately > put their output in your run directory. This interacts with jobs that have > not yet been run in a slightly surprising way. Hopefully I can graph this > better soon. > > The charts at > http://www.ci.uchicago.edu/~benc/report-Windowlicker-20071025-2116-ue28hhtc/ > suggest that there are plenty of jobs finishing. > > Here are some questions (that I think can be answered by logs, but not > with the graphs I have now): > > i) how fast are jobs finishing executing? > > ii) how fast are jobs *completely* finishing (which I think is what you > are expecting) which includes staging out files from the compute site to > the submit site? > > I'll have some more plots of this in 12h or so. > > On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: > >> I am kind of at a stand still for getting anything done on TP right now with >> this problem. Are there any suggestions to overcome this for the time being? >> >> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote: >> >>> Hello all, >>> >>> I am encountering the following problem on Teraport. I submit a clustered >>> swift WF which should amount to something on the order of 850x3 individual >>> jobs total. I have clustered the jobs because they are very fast (somewhere >>> around 20 sec to 1 min long). When I submit the WF on TP things start out >>> fantastic, I get 10s of output files in a matter of seconds and nodes would >>> start and finish clustered batches in a matter of minutes or less. However, >>> after waiting about 3-5 mins, when clustered jobs are begin to line up in >>> the queue and more start running at the same time, things start to slow down >>> to a trickle in terms of output. >>> >>> One thing I noticed is when I try a simply ls on TP in the swift temp >>> running directory where the temp job dirs are created and destroyed, it take >>> a very long time. And when it is done only five or so things are in the >>> dir. (this is the dir with "info kickstart shared status wrapper.log" in >>> it). What I think is happening is that TP's filesystem cant handle this >>> extremely rapid creation/destruction of directories in that shared location. >>> From what I have been told these temp dirs come and go as long as the job >>> runs successfully. >>> >>> What I am wondering is if there is anyway to move that dir to the local node >>> tmp diretory not the shared file system, while it is running and if >>> something fails then have it sent to the appropriate place. >>> >>> Or, if another layer of temp dir wrapping could be applied with labeld >>> perhaps with respect to the clustered job grouping and not simply the >>> individual jobs (since there are thousands being computed at once). >>> That these things would only be generated/deleted every 5 mins or 10 mins >>> (if clustered properly on my part) instead of one event every milli second >>> or what have you. >>> >>> I don't know which solution is feasible or if any are at all, but this seems >>> to be a major problem for my WFs. In general it is never good to have a >>> million things coming and going on a shared file system in one place, from >>> my experience at least. >>> >>> >>> Thanks, >>> Andrew >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> > > From benc at hawaga.org.uk Sat Oct 27 13:50:39 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Oct 2007 18:50:39 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <472386D6.2020707@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> Message-ID: On Sat, 27 Oct 2007, Michael Wilde wrote: > I suspect that my angle workflow on UC teragrid was having similar problems: > lots of jobs finishing but data coming back very slowly. > (Btw I really appreciate everyones efforts on this and I *do* realize that its > a weekend) Is this the one that looks like you were hitting the maximum-of-4-at-once limit on file transfers? > Ben: is the log_processing code changing as we speak, and is it sensible for > me and others to try to run your latest versions? Or just send you logfiles? It always changes. But you can svn update whenever you want. If you put a log file (and associated kickstart records) in the usual repository then its easy enough for me to run the code on it. > Question: do people feel that a move to local disk could be done > *entirely* in wrapper.sh, or is it known that other parts of swift would > have to change as well? I think that there won't be a trivial solution to this problem. At present, the model is quite strongly tied to a site-shared filesystem (as VDS was before). In the past, we've discussed informally different ways of moving data round between submit-side storage locations, site-wide storage locations and worker-local storage. I think this is another use case for that; but I think the general conclusion that that's a non-trivial thing to do is still valid. > For the moment, until I hear comments on the questions above, I will > work on Angle, see if I get the same problems (I should see the same) > and try to start a simple text doc on the data management mechanism that > will at least help *me* better understand whats going on. For angle, a first thing to try is increasing the transfer throttle. If there's lock contention there, it may be that will decrease, rather than increase the performance. -- From hategan at mcs.anl.gov Sat Oct 27 13:58:05 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 27 Oct 2007 13:58:05 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472386D6.2020707@mcs.anl.gov> Message-ID: <1193511485.27417.52.camel@blabla.mcs.anl.gov> > > Question: do people feel that a move to local disk could be done > > *entirely* in wrapper.sh, or is it known that other parts of swift would > > have to change as well? > > I think that there won't be a trivial solution to this problem. At > present, the model is quite strongly tied to a site-shared filesystem (as > VDS was before). Quickly before I leave the house: Perhaps we could try copying to local FS instead of linking from shared dir and hence running the jobs on the local FS. Also minimize concurrent access to files (i.e. right now all seq.sh instances write to one global log file on the sfs - silly). > > In the past, we've discussed informally different ways of moving data > round between submit-side storage locations, site-wide storage locations > and worker-local storage. I think this is another use case for that; but I > think the general conclusion that that's a non-trivial thing to do is > still valid. > From benc at hawaga.org.uk Sat Oct 27 14:08:14 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Oct 2007 19:08:14 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193511485.27417.52.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> Message-ID: On Sat, 27 Oct 2007, Mihael Hategan wrote: > Quickly before I leave the house: > Perhaps we could try copying to local FS instead of linking from shared > dir and hence running the jobs on the local FS. Maybe. I'd be suspicious that doesn't reduce access to the directory too much. I think the directories where there are lots of files being read/written by lots of hosts are: the top directory (one job directory per job) the info directory the kickstart directory the file cache In the case where directories get too many files in them because of directory size constraints, its common to split that directory into many smaller directories (eg. how squid caching, or git object storage works). eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some short hash of the filename (with the hash here being 'extract the first two characters). Pretty much I think Andrew wanted to do that for his data files anyway, which would then reflect in the layout of the data cache directory structure. For job directories, it may not be too hard to split the big directories into smaller ones. There will still be write-lock conflicts, but this might mean the contention for each directories write-lock is lower. -- From benc at hawaga.org.uk Sat Oct 27 14:14:28 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Oct 2007 19:14:28 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <472386D6.2020707@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> Message-ID: On Sat, 27 Oct 2007, Michael Wilde wrote: > - Mihael suggested we instrument wrapper.sh to record times. I'll add that. -- From wilde at mcs.anl.gov Sat Oct 27 14:14:42 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 27 Oct 2007 14:14:42 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472386D6.2020707@mcs.anl.gov> Message-ID: <47238E22.3030704@mcs.anl.gov> On 10/27/07 1:50 PM, Ben Clifford wrote: > > On Sat, 27 Oct 2007, Michael Wilde wrote: > >> I suspect that my angle workflow on UC teragrid was having similar problems: >> lots of jobs finishing but data coming back very slowly. >> (Btw I really appreciate everyones efforts on this and I *do* realize that its >> a weekend) > > Is this the one that looks like you were hitting the maximum-of-4-at-once > limit on file transfers? Yes. I dont have the data at hand, but I thought that I had achieved better performance in early runs (about 4 weeks prior). One reason Im suspicious that the throttle itself may not be the problem, is that in older tests I had the throttles opened much wider, and this was causing transfer and data management failures. So I narrows them back to the deafult values, and the workflow went very fast (seeming to have no data transfer bottleneck). I need to gather more data to know whats really happening. One additional unexplained item is that in the run you analyzed with a 4-wide transfer throttle, I was still getting a lot of I/O errors in the log, which I dont thing have been explained yet. > >> Ben: is the log_processing code changing as we speak, and is it sensible for >> me and others to try to run your latest versions? Or just send you logfiles? > > It always changes. But you can svn update whenever you want. > > If you put a log file (and associated kickstart records) in the usual > repository then its easy enough for me to run the code on it. > >> Question: do people feel that a move to local disk could be done >> *entirely* in wrapper.sh, or is it known that other parts of swift would >> have to change as well? > > I think that there won't be a trivial solution to this problem. At > present, the model is quite strongly tied to a site-shared filesystem (as > VDS was before). > > In the past, we've discussed informally different ways of moving data > round between submit-side storage locations, site-wide storage locations > and worker-local storage. I think this is another use case for that; but I > think the general conclusion that that's a non-trivial thing to do is > still valid. I agree that it sounds non-trivial. But it sounded from Mihael on Friday that he was about to start work on it. Thats what I'd like to discuss on the list. Also, a point to consider that has not been discussed much in this thread: it seems from anecdotal evidence that having too many entries in any single dir, *especially* on GPFS, causes very bad performance. Addressing this may be much easier to do than moving shared files and dirs to local disk. For GPFS it seems like >100 entries per dir performs badly. > >> For the moment, until I hear comments on the questions above, I will >> work on Angle, see if I get the same problems (I should see the same) >> and try to start a simple text doc on the data management mechanism that >> will at least help *me* better understand whats going on. > > For angle, a first thing to try is increasing the transfer throttle. > > If there's lock contention there, it may be that will decrease, rather > than increase the performance. Agreed. Will test a few cases and report back; will probably take me till tomorrow to get some results but I'll send reports as I progress. > From hategan at mcs.anl.gov Sat Oct 27 14:17:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 27 Oct 2007 14:17:18 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> Message-ID: <1193512638.30169.8.camel@blabla.mcs.anl.gov> On Sat, 2007-10-27 at 19:08 +0000, Ben Clifford wrote: > > On Sat, 27 Oct 2007, Mihael Hategan wrote: > > > Quickly before I leave the house: Hmm. How naive. > > Perhaps we could try copying to local FS instead of linking from shared > > dir and hence running the jobs on the local FS. > > Maybe. I'd be suspicious that doesn't reduce access to the directory too > much. > > I think the directories where there are lots of files being read/written > by lots of hosts are: > > the top directory (one job directory per job) > the info directory > the kickstart directory > the file cache > > In the case where directories get too many files in them because of > directory size constraints, its common to split that directory into many > smaller directories (eg. how squid caching, or git object storage works). > eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some > short hash of the filename (with the hash here being 'extract the first > two characters). > > Pretty much I think Andrew wanted to do that for his data files anyway, > which would then reflect in the layout of the data cache directory > structure. > > For job directories, it may not be too hard to split the big directories > into smaller ones. There will still be write-lock conflicts, but this > might mean the contention for each directories write-lock is lower. Right. Some of these are easy to avoid and some are harder. The hash idea is brilliant. I think. > From benc at hawaga.org.uk Sat Oct 27 16:57:52 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Oct 2007 21:57:52 +0000 (GMT) Subject: [Swift-devel] awf2 errors In-Reply-To: <47238E22.3030704@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <47238E22.3030704@mcs.anl.gov> Message-ID: On Sat, 27 Oct 2007, Michael Wilde wrote: > One additional unexplained item is that in the run you analyzed with a 4-wide > transfer throttle, I was still getting a lot of I/O errors in the log, which I > dont thing have been explained yet. Can you past one? I don't immediately see them. I see lots of APPLICATION_EXCEPTIONS but with not much detail about the cause. -- From hategan at mcs.anl.gov Sat Oct 27 17:05:49 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 27 Oct 2007 17:05:49 -0500 Subject: [Swift-devel] awf2 errors In-Reply-To: References: <472386D6.2020707@mcs.anl.gov> <47238E22.3030704@mcs.anl.gov> Message-ID: <1193522749.31612.3.camel@blabla.mcs.anl.gov> On Sat, 2007-10-27 at 21:57 +0000, Ben Clifford wrote: > > On Sat, 27 Oct 2007, Michael Wilde wrote: > > > One additional unexplained item is that in the run you analyzed with a 4-wide > > transfer throttle, I was still getting a lot of I/O errors in the log, which I > > dont thing have been explained yet. > > Can you past one? I don't immediately see them. I see lots of > APPLICATION_EXCEPTIONS but with not much detail about the cause. Whenever a job fails, Swift will attempt to transfer the stdout and stderr of that job. There is no guarantee that those files are created by the job (i.e. they only get created when at least one character is written to them). Hence the transfer of these may fail. It is not an error at the Swift level. Again, it's a pattern of the following kind: try { optionalOperationWhichDoesNotHaveToSucceed(); } catch (Exception e) { log(e); } > From hategan at mcs.anl.gov Sat Oct 27 17:11:21 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 27 Oct 2007 17:11:21 -0500 Subject: [Swift-devel] while/for Message-ID: <1193523081.31881.2.camel@blabla.mcs.anl.gov> A somewhat interesting paper on while and sequential for in functional languages. If there's efficient recursion, these should be fairly straightforward to implement in dataflow languages. http://www4.in.tum.de/~obua/looping/ Mihael From wilde at mcs.anl.gov Sat Oct 27 17:52:08 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 27 Oct 2007 17:52:08 -0500 Subject: [Swift-devel] awf2 errors In-Reply-To: <1193522749.31612.3.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <47238E22.3030704@mcs.anl.gov> <1193522749.31612.3.camel@blabla.mcs.anl.gov> Message-ID: <4723C118.5020104@mcs.anl.gov> On 10/27/07 5:05 PM, Mihael Hategan wrote: > On Sat, 2007-10-27 at 21:57 +0000, Ben Clifford wrote: >> On Sat, 27 Oct 2007, Michael Wilde wrote: >> >>> One additional unexplained item is that in the run you analyzed with a 4-wide >>> transfer throttle, I was still getting a lot of I/O errors in the log, which I >>> dont thing have been explained yet. >> Can you past one? I don't immediately see them. I see lots of >> APPLICATION_EXCEPTIONS but with not much detail about the cause. > > Whenever a job fails, Swift will attempt to transfer the stdout and > stderr of that job. There is no guarantee that those files are created > by the job (i.e. they only get created when at least one character is > written to them). Hence the transfer of these may fail. It is not an > error at the Swift level. Again, it's a pattern of the following kind: > Thats what it was. There were 12 APPLICATION_EXCEPTION errors out of 1000 jobs, and 48 failures to get stderr. I didnt correlate these because I had found the I/O errors via grep, and there were 400+ lines with error/failure strings. And I didnt catch that they all pertained to stderr. I'm guessing (but need to check) that those 48 represent retries of some sort on the 12 failed jobs. So you're right, Ben, the slow data return rate is more likely due to throttling or contention. I think we should try to indicate top-level Swift-detected errors with a distinct code to separate them from all the lower-level error details that each incident produces. I realize in practice that this may not be easy, as the details may get logged before the error propagates up tp the "top" level. I wonder what the Globus developers have concluded about error logging strategy. (Can discuss this later on relevant bugzilla bugs - dont want to sidetrack discussion now). > try { > optionalOperationWhichDoesNotHaveToSucceed(); > } > catch (Exception e) { > log(e); > } > > > From benc at hawaga.org.uk Sat Oct 27 17:58:25 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Oct 2007 22:58:25 +0000 (GMT) Subject: [Swift-devel] awf2 errors In-Reply-To: <4723C118.5020104@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <47238E22.3030704@mcs.anl.gov> <1193522749.31612.3.camel@blabla.mcs.anl.gov> <4723C118.5020104@mcs.anl.gov> Message-ID: On Sat, 27 Oct 2007, Michael Wilde wrote: > I think we should try to indicate top-level Swift-detected errors with a > distinct code to separate them from all the lower-level error details that There are a very few defined states through which 'execute' entities can go through; likewise for the various other logged event streams. APPLICATION_EXCEPTION is one such for execute2. That doesn't propagate to its containing 'execute' - instead it retries 3 times and then goes into its own failure state (although that didn't happen in awf2.log - all executes completed successfully). > each incident produces. I realize in practice that this may not be easy, as > the details may get logged before the error propagates up tp the "top" level. > I wonder what the Globus developers have concluded about error logging > strategy. (Can discuss this later on relevant bugzilla bugs - dont want to > sidetrack discussion now). There's no particularly coherent Globus-wide error logging strategy. Attempts to discuss it usually turn into the usual religiousfest. -- From hategan at mcs.anl.gov Sat Oct 27 23:05:05 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 27 Oct 2007 23:05:05 -0500 Subject: [Swift-devel] awf2 errors In-Reply-To: <4723C118.5020104@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <47238E22.3030704@mcs.anl.gov> <1193522749.31612.3.camel@blabla.mcs.anl.gov> <4723C118.5020104@mcs.anl.gov> Message-ID: <1193544305.10664.25.camel@blabla.mcs.anl.gov> On Sat, 2007-10-27 at 17:52 -0500, Michael Wilde wrote: > > On 10/27/07 5:05 PM, Mihael Hategan wrote: > > On Sat, 2007-10-27 at 21:57 +0000, Ben Clifford wrote: > >> On Sat, 27 Oct 2007, Michael Wilde wrote: > >> > >>> One additional unexplained item is that in the run you analyzed with a 4-wide > >>> transfer throttle, I was still getting a lot of I/O errors in the log, which I > >>> dont thing have been explained yet. > >> Can you past one? I don't immediately see them. I see lots of > >> APPLICATION_EXCEPTIONS but with not much detail about the cause. > > > > Whenever a job fails, Swift will attempt to transfer the stdout and > > stderr of that job. There is no guarantee that those files are created > > by the job (i.e. they only get created when at least one character is > > written to them). Hence the transfer of these may fail. It is not an > > error at the Swift level. Again, it's a pattern of the following kind: > > > > Thats what it was. There were 12 APPLICATION_EXCEPTION errors out of > 1000 jobs, and 48 failures to get stderr. I didnt correlate these > because I had found the I/O errors via grep, and there were 400+ lines > with error/failure strings. And I didnt catch that they all pertained to > stderr. > > I'm guessing (but need to check) that those 48 represent retries of some > sort on the 12 failed jobs. > > So you're right, Ben, the slow data return rate is more likely due to > throttling or contention. > > I think we should try to indicate top-level Swift-detected errors with a > distinct code to separate them from all the lower-level error details > that each incident produces. It's an interesting point, but I don't quite see how that would be done in theory :). Superficially, there would need to be a parameter which goes all the way down from maybe() to whatever piece of software implements what's under it for a specific call and say "This call is special, so log it with some stuff before it". Proponents of information hiding would shout "No! It gives you access to implementation details." > I realize in practice that this may not be > easy, as the details may get logged before the error propagates up tp > the "top" level. I wonder what the Globus developers have concluded > about error logging strategy. (Can discuss this later on relevant > bugzilla bugs - dont want to sidetrack discussion now). > > > > try { > > optionalOperationWhichDoesNotHaveToSucceed(); > > } > > catch (Exception e) { > > log(e); > > } > > > > > > > From benc at hawaga.org.uk Sun Oct 28 05:20:05 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 28 Oct 2007 10:20:05 +0000 (GMT) Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures Message-ID: I've been running the same workflow a few times with a high level of clustering. I've noticed that when there are no errors, the code will have up to perhaps 40 jobs running on a site; but if there is a spike of errors restricted in time to a minute or so, but damaging quite a large number of jobs, then the scheduler score for that site gets hit so hard that it never builds up to a reasonable value again and a very low rate is used for the rest of the workflow. Alternatively, aborting the workflow when this happens resets the scheduler score back to 0 for a fresh start and is likely to get a bunch of work done. It seems undesirable that 'kill workflow and restart to clear out the scheduler scores' is the correct action to take. I'm not particularly in a position to do rate limit / scheduler hacking at the moment, but I did turn on scheduler score logging in the default log config. If you're look at job submission rates in future, this may be useful information to have. -- From benc at hawaga.org.uk Sun Oct 28 07:45:28 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 28 Oct 2007 12:45:28 +0000 (GMT) Subject: [Swift-devel] what's new web events Message-ID: I updated the front web page so that the what's new section lists events at SC07, rather than being a link to the quickstart guide. The events listed are the analytics tutorial, the analytics challenge that Mike has been working on with Bob Grossman, and the SC07 booth presentation. If there's anything else going on, I can add it or you can too, in www/inc/home_sidebar.php in the SVN. -- From iraicu at cs.uchicago.edu Sun Oct 28 10:02:29 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 10:02:29 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: References: Message-ID: <4724A485.60906@cs.uchicago.edu> This was the same thing that was happening to the MolDyn workflow when we were hitting the "stale NFS handle" error, when possibly 1000s of jobs would fail within a minute (due to a single bad node), but then when jobs would get through again (10K+ more), the score remained low. I fixed this in Falkon by hiding some of the known errors from Swift, and re-trying the failed tasks, if they were due to the stale NFS handle error. I think Mihael outlined in an email a while back how to disable the task submission throttling due to a bad score, assuming that you have a single site to submit to anyways. A while back I had argued that it might be worthwhile to augment the site score with a weighted score, when a job completion or failure is also multiplied by the time it took for the job to complete or fail. Also, we could change the ratio from -5:1 (failed : succesful) to something more balanced (-1 : 1) where we are not favoring successful or failed jobs. Ioan Ben Clifford wrote: > I've been running the same workflow a few times with a high level of > clustering. I've noticed that when there are no errors, the code will have > up to perhaps 40 jobs running on a site; but if there is a spike of errors > restricted in time to a minute or so, but damaging quite a large number of > jobs, then the scheduler score for that site gets hit so hard that it > never builds up to a reasonable value again and a very low rate is used > for the rest of the workflow. > > Alternatively, aborting the workflow when this happens resets the > scheduler score back to 0 for a fresh start and is likely to get a bunch > of work done. It seems undesirable that 'kill workflow and restart to > clear out the scheduler scores' is the correct action to take. > > I'm not particularly in a position to do rate limit / scheduler hacking at > the moment, but I did turn on scheduler score logging in the default log > config. > > If you're look at job submission rates in future, this may be useful > information to have. > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From benc at hawaga.org.uk Sun Oct 28 10:08:30 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 28 Oct 2007 15:08:30 +0000 (GMT) Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <4724A485.60906@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> Message-ID: On Sun, 28 Oct 2007, Ioan Raicu wrote: > they were due to the stale NFS handle error. I think Mihael outlined in an > email a while back how to disable the task submission throttling due to a bad > score, assuming that you have a single site to submit to anyways. I know how to disable it. I don't particularly want it running rate free. Whats happening here is that the feedback loop feeding back too much / too fast for the situation I experience. There's plenty of fun to be had experimenting there; and I suspect there will be no One True Rate Controller. -- From iraicu at cs.uchicago.edu Sun Oct 28 10:25:11 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 10:25:11 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: References: <4724A485.60906@cs.uchicago.edu> Message-ID: <4724A9D7.2010207@cs.uchicago.edu> Assuming you have a single site to submit to, then I don't see why you don't want to disable the site scoring altogether? Of course you still want throttling, but that is more on the level of X outstanding jobs at any given time (and possibly Y jobs/sec submit rate), so you don't overrun the LRM, but you would not want to lower X to some low value just because some jobs are failing. Again, once you go to multi-site runs, you need the site scoring to decide among the different sites, but with a single site, I see no drawbacks to disabling the site scoring mechanism. Ioan Ben Clifford wrote: > On Sun, 28 Oct 2007, Ioan Raicu wrote: > > >> they were due to the stale NFS handle error. I think Mihael outlined in an >> email a while back how to disable the task submission throttling due to a bad >> score, assuming that you have a single site to submit to anyways. >> > > I know how to disable it. I don't particularly want it running rate free. > > Whats happening here is that the feedback loop feeding back too much / too > fast for the situation I experience. > > There's plenty of fun to be had experimenting there; and I suspect there > will be no One True Rate Controller. > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Oct 28 11:15:46 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 11:15:46 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <4724A9D7.2010207@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> Message-ID: <1193588147.15017.2.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote: > Assuming you have a single site to submit to, then I don't see why you > don't want to disable the site scoring altogether? Because having too many jobs on that one site may still cause problems. That said, the algorithm currently there needs some work. > Of course you still want throttling, but that is more on the level > of X outstanding jobs at any given time (and possibly Y jobs/sec > submit rate), so you don't overrun the LRM, but you would not want to > lower X to some low value just because some jobs are failing. Again, > once you go to multi-site runs, you need the site scoring to decide > among the different sites, but with a single site, I see no drawbacks > to disabling the site scoring mechanism. > > Ioan > > Ben Clifford wrote: > > On Sun, 28 Oct 2007, Ioan Raicu wrote: > > > > > > > they were due to the stale NFS handle error. I think Mihael outlined in an > > > email a while back how to disable the task submission throttling due to a bad > > > score, assuming that you have a single site to submit to anyways. > > > > > > > I know how to disable it. I don't particularly want it running rate free. > > > > Whats happening here is that the feedback loop feeding back too much / too > > fast for the situation I experience. > > > > There's plenty of fun to be had experimenting there; and I suspect there > > will be no One True Rate Controller. > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun Oct 28 11:17:15 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 11:17:15 -0500 Subject: [Swift-devel] what's new web events In-Reply-To: References: Message-ID: <1193588235.15017.4.camel@blabla.mcs.anl.gov> "You can find Swift will in several places at SC07 in Reno, Nevada" Hmm? On Sun, 2007-10-28 at 12:45 +0000, Ben Clifford wrote: > I updated the front web page so that the what's new section lists > events at SC07, rather than being a link to the quickstart guide. > > The events listed are the analytics tutorial, the analytics challenge that > Mike has been working on with Bob Grossman, and the SC07 booth > presentation. > > If there's anything else going on, I can add it or you can too, in > www/inc/home_sidebar.php in the SVN. > From benc at hawaga.org.uk Sun Oct 28 11:17:59 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 28 Oct 2007 16:17:59 +0000 (GMT) Subject: [Swift-devel] what's new web events In-Reply-To: <1193588235.15017.4.camel@blabla.mcs.anl.gov> References: <1193588235.15017.4.camel@blabla.mcs.anl.gov> Message-ID: On Sun, 28 Oct 2007, Mihael Hategan wrote: > "You can find Swift will in several places at SC07 in Reno, Nevada" > > Hmm? heh, my usual missing verb problem. > > On Sun, 2007-10-28 at 12:45 +0000, Ben Clifford wrote: > > I updated the front web page so that the what's new section lists > > events at SC07, rather than being a link to the quickstart guide. > > > > The events listed are the analytics tutorial, the analytics challenge that > > Mike has been working on with Bob Grossman, and the SC07 booth > > presentation. > > > > If there's anything else going on, I can add it or you can too, in > > www/inc/home_sidebar.php in the SVN. > > > > From iraicu at cs.uchicago.edu Sun Oct 28 11:23:41 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 11:23:41 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <1193588147.15017.2.camel@blabla.mcs.anl.gov> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> Message-ID: <4724B78D.4010907@cs.uchicago.edu> I mentioned 2 throttling mechanisms, one is to have X outstanding jobs at any given time (limits jobs in the queue), and Y jobs/sec submit rate (limits the rate of submission). I believe both of these throttling mechanisms could exist without computing site scores, assuming the user knows what to set X and Y to. Ioan Mihael Hategan wrote: > On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote: > >> Assuming you have a single site to submit to, then I don't see why you >> don't want to disable the site scoring altogether? >> > > Because having too many jobs on that one site may still cause problems. > > That said, the algorithm currently there needs some work. > > >> Of course you still want throttling, but that is more on the level >> of X outstanding jobs at any given time (and possibly Y jobs/sec >> submit rate), so you don't overrun the LRM, but you would not want to >> lower X to some low value just because some jobs are failing. Again, >> once you go to multi-site runs, you need the site scoring to decide >> among the different sites, but with a single site, I see no drawbacks >> to disabling the site scoring mechanism. >> >> Ioan >> >> Ben Clifford wrote: >> >>> On Sun, 28 Oct 2007, Ioan Raicu wrote: >>> >>> >>> >>>> they were due to the stale NFS handle error. I think Mihael outlined in an >>>> email a while back how to disable the task submission throttling due to a bad >>>> score, assuming that you have a single site to submit to anyways. >>>> >>>> >>> I know how to disable it. I don't particularly want it running rate free. >>> >>> Whats happening here is that the feedback loop feeding back too much / too >>> fast for the situation I experience. >>> >>> There's plenty of fun to be had experimenting there; and I suspect there >>> will be no One True Rate Controller. >>> >>> >>> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Sun Oct 28 11:25:28 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 11:25:28 -0500 Subject: [Swift-devel] what's new web events In-Reply-To: References: <1193588235.15017.4.camel@blabla.mcs.anl.gov> Message-ID: <4724B7F8.5060108@cs.uchicago.edu> The Falkon talk will have some Swift related slides as well, maybe you want to link to that as well (http://sc07.supercomputing.org/schedule/event_detail.php?evid=11098). Ioan Ben Clifford wrote: > On Sun, 28 Oct 2007, Mihael Hategan wrote: > > >> "You can find Swift will in several places at SC07 in Reno, Nevada" >> >> Hmm? >> > > heh, my usual missing verb problem. > > >> On Sun, 2007-10-28 at 12:45 +0000, Ben Clifford wrote: >> >>> I updated the front web page so that the what's new section lists >>> events at SC07, rather than being a link to the quickstart guide. >>> >>> The events listed are the analytics tutorial, the analytics challenge that >>> Mike has been working on with Bob Grossman, and the SC07 booth >>> presentation. >>> >>> If there's anything else going on, I can add it or you can too, in >>> www/inc/home_sidebar.php in the SVN. >>> >>> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sun Oct 28 12:11:17 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 28 Oct 2007 12:11:17 -0500 Subject: [Swift-devel] what's new web events In-Reply-To: <4724B7F8.5060108@cs.uchicago.edu> References: <1193588235.15017.4.camel@blabla.mcs.anl.gov> <4724B7F8.5060108@cs.uchicago.edu> Message-ID: <4724C2B5.6050303@mcs.anl.gov> yes, would be good to add. On 10/28/07 11:25 AM, Ioan Raicu wrote: > The Falkon talk will have some Swift related slides as well, maybe you > want to link to that as well > (http://sc07.supercomputing.org/schedule/event_detail.php?evid=11098). > > Ioan > > Ben Clifford wrote: >> On Sun, 28 Oct 2007, Mihael Hategan wrote: >> >> >>> "You can find Swift will in several places at SC07 in Reno, Nevada" >>> >>> Hmm? >>> >> >> heh, my usual missing verb problem. >> >> >>> On Sun, 2007-10-28 at 12:45 +0000, Ben Clifford wrote: >>> >>>> I updated the front web page so that the what's new section lists >>>> events at SC07, rather than being a link to the quickstart guide. >>>> >>>> The events listed are the analytics tutorial, the analytics >>>> challenge that Mike has been working on with Bob Grossman, and the >>>> SC07 booth presentation. >>>> >>>> If there's anything else going on, I can add it or you can too, in >>>> www/inc/home_sidebar.php in the SVN. >>>> >>>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun Oct 28 14:51:31 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 14:51:31 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <4724B78D.4010907@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> Message-ID: <1193601091.22794.3.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 11:23 -0500, Ioan Raicu wrote: > I mentioned 2 throttling mechanisms, one is to have X outstanding jobs > at any given time (limits jobs in the queue), and Y jobs/sec > submit rate (limits the rate of submission). I believe both of these > throttling mechanisms could exist without computing site scores, > assuming the user knows what to set X and Y to. They do exist, but they don't deal with asymmetries between sites. Nor do they deal with changing situations. > > Ioan > > Mihael Hategan wrote: > > On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote: > > > > > Assuming you have a single site to submit to, then I don't see why you > > > don't want to disable the site scoring altogether? > > > > > > > Because having too many jobs on that one site may still cause problems. > > > > That said, the algorithm currently there needs some work. > > > > > > > Of course you still want throttling, but that is more on the level > > > of X outstanding jobs at any given time (and possibly Y jobs/sec > > > submit rate), so you don't overrun the LRM, but you would not want to > > > lower X to some low value just because some jobs are failing. Again, > > > once you go to multi-site runs, you need the site scoring to decide > > > among the different sites, but with a single site, I see no drawbacks > > > to disabling the site scoring mechanism. > > > > > > Ioan > > > > > > Ben Clifford wrote: > > > > > > > On Sun, 28 Oct 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > they were due to the stale NFS handle error. I think Mihael outlined in an > > > > > email a while back how to disable the task submission throttling due to a bad > > > > > score, assuming that you have a single site to submit to anyways. > > > > > > > > > > > > > > I know how to disable it. I don't particularly want it running rate free. > > > > > > > > Whats happening here is that the feedback loop feeding back too much / too > > > > fast for the situation I experience. > > > > > > > > There's plenty of fun to be had experimenting there; and I suspect there > > > > will be no One True Rate Controller. > > > > > > > > > > > > > > > -- > > > ============================================ > > > Ioan Raicu > > > Ph.D. Student > > > ============================================ > > > Distributed Systems Laboratory > > > Computer Science Department > > > University of Chicago > > > 1100 E. 58th Street, Ryerson Hall > > > Chicago, IL 60637 > > > ============================================ > > > Email: iraicu at cs.uchicago.edu > > > Web: http://www.cs.uchicago.edu/~iraicu > > > http://dsl.cs.uchicago.edu/ > > > ============================================ > > > ============================================ > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From iraicu at cs.uchicago.edu Sun Oct 28 15:05:36 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 15:05:36 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <1193601091.22794.3.camel@blabla.mcs.anl.gov> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> Message-ID: <4724EB90.1060507@cs.uchicago.edu> But my argument was, and still is, if there is only one site to submit to, changing situations are almost irrelevant, as there are no options anyhow. Give me one example, where you have only 1 site, set X and Y properly, yet you need site scores as an additional throttling mechanism! Mihael Hategan wrote: > On Sun, 2007-10-28 at 11:23 -0500, Ioan Raicu wrote: > >> I mentioned 2 throttling mechanisms, one is to have X outstanding jobs >> at any given time (limits jobs in the queue), and Y jobs/sec >> submit rate (limits the rate of submission). I believe both of these >> throttling mechanisms could exist without computing site scores, >> assuming the user knows what to set X and Y to. >> > > They do exist, but they don't deal with asymmetries between sites. Nor > do they deal with changing situations. > > >> Ioan >> >> Mihael Hategan wrote: >> >>> On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote: >>> >>> >>>> Assuming you have a single site to submit to, then I don't see why you >>>> don't want to disable the site scoring altogether? >>>> >>>> >>> Because having too many jobs on that one site may still cause problems. >>> >>> That said, the algorithm currently there needs some work. >>> >>> >>> >>>> Of course you still want throttling, but that is more on the level >>>> of X outstanding jobs at any given time (and possibly Y jobs/sec >>>> submit rate), so you don't overrun the LRM, but you would not want to >>>> lower X to some low value just because some jobs are failing. Again, >>>> once you go to multi-site runs, you need the site scoring to decide >>>> among the different sites, but with a single site, I see no drawbacks >>>> to disabling the site scoring mechanism. >>>> >>>> Ioan >>>> >>>> Ben Clifford wrote: >>>> >>>> >>>>> On Sun, 28 Oct 2007, Ioan Raicu wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> they were due to the stale NFS handle error. I think Mihael outlined in an >>>>>> email a while back how to disable the task submission throttling due to a bad >>>>>> score, assuming that you have a single site to submit to anyways. >>>>>> >>>>>> >>>>>> >>>>> I know how to disable it. I don't particularly want it running rate free. >>>>> >>>>> Whats happening here is that the feedback loop feeding back too much / too >>>>> fast for the situation I experience. >>>>> >>>>> There's plenty of fun to be had experimenting there; and I suspect there >>>>> will be no One True Rate Controller. >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> ============================================ >>>> Ioan Raicu >>>> Ph.D. Student >>>> ============================================ >>>> Distributed Systems Laboratory >>>> Computer Science Department >>>> University of Chicago >>>> 1100 E. 58th Street, Ryerson Hall >>>> Chicago, IL 60637 >>>> ============================================ >>>> Email: iraicu at cs.uchicago.edu >>>> Web: http://www.cs.uchicago.edu/~iraicu >>>> http://dsl.cs.uchicago.edu/ >>>> ============================================ >>>> ============================================ >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>> >>> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Oct 28 15:58:06 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 15:58:06 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <4724EB90.1060507@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> Message-ID: <1193605087.25186.3.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 15:05 -0500, Ioan Raicu wrote: > But my argument was, and still is, if there is only one site to submit > to, changing situations are almost irrelevant, as there are no options > anyhow. Give me one example, where you have only 1 site, set X and Y > properly, yet you need site scores as an additional throttling > mechanism! Of course it doesn't strictly apply in the one site case. However, the idea of a single simple algorithm that can both deal with multiple sites and can adjust things in the one site case sounds appealing. And since self adjusting processes tend to have a feedback loop in most cases, I'm led to believe that the possibility exists. > > Mihael Hategan wrote: > > On Sun, 2007-10-28 at 11:23 -0500, Ioan Raicu wrote: > > > > > I mentioned 2 throttling mechanisms, one is to have X outstanding jobs > > > at any given time (limits jobs in the queue), and Y jobs/sec > > > submit rate (limits the rate of submission). I believe both of these > > > throttling mechanisms could exist without computing site scores, > > > assuming the user knows what to set X and Y to. > > > > > > > They do exist, but they don't deal with asymmetries between sites. Nor > > do they deal with changing situations. > > > > > > > Ioan > > > > > > Mihael Hategan wrote: > > > > > > > On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > Assuming you have a single site to submit to, then I don't see why you > > > > > don't want to disable the site scoring altogether? > > > > > > > > > > > > > > Because having too many jobs on that one site may still cause problems. > > > > > > > > That said, the algorithm currently there needs some work. > > > > > > > > > > > > > > > > > Of course you still want throttling, but that is more on the level > > > > > of X outstanding jobs at any given time (and possibly Y jobs/sec > > > > > submit rate), so you don't overrun the LRM, but you would not want to > > > > > lower X to some low value just because some jobs are failing. Again, > > > > > once you go to multi-site runs, you need the site scoring to decide > > > > > among the different sites, but with a single site, I see no drawbacks > > > > > to disabling the site scoring mechanism. > > > > > > > > > > Ioan > > > > > > > > > > Ben Clifford wrote: > > > > > > > > > > > > > > > > On Sun, 28 Oct 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > they were due to the stale NFS handle error. I think Mihael outlined in an > > > > > > > email a while back how to disable the task submission throttling due to a bad > > > > > > > score, assuming that you have a single site to submit to anyways. > > > > > > > > > > > > > > > > > > > > > > > > > > > I know how to disable it. I don't particularly want it running rate free. > > > > > > > > > > > > Whats happening here is that the feedback loop feeding back too much / too > > > > > > fast for the situation I experience. > > > > > > > > > > > > There's plenty of fun to be had experimenting there; and I suspect there > > > > > > will be no One True Rate Controller. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > ============================================ > > > > > Ioan Raicu > > > > > Ph.D. Student > > > > > ============================================ > > > > > Distributed Systems Laboratory > > > > > Computer Science Department > > > > > University of Chicago > > > > > 1100 E. 58th Street, Ryerson Hall > > > > > Chicago, IL 60637 > > > > > ============================================ > > > > > Email: iraicu at cs.uchicago.edu > > > > > Web: http://www.cs.uchicago.edu/~iraicu > > > > > http://dsl.cs.uchicago.edu/ > > > > > ============================================ > > > > > ============================================ > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > -- > > > ============================================ > > > Ioan Raicu > > > Ph.D. Student > > > ============================================ > > > Distributed Systems Laboratory > > > Computer Science Department > > > University of Chicago > > > 1100 E. 58th Street, Ryerson Hall > > > Chicago, IL 60637 > > > ============================================ > > > Email: iraicu at cs.uchicago.edu > > > Web: http://www.cs.uchicago.edu/~iraicu > > > http://dsl.cs.uchicago.edu/ > > > ============================================ > > > ============================================ > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From hategan at mcs.anl.gov Sun Oct 28 16:42:02 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 16:42:02 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <4724EB90.1060507@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> Message-ID: <1193607722.25186.17.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 15:05 -0500, Ioan Raicu wrote: > But my argument was, and still is, if there is only one site to submit > to, changing situations are almost irrelevant, Missed that. It is not irrelevant. The speed/capacity of a service is determined by: the jobs you submit, the jobs others submit, the specific type of hardware, and the load on the service node (and other things like network latency). The jobs other submit and the load on the service node vary with time. The bad thing about them is that it's hard to predict how they affect things. Furthermore, user specified rates suffer fundamentally from the problem of the user having to understand how the whole thing works and picking good values. What I've observed is that this doesn't work very well. > as there are no options anyhow. Give me one example, where you have > only 1 site, set X and Y properly, yet you need site scores as an > additional throttling mechanism! > > Mihael Hategan wrote: > > On Sun, 2007-10-28 at 11:23 -0500, Ioan Raicu wrote: > > > > > I mentioned 2 throttling mechanisms, one is to have X outstanding jobs > > > at any given time (limits jobs in the queue), and Y jobs/sec > > > submit rate (limits the rate of submission). I believe both of these > > > throttling mechanisms could exist without computing site scores, > > > assuming the user knows what to set X and Y to. > > > > > > > They do exist, but they don't deal with asymmetries between sites. Nor > > do they deal with changing situations. > > > > > > > Ioan > > > > > > Mihael Hategan wrote: > > > > > > > On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > Assuming you have a single site to submit to, then I don't see why you > > > > > don't want to disable the site scoring altogether? > > > > > > > > > > > > > > Because having too many jobs on that one site may still cause problems. > > > > > > > > That said, the algorithm currently there needs some work. > > > > > > > > > > > > > > > > > Of course you still want throttling, but that is more on the level > > > > > of X outstanding jobs at any given time (and possibly Y jobs/sec > > > > > submit rate), so you don't overrun the LRM, but you would not want to > > > > > lower X to some low value just because some jobs are failing. Again, > > > > > once you go to multi-site runs, you need the site scoring to decide > > > > > among the different sites, but with a single site, I see no drawbacks > > > > > to disabling the site scoring mechanism. > > > > > > > > > > Ioan > > > > > > > > > > Ben Clifford wrote: > > > > > > > > > > > > > > > > On Sun, 28 Oct 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > they were due to the stale NFS handle error. I think Mihael outlined in an > > > > > > > email a while back how to disable the task submission throttling due to a bad > > > > > > > score, assuming that you have a single site to submit to anyways. > > > > > > > > > > > > > > > > > > > > > > > > > > > I know how to disable it. I don't particularly want it running rate free. > > > > > > > > > > > > Whats happening here is that the feedback loop feeding back too much / too > > > > > > fast for the situation I experience. > > > > > > > > > > > > There's plenty of fun to be had experimenting there; and I suspect there > > > > > > will be no One True Rate Controller. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > ============================================ > > > > > Ioan Raicu > > > > > Ph.D. Student > > > > > ============================================ > > > > > Distributed Systems Laboratory > > > > > Computer Science Department > > > > > University of Chicago > > > > > 1100 E. 58th Street, Ryerson Hall > > > > > Chicago, IL 60637 > > > > > ============================================ > > > > > Email: iraicu at cs.uchicago.edu > > > > > Web: http://www.cs.uchicago.edu/~iraicu > > > > > http://dsl.cs.uchicago.edu/ > > > > > ============================================ > > > > > ============================================ > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > -- > > > ============================================ > > > Ioan Raicu > > > Ph.D. Student > > > ============================================ > > > Distributed Systems Laboratory > > > Computer Science Department > > > University of Chicago > > > 1100 E. 58th Street, Ryerson Hall > > > Chicago, IL 60637 > > > ============================================ > > > Email: iraicu at cs.uchicago.edu > > > Web: http://www.cs.uchicago.edu/~iraicu > > > http://dsl.cs.uchicago.edu/ > > > ============================================ > > > ============================================ > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From benc at hawaga.org.uk Sun Oct 28 17:12:53 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 28 Oct 2007 22:12:53 +0000 (GMT) Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <4724EB90.1060507@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> Message-ID: On Sun, 28 Oct 2007, Ioan Raicu wrote: > But my argument was, and still is, if there is only one site to submit to, > changing situations are almost irrelevant, as there are no options anyhow. > Give me one example, where you have only 1 site, set X and Y properly, yet you > need site scores as an additional throttling mechanism! Sites change in their behaviour over time. The TG-UC of right now is not the same as the TG-UC of yesterday nor the TG-UC of tomorrow, at least as far as how much load can be put on the ftp server or the job submission engine or whatever. That's why things like NWS are an interesting research problem still. -- From wilde at mcs.anl.gov Sun Oct 28 17:15:27 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 28 Oct 2007 17:15:27 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> Message-ID: <472509FF.5050608@mcs.anl.gov> Regarding this problem of avoiding large directories: One part of this taking all the swift by-product files that are generated on a per-job basis within a workflow (which Ben has started to list below) and naming them in a way that spreads them across a directory tree. One step seems to be naming files in a way that makes this split easier. What I'd like to suggest is that we set all the UUID patterns that we use in swift from "no-touch-em" properties that we can experiment with. This can set both the pattern eg nnnnnn or aaaaaa as well as whether its sequential vs random, etc. This makes me ask what naming strategy we use for jobs and kickstart records: angle4-mtlivaji-kickstart.xml angle4-ntlivaji-kickstart.xml angle4-otlivaji-kickstart.xml angle4-ptlivaji-kickstart.xml angle4-qtlivaji-kickstart.xml Why are these jobnames differing in the leftmost character of the uuid instead of the rightmost? I never paid attention to this till I started thinking about the dir hashing Ben suggests. I think that most hashes, unless the file names are random, need to be aware of which end of the name is varying fastest. If these were numeric patterns, it would be easy to eg put 100 files per dir by taking say the leftmost 6 characters and making that a dirname within which the rightmost 2 chars would vary: tlivaj/angle4-tlivajim-kickstart.xml tlivaj/angle4-tlivajin-kickstart.xml tlivaj/angle4-tlivajio-kickstart.xml tlivaj/angle4-tlivajip-kickstart.xml tlivaj/angle4-tlivajiq-kickstart.xml but easier on my eyes would be: 000000/angle4-00000001-kickstart.xml 000000/angle4-00000002-kickstart.xml ... 000000/angle4-00000099-kickstart.xml ... 000020/angle4-00002076-kickstart.xml etc. This makes splitting based on powers of 10 (or 26 or 36) trivial. Other splits can be done with mod() functions. Can we start heading in this or some similar direction? We need to coordinate a plan for this, I suspect, to make Andrew's workflows perform acceptably. - Mike On 10/27/07 2:08 PM, Ben Clifford wrote: > > On Sat, 27 Oct 2007, Mihael Hategan wrote: > >> Quickly before I leave the house: >> Perhaps we could try copying to local FS instead of linking from shared >> dir and hence running the jobs on the local FS. > > Maybe. I'd be suspicious that doesn't reduce access to the directory too > much. > > I think the directories where there are lots of files being read/written > by lots of hosts are: > > the top directory (one job directory per job) > the info directory > the kickstart directory > the file cache > > In the case where directories get too many files in them because of > directory size constraints, its common to split that directory into many > smaller directories (eg. how squid caching, or git object storage works). > eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some > short hash of the filename (with the hash here being 'extract the first > two characters). > > Pretty much I think Andrew wanted to do that for his data files anyway, > which would then reflect in the layout of the data cache directory > structure. > > For job directories, it may not be too hard to split the big directories > into smaller ones. There will still be write-lock conflicts, but this > might mean the contention for each directories write-lock is lower. > From hategan at mcs.anl.gov Sun Oct 28 17:17:47 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 17:17:47 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <472509FF.5050608@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> Message-ID: <1193609867.28024.0.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 17:15 -0500, Michael Wilde wrote: > Regarding this problem of avoiding large directories: > > One part of this taking all the swift by-product files that are > generated on a per-job basis within a workflow (which Ben has started to > list below) and naming them in a way that spreads them across a > directory tree. > > One step seems to be naming files in a way that makes this split easier. > > What I'd like to suggest is that we set all the UUID patterns that we > use in swift from "no-touch-em" properties that we can experiment with. > This can set both the pattern eg nnnnnn or aaaaaa as well as whether its > sequential vs random, etc. > > This makes me ask what naming strategy we use for jobs and kickstart > records: > angle4-mtlivaji-kickstart.xml > angle4-ntlivaji-kickstart.xml > angle4-otlivaji-kickstart.xml > angle4-ptlivaji-kickstart.xml > angle4-qtlivaji-kickstart.xml > > Why are these jobnames differing in the leftmost character of the uuid > instead of the rightmost? I never paid attention to this till I started > thinking about the dir hashing Ben suggests. I think that most hashes, > unless the file names are random, need to be aware of which end of the > name is varying fastest. Hmm. Yes. Never occurred to me. It can be changed. > > If these were numeric patterns, it would be easy to eg put 100 files per > dir by taking say the leftmost 6 characters and making that a dirname > within which the rightmost 2 chars would vary: > > tlivaj/angle4-tlivajim-kickstart.xml > tlivaj/angle4-tlivajin-kickstart.xml > tlivaj/angle4-tlivajio-kickstart.xml > tlivaj/angle4-tlivajip-kickstart.xml > tlivaj/angle4-tlivajiq-kickstart.xml > > but easier on my eyes would be: > 000000/angle4-00000001-kickstart.xml > 000000/angle4-00000002-kickstart.xml > ... > 000000/angle4-00000099-kickstart.xml > ... > 000020/angle4-00002076-kickstart.xml > etc. > > This makes splitting based on powers of 10 (or 26 or 36) trivial. Other > splits can be done with mod() functions. > > Can we start heading in this or some similar direction? > > We need to coordinate a plan for this, I suspect, to make Andrew's > workflows perform acceptably. > > - Mike > > > > On 10/27/07 2:08 PM, Ben Clifford wrote: > > > > On Sat, 27 Oct 2007, Mihael Hategan wrote: > > > >> Quickly before I leave the house: > >> Perhaps we could try copying to local FS instead of linking from shared > >> dir and hence running the jobs on the local FS. > > > > Maybe. I'd be suspicious that doesn't reduce access to the directory too > > much. > > > > I think the directories where there are lots of files being read/written > > by lots of hosts are: > > > > the top directory (one job directory per job) > > the info directory > > the kickstart directory > > the file cache > > > > In the case where directories get too many files in them because of > > directory size constraints, its common to split that directory into many > > smaller directories (eg. how squid caching, or git object storage works). > > eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some > > short hash of the filename (with the hash here being 'extract the first > > two characters). > > > > Pretty much I think Andrew wanted to do that for his data files anyway, > > which would then reflect in the layout of the data cache directory > > structure. > > > > For job directories, it may not be too hard to split the big directories > > into smaller ones. There will still be write-lock conflicts, but this > > might mean the contention for each directories write-lock is lower. > > > From benc at hawaga.org.uk Sun Oct 28 17:21:55 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 28 Oct 2007 22:21:55 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193609867.28024.0.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193609867.28024.0.camel@blabla.mcs.anl.gov> Message-ID: On Sun, 28 Oct 2007, Mihael Hategan wrote: > > Why are these jobnames differing in the leftmost character of the uuid > > instead of the rightmost? I never paid attention to this till I started > > thinking about the dir hashing Ben suggests. I think that most hashes, > > unless the file names are random, need to be aware of which end of the > > name is varying fastest. > > Hmm. Yes. Never occurred to me. It can be changed. Though leftmost scoring is (very slightly) easier for prefix-hashing. -- From hategan at mcs.anl.gov Sun Oct 28 17:27:53 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 17:27:53 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <472509FF.5050608@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> Message-ID: <1193610473.28024.7.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 17:15 -0500, Michael Wilde wrote: > If these were numeric patterns, the string would be longer. > it would be easy to eg put 100 files per > dir by taking say the leftmost 6 characters and making that a dirname > within which the rightmost 2 chars would vary: With alpha-numeric ones, it's fairly easy to put 37 files per dir. Anyway. It doesn't matter. Either way. The problem isn't what exact numbering base we're using, but how exactly we put them in subdirectories. > > tlivaj/angle4-tlivajim-kickstart.xml > tlivaj/angle4-tlivajin-kickstart.xml > tlivaj/angle4-tlivajio-kickstart.xml > tlivaj/angle4-tlivajip-kickstart.xml > tlivaj/angle4-tlivajiq-kickstart.xml > > but easier on my eyes would be: > 000000/angle4-00000001-kickstart.xml Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range of values: 00000000000000/angle4-00000000000001-kickstart.xml > 000000/angle4-00000002-kickstart.xml > ... > 000000/angle4-00000099-kickstart.xml > ... > 000020/angle4-00002076-kickstart.xml > etc. > > This makes splitting based on powers of 10 (or 26 or 36) trivial. Other > splits can be done with mod() functions. > > Can we start heading in this or some similar direction? > > We need to coordinate a plan for this, I suspect, to make Andrew's > workflows perform acceptably. > > - Mike > > > > On 10/27/07 2:08 PM, Ben Clifford wrote: > > > > On Sat, 27 Oct 2007, Mihael Hategan wrote: > > > >> Quickly before I leave the house: > >> Perhaps we could try copying to local FS instead of linking from shared > >> dir and hence running the jobs on the local FS. > > > > Maybe. I'd be suspicious that doesn't reduce access to the directory too > > much. > > > > I think the directories where there are lots of files being read/written > > by lots of hosts are: > > > > the top directory (one job directory per job) > > the info directory > > the kickstart directory > > the file cache > > > > In the case where directories get too many files in them because of > > directory size constraints, its common to split that directory into many > > smaller directories (eg. how squid caching, or git object storage works). > > eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some > > short hash of the filename (with the hash here being 'extract the first > > two characters). > > > > Pretty much I think Andrew wanted to do that for his data files anyway, > > which would then reflect in the layout of the data cache directory > > structure. > > > > For job directories, it may not be too hard to split the big directories > > into smaller ones. There will still be write-lock conflicts, but this > > might mean the contention for each directories write-lock is lower. > > > From hategan at mcs.anl.gov Sun Oct 28 17:28:43 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 17:28:43 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193609867.28024.0.camel@blabla.mcs.anl.gov> Message-ID: <1193610524.28024.9.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 22:21 +0000, Ben Clifford wrote: > On Sun, 28 Oct 2007, Mihael Hategan wrote: > > > > Why are these jobnames differing in the leftmost character of the uuid > > > instead of the rightmost? I never paid attention to this till I started > > > thinking about the dir hashing Ben suggests. I think that most hashes, > > > unless the file names are random, need to be aware of which end of the > > > name is varying fastest. > > > > Hmm. Yes. Never occurred to me. It can be changed. > > Though leftmost scoring is (very slightly) easier for prefix-hashing. Hmm. Yes. Make up your minds! > From hategan at mcs.anl.gov Sun Oct 28 17:34:06 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 17:34:06 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193610473.28024.7.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> Message-ID: <1193610846.28024.12.camel@blabla.mcs.anl.gov> > Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range > of values: > > 00000000000000/angle4-00000000000001-kickstart.xml Although that's silly. We'll never have more than 10 million jobs of a kind (pretty much like 640K should be enough for everybody). > > > > 000000/angle4-00000002-kickstart.xml > > ... > > 000000/angle4-00000099-kickstart.xml > > ... > > 000020/angle4-00002076-kickstart.xml > > etc. > > > > This makes splitting based on powers of 10 (or 26 or 36) trivial. Other > > splits can be done with mod() functions. > > > > Can we start heading in this or some similar direction? > > > > We need to coordinate a plan for this, I suspect, to make Andrew's > > workflows perform acceptably. > > > > - Mike > > > > > > > > On 10/27/07 2:08 PM, Ben Clifford wrote: > > > > > > On Sat, 27 Oct 2007, Mihael Hategan wrote: > > > > > >> Quickly before I leave the house: > > >> Perhaps we could try copying to local FS instead of linking from shared > > >> dir and hence running the jobs on the local FS. > > > > > > Maybe. I'd be suspicious that doesn't reduce access to the directory too > > > much. > > > > > > I think the directories where there are lots of files being read/written > > > by lots of hosts are: > > > > > > the top directory (one job directory per job) > > > the info directory > > > the kickstart directory > > > the file cache > > > > > > In the case where directories get too many files in them because of > > > directory size constraints, its common to split that directory into many > > > smaller directories (eg. how squid caching, or git object storage works). > > > eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some > > > short hash of the filename (with the hash here being 'extract the first > > > two characters). > > > > > > Pretty much I think Andrew wanted to do that for his data files anyway, > > > which would then reflect in the layout of the data cache directory > > > structure. > > > > > > For job directories, it may not be too hard to split the big directories > > > into smaller ones. There will still be write-lock conflicts, but this > > > might mean the contention for each directories write-lock is lower. > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Sun Oct 28 17:36:54 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 17:36:54 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193610524.28024.9.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193609867.28024.0.camel@blabla.mcs.anl.gov> <1193610524.28024.9.camel@blabla.mcs.anl.gov> Message-ID: <1193611014.28024.16.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 17:28 -0500, Mihael Hategan wrote: > On Sun, 2007-10-28 at 22:21 +0000, Ben Clifford wrote: > > On Sun, 28 Oct 2007, Mihael Hategan wrote: > > > > > > Why are these jobnames differing in the leftmost character of the uuid > > > > instead of the rightmost? I never paid attention to this till I started > > > > thinking about the dir hashing Ben suggests. I think that most hashes, > > > > unless the file names are random, need to be aware of which end of the > > > > name is varying fastest. > > > > > > Hmm. Yes. Never occurred to me. It can be changed. > > > > Though leftmost scoring is (very slightly) easier for prefix-hashing. > > Hmm. Yes. Make up your minds! Actually this is better. We want to avoid locality. In other words consecutive jobs should be in different directories. This reduces contention because the first job is more likely to be done by the time the hundredth job is started than by the time the second one is. > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Sun Oct 28 17:46:19 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 28 Oct 2007 17:46:19 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193610473.28024.7.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> Message-ID: <4725113B.5070909@mcs.anl.gov> On 10/28/07 5:27 PM, Mihael Hategan wrote: > On Sun, 2007-10-28 at 17:15 -0500, Michael Wilde wrote: > >> If these were numeric patterns, > > the string would be longer. Right. But for now, if we engineered for 999,999 jobs, that would be a simple 6 digit number and it will be a while before we exceed that. Depends also on our strategy for uniqueness. So far I dont see a need to make objects (jobs and files) unique across workflows, just within a workfow. If a run a workflow twice in one dir, I'd like the system to either put my data in a new dataNNN dir thats unique to one run of my workflow, or to start numbering for auto-named files for one workflow where it left of in a previous workflow. (I guess this gets into mktemp-like issues, or .nextID files, etc). In my work, by the way, my log-saving script also moves all my output data to a unique per-run directory, run01, run02, etc. Workflow IDs dont need to be unique outside of a user or group. Im happy to call my runs angle001, angle002, cnari001, etc. Having said all that, I dont have strong feelings on it at this point, except to note that the small easy numbers make it easier on *most* user, for a long time, till their needs outgrow smaller local ID spaces. I'd rather revisit UUID strategies again down the road when we hit that as a scalability problem, and keep simple things simpler for now. This will be much nicer for examples, tutorials, etc in addition to most normal usage. - Mike > >> it would be easy to eg put 100 files per >> dir by taking say the leftmost 6 characters and making that a dirname >> within which the rightmost 2 chars would vary: > > With alpha-numeric ones, it's fairly easy to put 37 files per dir. > > Anyway. It doesn't matter. Either way. The problem isn't what exact > numbering base we're using, but how exactly we put them in > subdirectories. > >> tlivaj/angle4-tlivajim-kickstart.xml >> tlivaj/angle4-tlivajin-kickstart.xml >> tlivaj/angle4-tlivajio-kickstart.xml >> tlivaj/angle4-tlivajip-kickstart.xml >> tlivaj/angle4-tlivajiq-kickstart.xml >> >> but easier on my eyes would be: >> 000000/angle4-00000001-kickstart.xml > > Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range > of values: > > 00000000000000/angle4-00000000000001-kickstart.xml > > >> 000000/angle4-00000002-kickstart.xml >> ... >> 000000/angle4-00000099-kickstart.xml >> ... >> 000020/angle4-00002076-kickstart.xml >> etc. >> >> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other >> splits can be done with mod() functions. >> >> Can we start heading in this or some similar direction? >> >> We need to coordinate a plan for this, I suspect, to make Andrew's >> workflows perform acceptably. >> >> - Mike >> >> >> >> On 10/27/07 2:08 PM, Ben Clifford wrote: >>> On Sat, 27 Oct 2007, Mihael Hategan wrote: >>> >>>> Quickly before I leave the house: >>>> Perhaps we could try copying to local FS instead of linking from shared >>>> dir and hence running the jobs on the local FS. >>> Maybe. I'd be suspicious that doesn't reduce access to the directory too >>> much. >>> >>> I think the directories where there are lots of files being read/written >>> by lots of hosts are: >>> >>> the top directory (one job directory per job) >>> the info directory >>> the kickstart directory >>> the file cache >>> >>> In the case where directories get too many files in them because of >>> directory size constraints, its common to split that directory into many >>> smaller directories (eg. how squid caching, or git object storage works). >>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some >>> short hash of the filename (with the hash here being 'extract the first >>> two characters). >>> >>> Pretty much I think Andrew wanted to do that for his data files anyway, >>> which would then reflect in the layout of the data cache directory >>> structure. >>> >>> For job directories, it may not be too hard to split the big directories >>> into smaller ones. There will still be write-lock conflicts, but this >>> might mean the contention for each directories write-lock is lower. >>> > > From wilde at mcs.anl.gov Sun Oct 28 17:47:53 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 28 Oct 2007 17:47:53 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193610846.28024.12.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> <1193610846.28024.12.camel@blabla.mcs.anl.gov> Message-ID: <47251199.5080802@mcs.anl.gov> We're gonna have a pretty serious party when we complete our first 10M-job workflow. I look forward to this problem !!! :) Mike On 10/28/07 5:34 PM, Mihael Hategan wrote: > >> Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range >> of values: >> >> 00000000000000/angle4-00000000000001-kickstart.xml > > Although that's silly. We'll never have more than 10 million jobs of a > kind (pretty much like 640K should be enough for everybody). > >> >>> 000000/angle4-00000002-kickstart.xml >>> ... >>> 000000/angle4-00000099-kickstart.xml >>> ... >>> 000020/angle4-00002076-kickstart.xml >>> etc. >>> >>> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other >>> splits can be done with mod() functions. >>> >>> Can we start heading in this or some similar direction? >>> >>> We need to coordinate a plan for this, I suspect, to make Andrew's >>> workflows perform acceptably. >>> >>> - Mike >>> >>> >>> >>> On 10/27/07 2:08 PM, Ben Clifford wrote: >>>> On Sat, 27 Oct 2007, Mihael Hategan wrote: >>>> >>>>> Quickly before I leave the house: >>>>> Perhaps we could try copying to local FS instead of linking from shared >>>>> dir and hence running the jobs on the local FS. >>>> Maybe. I'd be suspicious that doesn't reduce access to the directory too >>>> much. >>>> >>>> I think the directories where there are lots of files being read/written >>>> by lots of hosts are: >>>> >>>> the top directory (one job directory per job) >>>> the info directory >>>> the kickstart directory >>>> the file cache >>>> >>>> In the case where directories get too many files in them because of >>>> directory size constraints, its common to split that directory into many >>>> smaller directories (eg. how squid caching, or git object storage works). >>>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some >>>> short hash of the filename (with the hash here being 'extract the first >>>> two characters). >>>> >>>> Pretty much I think Andrew wanted to do that for his data files anyway, >>>> which would then reflect in the layout of the data cache directory >>>> structure. >>>> >>>> For job directories, it may not be too hard to split the big directories >>>> into smaller ones. There will still be write-lock conflicts, but this >>>> might mean the contention for each directories write-lock is lower. >>>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > From hategan at mcs.anl.gov Sun Oct 28 17:53:04 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 17:53:04 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <4725113B.5070909@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> <4725113B.5070909@mcs.anl.gov> Message-ID: <1193611984.29418.2.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 17:46 -0500, Michael Wilde wrote: > Workflow IDs dont need to be unique outside of a user or group. They should. > > Im happy to call my runs angle001, angle002, cnari001, etc. You're forgetting I2U2, which will likely have lots of worklfows running. I think the workflow IDs should stay as they are. > > Having said all that, I dont have strong feelings on it at this point, > except to note that the small easy numbers make it easier on *most* > user, for a long time, till their needs outgrow smaller local ID spaces. > > I'd rather revisit UUID strategies again down the road when we hit that > as a scalability problem, and keep simple things simpler for now. > > This will be much nicer for examples, tutorials, etc in addition to most > normal usage. > > - Mike > > > > >> it would be easy to eg put 100 files per > >> dir by taking say the leftmost 6 characters and making that a dirname > >> within which the rightmost 2 chars would vary: > > > > With alpha-numeric ones, it's fairly easy to put 37 files per dir. > > > > Anyway. It doesn't matter. Either way. The problem isn't what exact > > numbering base we're using, but how exactly we put them in > > subdirectories. > > > >> tlivaj/angle4-tlivajim-kickstart.xml > >> tlivaj/angle4-tlivajin-kickstart.xml > >> tlivaj/angle4-tlivajio-kickstart.xml > >> tlivaj/angle4-tlivajip-kickstart.xml > >> tlivaj/angle4-tlivajiq-kickstart.xml > >> > >> but easier on my eyes would be: > >> 000000/angle4-00000001-kickstart.xml > > > > Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range > > of values: > > > > 00000000000000/angle4-00000000000001-kickstart.xml > > > > > >> 000000/angle4-00000002-kickstart.xml > >> ... > >> 000000/angle4-00000099-kickstart.xml > >> ... > >> 000020/angle4-00002076-kickstart.xml > >> etc. > >> > >> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other > >> splits can be done with mod() functions. > >> > >> Can we start heading in this or some similar direction? > >> > >> We need to coordinate a plan for this, I suspect, to make Andrew's > >> workflows perform acceptably. > >> > >> - Mike > >> > >> > >> > >> On 10/27/07 2:08 PM, Ben Clifford wrote: > >>> On Sat, 27 Oct 2007, Mihael Hategan wrote: > >>> > >>>> Quickly before I leave the house: > >>>> Perhaps we could try copying to local FS instead of linking from shared > >>>> dir and hence running the jobs on the local FS. > >>> Maybe. I'd be suspicious that doesn't reduce access to the directory too > >>> much. > >>> > >>> I think the directories where there are lots of files being read/written > >>> by lots of hosts are: > >>> > >>> the top directory (one job directory per job) > >>> the info directory > >>> the kickstart directory > >>> the file cache > >>> > >>> In the case where directories get too many files in them because of > >>> directory size constraints, its common to split that directory into many > >>> smaller directories (eg. how squid caching, or git object storage works). > >>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some > >>> short hash of the filename (with the hash here being 'extract the first > >>> two characters). > >>> > >>> Pretty much I think Andrew wanted to do that for his data files anyway, > >>> which would then reflect in the layout of the data cache directory > >>> structure. > >>> > >>> For job directories, it may not be too hard to split the big directories > >>> into smaller ones. There will still be write-lock conflicts, but this > >>> might mean the contention for each directories write-lock is lower. > >>> > > > > > From hategan at mcs.anl.gov Sun Oct 28 17:54:10 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 17:54:10 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <47251199.5080802@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> <1193610846.28024.12.camel@blabla.mcs.anl.gov> <47251199.5080802@mcs.anl.gov> Message-ID: <1193612051.29418.4.camel@blabla.mcs.anl.gov> The odds of that seem low, indeed :) On Sun, 2007-10-28 at 17:47 -0500, Michael Wilde wrote: > We're gonna have a pretty serious party when we complete our first > 10M-job workflow. I look forward to this problem !!! > > :) Mike > > > On 10/28/07 5:34 PM, Mihael Hategan wrote: > > > >> Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range > >> of values: > >> > >> 00000000000000/angle4-00000000000001-kickstart.xml > > > > Although that's silly. We'll never have more than 10 million jobs of a > > kind (pretty much like 640K should be enough for everybody). > > > >> > >>> 000000/angle4-00000002-kickstart.xml > >>> ... > >>> 000000/angle4-00000099-kickstart.xml > >>> ... > >>> 000020/angle4-00002076-kickstart.xml > >>> etc. > >>> > >>> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other > >>> splits can be done with mod() functions. > >>> > >>> Can we start heading in this or some similar direction? > >>> > >>> We need to coordinate a plan for this, I suspect, to make Andrew's > >>> workflows perform acceptably. > >>> > >>> - Mike > >>> > >>> > >>> > >>> On 10/27/07 2:08 PM, Ben Clifford wrote: > >>>> On Sat, 27 Oct 2007, Mihael Hategan wrote: > >>>> > >>>>> Quickly before I leave the house: > >>>>> Perhaps we could try copying to local FS instead of linking from shared > >>>>> dir and hence running the jobs on the local FS. > >>>> Maybe. I'd be suspicious that doesn't reduce access to the directory too > >>>> much. > >>>> > >>>> I think the directories where there are lots of files being read/written > >>>> by lots of hosts are: > >>>> > >>>> the top directory (one job directory per job) > >>>> the info directory > >>>> the kickstart directory > >>>> the file cache > >>>> > >>>> In the case where directories get too many files in them because of > >>>> directory size constraints, its common to split that directory into many > >>>> smaller directories (eg. how squid caching, or git object storage works). > >>>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some > >>>> short hash of the filename (with the hash here being 'extract the first > >>>> two characters). > >>>> > >>>> Pretty much I think Andrew wanted to do that for his data files anyway, > >>>> which would then reflect in the layout of the data cache directory > >>>> structure. > >>>> > >>>> For job directories, it may not be too hard to split the big directories > >>>> into smaller ones. There will still be write-lock conflicts, but this > >>>> might mean the contention for each directories write-lock is lower. > >>>> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > > > From benc at hawaga.org.uk Sun Oct 28 17:57:49 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 28 Oct 2007 22:57:49 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193611984.29418.2.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> <4725113B.5070909@mcs.anl.gov> <1193611984.29418.2.camel@blabla.mcs.anl.gov> Message-ID: any sequential numbering relies on having a global lock on a sequence counter. if we're expecting workflow log names to be unique amongst all workflow runs ever (which I believe to be desirable) then there needs to be a global shared counter variable between all instances of swift running anywhere. that's lame. having a large value space and picking numbers in there very sparsely (eg timestamp + some random ID) is very scalable in this situation. -- From wilde at mcs.anl.gov Sun Oct 28 18:12:53 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 28 Oct 2007 18:12:53 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> <4725113B.5070909@mcs.anl.gov> <1193611984.29418.2.camel@blabla.mcs.anl.gov> Message-ID: <47251775.4020700@mcs.anl.gov> OK. You two should decide. I observe that if we use the current strategy to pick a unique wfid, then we dont need a global lock, just a strategy that works within a workflow and its top-level client and server-side directories. Much more important is that we solve the scalability problems, starting as far as I can tell with the big-dir problems. Any way you solve it that gets things running fastest soonest, is fine by me. Solve away. - Mike On 10/28/07 5:57 PM, Ben Clifford wrote: > any sequential numbering relies on having a global lock on a sequence > counter. > > if we're expecting workflow log names to be unique amongst all workflow > runs ever (which I believe to be desirable) then there needs to be a > global shared counter variable between all instances of swift running > anywhere. that's lame. > > having a large value space and picking numbers in there very sparsely (eg > timestamp + some random ID) is very scalable in this situation. > From iraicu at cs.uchicago.edu Sun Oct 28 19:51:00 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 19:51:00 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193610846.28024.12.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> <1193610846.28024.12.camel@blabla.mcs.anl.gov> Message-ID: <47252E74.9060903@cs.uchicago.edu> At the Microsoft workshop I just attended, someone had a 25 million task application that dealt with AIDS research :) Mihael Hategan wrote: > >> Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range >> of values: >> >> 00000000000000/angle4-00000000000001-kickstart.xml >> > > Although that's silly. We'll never have more than 10 million jobs of a > kind (pretty much like 640K should be enough for everybody). > > >> >>> 000000/angle4-00000002-kickstart.xml >>> ... >>> 000000/angle4-00000099-kickstart.xml >>> ... >>> 000020/angle4-00002076-kickstart.xml >>> etc. >>> >>> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other >>> splits can be done with mod() functions. >>> >>> Can we start heading in this or some similar direction? >>> >>> We need to coordinate a plan for this, I suspect, to make Andrew's >>> workflows perform acceptably. >>> >>> - Mike >>> >>> >>> >>> On 10/27/07 2:08 PM, Ben Clifford wrote: >>> >>>> On Sat, 27 Oct 2007, Mihael Hategan wrote: >>>> >>>> >>>>> Quickly before I leave the house: >>>>> Perhaps we could try copying to local FS instead of linking from shared >>>>> dir and hence running the jobs on the local FS. >>>>> >>>> Maybe. I'd be suspicious that doesn't reduce access to the directory too >>>> much. >>>> >>>> I think the directories where there are lots of files being read/written >>>> by lots of hosts are: >>>> >>>> the top directory (one job directory per job) >>>> the info directory >>>> the kickstart directory >>>> the file cache >>>> >>>> In the case where directories get too many files in them because of >>>> directory size constraints, its common to split that directory into many >>>> smaller directories (eg. how squid caching, or git object storage works). >>>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some >>>> short hash of the filename (with the hash here being 'extract the first >>>> two characters). >>>> >>>> Pretty much I think Andrew wanted to do that for his data files anyway, >>>> which would then reflect in the layout of the data cache directory >>>> structure. >>>> >>>> For job directories, it may not be too hard to split the big directories >>>> into smaller ones. There will still be write-lock conflicts, but this >>>> might mean the contention for each directories write-lock is lower. >>>> >>>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Sun Oct 28 20:05:06 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 20:05:06 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <1193607722.25186.17.camel@blabla.mcs.anl.gov> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> Message-ID: <472531C2.6020206@cs.uchicago.edu> This might be so, but when a user comes across behavior that is significantly sub-optimal (such as sending few jobs that don't utilize all the nodes at a site), they will want knobs to manually tune things to be closer to optimal (in their opinion). That said, you are probably right that the default setting should be completely automated, but there should be knobs that can be turned on, off, up, down, etc... to allow the user to avoid the bad behavior. For example, this means allowing the user to turn off site scoring. This is not the first time we are having this discussion, and I only brought up these points again since Ben started up the discussion. I think we all have our opinions, and in the end, I am not the one who will be implementing these knobs, so feel free to do what you think is best! Ioan Mihael Hategan wrote: > On Sun, 2007-10-28 at 15:05 -0500, Ioan Raicu wrote: > >> But my argument was, and still is, if there is only one site to submit >> to, changing situations are almost irrelevant, >> > > Missed that. It is not irrelevant. The speed/capacity of a service is > determined by: the jobs you submit, the jobs others submit, the specific > type of hardware, and the load on the service node (and other things > like network latency). The jobs other submit and the load on the service > node vary with time. The bad thing about them is that it's hard to > predict how they affect things. > > Furthermore, user specified rates suffer fundamentally from the problem > of the user having to understand how the whole thing works and picking > good values. What I've observed is that this doesn't work very well. > > >> as there are no options anyhow. Give me one example, where you have >> only 1 site, set X and Y properly, yet you need site scores as an >> additional throttling mechanism! >> >> Mihael Hategan wrote: >> >>> On Sun, 2007-10-28 at 11:23 -0500, Ioan Raicu wrote: >>> >>> >>>> I mentioned 2 throttling mechanisms, one is to have X outstanding jobs >>>> at any given time (limits jobs in the queue), and Y jobs/sec >>>> submit rate (limits the rate of submission). I believe both of these >>>> throttling mechanisms could exist without computing site scores, >>>> assuming the user knows what to set X and Y to. >>>> >>>> >>> They do exist, but they don't deal with asymmetries between sites. Nor >>> do they deal with changing situations. >>> >>> >>> >>>> Ioan >>>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote: >>>>> >>>>> >>>>> >>>>>> Assuming you have a single site to submit to, then I don't see why you >>>>>> don't want to disable the site scoring altogether? >>>>>> >>>>>> >>>>>> >>>>> Because having too many jobs on that one site may still cause problems. >>>>> >>>>> That said, the algorithm currently there needs some work. >>>>> >>>>> >>>>> >>>>> >>>>>> Of course you still want throttling, but that is more on the level >>>>>> of X outstanding jobs at any given time (and possibly Y jobs/sec >>>>>> submit rate), so you don't overrun the LRM, but you would not want to >>>>>> lower X to some low value just because some jobs are failing. Again, >>>>>> once you go to multi-site runs, you need the site scoring to decide >>>>>> among the different sites, but with a single site, I see no drawbacks >>>>>> to disabling the site scoring mechanism. >>>>>> >>>>>> Ioan >>>>>> >>>>>> Ben Clifford wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Sun, 28 Oct 2007, Ioan Raicu wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> they were due to the stale NFS handle error. I think Mihael outlined in an >>>>>>>> email a while back how to disable the task submission throttling due to a bad >>>>>>>> score, assuming that you have a single site to submit to anyways. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> I know how to disable it. I don't particularly want it running rate free. >>>>>>> >>>>>>> Whats happening here is that the feedback loop feeding back too much / too >>>>>>> fast for the situation I experience. >>>>>>> >>>>>>> There's plenty of fun to be had experimenting there; and I suspect there >>>>>>> will be no One True Rate Controller. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> ============================================ >>>>>> Ioan Raicu >>>>>> Ph.D. Student >>>>>> ============================================ >>>>>> Distributed Systems Laboratory >>>>>> Computer Science Department >>>>>> University of Chicago >>>>>> 1100 E. 58th Street, Ryerson Hall >>>>>> Chicago, IL 60637 >>>>>> ============================================ >>>>>> Email: iraicu at cs.uchicago.edu >>>>>> Web: http://www.cs.uchicago.edu/~iraicu >>>>>> http://dsl.cs.uchicago.edu/ >>>>>> ============================================ >>>>>> ============================================ >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> -- >>>> ============================================ >>>> Ioan Raicu >>>> Ph.D. Student >>>> ============================================ >>>> Distributed Systems Laboratory >>>> Computer Science Department >>>> University of Chicago >>>> 1100 E. 58th Street, Ryerson Hall >>>> Chicago, IL 60637 >>>> ============================================ >>>> Email: iraicu at cs.uchicago.edu >>>> Web: http://www.cs.uchicago.edu/~iraicu >>>> http://dsl.cs.uchicago.edu/ >>>> ============================================ >>>> ============================================ >>>> >>>> >>> >>> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Oct 28 21:33:49 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 21:33:49 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <47252E74.9060903@cs.uchicago.edu> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> <1193610846.28024.12.camel@blabla.mcs.anl.gov> <47252E74.9060903@cs.uchicago.edu> Message-ID: <1193625230.31045.3.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 19:51 -0500, Ioan Raicu wrote: > At the Microsoft workshop I just attended, someone had a 25 million > task application that dealt with AIDS research :) :) We might also get there at some undetermined point in the future. Luckily we can easily change the scheme at that time without causing too much trouble. Do you know the name of the system? It may be very useful to learn how they do it, and what problems they have hit. > > Mihael Hategan wrote: > > > Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range > > > of values: > > > > > > 00000000000000/angle4-00000000000001-kickstart.xml > > > > > > > Although that's silly. We'll never have more than 10 million jobs of a > > kind (pretty much like 640K should be enough for everybody). > > > > > > > > 000000/angle4-00000002-kickstart.xml > > > > ... > > > > 000000/angle4-00000099-kickstart.xml > > > > ... > > > > 000020/angle4-00002076-kickstart.xml > > > > etc. > > > > > > > > This makes splitting based on powers of 10 (or 26 or 36) trivial. Other > > > > splits can be done with mod() functions. > > > > > > > > Can we start heading in this or some similar direction? > > > > > > > > We need to coordinate a plan for this, I suspect, to make Andrew's > > > > workflows perform acceptably. > > > > > > > > - Mike > > > > > > > > > > > > > > > > On 10/27/07 2:08 PM, Ben Clifford wrote: > > > > > > > > > On Sat, 27 Oct 2007, Mihael Hategan wrote: > > > > > > > > > > > > > > > > Quickly before I leave the house: > > > > > > Perhaps we could try copying to local FS instead of linking from shared > > > > > > dir and hence running the jobs on the local FS. > > > > > > > > > > > Maybe. I'd be suspicious that doesn't reduce access to the directory too > > > > > much. > > > > > > > > > > I think the directories where there are lots of files being read/written > > > > > by lots of hosts are: > > > > > > > > > > the top directory (one job directory per job) > > > > > the info directory > > > > > the kickstart directory > > > > > the file cache > > > > > > > > > > In the case where directories get too many files in them because of > > > > > directory size constraints, its common to split that directory into many > > > > > smaller directories (eg. how squid caching, or git object storage works). > > > > > eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some > > > > > short hash of the filename (with the hash here being 'extract the first > > > > > two characters). > > > > > > > > > > Pretty much I think Andrew wanted to do that for his data files anyway, > > > > > which would then reflect in the layout of the data cache directory > > > > > structure. > > > > > > > > > > For job directories, it may not be too hard to split the big directories > > > > > into smaller ones. There will still be write-lock conflicts, but this > > > > > might mean the contention for each directories write-lock is lower. > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From hategan at mcs.anl.gov Sun Oct 28 22:00:22 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 22:00:22 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <472531C2.6020206@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> Message-ID: <1193626822.31045.31.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 20:05 -0500, Ioan Raicu wrote: > This might be so, but when a user comes across behavior that is > significantly sub-optimal (such as sending few jobs that don't utilize > all the nodes at a site), they will want knobs to manually tune things > to be closer to optimal (in their opinion). This is a hindsight kind of "sub-optimal". > That said, you are probably right that the default setting should be > completely automated, but there should be knobs that can be turned on, > off, up, down, etc... to allow the user to avoid the bad behavior. Yes, although it may at times do more harm than good, if done without a reasonable understanding of the issue. The more complex the problem, the more likely the users will fill in inappropriate values. The problem right now is that the current algorithms will not reasonably make users happy if they insist on optimality. We should try to change that. > For example, this means allowing the user to turn off site scoring. Hmm. I think this went in the wrong direction. I assumed you knew that these things can be turned off, since we discussed this before and can be seen in swift.properties. So I thought this was meant as an attempt to try to understand the larger issue (as much on your side as on mine). In particular: throttle.score.job.factor=off Mihael > > This is not the first time we are having this discussion, and I only > brought up these points again since Ben started up the discussion. I > think we all have our opinions, and in the end, I am not the one who > will be implementing these knobs, so feel free to do what you think is > best! > > Ioan > > Mihael Hategan wrote: > > On Sun, 2007-10-28 at 15:05 -0500, Ioan Raicu wrote: > > > > > But my argument was, and still is, if there is only one site to submit > > > to, changing situations are almost irrelevant, > > > > > > > Missed that. It is not irrelevant. The speed/capacity of a service is > > determined by: the jobs you submit, the jobs others submit, the specific > > type of hardware, and the load on the service node (and other things > > like network latency). The jobs other submit and the load on the service > > node vary with time. The bad thing about them is that it's hard to > > predict how they affect things. > > > > Furthermore, user specified rates suffer fundamentally from the problem > > of the user having to understand how the whole thing works and picking > > good values. What I've observed is that this doesn't work very well. > > > > > > > as there are no options anyhow. Give me one example, where you have > > > only 1 site, set X and Y properly, yet you need site scores as an > > > additional throttling mechanism! > > > > > > Mihael Hategan wrote: > > > > > > > On Sun, 2007-10-28 at 11:23 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > I mentioned 2 throttling mechanisms, one is to have X outstanding jobs > > > > > at any given time (limits jobs in the queue), and Y jobs/sec > > > > > submit rate (limits the rate of submission). I believe both of these > > > > > throttling mechanisms could exist without computing site scores, > > > > > assuming the user knows what to set X and Y to. > > > > > > > > > > > > > > They do exist, but they don't deal with asymmetries between sites. Nor > > > > do they deal with changing situations. > > > > > > > > > > > > > > > > > Ioan > > > > > > > > > > Mihael Hategan wrote: > > > > > > > > > > > > > > > > On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Assuming you have a single site to submit to, then I don't see why you > > > > > > > don't want to disable the site scoring altogether? > > > > > > > > > > > > > > > > > > > > > > > > > > > Because having too many jobs on that one site may still cause problems. > > > > > > > > > > > > That said, the algorithm currently there needs some work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Of course you still want throttling, but that is more on the level > > > > > > > of X outstanding jobs at any given time (and possibly Y jobs/sec > > > > > > > submit rate), so you don't overrun the LRM, but you would not want to > > > > > > > lower X to some low value just because some jobs are failing. Again, > > > > > > > once you go to multi-site runs, you need the site scoring to decide > > > > > > > among the different sites, but with a single site, I see no drawbacks > > > > > > > to disabling the site scoring mechanism. > > > > > > > > > > > > > > Ioan > > > > > > > > > > > > > > Ben Clifford wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, 28 Oct 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > they were due to the stale NFS handle error. I think Mihael outlined in an > > > > > > > > > email a while back how to disable the task submission throttling due to a bad > > > > > > > > > score, assuming that you have a single site to submit to anyways. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I know how to disable it. I don't particularly want it running rate free. > > > > > > > > > > > > > > > > Whats happening here is that the feedback loop feeding back too much / too > > > > > > > > fast for the situation I experience. > > > > > > > > > > > > > > > > There's plenty of fun to be had experimenting there; and I suspect there > > > > > > > > will be no One True Rate Controller. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > ============================================ > > > > > > > Ioan Raicu > > > > > > > Ph.D. Student > > > > > > > ============================================ > > > > > > > Distributed Systems Laboratory > > > > > > > Computer Science Department > > > > > > > University of Chicago > > > > > > > 1100 E. 58th Street, Ryerson Hall > > > > > > > Chicago, IL 60637 > > > > > > > ============================================ > > > > > > > Email: iraicu at cs.uchicago.edu > > > > > > > Web: http://www.cs.uchicago.edu/~iraicu > > > > > > > http://dsl.cs.uchicago.edu/ > > > > > > > ============================================ > > > > > > > ============================================ > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > ============================================ > > > > > Ioan Raicu > > > > > Ph.D. Student > > > > > ============================================ > > > > > Distributed Systems Laboratory > > > > > Computer Science Department > > > > > University of Chicago > > > > > 1100 E. 58th Street, Ryerson Hall > > > > > Chicago, IL 60637 > > > > > ============================================ > > > > > Email: iraicu at cs.uchicago.edu > > > > > Web: http://www.cs.uchicago.edu/~iraicu > > > > > http://dsl.cs.uchicago.edu/ > > > > > ============================================ > > > > > ============================================ > > > > > > > > > > > > > > > > > > > > > -- > > > ============================================ > > > Ioan Raicu > > > Ph.D. Student > > > ============================================ > > > Distributed Systems Laboratory > > > Computer Science Department > > > University of Chicago > > > 1100 E. 58th Street, Ryerson Hall > > > Chicago, IL 60637 > > > ============================================ > > > Email: iraicu at cs.uchicago.edu > > > Web: http://www.cs.uchicago.edu/~iraicu > > > http://dsl.cs.uchicago.edu/ > > > ============================================ > > > ============================================ > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From hategan at mcs.anl.gov Sun Oct 28 22:07:20 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 22:07:20 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <1193626822.31045.31.camel@blabla.mcs.anl.gov> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> <1193626822.31045.31.camel@blabla.mcs.anl.gov> Message-ID: <1193627240.31045.34.camel@blabla.mcs.anl.gov> > > For example, this means allowing the user to turn off site scoring. > > Hmm. I think this went in the wrong direction. I assumed you knew that > these things can be turned off And you say that in an earlier email (see below). I'm not following any more. What's the issue? > > > > > > > > > On Sun, 28 Oct 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > they were due to the stale NFS handle error. I think Mihael outlined in an > > > > > > > > > > email a while back how to disable the task submission throttling due to a bad > > > > > > > > > > score, assuming that you have a single site to submit to anyways. From iraicu at cs.uchicago.edu Sun Oct 28 22:37:40 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 22:37:40 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <1193625230.31045.3.camel@blabla.mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> <1193610846.28024.12.camel@blabla.mcs.anl.gov> <47252E74.9060903@cs.uchicago.edu> <1193625230.31045.3.camel@blabla.mcs.anl.gov> Message-ID: <47255584.7010201@cs.uchicago.edu> I remember the guy who gave the talk, so when they send out the slides, I can point you to the exact source. In the meantime, from what I remember, it was an app that ran over a Microsoft Windows Cluster Edition with 300 processors, and the application completed in some 24 hours (~1 sec / job). That is an average throughput of 300+ jobs/sec, pretty impressive. Now, I don't know if the app was using any workflow system, or if it was simply an app that could talk to a cluster to submit jobs. I'll try to find out more details on this, as I think it would be great to be able to compare even with Falkon at some level. Ioan Mihael Hategan wrote: > On Sun, 2007-10-28 at 19:51 -0500, Ioan Raicu wrote: > >> At the Microsoft workshop I just attended, someone had a 25 million >> task application that dealt with AIDS research :) >> > > :) > > We might also get there at some undetermined point in the future. > Luckily we can easily change the scheme at that time without causing too > much trouble. > > Do you know the name of the system? It may be very useful to learn how > they do it, and what problems they have hit. > > >> Mihael Hategan wrote: >> >>>> Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range >>>> of values: >>>> >>>> 00000000000000/angle4-00000000000001-kickstart.xml >>>> >>>> >>> Although that's silly. We'll never have more than 10 million jobs of a >>> kind (pretty much like 640K should be enough for everybody). >>> >>> >>> >>>>> 000000/angle4-00000002-kickstart.xml >>>>> ... >>>>> 000000/angle4-00000099-kickstart.xml >>>>> ... >>>>> 000020/angle4-00002076-kickstart.xml >>>>> etc. >>>>> >>>>> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other >>>>> splits can be done with mod() functions. >>>>> >>>>> Can we start heading in this or some similar direction? >>>>> >>>>> We need to coordinate a plan for this, I suspect, to make Andrew's >>>>> workflows perform acceptably. >>>>> >>>>> - Mike >>>>> >>>>> >>>>> >>>>> On 10/27/07 2:08 PM, Ben Clifford wrote: >>>>> >>>>> >>>>>> On Sat, 27 Oct 2007, Mihael Hategan wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Quickly before I leave the house: >>>>>>> Perhaps we could try copying to local FS instead of linking from shared >>>>>>> dir and hence running the jobs on the local FS. >>>>>>> >>>>>>> >>>>>> Maybe. I'd be suspicious that doesn't reduce access to the directory too >>>>>> much. >>>>>> >>>>>> I think the directories where there are lots of files being read/written >>>>>> by lots of hosts are: >>>>>> >>>>>> the top directory (one job directory per job) >>>>>> the info directory >>>>>> the kickstart directory >>>>>> the file cache >>>>>> >>>>>> In the case where directories get too many files in them because of >>>>>> directory size constraints, its common to split that directory into many >>>>>> smaller directories (eg. how squid caching, or git object storage works). >>>>>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some >>>>>> short hash of the filename (with the hash here being 'extract the first >>>>>> two characters). >>>>>> >>>>>> Pretty much I think Andrew wanted to do that for his data files anyway, >>>>>> which would then reflect in the layout of the data cache directory >>>>>> structure. >>>>>> >>>>>> For job directories, it may not be too hard to split the big directories >>>>>> into smaller ones. There will still be write-lock conflicts, but this >>>>>> might mean the contention for each directories write-lock is lower. >>>>>> >>>>>> >>>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Sun Oct 28 22:41:19 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 22:41:19 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <1193627240.31045.34.camel@blabla.mcs.anl.gov> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> <1193626822.31045.31.camel@blabla.mcs.anl.gov> <1193627240.31045.34.camel@blabla.mcs.anl.gov> Message-ID: <4725565F.4030404@cs.uchicago.edu> If the knobs are all there, then I don't think there is an issue at the moment. I think this all started by Ben saying that there was excessive throttling due to the site scoring. Understanding how to fix the site scoring is one thing. Being able to disable site scoring is another, which seems to be there already. Ben, can you turn site scoring off, and see if that solves your problem for now? Ioan Mihael Hategan wrote: >>> For example, this means allowing the user to turn off site scoring. >>> >> Hmm. I think this went in the wrong direction. I assumed you knew that >> these things can be turned off >> > > And you say that in an earlier email (see below). > > I'm not following any more. What's the issue? > > >>>>>>>>>> On Sun, 28 Oct 2007, Ioan Raicu wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> they were due to the stale NFS handle error. I think Mihael outlined in an >>>>>>>>>>> email a while back how to disable the task submission throttling due to a bad >>>>>>>>>>> score, assuming that you have a single site to submit to anyways. >>>>>>>>>>> > > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Oct 28 22:44:15 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 22:44:15 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <472531C2.6020206@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> Message-ID: <1193629455.888.6.camel@blabla.mcs.anl.gov> Btw, a gentle introduction to systems that have to deal with things changing in unpredictable ways: http://en.wikipedia.org/wiki/Control_theory There's all this crap romanians as CE/CS majors have to take in college, including basic control theory. Though everybody can swear at the time by their beers that they'll never need it, sometimes it resurfaces in the strangest of places. On Sun, 2007-10-28 at 20:05 -0500, Ioan Raicu wrote: > This might be so, but when a user comes across behavior that is > significantly sub-optimal (such as sending few jobs that don't utilize > all the nodes at a site), they will want knobs to manually tune things > to be closer to optimal (in their opinion). That said, you are > probably right that the default setting should be completely > automated, but there should be knobs that can be turned on, off, up, > down, etc... to allow the user to avoid the bad behavior. For > example, this means allowing the user to turn off site scoring. > > This is not the first time we are having this discussion, and I only > brought up these points again since Ben started up the discussion. I > think we all have our opinions, and in the end, I am not the one who > will be implementing these knobs, so feel free to do what you think is > best! > > Ioan > > Mihael Hategan wrote: > > On Sun, 2007-10-28 at 15:05 -0500, Ioan Raicu wrote: > > > > > But my argument was, and still is, if there is only one site to submit > > > to, changing situations are almost irrelevant, > > > > > > > Missed that. It is not irrelevant. The speed/capacity of a service is > > determined by: the jobs you submit, the jobs others submit, the specific > > type of hardware, and the load on the service node (and other things > > like network latency). The jobs other submit and the load on the service > > node vary with time. The bad thing about them is that it's hard to > > predict how they affect things. > > > > Furthermore, user specified rates suffer fundamentally from the problem > > of the user having to understand how the whole thing works and picking > > good values. What I've observed is that this doesn't work very well. > > > > > > > as there are no options anyhow. Give me one example, where you have > > > only 1 site, set X and Y properly, yet you need site scores as an > > > additional throttling mechanism! > > > > > > Mihael Hategan wrote: > > > > > > > On Sun, 2007-10-28 at 11:23 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > I mentioned 2 throttling mechanisms, one is to have X outstanding jobs > > > > > at any given time (limits jobs in the queue), and Y jobs/sec > > > > > submit rate (limits the rate of submission). I believe both of these > > > > > throttling mechanisms could exist without computing site scores, > > > > > assuming the user knows what to set X and Y to. > > > > > > > > > > > > > > They do exist, but they don't deal with asymmetries between sites. Nor > > > > do they deal with changing situations. > > > > > > > > > > > > > > > > > Ioan > > > > > > > > > > Mihael Hategan wrote: > > > > > > > > > > > > > > > > On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Assuming you have a single site to submit to, then I don't see why you > > > > > > > don't want to disable the site scoring altogether? > > > > > > > > > > > > > > > > > > > > > > > > > > > Because having too many jobs on that one site may still cause problems. > > > > > > > > > > > > That said, the algorithm currently there needs some work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Of course you still want throttling, but that is more on the level > > > > > > > of X outstanding jobs at any given time (and possibly Y jobs/sec > > > > > > > submit rate), so you don't overrun the LRM, but you would not want to > > > > > > > lower X to some low value just because some jobs are failing. Again, > > > > > > > once you go to multi-site runs, you need the site scoring to decide > > > > > > > among the different sites, but with a single site, I see no drawbacks > > > > > > > to disabling the site scoring mechanism. > > > > > > > > > > > > > > Ioan > > > > > > > > > > > > > > Ben Clifford wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, 28 Oct 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > they were due to the stale NFS handle error. I think Mihael outlined in an > > > > > > > > > email a while back how to disable the task submission throttling due to a bad > > > > > > > > > score, assuming that you have a single site to submit to anyways. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I know how to disable it. I don't particularly want it running rate free. > > > > > > > > > > > > > > > > Whats happening here is that the feedback loop feeding back too much / too > > > > > > > > fast for the situation I experience. > > > > > > > > > > > > > > > > There's plenty of fun to be had experimenting there; and I suspect there > > > > > > > > will be no One True Rate Controller. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > ============================================ > > > > > > > Ioan Raicu > > > > > > > Ph.D. Student > > > > > > > ============================================ > > > > > > > Distributed Systems Laboratory > > > > > > > Computer Science Department > > > > > > > University of Chicago > > > > > > > 1100 E. 58th Street, Ryerson Hall > > > > > > > Chicago, IL 60637 > > > > > > > ============================================ > > > > > > > Email: iraicu at cs.uchicago.edu > > > > > > > Web: http://www.cs.uchicago.edu/~iraicu > > > > > > > http://dsl.cs.uchicago.edu/ > > > > > > > ============================================ > > > > > > > ============================================ > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > ============================================ > > > > > Ioan Raicu > > > > > Ph.D. Student > > > > > ============================================ > > > > > Distributed Systems Laboratory > > > > > Computer Science Department > > > > > University of Chicago > > > > > 1100 E. 58th Street, Ryerson Hall > > > > > Chicago, IL 60637 > > > > > ============================================ > > > > > Email: iraicu at cs.uchicago.edu > > > > > Web: http://www.cs.uchicago.edu/~iraicu > > > > > http://dsl.cs.uchicago.edu/ > > > > > ============================================ > > > > > ============================================ > > > > > > > > > > > > > > > > > > > > > -- > > > ============================================ > > > Ioan Raicu > > > Ph.D. Student > > > ============================================ > > > Distributed Systems Laboratory > > > Computer Science Department > > > University of Chicago > > > 1100 E. 58th Street, Ryerson Hall > > > Chicago, IL 60637 > > > ============================================ > > > Email: iraicu at cs.uchicago.edu > > > Web: http://www.cs.uchicago.edu/~iraicu > > > http://dsl.cs.uchicago.edu/ > > > ============================================ > > > ============================================ > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From hategan at mcs.anl.gov Sun Oct 28 22:46:47 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 22:46:47 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <4725565F.4030404@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> <1193626822.31045.31.camel@blabla.mcs.anl.gov> <1193627240.31045.34.camel@blabla.mcs.anl.gov> <4725565F.4030404@cs.uchicago.edu> Message-ID: <1193629607.888.9.camel@blabla.mcs.anl.gov> On Sun, 2007-10-28 at 22:41 -0500, Ioan Raicu wrote: > If the knobs are all there, then I don't think there is an issue at > the moment. I think this all started by Ben saying that there was > excessive throttling due to the site scoring. Understanding how to > fix the site scoring is one thing. Being able to disable site scoring > is another, which seems to be there already. Ben, can you turn site > scoring off, and see if that solves your problem for now? You can re-read Ben's earlier answer to your same question. I'll post it here: > > I know how to disable it. I don't particularly want it running rate free. > > Whats happening here is that the feedback loop feeding back too much / too > fast for the situation I experience. > > There's plenty of fun to be had experimenting there; and I suspect there > will be no One True Rate Controller. > > From iraicu at cs.uchicago.edu Sun Oct 28 22:48:35 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 28 Oct 2007 22:48:35 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <1193629607.888.9.camel@blabla.mcs.anl.gov> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> <1193626822.31045.31.camel@blabla.mcs.anl.gov> <1193627240.31045.34.camel@blabla.mcs.anl.gov> <4725565F.4030404@cs.uchicago.edu> <1193629607.888.9.camel@blabla.mcs.anl.gov> Message-ID: <47255813.2090603@cs.uchicago.edu> Right, I now remember reading that... too many emails, and our discussion got side-tracked :) Thanks for the control theory link, it looks like a good read! Ioan Mihael Hategan wrote: > On Sun, 2007-10-28 at 22:41 -0500, Ioan Raicu wrote: > >> If the knobs are all there, then I don't think there is an issue at >> the moment. I think this all started by Ben saying that there was >> excessive throttling due to the site scoring. Understanding how to >> fix the site scoring is one thing. Being able to disable site scoring >> is another, which seems to be there already. Ben, can you turn site >> scoring off, and see if that solves your problem for now? >> > > You can re-read Ben's earlier answer to your same question. I'll post it > here: > > >> I know how to disable it. I don't particularly want it running rate free. >> >> Whats happening here is that the feedback loop feeding back too much / too >> fast for the situation I experience. >> >> There's plenty of fun to be had experimenting there; and I suspect there >> will be no One True Rate Controller. >> >> >> > > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Oct 28 22:53:08 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 28 Oct 2007 22:53:08 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <47255813.2090603@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> <1193626822.31045.31.camel@blabla.mcs.anl.gov> <1193627240.31045.34.camel@blabla.mcs.anl.gov> <4725565F.4030404@cs.uchicago.edu> <1193629607.888.9.camel@blabla.mcs.anl.gov> <47255813.2090603@cs.uchicago.edu> Message-ID: <1193629988.888.12.camel@blabla.mcs.anl.gov> For some reason this quote from that article seems particularily relevant (in that it shows how similar the problem is): A simple way to implement cruise control is to lock the throttle position when the driver engages cruise control. However, on hilly terrain, the vehicle will slow down going uphill and accelerate going downhill. This type of controller is called an open-loop controller because there is no direct connection between the output of the system (the engine torque) and its input (the throttle position). In a closed-loop control system, a feedback controller monitors the output (the vehicle's speed) and adjusts the control input (the throttle) as necessary to keep the control error to a minimum (to maintain the desired speed). This feedback dynamically compensates for disturbances to the system, such as changes in slope of the ground or wind speed. On Sun, 2007-10-28 at 22:48 -0500, Ioan Raicu wrote: > Right, I now remember reading that... too many emails, and our > discussion got side-tracked :) > Thanks for the control theory link, it looks like a good read! > > Ioan > > Mihael Hategan wrote: > > On Sun, 2007-10-28 at 22:41 -0500, Ioan Raicu wrote: > > > > > If the knobs are all there, then I don't think there is an issue at > > > the moment. I think this all started by Ben saying that there was > > > excessive throttling due to the site scoring. Understanding how to > > > fix the site scoring is one thing. Being able to disable site scoring > > > is another, which seems to be there already. Ben, can you turn site > > > scoring off, and see if that solves your problem for now? > > > > > > > You can re-read Ben's earlier answer to your same question. I'll post it > > here: > > > > > > > I know how to disable it. I don't particularly want it running rate free. > > > > > > Whats happening here is that the feedback loop feeding back too much / too > > > fast for the situation I experience. > > > > > > There's plenty of fun to be had experimenting there; and I suspect there > > > will be no One True Rate Controller. > > > > > > > > > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From iraicu at cs.uchicago.edu Mon Oct 29 00:02:23 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 29 Oct 2007 00:02:23 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <1193629988.888.12.camel@blabla.mcs.anl.gov> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> <1193626822.31045.31.camel@blabla.mcs.anl.gov> <1193627240.31045.34.camel@blabla.mcs.anl.gov> <4725565F.4030404@cs.uchicago.edu> <1193629607.888.9.camel@blabla.mcs.anl.gov> <47255813.2090603@cs.uchicago.edu> <1193629988.888.12.camel@blabla.mcs.anl.gov> Message-ID: <4725695F.9000404@cs.uchicago.edu> One thing that I noticed is that the site score is quick to react to failed jobs, but slow to react to successful jobs, to the point that things take a long time to recover once some rough waters were encountered. Using the car analogy from below, it would be like coming to a downward slope where the cruise control adjusts the throttle position from say 25% to 5% while on the downward slope, but then when it gets back on flat ground, not going back to 25% for a long time due to the previous downward slope. Basically, I think the algorithm memory might need some tunning, maybe using a window based memory (as opposed to the entire history of memory), or perhaps give higher weight to more recent events, weight events according to their execution times, reward more successive good jobs to allow the system to get back a high score faster if jobs keep completing successfully, etc... certainly lots of things to try out! Mihael Hategan wrote: > For some reason this quote from that article seems particularily > relevant (in that it shows how similar the problem is): > > A simple way to implement cruise control is to lock the throttle > position when the driver engages cruise control. However, on hilly > terrain, the vehicle will slow down going uphill and accelerate going > downhill. This type of controller is called an open-loop controller > because there is no direct connection between the output of the system > (the engine torque) and its input (the throttle position). > > In a closed-loop control system, a feedback controller monitors the > output (the vehicle's speed) and adjusts the control input (the > throttle) as necessary to keep the control error to a minimum (to > maintain the desired speed). This feedback dynamically compensates for > disturbances to the system, such as changes in slope of the ground or > wind speed. > > > On Sun, 2007-10-28 at 22:48 -0500, Ioan Raicu wrote: > >> Right, I now remember reading that... too many emails, and our >> discussion got side-tracked :) >> Thanks for the control theory link, it looks like a good read! >> >> Ioan >> >> Mihael Hategan wrote: >> >>> On Sun, 2007-10-28 at 22:41 -0500, Ioan Raicu wrote: >>> >>> >>>> If the knobs are all there, then I don't think there is an issue at >>>> the moment. I think this all started by Ben saying that there was >>>> excessive throttling due to the site scoring. Understanding how to >>>> fix the site scoring is one thing. Being able to disable site scoring >>>> is another, which seems to be there already. Ben, can you turn site >>>> scoring off, and see if that solves your problem for now? >>>> >>>> >>> You can re-read Ben's earlier answer to your same question. I'll post it >>> here: >>> >>> >>> >>>> I know how to disable it. I don't particularly want it running rate free. >>>> >>>> Whats happening here is that the feedback loop feeding back too much / too >>>> fast for the situation I experience. >>>> >>>> There's plenty of fun to be had experimenting there; and I suspect there >>>> will be no One True Rate Controller. >>>> >>>> >>>> >>>> >>> >>> >>> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Oct 29 00:22:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 29 Oct 2007 00:22:39 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <4725695F.9000404@cs.uchicago.edu> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> <1193626822.31045.31.camel@blabla.mcs.anl.gov> <1193627240.31045.34.camel@blabla.mcs.anl.gov> <4725565F.4030404@cs.uchicago.edu> <1193629607.888.9.camel@blabla.mcs.anl.gov> <47255813.2090603@cs.uchicago.edu> <1193629988.888.12.camel@blabla.mcs.anl.gov> <4725695F.9000404@cs.uchicago.edu> Message-ID: <1193635359.4267.16.camel@blabla.mcs.anl.gov> On Mon, 2007-10-29 at 00:02 -0500, Ioan Raicu wrote: > One thing that I noticed is that the site score is quick to react to > failed jobs, but slow to react to successful jobs, to the point that > things take a long time to recover once some rough waters were > encountered. Using the car analogy from below, it would be like coming > to a downward slope where the cruise control adjusts the throttle > position from say 25% to 5% while on the downward slope, but then when > it gets back on flat ground, not going back to 25% for a long time due > to the previous downward slope. Basically, I think the algorithm > memory might need some tunning, maybe using a window based memory (as > opposed to the entire history of memory), or perhaps give higher > weight to more recent events, weight events according to their > execution times, reward more successive good jobs to allow the system > to get back a high score faster if jobs keep completing successfully, > etc... certainly lots of things to try out! Yep. It is a very interesting thing. It is intentional that bad jobs affect score different from good jobs. Basically I don't want a site with 50% reliability to keep a constant score. That screws the retries. That ratio basically defines the reliability goal. 1/4 yields an 80% target reliability. With 4 restarts that's about 99.9% reliability, while the 50% case only gives 93% after 4 restarts. But this assumes the number of concurrent jobs on a site determines the reliability, which I think is only a rough approximation. But yes, I think it should somehow integrate time dependence better. In this particular case it should actually account for the fact that a cluster failing should only register as a single job failing for scoring purposes. Anyway. Lots of refinements can be done here. Know any PhD student interested? > > Mihael Hategan wrote: > > For some reason this quote from that article seems particularily > > relevant (in that it shows how similar the problem is): > > > > A simple way to implement cruise control is to lock the throttle > > position when the driver engages cruise control. However, on hilly > > terrain, the vehicle will slow down going uphill and accelerate going > > downhill. This type of controller is called an open-loop controller > > because there is no direct connection between the output of the system > > (the engine torque) and its input (the throttle position). > > > > In a closed-loop control system, a feedback controller monitors the > > output (the vehicle's speed) and adjusts the control input (the > > throttle) as necessary to keep the control error to a minimum (to > > maintain the desired speed). This feedback dynamically compensates for > > disturbances to the system, such as changes in slope of the ground or > > wind speed. > > > > > > On Sun, 2007-10-28 at 22:48 -0500, Ioan Raicu wrote: > > > > > Right, I now remember reading that... too many emails, and our > > > discussion got side-tracked :) > > > Thanks for the control theory link, it looks like a good read! > > > > > > Ioan > > > > > > Mihael Hategan wrote: > > > > > > > On Sun, 2007-10-28 at 22:41 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > If the knobs are all there, then I don't think there is an issue at > > > > > the moment. I think this all started by Ben saying that there was > > > > > excessive throttling due to the site scoring. Understanding how to > > > > > fix the site scoring is one thing. Being able to disable site scoring > > > > > is another, which seems to be there already. Ben, can you turn site > > > > > scoring off, and see if that solves your problem for now? > > > > > > > > > > > > > > You can re-read Ben's earlier answer to your same question. I'll post it > > > > here: > > > > > > > > > > > > > > > > > I know how to disable it. I don't particularly want it running rate free. > > > > > > > > > > Whats happening here is that the feedback loop feeding back too much / too > > > > > fast for the situation I experience. > > > > > > > > > > There's plenty of fun to be had experimenting there; and I suspect there > > > > > will be no One True Rate Controller. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > ============================================ > > > Ioan Raicu > > > Ph.D. Student > > > ============================================ > > > Distributed Systems Laboratory > > > Computer Science Department > > > University of Chicago > > > 1100 E. 58th Street, Ryerson Hall > > > Chicago, IL 60637 > > > ============================================ > > > Email: iraicu at cs.uchicago.edu > > > Web: http://www.cs.uchicago.edu/~iraicu > > > http://dsl.cs.uchicago.edu/ > > > ============================================ > > > ============================================ > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From hategan at mcs.anl.gov Mon Oct 29 01:07:44 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 29 Oct 2007 01:07:44 -0500 Subject: [Swift-devel] excessive rate throttling for apparently temporally-restricted failures In-Reply-To: <1193635359.4267.16.camel@blabla.mcs.anl.gov> References: <4724A485.60906@cs.uchicago.edu> <4724A9D7.2010207@cs.uchicago.edu> <1193588147.15017.2.camel@blabla.mcs.anl.gov> <4724B78D.4010907@cs.uchicago.edu> <1193601091.22794.3.camel@blabla.mcs.anl.gov> <4724EB90.1060507@cs.uchicago.edu> <1193607722.25186.17.camel@blabla.mcs.anl.gov> <472531C2.6020206@cs.uchicago.edu> <1193626822.31045.31.camel@blabla.mcs.anl.gov> <1193627240.31045.34.camel@blabla.mcs.anl.gov> <4725565F.4030404@cs.uchicago.edu> <1193629607.888.9.camel@blabla.mcs.anl.gov> <47255813.2090603@cs.uchicago.edu> <1193629988.888.12.camel@blabla.mcs.anl.gov> <4725695F.9000404@cs.uchicago.edu> <1193635359.4267.16.camel@blabla.mcs.anl.gov> Message-ID: <1193638064.5148.32.camel@blabla.mcs.anl.gov> On Mon, 2007-10-29 at 00:22 -0500, Mihael Hategan wrote: > On Mon, 2007-10-29 at 00:02 -0500, Ioan Raicu wrote: > > One thing that I noticed is that the site score is quick to react to > > failed jobs, but slow to react to successful jobs, to the point that > > things take a long time to recover once some rough waters were > > encountered. Using the car analogy from below, it would be like coming > > to a downward slope where the cruise control adjusts the throttle > > position from say 25% to 5% while on the downward slope, but then when > > it gets back on flat ground, not going back to 25% for a long time due > > to the previous downward slope. Basically, I think the algorithm > > memory might need some tunning, maybe using a window based memory (as > > opposed to the entire history of memory), or perhaps give higher > > weight to more recent events, weight events according to their > > execution times, reward more successive good jobs to allow the system > > to get back a high score faster if jobs keep completing successfully, > > etc... certainly lots of things to try out! > > Yep. It is a very interesting thing. > > It is intentional that bad jobs affect score different from good jobs. > Basically I don't want a site with 50% reliability to keep a constant > score. That screws the retries. That ratio basically defines the > reliability goal. 1/4 yields an 80% target reliability. With 4 restarts > that's about 99.9% reliability, while the 50% case only gives 93% after > 4 restarts. But this assumes the number of concurrent jobs on a site > determines the reliability, which I think is only a rough approximation. Also the value used for throttling isn't linear. It's something like e^(B*arctan(C*score)), where B and C are constants empirically determined. Try plotting it with gnuplot to see what it looks like*. This is there to satisfy some things: - Stability (output being bound - this function leads to stronger than BIBO stability in process control because it's still bound for infinite input; therefore it may be too strict and one source of problems) - Tweakability of the first derivative around 0. This basically dictates how the output grows around the origin (i.e. when the workflow starts, how do we allow the throttle to grow) - Continuity (I think this is there to make sure there are no crazy oscillations, although it's a bit silly because this is rather a discrete time system so it may happen anyway) - Continuity of the first derivative (just felt elegant, so it probably has some meaning). - Lower bound strictly positive (in this case it's actually 1/upper_bound) - i.e. we always leave some small odds that a job will eventually be sent to a site to allow it to increase its score. Anyway, this is rather rudimentary, but somewhat effective. I think one problem is that this is only indirectly part of the feedback loop (in that it only affects the throttling, not the state/score). The real way to do this is to specify the whole system as accurately as possible and actually model the transfer function, but that's done with nasty mathematics for which I didn't have the mood at the time, nor the appropriate recollection (if I ever had that knowledge). That or Matlab/Simulink. (*) C = 0.2, B = 2 * ln(T)/PI, where T = 100 (the upper bound and the inverse of the lower bound). > > But yes, I think it should somehow integrate time dependence better. In > this particular case it should actually account for the fact that a > cluster failing should only register as a single job failing for scoring > purposes. > > Anyway. Lots of refinements can be done here. Know any PhD student > interested? > > > > > Mihael Hategan wrote: > > > For some reason this quote from that article seems particularily > > > relevant (in that it shows how similar the problem is): > > > > > > A simple way to implement cruise control is to lock the throttle > > > position when the driver engages cruise control. However, on hilly > > > terrain, the vehicle will slow down going uphill and accelerate going > > > downhill. This type of controller is called an open-loop controller > > > because there is no direct connection between the output of the system > > > (the engine torque) and its input (the throttle position). > > > > > > In a closed-loop control system, a feedback controller monitors the > > > output (the vehicle's speed) and adjusts the control input (the > > > throttle) as necessary to keep the control error to a minimum (to > > > maintain the desired speed). This feedback dynamically compensates for > > > disturbances to the system, such as changes in slope of the ground or > > > wind speed. > > > > > > > > > On Sun, 2007-10-28 at 22:48 -0500, Ioan Raicu wrote: > > > > > > > Right, I now remember reading that... too many emails, and our > > > > discussion got side-tracked :) > > > > Thanks for the control theory link, it looks like a good read! > > > > > > > > Ioan > > > > > > > > Mihael Hategan wrote: > > > > > > > > > On Sun, 2007-10-28 at 22:41 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > > > > If the knobs are all there, then I don't think there is an issue at > > > > > > the moment. I think this all started by Ben saying that there was > > > > > > excessive throttling due to the site scoring. Understanding how to > > > > > > fix the site scoring is one thing. Being able to disable site scoring > > > > > > is another, which seems to be there already. Ben, can you turn site > > > > > > scoring off, and see if that solves your problem for now? > > > > > > > > > > > > > > > > > You can re-read Ben's earlier answer to your same question. I'll post it > > > > > here: > > > > > > > > > > > > > > > > > > > > > I know how to disable it. I don't particularly want it running rate free. > > > > > > > > > > > > Whats happening here is that the feedback loop feeding back too much / too > > > > > > fast for the situation I experience. > > > > > > > > > > > > There's plenty of fun to be had experimenting there; and I suspect there > > > > > > will be no One True Rate Controller. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > ============================================ > > > > Ioan Raicu > > > > Ph.D. Student > > > > ============================================ > > > > Distributed Systems Laboratory > > > > Computer Science Department > > > > University of Chicago > > > > 1100 E. 58th Street, Ryerson Hall > > > > Chicago, IL 60637 > > > > ============================================ > > > > Email: iraicu at cs.uchicago.edu > > > > Web: http://www.cs.uchicago.edu/~iraicu > > > > http://dsl.cs.uchicago.edu/ > > > > ============================================ > > > > ============================================ > > > > > > > > > > > > > > > > > -- > > ============================================ > > Ioan Raicu > > Ph.D. Student > > ============================================ > > Distributed Systems Laboratory > > Computer Science Department > > University of Chicago > > 1100 E. 58th Street, Ryerson Hall > > Chicago, IL 60637 > > ============================================ > > Email: iraicu at cs.uchicago.edu > > Web: http://www.cs.uchicago.edu/~iraicu > > http://dsl.cs.uchicago.edu/ > > ============================================ > > ============================================ > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Mon Oct 29 03:47:04 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 08:47:04 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <4725113B.5070909@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov> <1193610473.28024.7.camel@blabla.mcs.anl.gov> <4725113B.5070909@mcs.anl.gov> Message-ID: On Sun, 28 Oct 2007, Michael Wilde wrote: > Workflow IDs dont need to be unique outside of a user or group. The way I've been thinking things would work with log file names (which to an extent overlaps with workflow IDs) is this: * Swift generates a log file name by default that is very unique (i.e. its present format is workflow name + timestamp + random) * The log file name can be overridden with the -log command line option (which was broken but I fixed it in r1357) * To get domain-specific log file naming with your own uniqueness rules (eg. a sequence number), use -log to specify that. I think the present log naming is a good way to name things in the absence of any domain-specific naming strategy; and I think -log is a good way for a domain specific naming strategy to be plugged in. -- From benc at hawaga.org.uk Mon Oct 29 05:24:10 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 10:24:10 +0000 (GMT) Subject: [Swift-devel] what's new web events In-Reply-To: <4724B7F8.5060108@cs.uchicago.edu> References: <1193588235.15017.4.camel@blabla.mcs.anl.gov> <4724B7F8.5060108@cs.uchicago.edu> Message-ID: added. On Sun, 28 Oct 2007, Ioan Raicu wrote: > The Falkon talk will have some Swift related slides as well, maybe you want to > link to that as well > (http://sc07.supercomputing.org/schedule/event_detail.php?evid=11098). > > Ioan > > Ben Clifford wrote: > > On Sun, 28 Oct 2007, Mihael Hategan wrote: > > > > > > > "You can find Swift will in several places at SC07 in Reno, Nevada" > > > > > > Hmm? > > > > > > > heh, my usual missing verb problem. > > > > > > > On Sun, 2007-10-28 at 12:45 +0000, Ben Clifford wrote: > > > > > > > I updated the front web page so that the what's new section lists events > > > > at SC07, rather than being a link to the quickstart guide. > > > > > > > > The events listed are the analytics tutorial, the analytics challenge > > > > that Mike has been working on with Bob Grossman, and the SC07 booth > > > > presentation. > > > > > > > > If there's anything else going on, I can add it or you can too, in > > > > www/inc/home_sidebar.php in the SVN. > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > From itf at mcs.anl.gov Mon Oct 29 07:35:42 2007 From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=) Date: Mon, 29 Oct 2007 12:35:42 +0000 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov><1193610473.28024.7.camel@blabla.mcs.anl.gov><4725113B.5070909@mcs.anl.gov> Message-ID: <550138407-1193661416-cardhu_decombobulator_blackberry.rim.net-400906280-@bxe030.bisx.prod.on.blackberry> If they are not globally unique, don't we have problems when we combine logs from multiple sources? Sent via BlackBerry from T-Mobile -----Original Message----- From: Ben Clifford Date: Mon, 29 Oct 2007 08:47:04 To:Michael Wilde Cc:swiftdevel Subject: Re: [Swift-devel] Clustering and Temp Dirs with Swift On Sun, 28 Oct 2007, Michael Wilde wrote: > Workflow IDs dont need to be unique outside of a user or group. The way I've been thinking things would work with log file names (which to an extent overlaps with workflow IDs) is this: * Swift generates a log file name by default that is very unique (i.e. its present format is workflow name + timestamp + random) * The log file name can be overridden with the -log command line option (which was broken but I fixed it in r1357) * To get domain-specific log file naming with your own uniqueness rules (eg. a sequence number), use -log to specify that. I think the present log naming is a good way to name things in the absence of any domain-specific naming strategy; and I think -log is a good way for a domain specific naming strategy to be plugged in. -- _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Mon Oct 29 07:43:59 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 12:43:59 +0000 (GMT) Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <550138407-1193661416-cardhu_decombobulator_blackberry.rim.net-400906280-@bxe030.bisx.prod.on.blackberry> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov><1193610473.28024.7.camel@blabla.mcs.anl.gov><4725113B.5070909@mcs.anl.gov> <550138407-1193661416-cardhu_decombobulator_blackberry.rim.net-400906280-@bxe030.bisx.prod.on.blackberry> Message-ID: On Mon, 29 Oct 2007, Ian Foster wrote: > If they are not globally unique, don't we have problems when we combine > logs from multiple sources? yes. like i said in the mail that you quoted, swift will come up with an identifier that is almost definitely globally unique. you can make your own unique identifier space, using -log. if it happens to not be unique, then you didn't design your unique identifier space very well. we can't protect against that. -- From benc at hawaga.org.uk Mon Oct 29 07:56:37 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 12:56:37 +0000 (GMT) Subject: [Swift-devel] wrapper.sh logging Message-ID: As of r1406, there's a new config option 'sitedir.keep' which defaults to false (preserving existing behaviour). If this option is set to true, then swift will not clean up the site working directory at the end of a workflow. This will leave info/ records in place rather than deleting them. The info records now contain timestamps and state information, and the log-processing code contains some more stuff to graph that information. -- From wilde at mcs.anl.gov Mon Oct 29 08:37:15 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 29 Oct 2007 08:37:15 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov><1193610473.28024.7.camel@blabla.mcs.anl.gov><4725113B.5070909@mcs.anl.gov> <550138407-1193661416-cardhu_decombobulator_blackberry.rim.net-400906280-@bxe030.bisx.prod.on.blackberry> Message-ID: <4725E20B.5070706@mcs.anl.gov> I think this approach as outlined in Bens earlier email is fine. After lots of experience we'll learn what a good default is. The current default is good for now. On 10/29/07 7:43 AM, Ben Clifford wrote: > On Mon, 29 Oct 2007, Ian Foster wrote: > >> If they are not globally unique, don't we have problems when we combine >> logs from multiple sources? > > yes. > > like i said in the mail that you quoted, swift will come up with an > identifier that is almost definitely globally unique. > > you can make your own unique identifier space, using -log. > > if it happens to not be unique, then you didn't design your unique > identifier space very well. we can't protect against that. > From wilde at mcs.anl.gov Mon Oct 29 08:38:44 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 29 Oct 2007 08:38:44 -0500 Subject: [Swift-devel] wrapper.sh logging In-Reply-To: References: Message-ID: <4725E264.8000509@mcs.anl.gov> Sounds great. Should Andrew re-run a few good-sized workflows to generate info/ dirs with timestamps, so we can zero in on the operation(s) that are taking the most time? On 10/29/07 7:56 AM, Ben Clifford wrote: > As of r1406, there's a new config option 'sitedir.keep' which defaults > to false (preserving existing behaviour). > > If this option is set to true, then swift will not clean up the site > working directory at the end of a workflow. This will leave info/ records > in place rather than deleting them. > > The info records now contain timestamps and state information, and the > log-processing code contains some more stuff to graph that information. > From benc at hawaga.org.uk Mon Oct 29 08:39:34 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 13:39:34 +0000 (GMT) Subject: [Swift-devel] wrapper.sh logging In-Reply-To: <4725E264.8000509@mcs.anl.gov> References: <4725E264.8000509@mcs.anl.gov> Message-ID: On Mon, 29 Oct 2007, Michael Wilde wrote: > Should Andrew re-run a few good-sized workflows to generate info/ dirs with > timestamps, so we can zero in on the operation(s) that are taking the most > time? A workflow of the size that was showing the suspect symptoms before would be good. -- From wilde at mcs.anl.gov Mon Oct 29 08:47:19 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 29 Oct 2007 08:47:19 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <550138407-1193661416-cardhu_decombobulator_blackberry.rim.net-400906280-@bxe030.bisx.prod.on.blackberry> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov><1193610473.28024.7.camel@blabla.mcs.anl.gov><4725113B.5070909@mcs.anl.gov> <550138407-1193661416-cardhu_decombobulator_blackberry.rim.net-400906280-@bxe030.bisx.prod.on.blackberry> Message-ID: <4725E467.3020403@mcs.anl.gov> I was suggesting that workflow IDs get their global uniqueness via a composite name, not a single globally unique GUID. As we collect data in a central place, I envision a hierarchy of $SWIFT_LOGS/project/user/submithost/workflow-run/intermediate-dirs/objects (or something similar) This hierarchy doesnt have to be consistent or constant, as long as there is as well-defined notion of a workflow's "run directory" and the path to each run dir is unique. The the log processor will find everything. As a user, having to constantly work in a space of "dense" unique names is hard - its a source of cognitive dissonance. If the system would give me a choice of using simpler name, nicely balance my files over directories for performance, and accept my log data for analysis, that would be great. But most important is that it work well and fast. Given a choice, I'd much rather work using the current "dissonant" names than not work. So my comments on naming are a minor issue and we can put them aside for now. (I will try harder to stop talking about this ;) We're currently focusing on solving the performance problems and continually enhancing the log processing for analysis (related). We should keep doing that, and can review our file-naming issues in a few months from now, unless naming changes are needed for directory balancing. - Mike On 10/29/07 7:35 AM, Ian Foster wrote: > If they are not globally unique, don't we have problems when we combine logs from multiple sources? > > Sent via BlackBerry from T-Mobile > > -----Original Message----- > From: Ben Clifford > > Date: Mon, 29 Oct 2007 08:47:04 > To:Michael Wilde > Cc:swiftdevel > Subject: Re: [Swift-devel] Clustering and Temp Dirs with Swift > > > > On Sun, 28 Oct 2007, Michael Wilde wrote: > >> Workflow IDs dont need to be unique outside of a user or group. > > The way I've been thinking things would work with log file names (which to > an extent overlaps with workflow IDs) is this: > > * Swift generates a log file name by default that is very unique > (i.e. its present format is workflow name + timestamp + random) > > * The log file name can be overridden with the -log command line option > (which was broken but I fixed it in r1357) > > * To get domain-specific log file naming with your own > uniqueness rules (eg. a sequence number), use -log > to specify that. > > I think the present log naming is a good way to name things in the absence > of any domain-specific naming strategy; and I think -log is a good way for > a domain specific naming strategy to be plugged in. > From foster at mcs.anl.gov Mon Oct 29 09:17:50 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Mon, 29 Oct 2007 09:17:50 -0500 Subject: [Swift-devel] Clustering and Temp Dirs with Swift In-Reply-To: <4725E467.3020403@mcs.anl.gov> References: <472386D6.2020707@mcs.anl.gov> <1193511485.27417.52.camel@blabla.mcs.anl.gov> <472509FF.5050608@mcs.anl.gov><1193610473.28024.7.camel@blabla.mcs.anl.gov><4725113B.5070909@mcs.anl.gov> <550138407-1193661416-cardhu_decombobulator_blackberry.rim.net-400906280-@bxe030.bisx.prod.on.blackberry> <4725E467.3020403@mcs.anl.gov> Message-ID: <4725EB8E.806@mcs.anl.gov> unique + user-controlled (two parts) sounds a good idea, I understand the motivation Michael Wilde wrote: > I was suggesting that workflow IDs get their global uniqueness via a > composite name, not a single globally unique GUID. > > As we collect data in a central place, I envision a hierarchy of > $SWIFT_LOGS/project/user/submithost/workflow-run/intermediate-dirs/objects > > > (or something similar) > > This hierarchy doesnt have to be consistent or constant, as long as > there is as well-defined notion of a workflow's "run directory" and > the path to each run dir is unique. The the log processor will find > everything. > > As a user, having to constantly work in a space of "dense" unique > names is hard - its a source of cognitive dissonance. > > If the system would give me a choice of using simpler name, nicely > balance my files over directories for performance, and accept my log > data for analysis, that would be great. But most important is that it > work well and fast. > > Given a choice, I'd much rather work using the current "dissonant" > names than not work. So my comments on naming are a minor issue and we > can put them aside for now. (I will try harder to stop talking about > this ;) > > We're currently focusing on solving the performance problems and > continually enhancing the log processing for analysis (related). We > should keep doing that, and can review our file-naming issues in a few > months from now, unless naming changes are needed for directory > balancing. > > - Mike > > > On 10/29/07 7:35 AM, Ian Foster wrote: >> If they are not globally unique, don't we have problems when we >> combine logs from multiple sources? >> >> Sent via BlackBerry from T-Mobile >> >> -----Original Message----- >> From: Ben Clifford >> >> Date: Mon, 29 Oct 2007 08:47:04 To:Michael Wilde >> Cc:swiftdevel >> Subject: Re: [Swift-devel] Clustering and Temp Dirs with Swift >> >> >> >> On Sun, 28 Oct 2007, Michael Wilde wrote: >> >>> Workflow IDs dont need to be unique outside of a user or group. >> >> The way I've been thinking things would work with log file names >> (which to an extent overlaps with workflow IDs) is this: >> >> * Swift generates a log file name by default that is very unique >> (i.e. its present format is workflow name + timestamp + random) >> >> * The log file name can be overridden with the -log command line >> option >> (which was broken but I fixed it in r1357) >> >> * To get domain-specific log file naming with your own >> uniqueness rules (eg. a sequence number), use -log >> to specify that. >> >> I think the present log naming is a good way to name things in the >> absence of any domain-specific naming strategy; and I think -log is a >> good way for a domain specific naming strategy to be plugged in. >> > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From andrewj at uchicago.edu Mon Oct 29 11:38:44 2007 From: andrewj at uchicago.edu (Andrew Robert Jamieson) Date: Mon, 29 Oct 2007 11:38:44 -0500 (CDT) Subject: [Swift-devel] Clustering / wrapper.sh logging In-Reply-To: References: <4725E264.8000509@mcs.anl.gov> Message-ID: Mike suggested I point folks who are interested to the dir on Teraport where this run is taking place now collecting data on the problem. This is an example of the 6 hour long WF in reality, which should only take 20 mins or so. On Teraport: /home/andrewj/scratch/SWIFT/Windowlicker-20071029-1043-1zhh40ob Notice, if one does an "ls" it lags quite a bit. Also notice the number of jobs I have in the queue at the moment (both running and waiting). Thanks, Andrew On Mon, 29 Oct 2007, Ben Clifford wrote: > > > On Mon, 29 Oct 2007, Michael Wilde wrote: > >> Should Andrew re-run a few good-sized workflows to generate info/ dirs with >> timestamps, so we can zero in on the operation(s) that are taking the most >> time? > > A workflow of the size that was showing the suspect symptoms before would > be good. > > -- > From benc at hawaga.org.uk Mon Oct 29 11:41:32 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 16:41:32 +0000 (GMT) Subject: [Swift-devel] Clustering / wrapper.sh logging In-Reply-To: References: <4725E264.8000509@mcs.anl.gov> Message-ID: Don't mess round in that directory too much, though - every time you do an ls, you're peturbing pretty much what it is that I'm trying to get measurements for. On Mon, 29 Oct 2007, Andrew Robert Jamieson wrote: > Mike suggested I point folks who are interested to the dir on Teraport where > this run is taking place now collecting data on the problem. This is an > example of the 6 hour long WF in reality, which should only take 20 mins or > so. > > On Teraport: > /home/andrewj/scratch/SWIFT/Windowlicker-20071029-1043-1zhh40ob > > Notice, if one does an "ls" it lags quite a bit. Also notice the number of > jobs I have in the queue at the moment (both running and waiting). > > Thanks, > Andrew > > On Mon, 29 Oct 2007, Ben Clifford wrote: > > > > > > > On Mon, 29 Oct 2007, Michael Wilde wrote: > > > > > Should Andrew re-run a few good-sized workflows to generate info/ dirs > > > with > > > timestamps, so we can zero in on the operation(s) that are taking the most > > > time? > > > > A workflow of the size that was showing the suspect symptoms before would > > be good. > > > > -- > > > > From benc at hawaga.org.uk Mon Oct 29 12:14:11 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 17:14:11 +0000 (GMT) Subject: [Swift-devel] Clustering / wrapper.sh logging In-Reply-To: References: <4725E264.8000509@mcs.anl.gov> Message-ID: I just grabbed all the info files that are there at the moment and munged them: (note these are not final results - the workflow continues to run) $ cat info.lastsummary 372 END 1 EXECUTE 1 RM_JOBDIR 1017 TOUCH_SUCCESS 372 jobs hit the very end of the wrapper script. 1017 of them are a line before that, at TOUCH_SUCCESS state: >From wrapper.sh: > logstate "TOUCH_SUCCESS" > touch status/${ID}-success > logstate "END" So it looks like I should pay attention first to dealing with status directory scalability. -- From wilde at mcs.anl.gov Mon Oct 29 12:20:29 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 29 Oct 2007 12:20:29 -0500 Subject: [Swift-devel] Swift run to Falkon hanging Message-ID: <4726165D.6090303@mcs.anl.gov> The attached logfile with my comments and questions (full logs in ~benc/swift-logs/wilde/run117) is for a a small 5-job Angle test with about 1-week-old code to try to get a stable falkon config back on uc-teragrid. Ioan has confirmed that there are issues causing the provisioner on the ia32 login host to fail to connect to the service correctly. Im still re-working around those issues. Im trying to just get this config, which worked moderately well last week, back to a working state before I jump to the latest code base. I may give up after this last attempt, and upgrade both swift and falkon. But my last run puzzles me. Can people take a look at the attached log with questions and comments and help me understand whats happening? Basically I get one app exception, things hang, and I dont see what in the log if anything is pointing to the cause. Thanks, Mike -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 14-BadMessagesOnAppException URL: From benc at hawaga.org.uk Mon Oct 29 12:22:28 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 17:22:28 +0000 (GMT) Subject: [Swift-devel] Clustering / wrapper.sh logging In-Reply-To: References: <4725E264.8000509@mcs.anl.gov> Message-ID: actually, there's a bug in my analysis code. please wait. it looks like pretty much everything is hitting 'END' state. On Mon, 29 Oct 2007, Ben Clifford wrote: > > I just grabbed all the info files that are there at the moment and munged > them: (note these are not final results - the workflow continues to run) > > $ cat info.lastsummary > 372 END > 1 EXECUTE > 1 RM_JOBDIR > 1017 TOUCH_SUCCESS > > 372 jobs hit the very end of the wrapper script. 1017 of them are a line > before that, at TOUCH_SUCCESS state: > > From wrapper.sh: > > > logstate "TOUCH_SUCCESS" > > touch status/${ID}-success > > logstate "END" > > So it looks like I should pay attention first to dealing with status > directory scalability. > > From hategan at mcs.anl.gov Mon Oct 29 12:26:54 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 29 Oct 2007 12:26:54 -0500 Subject: [Swift-devel] Clustering / wrapper.sh logging In-Reply-To: References: <4725E264.8000509@mcs.anl.gov> Message-ID: <1193678814.10781.9.camel@blabla.mcs.anl.gov> On Mon, 2007-10-29 at 17:14 +0000, Ben Clifford wrote: > I just grabbed all the info files that are there at the moment and munged > them: (note these are not final results - the workflow continues to run) > > $ cat info.lastsummary > 372 END > 1 EXECUTE > 1 RM_JOBDIR > 1017 TOUCH_SUCCESS > > 372 jobs hit the very end of the wrapper script. 1017 of them are a line > before that, at TOUCH_SUCCESS state: > > >From wrapper.sh: > > > logstate "TOUCH_SUCCESS" > > touch status/${ID}-success > > logstate "END" > > So it looks like I should pay attention first to dealing with status > directory scalability. Here's a bunch of things: vdl-int.k/execute2 uid := uid() dprefix := substr(uid, 0, 1) jobid := concat(tr, "-", uid) ... task:execute("/bin/sh", list("shared/wrapper.sh", jobid, dprefix "-e", vdl:executable(tr, rhost), ... //update checkStatus //update transferKickstartRec wrapper.sh: ... ID=$1 DPREFIX=$2 checkEmpty "$ID" "Missing job ID" checkEmpty "$DPREFIX" "Missing directory prefix" INFO=$WFDIR/info/${DPREFIX}/${ID}-info ... mkdir -p ../kickstart/${DPREFIX} ... mv -f kickstart.xml "../kickstart/${DPREFIX}/$ID-kickstart.xml" 2>&1 >>"$INFO" ... mkdir -p status/${DPREFIX} touch status/${DPREFIX}/${ID}-success > From hategan at mcs.anl.gov Mon Oct 29 12:34:31 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 29 Oct 2007 12:34:31 -0500 Subject: [Swift-devel] Clustering / wrapper.sh logging In-Reply-To: References: <4725E264.8000509@mcs.anl.gov> Message-ID: <1193679271.10781.11.camel@blabla.mcs.anl.gov> I'd also combine log and logstate, perhaps using a similar logging format as the swift logs. On Mon, 2007-10-29 at 17:22 +0000, Ben Clifford wrote: > actually, there's a bug in my analysis code. please wait. it looks like > pretty much everything is hitting 'END' state. > > On Mon, 29 Oct 2007, Ben Clifford wrote: > > > > > I just grabbed all the info files that are there at the moment and munged > > them: (note these are not final results - the workflow continues to run) > > > > $ cat info.lastsummary > > 372 END > > 1 EXECUTE > > 1 RM_JOBDIR > > 1017 TOUCH_SUCCESS > > > > 372 jobs hit the very end of the wrapper script. 1017 of them are a line > > before that, at TOUCH_SUCCESS state: > > > > From wrapper.sh: > > > > > logstate "TOUCH_SUCCESS" > > > touch status/${ID}-success > > > logstate "END" > > > > So it looks like I should pay attention first to dealing with status > > directory scalability. > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Mon Oct 29 12:37:28 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 29 Oct 2007 12:37:28 -0500 Subject: [Swift-devel] Clustering / wrapper.sh logging In-Reply-To: References: <4725E264.8000509@mcs.anl.gov> Message-ID: <1193679448.11565.0.camel@blabla.mcs.anl.gov> Also, don't ignore seq.sh. It does its own share of nasty things. On Mon, 2007-10-29 at 17:22 +0000, Ben Clifford wrote: > actually, there's a bug in my analysis code. please wait. it looks like > pretty much everything is hitting 'END' state. > > On Mon, 29 Oct 2007, Ben Clifford wrote: > > > > > I just grabbed all the info files that are there at the moment and munged > > them: (note these are not final results - the workflow continues to run) > > > > $ cat info.lastsummary > > 372 END > > 1 EXECUTE > > 1 RM_JOBDIR > > 1017 TOUCH_SUCCESS > > > > 372 jobs hit the very end of the wrapper script. 1017 of them are a line > > before that, at TOUCH_SUCCESS state: > > > > From wrapper.sh: > > > > > logstate "TOUCH_SUCCESS" > > > touch status/${ID}-success > > > logstate "END" > > > > So it looks like I should pay attention first to dealing with status > > directory scalability. > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Mon Oct 29 12:39:37 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 17:39:37 +0000 (GMT) Subject: [Swift-devel] Swift run to Falkon hanging In-Reply-To: <4726165D.6090303@mcs.anl.gov> References: <4726165D.6090303@mcs.anl.gov> Message-ID: Hmm. You have a bunch of jobs in progress at the end of the log file. I presume you have lazy errors turned off (that is the default). I'm not sure how eager the eager error handling is - Mihael might know off the top of his head, or I can have a poke. If an execute2 fails with APPLICATION_EXCEPTION, does that kill the whole workflow? I would have thought not but I realise I am not sure. http://www.ci.uchicago.edu/~benc/report-awf2-20071029-0831-nouanbe8/ Specifically: > Breakdown of last known status for execute2s: > > 1 APPLICATION_EXCEPTION > 5 JOB_START 1 execute2 had failed, but 5 were in progres (I think 4 of the original submissions and 1 retry). On Mon, 29 Oct 2007, Michael Wilde wrote: > The attached logfile with my comments and questions (full logs in > ~benc/swift-logs/wilde/run117) is for a a small 5-job Angle test with about > 1-week-old code to try to get a stable falkon config back on uc-teragrid. > > Ioan has confirmed that there are issues causing the provisioner on the ia32 > login host to fail to connect to the service correctly. Im still re-working > around those issues. > > Im trying to just get this config, which worked moderately well last week, > back to a working state before I jump to the latest code base. > > I may give up after this last attempt, and upgrade both swift and falkon. > > But my last run puzzles me. Can people take a look at the attached log with > questions and comments and help me understand whats happening? > > Basically I get one app exception, things hang, and I dont see what in the log > if anything is pointing to the cause. > > Thanks, > > Mike > From bugzilla-daemon at mcs.anl.gov Mon Oct 29 14:54:40 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 29 Oct 2007 14:54:40 -0500 (CDT) Subject: [Swift-devel] [Bug 110] New: move OPTIONS out of swift executable Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=110 Summary: move OPTIONS out of swift executable Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: nefedova at mcs.anl.gov CC: stace at mcs.anl.gov It will be more convenient to have the OPTIONS specified not in swift executable itself, but rather in some config file. I am talking about these options that have to be setup presently inside the swift executable: OPTIONS="-Xms1536m -Xmx1536m" (others might be using some other option). So it would be good to be able to specify it in the config file instead. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From benc at hawaga.org.uk Mon Oct 29 15:26:18 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Oct 2007 20:26:18 +0000 (GMT) Subject: [Swift-devel] Clustering / wrapper.sh logging In-Reply-To: <1193679271.10781.11.camel@blabla.mcs.anl.gov> References: <4725E264.8000509@mcs.anl.gov> <1193679271.10781.11.camel@blabla.mcs.anl.gov> Message-ID: so the wrapper.sh logging a put in suggests that everything encapsulated within wrapper.sh is not taking very long at all. some scripted disgustingness gets me all the wrapper.sh start-end times for a particular cluster, and gives: 1193676034 -> 1193676037 RGI_Man_sh-o3ka5cji 1193676415 -> 1193676424 MassClass_sh_man-p3ka5cji 1193676645 -> 1193676649 MassClass_sh_man-q3ka5cji 1193676791 -> 1193676802 MassClass_sh_man-r3ka5cji 1193677076 -> 1193677088 RGI_Man_sh-t3ka5cji 1193677343 -> 1193677345 RGI_Man_sh-u3ka5cji 1193677616 -> 1193677620 RGI_Man_sh-s3ka5cji There are large gaps (of hundreds of secodns) in the executions, when ideally the various wrapper.sh executions would butt up against each other closely. I'm instrumenting seq.sh as mihael suggested. I'm also going to change its log handling so that it does not use a shared log file. -- From hategan at mcs.anl.gov Tue Oct 30 17:11:51 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 30 Oct 2007 17:11:51 -0500 Subject: [Swift-devel] gridftp pipelining Message-ID: <1193782311.17558.4.camel@blabla.mcs.anl.gov> Apparently there is a way to pipeline gridftp transfers to deal with many-small-files latency problems. I think this could be used in Swift in a manner similar to the job clustering. Mihael From benc at hawaga.org.uk Tue Oct 30 17:15:33 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 30 Oct 2007 22:15:33 +0000 (GMT) Subject: [Swift-devel] gridftp pipelining In-Reply-To: <1193782311.17558.4.camel@blabla.mcs.anl.gov> References: <1193782311.17558.4.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 30 Oct 2007, Mihael Hategan wrote: > Apparently there is a way to pipeline gridftp transfers to deal with > many-small-files latency problems. I think this could be used in Swift > in a manner similar to the job clustering. yes, I just talked to Buzz about that last night wrt to doing this. Did you bring this up because you saw him commit code to do it? ;-) -- From itf at mcs.anl.gov Tue Oct 30 17:16:25 2007 From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=) Date: Tue, 30 Oct 2007 22:16:25 +0000 Subject: [Swift-devel] gridftp pipelining Message-ID: <189236514-1193782660-cardhu_decombobulator_blackberry.rim.net-145202979-@bxe030.bisx.prod.on.blackberry> Interesting idea! ------Original Message------ From: Mihael Hategan Sender: swift-devel-bounces at ci.uchicago.edu To: swift-devel Sent: Oct 30, 2007 5:11 PM Subject: [Swift-devel] gridftp pipelining Apparently there is a way to pipeline gridftp transfers to deal with many-small-files latency problems. I think this could be used in Swift in a manner similar to the job clustering. Mihael _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel Sent via BlackBerry from T-Mobile From wilde at mcs.anl.gov Tue Oct 30 17:19:58 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 30 Oct 2007 17:19:58 -0500 Subject: [Swift-devel] gridftp pipelining In-Reply-To: <1193782311.17558.4.camel@blabla.mcs.anl.gov> References: <1193782311.17558.4.camel@blabla.mcs.anl.gov> Message-ID: <4727AE0E.5020004@mcs.anl.gov> Cool. Might want to look at the VDS transfer and t2 commands. (I recall Ben suggested this a while back). I think those invoked g-u-c, so the mechanism is likely in there. On 10/30/07 5:11 PM, Mihael Hategan wrote: > Apparently there is a way to pipeline gridftp transfers to deal with > many-small-files latency problems. I think this could be used in Swift > in a manner similar to the job clustering. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Tue Oct 30 17:20:32 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 30 Oct 2007 17:20:32 -0500 Subject: [Swift-devel] gridftp pipelining In-Reply-To: References: <1193782311.17558.4.camel@blabla.mcs.anl.gov> Message-ID: <1193782832.17558.11.camel@blabla.mcs.anl.gov> On Tue, 2007-10-30 at 22:15 +0000, Ben Clifford wrote: > > On Tue, 30 Oct 2007, Mihael Hategan wrote: > > > Apparently there is a way to pipeline gridftp transfers to deal with > > many-small-files latency problems. I think this could be used in Swift > > in a manner similar to the job clustering. > > yes, I just talked to Buzz about that last night wrt to doing this. Did > you bring this up because you saw him commit code to do it? ;-) No. I was updating jglobus to do some stuff because of some reason, and in the discussion of the consequences of that update mlink introduced the issue of pipelining. > From hategan at mcs.anl.gov Tue Oct 30 17:21:28 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 30 Oct 2007 17:21:28 -0500 Subject: [Swift-devel] gridftp pipelining In-Reply-To: <1193782311.17558.4.camel@blabla.mcs.anl.gov> References: <1193782311.17558.4.camel@blabla.mcs.anl.gov> Message-ID: <1193782888.17558.13.camel@blabla.mcs.anl.gov> On Tue, 2007-10-30 at 17:11 -0500, Mihael Hategan wrote: > Apparently there is a way to pipeline gridftp transfers to deal with > many-small-files latency problems. I think this could be used in Swift > in a manner similar to the job clustering. Forget that. This could go directly in CoG at the GridFTP connection caching level. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Tue Oct 30 17:29:07 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 30 Oct 2007 22:29:07 +0000 (GMT) Subject: [Swift-devel] gridftp pipelining In-Reply-To: <1193782311.17558.4.camel@blabla.mcs.anl.gov> References: <1193782311.17558.4.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 30 Oct 2007, Mihael Hategan wrote: > Apparently there is a way to pipeline gridftp transfers to deal with > many-small-files latency problems. I think this could be used in Swift > in a manner similar to the job clustering. I also think this is a useful thing to do, btw. -- From hategan at mcs.anl.gov Wed Oct 31 17:21:49 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 31 Oct 2007 17:21:49 -0500 Subject: [Swift-devel] script mapper Message-ID: <1193869309.10145.9.camel@blabla.mcs.anl.gov> I'm working on a script mapper for swift. Basically it allows executing some form of local executable which spells on stdout (swift_path, file_path) pairs. This was motivated by my unwilingness to deal with a potentially deadlocking swift because the CSV mapper was used wrong in I2U2/LIGO (didn't actually get there, but wanted to avoid it). I believe Mike suggested something like this anyway. If you think I'm going in the wrong direction, it would be nice if you said so before I get deeper into it. Mihael From hategan at mcs.anl.gov Wed Oct 31 19:19:47 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 31 Oct 2007 19:19:47 -0500 Subject: [Swift-devel] script mapper In-Reply-To: <1193869309.10145.9.camel@blabla.mcs.anl.gov> References: <1193869309.10145.9.camel@blabla.mcs.anl.gov> Message-ID: <1193876387.18296.5.camel@blabla.mcs.anl.gov> Well, there were no objections so here it is: file fs[] ; foreach f in fs { print(@filename(f)); } The script produces space separated pairs of a path and a file name. The path is similar to what you would do in Swift. In the above case it produces something like: [0] file1 [1] file2 etc. This can be used with arbitrary things, such as: [0].somefield.array[3] file The mapper arguments, except "exec", are mapped to command line arguments prefixed with a hyphen. So the above example causes the following to happen: swift Plot1Chan.swift -GPS_start_time=877890090 -GPS_end_time=877543210 ... list-frames.php -s 877890090 -e 877543210 Mihael On Wed, 2007-10-31 at 17:21 -0500, Mihael Hategan wrote: > I'm working on a script mapper for swift. Basically it allows executing > some form of local executable which spells on stdout (swift_path, > file_path) pairs. This was motivated by my unwilingness to deal with a > potentially deadlocking swift because the CSV mapper was used wrong in > I2U2/LIGO (didn't actually get there, but wanted to avoid it). > > I believe Mike suggested something like this anyway. > > If you think I'm going in the wrong direction, it would be nice if you > said so before I get deeper into it. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Wed Oct 31 19:25:15 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 31 Oct 2007 19:25:15 -0500 Subject: [Swift-devel] script mapper In-Reply-To: <1193869309.10145.9.camel@blabla.mcs.anl.gov> References: <1193869309.10145.9.camel@blabla.mcs.anl.gov> Message-ID: <47291CEB.3000903@mcs.anl.gov> Im in favor of such a mapper - I have indeed asked for it often. I define it as a mapper that users can write as a shell script, any time an existing mapper doesnt serve their needs. For most needs, users should be able to code their own mappers as simple scripts in a language of their choice, and seldom if ever need to add java classes to the system to do mapping. I cant tell if the step you are taking here will get us there, or rather be one step in that direction. Either way, I favor it. If this is what you need to do next for I2U2, I feel you should proceed. In the overall swift to-do list, its high but not highest. (highest is performance, reliability, and usability, mainly error message and logging improvement) - Mike On 10/31/07 5:21 PM, Mihael Hategan wrote: > I'm working on a script mapper for swift. Basically it allows executing > some form of local executable which spells on stdout (swift_path, > file_path) pairs. This was motivated by my unwilingness to deal with a > potentially deadlocking swift because the CSV mapper was used wrong in > I2U2/LIGO (didn't actually get there, but wanted to avoid it). > > I believe Mike suggested something like this anyway. > > If you think I'm going in the wrong direction, it would be nice if you > said so before I get deeper into it. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Wed Oct 31 19:35:58 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 31 Oct 2007 19:35:58 -0500 Subject: [Swift-devel] script mapper In-Reply-To: <47291CEB.3000903@mcs.anl.gov> References: <1193869309.10145.9.camel@blabla.mcs.anl.gov> <47291CEB.3000903@mcs.anl.gov> Message-ID: <1193877358.18796.7.camel@blabla.mcs.anl.gov> On Wed, 2007-10-31 at 19:25 -0500, Michael Wilde wrote: > Im in favor of such a mapper - I have indeed asked for it often. > I define it as a mapper that users can write as a shell script, any time > an existing mapper doesnt serve their needs. For most needs, users > should be able to code their own mappers as simple scripts in a language > of their choice, and seldom if ever need to add java classes to the > system to do mapping. I'd still favor such mappers being some form of first class apps, but I need to get a bit more realistic. > > I cant tell if the step you are taking here will get us there, I think it did. As far as I can gaze it seems to be able to be used for anything by virtue of the fact that it imposes almost no restrictions. > or rather > be one step in that direction. Either way, I favor it. > > If this is what you need to do next for I2U2, I feel you should proceed. > > In the overall swift to-do list, its high but not highest. > (highest is performance, reliability, and usability, mainly error > message and logging improvement) Given that it took a couple of hours, and I don't expect maintenance to be a big issue, I'd say it can't hurt much. Docs and tests are still needed. > > - Mike > > > > > On 10/31/07 5:21 PM, Mihael Hategan wrote: > > I'm working on a script mapper for swift. Basically it allows executing > > some form of local executable which spells on stdout (swift_path, > > file_path) pairs. This was motivated by my unwilingness to deal with a > > potentially deadlocking swift because the CSV mapper was used wrong in > > I2U2/LIGO (didn't actually get there, but wanted to avoid it). > > > > I believe Mike suggested something like this anyway. > > > > If you think I'm going in the wrong direction, it would be nice if you > > said so before I get deeper into it. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From wilde at mcs.anl.gov Wed Oct 31 19:44:26 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 31 Oct 2007 19:44:26 -0500 Subject: [Swift-devel] script mapper In-Reply-To: <1193876387.18296.5.camel@blabla.mcs.anl.gov> References: <1193869309.10145.9.camel@blabla.mcs.anl.gov> <1193876387.18296.5.camel@blabla.mcs.anl.gov> Message-ID: <4729216A.5060805@mcs.anl.gov> Very nice. Youre a step ahead as usual. Im sitting here with Andrew cranking data for the grant proposal and my head hurts. Would be nice to do a simple example with short names suitable for the mapper section of the user guide. if you get to that first, great, if not one of us will. Is this commited/commitable? So the mapper is called "ext", takes a script via exec=, and then arbitrary mapper-specific args? And always is used to map an array? Im not sure this question makes sense, but can the same technique be used to shell out to a mapper that returns a struct or array of structs like the tabular mapper you did for Andrew? The main difference being that the tabular file doesnt need to exist before the program starts? - Mike On 10/31/07 7:19 PM, Mihael Hategan wrote: > Well, there were no objections so here it is: > > file fs[] > ; > > foreach f in fs { > print(@filename(f)); > } > > The script produces space separated pairs of a path and a file name. The > path is similar to what you would do in Swift. In the above case it > produces something like: > > [0] file1 > [1] file2 > etc. > > This can be used with arbitrary things, such as: > [0].somefield.array[3] file > > The mapper arguments, except "exec", are mapped to command line > arguments prefixed with a hyphen. So the above example causes the > following to happen: > > swift Plot1Chan.swift -GPS_start_time=877890090 -GPS_end_time=877543210 > ... > list-frames.php -s 877890090 -e 877543210 > > Mihael > > On Wed, 2007-10-31 at 17:21 -0500, Mihael Hategan wrote: >> I'm working on a script mapper for swift. Basically it allows executing >> some form of local executable which spells on stdout (swift_path, >> file_path) pairs. This was motivated by my unwilingness to deal with a >> potentially deadlocking swift because the CSV mapper was used wrong in >> I2U2/LIGO (didn't actually get there, but wanted to avoid it). >> >> I believe Mike suggested something like this anyway. >> >> If you think I'm going in the wrong direction, it would be nice if you >> said so before I get deeper into it. >> >> Mihael >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Wed Oct 31 19:59:03 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 31 Oct 2007 19:59:03 -0500 Subject: [Swift-devel] script mapper In-Reply-To: <4729216A.5060805@mcs.anl.gov> References: <1193869309.10145.9.camel@blabla.mcs.anl.gov> <1193876387.18296.5.camel@blabla.mcs.anl.gov> <4729216A.5060805@mcs.anl.gov> Message-ID: <1193878744.18796.29.camel@blabla.mcs.anl.gov> On Wed, 2007-10-31 at 19:44 -0500, Michael Wilde wrote: > Very nice. Youre a step ahead as usual. Im sitting here with Andrew > cranking data for the grant proposal and my head hurts. > > Would be nice to do a simple example with short names suitable for the > mapper section of the user guide. if you get to that first, great, if > not one of us will. > > Is this commited/commitable? Committed. > > > So the mapper is called "ext", takes a script via exec=, and then > arbitrary mapper-specific args? Yep. Except "exec", "input", "dbgname", and "descriptor" which are unfortunately reserved. The alternative was to pass it a string but I wanted to avoid quoting issues. > > And always is used to map an array? You can map any data structure that swift supports. If you can express it in swift, it should work. > > Im not sure this question makes sense, but can the same technique be > used to shell out to a mapper that returns a struct or array of structs > like the tabular mapper you did for Andrew? Yes. Or a struct of arrays of structs and however much you want to complicate this. For example, you have a variable v of arbitrary type complexity. You refer to one of its leaves using a path expression, say v.a[10].left.next.a[78]. Then you make your script print the following line: a[10].left.next.a[78] path/to/the/file/this/maps/to ... > The main difference being > that the tabular file doesnt need to exist before the program starts? The tabular file is limited to only certain structures (I think arrays of structs). And, if I remember correctly what Ben mentioned, suffers from deadlocks in certain cases. The downside (?) to the external mapper is that the script must be on the local machine, and is not something that goes into tc.data. Mihael > > - Mike > > > On 10/31/07 7:19 PM, Mihael Hategan wrote: > > Well, there were no objections so here it is: > > > > file fs[] > > ; > > > > foreach f in fs { > > print(@filename(f)); > > } > > > > The script produces space separated pairs of a path and a file name. The > > path is similar to what you would do in Swift. In the above case it > > produces something like: > > > > [0] file1 > > [1] file2 > > etc. > > > > This can be used with arbitrary things, such as: > > [0].somefield.array[3] file > > > > The mapper arguments, except "exec", are mapped to command line > > arguments prefixed with a hyphen. So the above example causes the > > following to happen: > > > > swift Plot1Chan.swift -GPS_start_time=877890090 -GPS_end_time=877543210 > > ... > > list-frames.php -s 877890090 -e 877543210 > > > > Mihael > > > > On Wed, 2007-10-31 at 17:21 -0500, Mihael Hategan wrote: > >> I'm working on a script mapper for swift. Basically it allows executing > >> some form of local executable which spells on stdout (swift_path, > >> file_path) pairs. This was motivated by my unwilingness to deal with a > >> potentially deadlocking swift because the CSV mapper was used wrong in > >> I2U2/LIGO (didn't actually get there, but wanted to avoid it). > >> > >> I believe Mike suggested something like this anyway. > >> > >> If you think I'm going in the wrong direction, it would be nice if you > >> said so before I get deeper into it. > >> > >> Mihael > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From wilde at mcs.anl.gov Wed Oct 31 20:25:10 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 31 Oct 2007 20:25:10 -0500 Subject: [Swift-devel] script mapper In-Reply-To: <1193878744.18796.29.camel@blabla.mcs.anl.gov> References: <1193869309.10145.9.camel@blabla.mcs.anl.gov> <1193876387.18296.5.camel@blabla.mcs.anl.gov> <4729216A.5060805@mcs.anl.gov> <1193878744.18796.29.camel@blabla.mcs.anl.gov> Message-ID: <47292AF6.9010103@mcs.anl.gov> > For example, you have a variable v of arbitrary type complexity. You > refer to one of its leaves using a path expression, say > v.a[10].left.next.a[78]. Then you make your script print the following > line: > a[10].left.next.a[78] path/to/the/file/this/maps/to > ... nice! > The downside (?) to the external mapper is that the script must be on > the local machine, and is not something that goes into tc.data. i think we can generalize this when we generalize our data management models.