From davidkelly at uchicago.edu Sat Mar 1 13:35:46 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Sat, 1 Mar 2014 13:35:46 -0600 Subject: [Swift-devel] Strange errors while staging in data Message-ID: Hello, At the beginning of my run (apparently before any jobs actually start on the scheduler) I am running into this error: Swift 0.95 RC5 swift-r7605 cog-r3874 RunID: 20140301-1911-fe86tqf8 Progress: Sat, 01 Mar 2014 19:11:49+0000 Progress: Sat, 01 Mar 2014 19:11:50+0000 Selecting site:193 Stage in:307 Execution failed: Exception in RunpSIMS: Arguments: [047, 438, params.psims, output/047/438output.tar.gz] Host: midway Directory: RunpSIMS-20140301-1911-fe86tqf8/jobs/g/RunpSIMS-g4b3m5nl exception @ swift-int.k, line: 530 Caused by: java.lang.NullPointerException cache @ swift-int.k, line: 134 Caused by: java.lang.NullPointerException at org.globus.cog.karajan.compiled.nodes.CacheNode.setValue(CacheNode.java:131) at org.globus.cog.karajan.compiled.nodes.CacheNode.runBody(CacheNode.java:77) at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:153) This only seems to happen when I have 500 or more tasks. If I set foreach.max.threads to a low number like 50, it seems to work fine. The log is at http://web.ci.uchicago.edu/~davidk/logs/RunpSIMS-20140301-1921-r4dps0i4.log. Any ideas? Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidkelly at uchicago.edu Sun Mar 2 00:03:12 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Sun, 2 Mar 2014 00:03:12 -0600 Subject: [Swift-devel] Strange errors while staging in data In-Reply-To: References: Message-ID: I made a small change to CacheNode.java in my local copy that just avoids calling any methods on FutureObject f if it is null. Maybe there is a better fix, but it seems to be working well for now anyway. Index: modules/karajan/src/org/globus/cog/karajan/compiled/nodes/CacheNode.java =================================================================== --- modules/karajan/src/org/globus/cog/karajan/compiled/nodes/CacheNode.java (revision 3874) +++ modules/karajan/src/org/globus/cog/karajan/compiled/nodes/CacheNode.java (working copy) @@ -128,7 +128,9 @@ Cache cache = getCache(stack, key == this); synchronized (cache) { FutureObject f = (FutureObject) cache.getCachedValue(key); - f.setValue(ret); + if(f != null) { + f.setValue(ret); + } On Sat, Mar 1, 2014 at 1:35 PM, David Kelly wrote: > Hello, > > At the beginning of my run (apparently before any jobs actually start on > the scheduler) I am running into this error: > > Swift 0.95 RC5 swift-r7605 cog-r3874 > > RunID: 20140301-1911-fe86tqf8 > Progress: Sat, 01 Mar 2014 19:11:49+0000 > Progress: Sat, 01 Mar 2014 19:11:50+0000 Selecting site:193 Stage in:307 > > Execution failed: > Exception in RunpSIMS: > Arguments: [047, 438, params.psims, output/047/438output.tar.gz] > Host: midway > Directory: RunpSIMS-20140301-1911-fe86tqf8/jobs/g/RunpSIMS-g4b3m5nl > exception @ swift-int.k, line: 530 > Caused by: java.lang.NullPointerException > cache @ swift-int.k, line: 134 > Caused by: java.lang.NullPointerException > at > org.globus.cog.karajan.compiled.nodes.CacheNode.setValue(CacheNode.java:131) > at > org.globus.cog.karajan.compiled.nodes.CacheNode.runBody(CacheNode.java:77) > at > org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:153) > > > This only seems to happen when I have 500 or more tasks. If I set > foreach.max.threads to a low number like 50, it seems to work fine. > > The log is at > http://web.ci.uchicago.edu/~davidk/logs/RunpSIMS-20140301-1921-r4dps0i4.log > . > > Any ideas? > Thanks, > David > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Wed Mar 19 19:03:39 2014 From: wilde at anl.gov (Michael Wilde) Date: Wed, 19 Mar 2014 19:03:39 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: References: Message-ID: <532A305B.4000600@anl.gov> Hi Jonathan, You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. What we do for now is one of these two work-arounds: - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. Im cc'ing swift-devel to see what we can do. Thanks for reminding us of this fairly common need! - Mike On 3/19/14, 5:57 PM, Jonathan Ozik wrote: > Mike, > > Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. > > Jonathan > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From wilde at anl.gov Thu Mar 20 20:53:16 2014 From: wilde at anl.gov (Michael Wilde) Date: Thu, 20 Mar 2014 20:53:16 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: References: <532A305B.4000600@anl.gov> Message-ID: <532B9B8C.7000007@anl.gov> Hi Jonathan, On 3/20/14, 6:04 PM, Jonathan Ozik wrote: > Mike, > > Thank you for the detailed information. > Regarding the "collect files of this pattern into an array" semantics, is the "file system mapper" not intended for this? No, its not. Mihael and others may need to correct me here, but basically the issue is this: An app *can* return multiple files - even an array of files - but not an array of files whose names and count is not known before the app is launched. The User Guide does not yet cover this adequately, but output mappings may be static or dynamic. Static means the names and quantify of files is determined when the mapping is made. Dynamic means the mapping is made on demand, as files are created. Such mappings for example can be used to map the elements of the array when they are filled within a foreach loop by an app that returns one file per invocation. What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. Mihael: is this something you could implement in the near future - after we agree on the semantics? Justin, Tim, do you want to comment on this from a Swift/T perspective? Thanks, - Mike > I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? > > Jonathan > > On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: > >> Hi Jonathan, >> >> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >> >> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >> >> What we do for now is one of these two work-arounds: >> >> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >> >> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >> >> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >> >> Im cc'ing swift-devel to see what we can do. >> >> Thanks for reminding us of this fairly common need! >> >> - Mike >> >> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>> Mike, >>> >>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>> >>> Jonathan >>> >> -- >> Michael Wilde >> Mathematics and Computer Science Computation Institute >> Argonne National Laboratory The University of Chicago >> -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From wilde at anl.gov Fri Mar 21 08:15:40 2014 From: wilde at anl.gov (Michael Wilde) Date: Fri, 21 Mar 2014 08:15:40 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> Message-ID: <532C3B7C.30609@anl.gov> Hi Jonathan, Thanks for bearing with us on this. I can see clearly where our documentation is falling short of explaining this clearly. Ive got to work on some deadlines today, but I'll see if someone else on the team can post a clarification with some examples. A brief response, below. On 3/20/14, 9:51 PM, Jonathan Ozik wrote: > Hi Mike, > > I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on. No, just swift-user. Ideally we should he started this discussion there. I steered you to swift-devel because I thought the issue was one of a new feature requirement, but I see its also one of documentation and training. ... > An app *can* return multiple files - even an array of files - but not > an array of files whose names and count is not known before the app is > launched. > This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?) Yes, that's the problem: you would need to know the exact names, in the Swift script, before the app is called, so that you can *map* all output file variables to the names that the app will be *expected* to produce. I.e., current one needs a priori knowledge of all output file names, and you need to map variable (which can include array and structure members) to those names. > and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality. > > For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app? Thats the current language deficiency: you can not. We will explain later today in more detail. The app is expected to produce all output files that any of its output variables (or arrays or structures) are mapped to. For example, you can map an *ouput* array to the file names f1.out, f2.out, and f3.out. Then the app will be expected to produce those files. If it doesnt, Swift will raise a runtime error. So if you know a prior (before the app is called) from context or from input aregument values that these 3 files will be produced, you can use one of the array mappers or the "ext" mapper to declare this expectation. The best way to get past this obstacle (while we develop the desired capability) is as follows. If you are running on a single machine, you can write a wrapper shell script around the repast app that runs repast and then returns a single file that contains a *list* of its output files. But you need to place these output files in a known shared directory, not in the current working directory in which Swift will run the repast app (called the "job directory" at the moment -- soon to be renamed the "app task directory"). Then you do a readData() on this returned file to create an array of strings, and use that array with the "array" mapper (explained in the User Guide). We'll post to you a working example of as soon as possible - today, if time permits. As well as an example of the proposed new feature. - Mike > > I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information... > >> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. >> >> Mihael: is this something you could implement in the near future - after we agree on the semantics? >> >> Justin, Tim, do you want to comment on this from a Swift/T perspective? >> >> Thanks, >> >> - Mike >>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? >>> >>> Jonathan >>> >>> On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: >>> >>>> Hi Jonathan, >>>> >>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >>>> >>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >>>> >>>> What we do for now is one of these two work-arounds: >>>> >>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >>>> >>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >>>> >>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >>>> >>>> Im cc'ing swift-devel to see what we can do. >>>> >>>> Thanks for reminding us of this fairly common need! >>>> >>>> - Mike >>>> >>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>>>> Mike, >>>>> >>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>>>> >>>>> Jonathan >>>>> >>>> -- >>>> Michael Wilde >>>> Mathematics and Computer Science Computation Institute >>>> Argonne National Laboratory The University of Chicago >>>> >> -- >> Michael Wilde >> Mathematics and Computer Science Computation Institute >> Argonne National Laboratory The University of Chicago >> -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Fri Mar 21 08:39:24 2014 From: wilde at anl.gov (Michael Wilde) Date: Fri, 21 Mar 2014 08:39:24 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <532C3B7C.30609@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> Message-ID: <532C410C.5080309@anl.gov> Mihael, All, I'd like to propose a Swift/K feature to provide a reasonable solution to this very common need for an app to return a dynamically determined set of files. file dynarry[ ] ; dynarry = myApp(myArgs); The "runtime" mapper should initially have the same arguments and semantics (roughly) as simple_mapper, except for two new arguments: "indexes" which determines how the matched file names will be indexed in the returned array "int" | "string" | "sequential" sequential: return the matched files as consecutive integer indices starting with 0 int: expect the filename component between prefix and suffix to be convertible to an integer, and use that as the index eg myfile.012.out and myfile.204.out will return an array with the mapped files at indices 12 and 204. string: similar to int but return a string-indexed associative array. "sequential" is simplest and should be the default. "paths" which determines if the match names will be absolute or relative to the job dir paths="relative" | "absolute" (may not be needed if this can be determined uniquely based on the location argument. swiftwrap will allow array variables mapped in this manner to have any number of files, including zero. I.e. "runtime-mapped" files should not be listed in the expected output list for an app invocation. Its up to the users app to ensure that some files match the pattern. An additional arg could set e.g. minfiles and/or maxfiles, in which case the wrapper code needs to validate the count of files matched and returned, but not their exact names. We can call this mapper "experimental" until we validate its usability and suitability as a permanent feature. But as we hope to revise the entire mapper family and semantics, in a sense all mappers are subject to change. Mihael, is the definition sound, and how long would it take you to develop it? Thanks, - Mike From wilde at anl.gov Fri Mar 21 08:47:37 2014 From: wilde at anl.gov (Michael Wilde) Date: Fri, 21 Mar 2014 08:47:37 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <532C410C.5080309@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> Message-ID: <532C42F9.4060404@anl.gov> Resending to cc'ing Jonathan. Lets try to quickly converge on a spec to implement, and send it to swift-user for comments. It also occurs to me that this feature will interact with the semantics of "implicitly-direct" data management conventions that we have long discussed. So we should keep an eye out for how to solve both of these problems together. - Mike On 3/21/14, 8:39 AM, Michael Wilde wrote: > Mihael, All, > > I'd like to propose a Swift/K feature to provide a reasonable solution > to this very common need for an app to return a dynamically determined > set of files. > > file dynarry[ ] ; > > dynarry = myApp(myArgs); > > The "runtime" mapper should initially have the same arguments and > semantics (roughly) as simple_mapper, except for two new arguments: > > "indexes" which determines how the matched file names will be indexed > in the returned array > "int" | "string" | "sequential" > sequential: return the matched files as consecutive integer indices > starting with 0 > int: expect the filename component between prefix and suffix to be > convertible to an integer, and use that as the index > eg myfile.012.out and myfile.204.out will return an array with the > mapped files at indices 12 and 204. > string: similar to int but return a string-indexed associative array. > "sequential" is simplest and should be the default. > > "paths" which determines if the match names will be absolute or relative > to the job dir > paths="relative" | "absolute" > (may not be needed if this can be determined uniquely based on the > location argument. > > swiftwrap will allow array variables mapped in this manner to have any > number of files, including zero. I.e. "runtime-mapped" files should not > be listed in the expected output list for an app invocation. Its up to > the users app to ensure that some files match the pattern. An additional > arg could set e.g. minfiles and/or maxfiles, in which case the wrapper > code needs to validate the count of files matched and returned, but not > their exact names. > > We can call this mapper "experimental" until we validate its usability > and suitability as a permanent feature. But as we hope to revise the > entire mapper family and semantics, in a sense all mappers are subject > to change. > > Mihael, is the definition sound, and how long would it take you to > develop it? > > Thanks, > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From jozik at uchicago.edu Thu Mar 20 18:04:59 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Thu, 20 Mar 2014 18:04:59 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <532A305B.4000600@anl.gov> References: <532A305B.4000600@anl.gov> Message-ID: Mike, Thank you for the detailed information. Regarding the "collect files of this pattern into an array" semantics, is the "file system mapper" not intended for this? I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? Jonathan On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: > Hi Jonathan, > > You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. > > At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. > > What we do for now is one of these two work-arounds: > > - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" > > - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage > > We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. > > Im cc'ing swift-devel to see what we can do. > > Thanks for reminding us of this fairly common need! > > - Mike > > On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >> Mike, >> >> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >> >> Jonathan >> > > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago > From jozik at uchicago.edu Thu Mar 20 21:51:41 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Thu, 20 Mar 2014 21:51:41 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <532B9B8C.7000007@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> Message-ID: <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> Hi Mike, I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on. Thank you, Jonathan On Mar 20, 2014, at 8:53 PM, Michael Wilde wrote: > Hi Jonathan, > > On 3/20/14, 6:04 PM, Jonathan Ozik wrote: >> Mike, >> >> Thank you for the detailed information. >> Regarding the "collect files of this pattern into an array" semantics, is the "file system mapper" not intended for this? > No, its not. Mihael and others may need to correct me here, but basically the issue is this: > > An app *can* return multiple files - even an array of files - but not an array of files whose names and count is not known before the app is launched. This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?) and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality. For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app? > > The User Guide does not yet cover this adequately, but output mappings may be static or dynamic. Static means the names and quantify of files is determined when the mapping is made. Dynamic means the mapping is made on demand, as files are created. Such mappings for example can be used to map the elements of the array when they are filled within a foreach loop by an app that returns one file per invocation. I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information... > > What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. > > Mihael: is this something you could implement in the near future - after we agree on the semantics? > > Justin, Tim, do you want to comment on this from a Swift/T perspective? > > Thanks, > > - Mike >> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? >> >> Jonathan >> >> On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: >> >>> Hi Jonathan, >>> >>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >>> >>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >>> >>> What we do for now is one of these two work-arounds: >>> >>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >>> >>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >>> >>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >>> >>> Im cc'ing swift-devel to see what we can do. >>> >>> Thanks for reminding us of this fairly common need! >>> >>> - Mike >>> >>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>>> Mike, >>>> >>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>>> >>>> Jonathan >>>> >>> -- >>> Michael Wilde >>> Mathematics and Computer Science Computation Institute >>> Argonne National Laboratory The University of Chicago >>> > > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago > From jozik at uchicago.edu Fri Mar 21 11:51:34 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Fri, 21 Mar 2014 11:51:34 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <532C3B7C.30609@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> Message-ID: <7A6631C0-A44D-4C47-BBB1-4B057D9CC14F@uchicago.edu> Mike, Thank you again for the detailed responses. I'm getting a better handle on what can be done and am trying to implement the workaround you suggested. Speaking of which, is the reason that a shared directory location needs to be utilized because readData() does not know to look in the "app task directory" and defaults to the swift script launch directory? Thanks again for the guidance, Jonathan On Mar 21, 2014, at 8:15 AM, Michael Wilde wrote: > Hi Jonathan, > > Thanks for bearing with us on this. I can see clearly where our documentation is falling short of explaining this clearly. > > Ive got to work on some deadlines today, but I'll see if someone else on the team can post a clarification with some examples. > > A brief response, below. > > On 3/20/14, 9:51 PM, Jonathan Ozik wrote: >> Hi Mike, >> >> I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on. > No, just swift-user. Ideally we should he started this discussion there. I steered you to swift-devel because I thought the issue was one of a new feature requirement, but I see its also one of documentation and training. > > ... >> An app *can* return multiple files - even an array of files - but not an array of files whose names and count is not known before the app is launched. >> This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?) > Yes, that's the problem: you would need to know the exact names, in the Swift script, before the app is called, so that you can *map* all output file variables to the names that the app will be *expected* to produce. I.e., current one needs a priori knowledge of all output file names, and you need to map variable (which can include array and structure members) to those names. >> and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality. >> >> For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app? > Thats the current language deficiency: you can not. We will explain later today in more detail. > > The app is expected to produce all output files that any of its output variables (or arrays or structures) are mapped to. > > For example, you can map an *ouput* array to the file names f1.out, f2.out, and f3.out. Then the app will be expected to produce those files. If it doesnt, Swift will raise a runtime error. So if you know a prior (before the app is called) from context or from input aregument values that these 3 files will be produced, you can use one of the array mappers or the "ext" mapper to declare this expectation. > > The best way to get past this obstacle (while we develop the desired capability) is as follows. If you are running on a single machine, you can write a wrapper shell script around the repast app that runs repast and then returns a single file that contains a *list* of its output files. But you need to place these output files in a known shared directory, not in the current working directory in which Swift will run the repast app (called the "job directory" at the moment -- soon to be renamed the "app task directory"). Then you do a readData() on this returned file to create an array of strings, and use that array with the "array" mapper (explained in the User Guide). > > We'll post to you a working example of as soon as possible - today, if time permits. As well as an example of the proposed new feature. > > - Mike >> >> I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information... >> >>> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. >>> >>> Mihael: is this something you could implement in the near future - after we agree on the semantics? >>> >>> Justin, Tim, do you want to comment on this from a Swift/T perspective? >>> >>> Thanks, >>> >>> - Mike >>>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? >>>> >>>> Jonathan >>>> >>>> On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: >>>> >>>>> Hi Jonathan, >>>>> >>>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >>>>> >>>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >>>>> >>>>> What we do for now is one of these two work-arounds: >>>>> >>>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >>>>> >>>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >>>>> >>>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >>>>> >>>>> Im cc'ing swift-devel to see what we can do. >>>>> >>>>> Thanks for reminding us of this fairly common need! >>>>> >>>>> - Mike >>>>> >>>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>>>>> Mike, >>>>>> >>>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>>>>> >>>>>> Jonathan >>>>>> >>>>> -- >>>>> Michael Wilde >>>>> Mathematics and Computer Science Computation Institute >>>>> Argonne National Laboratory The University of Chicago >>>>> >>> -- >>> Michael Wilde >>> Mathematics and Computer Science Computation Institute >>> Argonne National Laboratory The University of Chicago >>> > > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago -------------- next part -------------- An HTML attachment was scrubbed... URL: From jozik at uchicago.edu Fri Mar 21 13:59:50 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Fri, 21 Mar 2014 13:59:50 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <7A6631C0-A44D-4C47-BBB1-4B057D9CC14F@uchicago.edu> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <7A6631C0-A44D-4C47-BBB1-4B057D9CC14F@uchicago.edu> Message-ID: <563F817A-8D87-4AE6-902A-595EF17BAEB0@uchicago.edu> Mike, It looks like I misunderstood your workaround initially. Now I'm having an issue with specifying absolute paths. For example: file text ; tracef("The file name is: %s\n", at text); yields: The file name is: Users/jozik/temp/instance_4/customOut7.dat (the leading forward slash is missing) The idea here is that the output data is being placed in a well known location and retrieved via the output file location aggregator. This is a pared down example where I'm looking to see what each line from the output file location aggregator would be interpreted as in swift. Jonathan On Mar 21, 2014, at 11:51 AM, Jonathan Ozik wrote: > Mike, > > Thank you again for the detailed responses. I'm getting a better handle on what can be done and am trying to implement the workaround you suggested. > Speaking of which, is the reason that a shared directory location needs to be utilized because readData() does not know to look in the "app task directory" and defaults to the swift script launch directory? > > Thanks again for the guidance, > > Jonathan > > On Mar 21, 2014, at 8:15 AM, Michael Wilde wrote: > >> Hi Jonathan, >> >> Thanks for bearing with us on this. I can see clearly where our documentation is falling short of explaining this clearly. >> >> Ive got to work on some deadlines today, but I'll see if someone else on the team can post a clarification with some examples. >> >> A brief response, below. >> >> On 3/20/14, 9:51 PM, Jonathan Ozik wrote: >>> Hi Mike, >>> >>> I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on. >> No, just swift-user. Ideally we should he started this discussion there. I steered you to swift-devel because I thought the issue was one of a new feature requirement, but I see its also one of documentation and training. >> >> ... >>> An app *can* return multiple files - even an array of files - but not an array of files whose names and count is not known before the app is launched. >>> This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?) >> Yes, that's the problem: you would need to know the exact names, in the Swift script, before the app is called, so that you can *map* all output file variables to the names that the app will be *expected* to produce. I.e., current one needs a priori knowledge of all output file names, and you need to map variable (which can include array and structure members) to those names. >>> and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality. >>> >>> For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app? >> Thats the current language deficiency: you can not. We will explain later today in more detail. >> >> The app is expected to produce all output files that any of its output variables (or arrays or structures) are mapped to. >> >> For example, you can map an *ouput* array to the file names f1.out, f2.out, and f3.out. Then the app will be expected to produce those files. If it doesnt, Swift will raise a runtime error. So if you know a prior (before the app is called) from context or from input aregument values that these 3 files will be produced, you can use one of the array mappers or the "ext" mapper to declare this expectation. >> >> The best way to get past this obstacle (while we develop the desired capability) is as follows. If you are running on a single machine, you can write a wrapper shell script around the repast app that runs repast and then returns a single file that contains a *list* of its output files. But you need to place these output files in a known shared directory, not in the current working directory in which Swift will run the repast app (called the "job directory" at the moment -- soon to be renamed the "app task directory"). Then you do a readData() on this returned file to create an array of strings, and use that array with the "array" mapper (explained in the User Guide). >> >> We'll post to you a working example of as soon as possible - today, if time permits. As well as an example of the proposed new feature. >> >> - Mike >>> I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information... >>> >>>> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. >>>> >>>> Mihael: is this something you could implement in the near future - after we agree on the semantics? >>>> >>>> Justin, Tim, do you want to comment on this from a Swift/T perspective? >>>> >>>> Thanks, >>>> >>>> - Mike >>>>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? >>>>> >>>>> Jonathan >>>>> >>>>> On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: >>>>> >>>>>> Hi Jonathan, >>>>>> >>>>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >>>>>> >>>>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >>>>>> >>>>>> What we do for now is one of these two work-arounds: >>>>>> >>>>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >>>>>> >>>>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >>>>>> >>>>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >>>>>> >>>>>> Im cc'ing swift-devel to see what we can do. >>>>>> >>>>>> Thanks for reminding us of this fairly common need! >>>>>> >>>>>> - Mike >>>>>> >>>>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>>>>>> Mike, >>>>>>> >>>>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>>>>>> >>>>>>> Jonathan >>>>>>> >>>>>> -- >>>>>> Michael Wilde >>>>>> Mathematics and Computer Science Computation Institute >>>>>> Argonne National Laboratory The University of Chicago >>>>>> >>>> -- >>>> Michael Wilde >>>> Mathematics and Computer Science Computation Institute >>>> Argonne National Laboratory The University of Chicago >>>> >> >> -- >> Michael Wilde >> Mathematics and Computer Science Computation Institute >> Argonne National Laboratory The University of Chicago > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidkelly at uchicago.edu Fri Mar 21 16:43:03 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Fri, 21 Mar 2014 16:43:03 -0500 Subject: [Swift-devel] Removing _concurrent directory and files? Message-ID: Hello, I would like to have concurrent files and the _concurrent directory removed when I'm done with them. Right now they stick around after the script runs. I'm using 0.94.1. Is this possible? I thought there was an option that controls this, but can't seem to find it in the docs. Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Fri Mar 21 16:46:53 2014 From: wilde at anl.gov (Michael Wilde) Date: Fri, 21 Mar 2014 16:46:53 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <563F817A-8D87-4AE6-902A-595EF17BAEB0@uchicago.edu> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <7A6631C0-A44D-4C47-BBB1-4B057D9CC14F@uchicago.edu> <563F817A-8D87-4AE6-902A-595EF17BAEB0@uchicago.edu> Message-ID: <532CB34D.3040701@anl.gov> Jonathan, @text is behaving as expected here. The rationale is as follows. This line: file text ; ...asociates the variable "text" with the file /Users/jozik/temp/instance_4/customOut7.dat. If that file was passed to (or returned from) an app() call, then @filename(text) (for which @text is a shorthand notation) would be the name by which the app should refer to the file, relative to the app's job directory. The leading "/" is removed because the file, if its an input to the app, would get linked to the job dir with all of its directory components as specified in the mapping. Ie it be linked to ./Users/jozik/temp/instance_4/customOut7.dat If that file was an output from the app, Swift would expect the app to create a file by this name below the job dir. Again, these semantics date back to the origins of Swift, when every job was essentially expected to be executed on a remote grid node under Globus. Yadu is working on a complete example of the multiple-file return case right now. - Mike On 3/21/14, 1:59 PM, Jonathan Ozik wrote: > Mike, > > It looks like I misunderstood your workaround initially. Now I'm > having an issue with specifying absolute paths. > For example: > file text file="/Users/jozik/temp/instance_4/customOut7.dat">; > tracef("The file name is: %s\n", at text); > > yields: > The file name is: Users/jozik/temp/instance_4/customOut7.dat > (the leading forward slash is missing) > > The idea here is that the output data is being placed in a well known > location and retrieved via the output file location aggregator. This > is a pared down example where I'm looking to see what each line from > the output file location aggregator would be interpreted as in swift. > > Jonathan > > On Mar 21, 2014, at 11:51 AM, Jonathan Ozik > wrote: > >> Mike, >> >> Thank you again for the detailed responses. I'm getting a better >> handle on what can be done and am trying to implement the workaround >> you suggested. >> Speaking of which, is the reason that a shared directory location >> needs to be utilized because readData() does not know to look in the >> "app task directory" and defaults to the swift script launch directory? >> >> Thanks again for the guidance, >> >> Jonathan >> >> On Mar 21, 2014, at 8:15 AM, Michael Wilde > > wrote: >> >>> Hi Jonathan, >>> >>> Thanks for bearing with us on this. I can see clearly where our >>> documentation is falling short of explaining this clearly. >>> >>> Ive got to work on some deadlines today, but I'll see if someone >>> else on the team can post a clarification with some examples. >>> >>> A brief response, below. >>> >>> On 3/20/14, 9:51 PM, Jonathan Ozik wrote: >>>> Hi Mike, >>>> >>>> I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on. >>> No, just swift-user. Ideally we should he started this discussion >>> there. I steered you to swift-devel because I thought the issue was >>> one of a new feature requirement, but I see its also one of >>> documentation and training. >>> >>> ... >>>> An app *can* return multiple files - even an array of files - but >>>> not an array of files whose names and count is not known before the >>>> app is launched. >>>> This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?) >>> Yes, that's the problem: you would need to know the exact names, in >>> the Swift script, before the app is called, so that you can *map* >>> all output file variables to the names that the app will be >>> *expected* to produce. I.e., current one needs a priori knowledge of >>> all output file names, and you need to map variable (which can >>> include array and structure members) to those names. >>>> and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality. >>>> >>>> For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app? >>> Thats the current language deficiency: you can not. We will explain >>> later today in more detail. >>> >>> The app is expected to produce all output files that any of its >>> output variables (or arrays or structures) are mapped to. >>> >>> For example, you can map an *ouput* array to the file names f1.out, >>> f2.out, and f3.out. Then the app will be expected to produce those >>> files. If it doesnt, Swift will raise a runtime error. So if you >>> know a prior (before the app is called) from context or from input >>> aregument values that these 3 files will be produced, you can use >>> one of the array mappers or the "ext" mapper to declare this >>> expectation. >>> >>> The best way to get past this obstacle (while we develop the desired >>> capability) is as follows. If you are running on a single machine, >>> you can write a wrapper shell script around the repast app that runs >>> repast and then returns a single file that contains a *list* of its >>> output files. But you need to place these output files in a known >>> shared directory, not in the current working directory in which >>> Swift will run the repast app (called the "job directory" at the >>> moment -- soon to be renamed the "app task directory"). Then you do >>> a readData() on this returned file to create an array of strings, >>> and use that array with the "array" mapper (explained in the User >>> Guide). >>> >>> We'll post to you a working example of as soon as possible - today, >>> if time permits. As well as an example of the proposed new feature. >>> >>> - Mike >>>> I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information... >>>> >>>>> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. >>>>> >>>>> Mihael: is this something you could implement in the near future - after we agree on the semantics? >>>>> >>>>> Justin, Tim, do you want to comment on this from a Swift/T perspective? >>>>> >>>>> Thanks, >>>>> >>>>> - Mike >>>>>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? >>>>>> >>>>>> Jonathan >>>>>> >>>>>> On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: >>>>>> >>>>>>> Hi Jonathan, >>>>>>> >>>>>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >>>>>>> >>>>>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >>>>>>> >>>>>>> What we do for now is one of these two work-arounds: >>>>>>> >>>>>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >>>>>>> >>>>>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >>>>>>> >>>>>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >>>>>>> >>>>>>> Im cc'ing swift-devel to see what we can do. >>>>>>> >>>>>>> Thanks for reminding us of this fairly common need! >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>>>>>>> Mike, >>>>>>>> >>>>>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>>>>>>> >>>>>>>> Jonathan >>>>>>>> >>>>>>> -- >>>>>>> Michael Wilde >>>>>>> Mathematics and Computer Science Computation Institute >>>>>>> Argonne National Laboratory The University of Chicago >>>>>>> >>>>> -- >>>>> Michael Wilde >>>>> Mathematics and Computer Science Computation Institute >>>>> Argonne National Laboratory The University of Chicago >>>>> >>> >>> -- >>> Michael Wilde >>> Mathematics and Computer Science Computation Institute >>> Argonne National Laboratory The University of Chicago >> > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadunand at uchicago.edu Fri Mar 21 17:51:29 2014 From: yadunand at uchicago.edu (Yadu Nand B) Date: Fri, 21 Mar 2014 17:51:29 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <532CB34D.3040701@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <7A6631C0-A44D-4C47-BBB1-4B057D9CC14F@uchicago.edu> <563F817A-8D87-4AE6-902A-595EF17BAEB0@uchicago.edu> <532CB34D.3040701@anl.gov> Message-ID: <532CC271.4040805@uchicago.edu> Hi Jonathan, I have a tar ball of an example which you can download from here -> http://swift.rcc.uchicago.edu:8042/var_arrays.tar.gz You can run the example on Midway using the following command: swift -tc.file apps -config swift.properties -sites.file sites.xml var_arrays.swift -dir=/scratch/midway/$USER/temp Please ensure that the directory passed to dir is on a shared filesystem (/scratch on midway). The first foreach loop runs the script gen_n_files which creates a random number of files within the specified directory and echoes the names of the files to stdout. The stdout file is read by swift using readData and passed to the array_mapper to be mapped to an array, which is placed in an array of arrays. The second foreach loop goes over each array in the array of arrays and simply sums up all integers present in each array of files. Slightly contrived example, but I hope you get the method used. Let me know if you want this example to be expanded in any way. Thanks, Yadu On 03/21/2014 04:46 PM, Michael Wilde wrote: > Jonathan, > > @text is behaving as expected here. The rationale is as follows. This > line: > > file text file="/Users/jozik/temp/instance_4/customOut7.dat">; > > ...asociates the variable "text" with the file > /Users/jozik/temp/instance_4/customOut7.dat. > > If that file was passed to (or returned from) an app() call, then > @filename(text) (for which @text is a shorthand notation) would be the > name by which the app should refer to the file, relative to the app's > job directory. The leading "/" is removed because the file, if its an > input to the app, would get linked to the job dir with all of its > directory components as specified in the mapping. Ie it be linked to > ./Users/jozik/temp/instance_4/customOut7.dat > If that file was an output from the app, Swift would expect the app to > create a file by this name below the job dir. > > Again, these semantics date back to the origins of Swift, when every > job was essentially expected to be executed on a remote grid node > under Globus. > > Yadu is working on a complete example of the multiple-file return case > right now. > > - Mike > > On 3/21/14, 1:59 PM, Jonathan Ozik wrote: >> Mike, >> >> It looks like I misunderstood your workaround initially. Now I'm >> having an issue with specifying absolute paths. >> For example: >> file text > file="/Users/jozik/temp/instance_4/customOut7.dat">; >> tracef("The file name is: %s\n", at text); >> >> yields: >> The file name is: Users/jozik/temp/instance_4/customOut7.dat >> (the leading forward slash is missing) >> >> The idea here is that the output data is being placed in a well known >> location and retrieved via the output file location aggregator. This >> is a pared down example where I'm looking to see what each line from >> the output file location aggregator would be interpreted as in swift. >> >> Jonathan >> >> On Mar 21, 2014, at 11:51 AM, Jonathan Ozik > > wrote: >> >>> Mike, >>> >>> Thank you again for the detailed responses. I'm getting a better >>> handle on what can be done and am trying to implement the workaround >>> you suggested. >>> Speaking of which, is the reason that a shared directory location >>> needs to be utilized because readData() does not know to look in the >>> "app task directory" and defaults to the swift script launch directory? >>> >>> Thanks again for the guidance, >>> >>> Jonathan >>> >>> On Mar 21, 2014, at 8:15 AM, Michael Wilde >> > wrote: >>> >>>> Hi Jonathan, >>>> >>>> Thanks for bearing with us on this. I can see clearly where our >>>> documentation is falling short of explaining this clearly. >>>> >>>> Ive got to work on some deadlines today, but I'll see if someone >>>> else on the team can post a clarification with some examples. >>>> >>>> A brief response, below. >>>> >>>> On 3/20/14, 9:51 PM, Jonathan Ozik wrote: >>>>> Hi Mike, >>>>> >>>>> I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on. >>>> No, just swift-user. Ideally we should he started this discussion >>>> there. I steered you to swift-devel because I thought the issue was >>>> one of a new feature requirement, but I see its also one of >>>> documentation and training. >>>> >>>> ... >>>>> An app *can* return multiple files - even an array of files - but >>>>> not an array of files whose names and count is not known before >>>>> the app is launched. >>>>> This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?) >>>> Yes, that's the problem: you would need to know the exact names, in >>>> the Swift script, before the app is called, so that you can *map* >>>> all output file variables to the names that the app will be >>>> *expected* to produce. I.e., current one needs a priori knowledge >>>> of all output file names, and you need to map variable (which can >>>> include array and structure members) to those names. >>>>> and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality. >>>>> >>>>> For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app? >>>> Thats the current language deficiency: you can not. We will >>>> explain later today in more detail. >>>> >>>> The app is expected to produce all output files that any of its >>>> output variables (or arrays or structures) are mapped to. >>>> >>>> For example, you can map an *ouput* array to the file names f1.out, >>>> f2.out, and f3.out. Then the app will be expected to produce those >>>> files. If it doesnt, Swift will raise a runtime error. So if you >>>> know a prior (before the app is called) from context or from input >>>> aregument values that these 3 files will be produced, you can use >>>> one of the array mappers or the "ext" mapper to declare this >>>> expectation. >>>> >>>> The best way to get past this obstacle (while we develop the >>>> desired capability) is as follows. If you are running on a single >>>> machine, you can write a wrapper shell script around the repast app >>>> that runs repast and then returns a single file that contains a >>>> *list* of its output files. But you need to place these output >>>> files in a known shared directory, not in the current working >>>> directory in which Swift will run the repast app (called the "job >>>> directory" at the moment -- soon to be renamed the "app task >>>> directory"). Then you do a readData() on this returned file to >>>> create an array of strings, and use that array with the "array" >>>> mapper (explained in the User Guide). >>>> >>>> We'll post to you a working example of as soon as possible - today, >>>> if time permits. As well as an example of the proposed new feature. >>>> >>>> - Mike >>>>> I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information... >>>>> >>>>>> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. >>>>>> >>>>>> Mihael: is this something you could implement in the near future - after we agree on the semantics? >>>>>> >>>>>> Justin, Tim, do you want to comment on this from a Swift/T perspective? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> - Mike >>>>>>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? >>>>>>> >>>>>>> Jonathan >>>>>>> >>>>>>> On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: >>>>>>> >>>>>>>> Hi Jonathan, >>>>>>>> >>>>>>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >>>>>>>> >>>>>>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >>>>>>>> >>>>>>>> What we do for now is one of these two work-arounds: >>>>>>>> >>>>>>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >>>>>>>> >>>>>>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >>>>>>>> >>>>>>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >>>>>>>> >>>>>>>> Im cc'ing swift-devel to see what we can do. >>>>>>>> >>>>>>>> Thanks for reminding us of this fairly common need! >>>>>>>> >>>>>>>> - Mike >>>>>>>> >>>>>>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>>>>>>>> Mike, >>>>>>>>> >>>>>>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>>>>>>>> >>>>>>>>> Jonathan >>>>>>>>> >>>>>>>> -- >>>>>>>> Michael Wilde >>>>>>>> Mathematics and Computer Science Computation Institute >>>>>>>> Argonne National Laboratory The University of Chicago >>>>>>>> >>>>>> -- >>>>>> Michael Wilde >>>>>> Mathematics and Computer Science Computation Institute >>>>>> Argonne National Laboratory The University of Chicago >>>>>> >>>> >>>> -- >>>> Michael Wilde >>>> Mathematics and Computer Science Computation Institute >>>> Argonne National Laboratory The University of Chicago >>> >> > > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Sat Mar 22 11:31:52 2014 From: wilde at anl.gov (Michael Wilde) Date: Sat, 22 Mar 2014 11:31:52 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <532C410C.5080309@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> Message-ID: <532DBAF8.9030300@anl.gov> This is now filed as enhancement bug 1225 and assigned to you, Mihael. The description has been revised as you suggested, to propose that the feature be first done using simple_mapper rather than a new mapper. - Mike On 3/21/14, 8:39 AM, Michael Wilde wrote: > Mihael, All, > > I'd like to propose a Swift/K feature to provide a reasonable solution > to this very common need for an app to return a dynamically determined > set of files. > > file dynarry[ ] indexes="int">; > > dynarry = myApp(myArgs); > > The "runtime" mapper should initially have the same arguments and > semantics (roughly) as simple_mapper, except for two new arguments: > > "indexes" which determines how the matched file names will be indexed > in the returned array > "int" | "string" | "sequential" > sequential: return the matched files as consecutive integer indices > starting with 0 > int: expect the filename component between prefix and suffix to be > convertible to an integer, and use that as the index > eg myfile.012.out and myfile.204.out will return an array with the > mapped files at indices 12 and 204. > string: similar to int but return a string-indexed associative array. > "sequential" is simplest and should be the default. > > "paths" which determines if the match names will be absolute or > relative to the job dir > paths="relative" | "absolute" > (may not be needed if this can be determined uniquely based on the > location argument. > > swiftwrap will allow array variables mapped in this manner to have any > number of files, including zero. I.e. "runtime-mapped" files should > not be listed in the expected output list for an app invocation. Its > up to the users app to ensure that some files match the pattern. An > additional arg could set e.g. minfiles and/or maxfiles, in which case > the wrapper code needs to validate the count of files matched and > returned, but not their exact names. > > We can call this mapper "experimental" until we validate its usability > and suitability as a permanent feature. But as we hope to revise the > entire mapper family and semantics, in a sense all mappers are subject > to change. > > Mihael, is the definition sound, and how long would it take you to > develop it? > > Thanks, > > - Mike > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sat Mar 22 13:38:10 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 22 Mar 2014 11:38:10 -0700 Subject: [Swift-devel] Multiple output files In-Reply-To: <532DBAF8.9030300@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> <532DBAF8.9030300@anl.gov> Message-ID: <1395513490.1382.10.camel@echo> Sorry, I'm not sure I'm 100% with you on this. My opinion is that this problem is NOT a language/mapper issue. It is an issue of implementation: how do you get the information about files to a place where it can be used. So I believe that whether we add a new mapper or make it work with existing mappers, we still need to fix that other more complex problem. This is the reason why I believe we shouldn't add anything to the language. Anyway, give me a couple of days. I cleaned up the stagein/stageout code a bit yesterday. It was messy for no good reason that I could see. Closing the remote loop on the dynamic mappers is next. Mihael On Sat, 2014-03-22 at 11:31 -0500, Michael Wilde wrote: > This is now filed as enhancement bug 1225 and assigned to you, Mihael. > The description has been revised as you suggested, to propose that the > feature be first done using simple_mapper rather than a new mapper. > > - Mike > > On 3/21/14, 8:39 AM, Michael Wilde wrote: > > Mihael, All, > > > > I'd like to propose a Swift/K feature to provide a reasonable solution > > to this very common need for an app to return a dynamically determined > > set of files. > > > > file dynarry[ ] > indexes="int">; > > > > dynarry = myApp(myArgs); > > > > The "runtime" mapper should initially have the same arguments and > > semantics (roughly) as simple_mapper, except for two new arguments: > > > > "indexes" which determines how the matched file names will be indexed > > in the returned array > > "int" | "string" | "sequential" > > sequential: return the matched files as consecutive integer indices > > starting with 0 > > int: expect the filename component between prefix and suffix to be > > convertible to an integer, and use that as the index > > eg myfile.012.out and myfile.204.out will return an array with the > > mapped files at indices 12 and 204. > > string: similar to int but return a string-indexed associative array. > > "sequential" is simplest and should be the default. > > > > "paths" which determines if the match names will be absolute or > > relative to the job dir > > paths="relative" | "absolute" > > (may not be needed if this can be determined uniquely based on the > > location argument. > > > > swiftwrap will allow array variables mapped in this manner to have any > > number of files, including zero. I.e. "runtime-mapped" files should > > not be listed in the expected output list for an app invocation. Its > > up to the users app to ensure that some files match the pattern. An > > additional arg could set e.g. minfiles and/or maxfiles, in which case > > the wrapper code needs to validate the count of files matched and > > returned, but not their exact names. > > > > We can call this mapper "experimental" until we validate its usability > > and suitability as a permanent feature. But as we hope to revise the > > entire mapper family and semantics, in a sense all mappers are subject > > to change. > > > > Mihael, is the definition sound, and how long would it take you to > > develop it? > > > > Thanks, > > > > - Mike > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at anl.gov Sat Mar 22 14:08:24 2014 From: wilde at anl.gov (Michael Wilde) Date: Sat, 22 Mar 2014 14:08:24 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <1395513490.1382.10.camel@echo> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> <532DBAF8.9030300@anl.gov> <1395513490.1382.10.camel@echo> Message-ID: <532DDFA8.3040909@anl.gov> On 3/22/14, 1:38 PM, Mihael Hategan wrote: > My opinion is that this problem is NOT a language/mapper issue. It is an > issue of implementation: how do you get the information about files to a > place where it can be used. > > So I believe that whether we add a new mapper or make it work with > existing mappers, we still need to fix that other more complex problem. > This is the reason why I believe we shouldn't add anything to the > language. Mihael and I discussed this in a chat just now, and I think we are in fact *in* sync. So he's going to push forward on this. - Mike From hategan at mcs.anl.gov Mon Mar 24 10:46:58 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 24 Mar 2014 08:46:58 -0700 Subject: [Swift-devel] Multiple output files In-Reply-To: <532DDFA8.3040909@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> <532DBAF8.9030300@anl.gov> <1395513490.1382.10.camel@echo> <532DDFA8.3040909@anl.gov> Message-ID: <1395676018.18549.13.camel@echo> An update on this. I'm still working on it, but here is the basic idea: - non-static mappers now support a method that, given a data type, returns a list of glob patterns that can be used to search for files that could be mapped by that mapper. The list (as opposed to one glob pattern) is necessary because there might be cases when you have: type s { file a; file[] b}; s[] x; Then s[] could match either s_????.a or s_????.???? - this (possibly empty) list gets sent to _swiftwrap - after the job is done, _swiftwrap creates a list of files matching those patterns - swift-int copies that list back and the files in it and uses the list to populate data in a fashion similar to what is done for input variables This is without provider staging. For provider staging, providers that support staging need to be modified to support staging out of files using glob patterns. There might be some complications there due to the local vs. remote path naming conventions. Mihael On Sat, 2014-03-22 at 14:08 -0500, Michael Wilde wrote: > On 3/22/14, 1:38 PM, Mihael Hategan wrote: > > My opinion is that this problem is NOT a language/mapper issue. It is an > > issue of implementation: how do you get the information about files to a > > place where it can be used. > > > > So I believe that whether we add a new mapper or make it work with > > existing mappers, we still need to fix that other more complex problem. > > This is the reason why I believe we shouldn't add anything to the > > language. > Mihael and I discussed this in a chat just now, and I think we are in > fact *in* sync. > So he's going to push forward on this. > > - Mike From iraicu at cs.iit.edu Mon Mar 24 12:48:13 2014 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Mon, 24 Mar 2014 12:48:13 -0500 Subject: [Swift-devel] Call for Posters at ACM HPDC 2014 Message-ID: <53306FDD.10906@cs.iit.edu> Call for Posters Submission Deadline: May 16 2014 http://www.hpdc.org/2014/posters/call-for-posters/ HPDC'14 will feature a poster session that will provide the right environment for lively and informal discussions on various high performance parallel and distributed computing topics. The poster session will be held on Wednesday, June 25, in the late afternoon. Participating posters will be selected based on the following criteria: - Submissions must describe new, interesting ideas on any HPDC topics of interest - Submissions can present work in progress, and we strongly encourage the authors to include preliminary experimental results, if available - Student submissions meeting the above criteria will be given preference We invite all potential authors to submit their contribution to this poster session in the form of a two-page PDF abstract (we recommend using the ACM Proceedings style, and fonts not smaller than 10 point). Please provide the following information in your PDF file: - Poster title - Author names, affiliations, and email addresses - Note which authors, if any, are students Abstracts must be submitted through email to chandra AT cs DOT umn DOT edu before May 16 2014, 5:00pm EDT. Authors will be notified of acceptance or rejection via e-mail by May 23, 2014. No reviews will be provided. Accepted posters will be published online on the conference website. Details about the poster presentation (e.g., poster size) will be available closer to the conference. For any questions about the submission, selection, and presentation of the accepted posters, please contact the Posters Chair, Abhishek Chandra, University of Minnesota (email: chandra AT cs DOT umn DOT edu). -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Editor: IEEE TCC, Springer Cluster, Springer JoCCASA Chair: IEEE/ACM MTAGS, ACM ScienceCloud ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ LinkedIn: http://www.linkedin.com/in/ioanraicu Google: http://scholar.google.com/citations?user=jE73HYAAAAAJ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Wed Mar 26 09:07:27 2014 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Wed, 26 Mar 2014 09:07:27 -0500 Subject: [Swift-devel] Call for Participation: IEEE/ACM CCGrid 2014 in Chicago May 26-29 -- Early Bird Registration due April 15th Message-ID: <5332DF1F.1050404@cs.iit.edu> ------------------------- Call for Participation ------------------------- -------------------------- IEEE/ACM CCGrid 2014 -------------------------- 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing May 26-29, 2014 Chicago, IL, USA http://datasys.cs.iit.edu/events/CCGrid2014/ Upcoming Important Dates: Early Bird Registration: April 15 Registration: May 5 Room Reservation: May 5 Workshops: May 26 Conference: May 27-29 Tutorial: May 29 A Message from the CCGrid 2014 General Chairs: It is our great pleasure to welcome you to the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014) and to the great global city of Chicago. CCGrid is a forum for all distributed computing technologies and for all technology stakeholders. The inaugural CCGrid conference was held in Brisbane, Australia, in 2001. Three years later, in 2004, it was held in Chicago. Ten years have passed since then. In 2004, Chicago was a leader in Grid computing. Today, it is a center of Cloud technology innovations. CCGrid 2014 is just like the city of Chicago, having naturally blended and mixed with the historical glories and modern stallers into its unique identity. Come and join us to meet preeminent computer scientists, leading creators of Grid and Cluster technologies, and new shining stars, the significant contributors of today's Cloud and big data management advances. We promise you a memorable experience unequaled at other conferences. Late May is the perfect time to visit Chicago. Flowers are booming, fountains are sprinkling, and sail boats are in the bay; but most of the summer tours are not in yet. The conference hotel, the Hyatt Regency Chicago, is in an ideal location. It is one block off The Magnificent Mile and near Millennium Park, Lake Michigan, and the Chicago River. Even you do not do shopping, simply walking along Michigan Ave or the Chicago River, in the day time or in the evening, is a pleasure. Plus the special hotel rate for CCGrid attendees is very attractive. The conference banquet will be held on a lake cruise in the evening of Wednesday. That night Chicago will have a half hour long firework show on the lake. Of course, the most exciting part is the outstanding programs offered by CCGrid 2014. From Keynotes to Technical Papers, CCGrid14 has its first class programs in all categories. We heartily appreciate the committee members and volunteers who have put the wonderful programs together. CCGrid 2014 features keynote talks, tutorials, workshops, poster sessions and demos, competition, student travel awards, panel, as well technical papers,. We are pleased to have This year the prestigious IEEE Medal for Excellence in Scalable Computing award winner, Professor Yves Robert at Ecole Normale Superieure de Lyon, is also a keynote speaker of CCGrid 2014. Prof Robert is authority in algorithm design and analysis. CCGrid 2014 received 302 paper submissions from 40 countries. After administrative filtering, 283 papers received full reviews. In total, 1089 reviews were conducted and 54 papers were accepted, with an acceptance rate of 19% (54/283). There are eight workshops on Monday, May 26, 2014, and five concurrent tutorials in the afternoon of Thursday, May 29. Eight posters and two demos are selected for the poster session in the evening of Tuesday, May 27, and ten papers are accepted for the Doctoral Symposium program on Wednesday, May 28. The IEEE SCALE Dr. Farnam Jahanian, the National Science Foundation Assistant Director for the Computer and Information Science and Engineering (CISE), to deliver the opening keynote. The keynote speaker of the second day, Prof. Ion Stoica, is a professor of Computer Science at University of California at Berkeley, and is known by his current research projects, Mesos and Spark. SCALE Challenge competition will be held live, with finalists judged by their demonstrations on Tuesday and the winner announced on Thursday. CCGrid 2014 has received generous sponsorship from the U.S. National Science Foundation and the IEEE Technical Committee on Scalable Computing to assist 17 students to attend this conference. A greeting and round-table session are also arranged for the student awardees. The success of CCGrid 2014 is due to the dedicated efforts and high standards of numerous international volunteers. Our long thank you list starts with the two excellent Program Chairs: Kirk W. Cameron and Dimitrios S. Nikolopoulos, and the Program Committee. Special thanks to the Program Committee Area Chairs who braved the coldest Chicago day in decades to run the program committee meeting on January 27, 2014. We thank Workshops Co-Chairs Zhiling Lan and Matei Ripeanu, the chairs and PC committees of the various workshops, and our Publicity Chairs for getting the word out about the conference. We thank Tutorials Co-Chairs Kate Keahey and Radu Prodan, Poster and Research Demo Co-Chairs Borja Sotomayor and Hui Jin, Doctoral Symposium Chair Judy Qiu, the Student Awards Chair Yong Chen, and the SCALE Challenge Coordinator Douglas Thain. The Cyber Co-chairs, Ge Rong and Wei Tang, did a wonderful job with the conference website. Dr. Pavan Balaji, the Proceedings Chair, ensured the publication of the conference proceedings. We are especially grateful to the Local Organizing Chairs, Ioan Raicu and Kyle Chard, who did a tremendous job on innumerable tasks, from identifying the hotel to negotiating the price of the banquet. Thanks are also due to our sponsors, namely, IEEE, ACM, TCSC, and the organizational supporters at Illinois Institute of Technology and the University of Chicago. Ultimately, however, the success of the conference will be judged by the attendees' experience. We hope that the conference will provide you with a valuable opportunity to share ideas, communicate, learn, and network. We wish everyone a successful, stimulating, and rewarding meeting and look forward to seeing you again at future CCGrid conferences. Ian Foster (University of Chicago and Argonne National Laboratory) Xian-He Sun (Illinois Institute of Technology) -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Editor: IEEE TCC, Springer Cluster, Springer JoCCASA Chair: IEEE/ACM MTAGS, ACM ScienceCloud ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ LinkedIn: http://www.linkedin.com/in/ioanraicu Google: http://scholar.google.com/citations?user=jE73HYAAAAAJ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Mar 26 13:24:31 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 26 Mar 2014 11:24:31 -0700 Subject: [Swift-devel] Multiple output files In-Reply-To: <1395676018.18549.13.camel@echo> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> <532DBAF8.9030300@anl.gov> <1395513490.1382.10.camel@echo> <532DDFA8.3040909@anl.gov> <1395676018.18549.13.camel@echo> Message-ID: <1395858271.26097.5.camel@echo> Almost there. It works for non-provider staging. I'm working on provider staging now. The difficulty is in that providers must now support glob-pattern staging with a twist. The twist is that local and remote path names are different. For example, without glob patterns, a stageout could look like this: __root__/dir/a.txt -> /dir/a.txt With globs, something like this is possible: __root__/dir/a_????_b_????.txt -> /dir/a_????_b_????.txt In this case the code needs to recursively glob things and substitute each glob group in the destination for the respective matching glob in the source. Luckily there are only two providers that support staging at this point: local and coasters. Unfortunately this has to be implemented twice: once in Java for local, and once in Perl for coasters. Mihael On Mon, 2014-03-24 at 08:46 -0700, Mihael Hategan wrote: > An update on this. > > I'm still working on it, but here is the basic idea: > > - non-static mappers now support a method that, given a data type, > returns a list of glob patterns that can be used to search for files > that could be mapped by that mapper. The list (as opposed to one glob > pattern) is necessary because there might be cases when you have: > type s { file a; file[] b}; > s[] x; > Then s[] could match either s_????.a or s_????.???? > - this (possibly empty) list gets sent to _swiftwrap > - after the job is done, _swiftwrap creates a list of files matching > those patterns > - swift-int copies that list back and the files in it and uses the list > to populate data in a fashion similar to what is done for input > variables > > This is without provider staging. > > For provider staging, providers that support staging need to be modified > to support staging out of files using glob patterns. There might be some > complications there due to the local vs. remote path naming conventions. > > Mihael > > On Sat, 2014-03-22 at 14:08 -0500, Michael Wilde wrote: > > On 3/22/14, 1:38 PM, Mihael Hategan wrote: > > > My opinion is that this problem is NOT a language/mapper issue. It is an > > > issue of implementation: how do you get the information about files to a > > > place where it can be used. > > > > > > So I believe that whether we add a new mapper or make it work with > > > existing mappers, we still need to fix that other more complex problem. > > > This is the reason why I believe we shouldn't add anything to the > > > language. > > Mihael and I discussed this in a chat just now, and I think we are in > > fact *in* sync. > > So he's going to push forward on this. > > > > - Mike > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at anl.gov Wed Mar 26 13:42:39 2014 From: wilde at anl.gov (Michael Wilde) Date: Wed, 26 Mar 2014 13:42:39 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <1395858271.26097.5.camel@echo> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> <532DBAF8.9030300@anl.gov> <1395513490.1382.10.camel@echo> <532DDFA8.3040909@anl.gov> <1395676018.18549.13.camel@echo> <1395858271.26097.5.camel@echo> Message-ID: <53331F9F.9040404@anl.gov> Excellent! The path-difference challenge sounds like the same things we faced in CDM, revisited. I suspect we can find ways to smooth this out in a way that works naturally for users. - Mike On 3/26/14, 1:24 PM, Mihael Hategan wrote: > Almost there. > > It works for non-provider staging. > I'm working on provider staging now. The difficulty is in that providers > must now support glob-pattern staging with a twist. The twist is that > local and remote path names are different. For example, without glob > patterns, a stageout could look like this: > > __root__/dir/a.txt -> /dir/a.txt > > With globs, something like this is possible: > > __root__/dir/a_????_b_????.txt -> /dir/a_????_b_????.txt > > In this case the code needs to recursively glob things and substitute > each glob group in the destination for the respective matching glob in > the source. > > Luckily there are only two providers that support staging at this point: > local and coasters. Unfortunately this has to be implemented twice: once > in Java for local, and once in Perl for coasters. > > Mihael > > On Mon, 2014-03-24 at 08:46 -0700, Mihael Hategan wrote: >> An update on this. >> >> I'm still working on it, but here is the basic idea: >> >> - non-static mappers now support a method that, given a data type, >> returns a list of glob patterns that can be used to search for files >> that could be mapped by that mapper. The list (as opposed to one glob >> pattern) is necessary because there might be cases when you have: >> type s { file a; file[] b}; >> s[] x; >> Then s[] could match either s_????.a or s_????.???? >> - this (possibly empty) list gets sent to _swiftwrap >> - after the job is done, _swiftwrap creates a list of files matching >> those patterns >> - swift-int copies that list back and the files in it and uses the list >> to populate data in a fashion similar to what is done for input >> variables >> >> This is without provider staging. >> >> For provider staging, providers that support staging need to be modified >> to support staging out of files using glob patterns. There might be some >> complications there due to the local vs. remote path naming conventions. >> >> Mihael >> >> On Sat, 2014-03-22 at 14:08 -0500, Michael Wilde wrote: >>> On 3/22/14, 1:38 PM, Mihael Hategan wrote: >>>> My opinion is that this problem is NOT a language/mapper issue. It is an >>>> issue of implementation: how do you get the information about files to a >>>> place where it can be used. >>>> >>>> So I believe that whether we add a new mapper or make it work with >>>> existing mappers, we still need to fix that other more complex problem. >>>> This is the reason why I believe we shouldn't add anything to the >>>> language. >>> Mihael and I discussed this in a chat just now, and I think we are in >>> fact *in* sync. >>> So he's going to push forward on this. >>> >>> - Mike >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From hategan at mcs.anl.gov Wed Mar 26 14:04:29 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 26 Mar 2014 12:04:29 -0700 Subject: [Swift-devel] Multiple output files In-Reply-To: <53331F9F.9040404@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> <532DBAF8.9030300@anl.gov> <1395513490.1382.10.camel@echo> <532DDFA8.3040909@anl.gov> <1395676018.18549.13.camel@echo> <1395858271.26097.5.camel@echo> <53331F9F.9040404@anl.gov> Message-ID: <1395860669.26428.2.camel@echo> Fortunately this would not be something that the users would see. But if they did, a*b -> c*d seems pretty intuitive to me. Maybe even more so than a(.*)b -> c\1d. It's the implementation of this that is more difficult. Mihael On Wed, 2014-03-26 at 13:42 -0500, Michael Wilde wrote: > Excellent! > > The path-difference challenge sounds like the same things we faced in > CDM, revisited. > > I suspect we can find ways to smooth this out in a way that works > naturally for users. > > - Mike > > > On 3/26/14, 1:24 PM, Mihael Hategan wrote: > > Almost there. > > > > It works for non-provider staging. > > I'm working on provider staging now. The difficulty is in that providers > > must now support glob-pattern staging with a twist. The twist is that > > local and remote path names are different. For example, without glob > > patterns, a stageout could look like this: > > > > __root__/dir/a.txt -> /dir/a.txt > > > > With globs, something like this is possible: > > > > __root__/dir/a_????_b_????.txt -> /dir/a_????_b_????.txt > > > > In this case the code needs to recursively glob things and substitute > > each glob group in the destination for the respective matching glob in > > the source. > > > > Luckily there are only two providers that support staging at this point: > > local and coasters. Unfortunately this has to be implemented twice: once > > in Java for local, and once in Perl for coasters. > > > > Mihael > > > > On Mon, 2014-03-24 at 08:46 -0700, Mihael Hategan wrote: > >> An update on this. > >> > >> I'm still working on it, but here is the basic idea: > >> > >> - non-static mappers now support a method that, given a data type, > >> returns a list of glob patterns that can be used to search for files > >> that could be mapped by that mapper. The list (as opposed to one glob > >> pattern) is necessary because there might be cases when you have: > >> type s { file a; file[] b}; > >> s[] x; > >> Then s[] could match either s_????.a or s_????.???? > >> - this (possibly empty) list gets sent to _swiftwrap > >> - after the job is done, _swiftwrap creates a list of files matching > >> those patterns > >> - swift-int copies that list back and the files in it and uses the list > >> to populate data in a fashion similar to what is done for input > >> variables > >> > >> This is without provider staging. > >> > >> For provider staging, providers that support staging need to be modified > >> to support staging out of files using glob patterns. There might be some > >> complications there due to the local vs. remote path naming conventions. > >> > >> Mihael > >> > >> On Sat, 2014-03-22 at 14:08 -0500, Michael Wilde wrote: > >>> On 3/22/14, 1:38 PM, Mihael Hategan wrote: > >>>> My opinion is that this problem is NOT a language/mapper issue. It is an > >>>> issue of implementation: how do you get the information about files to a > >>>> place where it can be used. > >>>> > >>>> So I believe that whether we add a new mapper or make it work with > >>>> existing mappers, we still need to fix that other more complex problem. > >>>> This is the reason why I believe we shouldn't add anything to the > >>>> language. > >>> Mihael and I discussed this in a chat just now, and I think we are in > >>> fact *in* sync. > >>> So he's going to push forward on this. > >>> > >>> - Mike > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > From hategan at mcs.anl.gov Thu Mar 27 21:11:13 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 27 Mar 2014 19:11:13 -0700 Subject: [Swift-devel] Multiple output files In-Reply-To: <1395860669.26428.2.camel@echo> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> <532DBAF8.9030300@anl.gov> <1395513490.1382.10.camel@echo> <532DDFA8.3040909@anl.gov> <1395676018.18549.13.camel@echo> <1395858271.26097.5.camel@echo> <53331F9F.9040404@anl.gov> <1395860669.26428.2.camel@echo> Message-ID: <1395972673.28720.13.camel@echo> I committed the changes. This only works for normal swift staging and provider staging. It can be made to work with wrapper staging, but I have not done so. As far as I can tell, there are only two mappers that can map arrays based on what's on the filesystem rather than what the mapper parameters are saying: FilesysMapper and SimpleMapper (well, there's also ConcurrentMapper and FMRIMapper, and, while they should be useable for collecting output data, it's unlikely that anyone will try to guess how to name files so that they match what those mappers are expecting). To use the feature, do the obvious: file[] a ; // OR // file[] a ; // OR // file[] a ; app (file[] oa) gen(int i) { gen i; // writes a bunch of files of the form foo????.out to outs/ } a = gen(3); I've also committed tests for this. Yadu, if you have some time, can you check why the check script doesn't find the output from the run? I don't remember how that worked. There are probably going to be a few bugs, since a lot of how the swift staging data was handled has changed. So please test this as swift in general. One other thing to note is that the globbing is only supported for the file names. For example: /a/b/??/c/*.txt won't work, but /a/b/x/c/*.txt will. However, that's if you use a pattern with the FilesysMapper. Complex structures mapped with SimpleMapper should work. For example, you should be able to do this: type struct { file a; file[] b; } app (struct[] o) f(...) {} It should figure out the ranges for both o[] and all of o[x].b[]. Mihael On Wed, 2014-03-26 at 12:04 -0700, Mihael Hategan wrote: > Fortunately this would not be something that the users would see. > > But if they did, a*b -> c*d seems pretty intuitive to me. Maybe even > more so than > a(.*)b -> c\1d. It's the implementation of this that is more difficult. > > Mihael > > On Wed, 2014-03-26 at 13:42 -0500, Michael Wilde wrote: > > Excellent! > > > > The path-difference challenge sounds like the same things we faced in > > CDM, revisited. > > > > I suspect we can find ways to smooth this out in a way that works > > naturally for users. > > > > - Mike > > > > > > On 3/26/14, 1:24 PM, Mihael Hategan wrote: > > > Almost there. > > > > > > It works for non-provider staging. > > > I'm working on provider staging now. The difficulty is in that providers > > > must now support glob-pattern staging with a twist. The twist is that > > > local and remote path names are different. For example, without glob > > > patterns, a stageout could look like this: > > > > > > __root__/dir/a.txt -> /dir/a.txt > > > > > > With globs, something like this is possible: > > > > > > __root__/dir/a_????_b_????.txt -> /dir/a_????_b_????.txt > > > > > > In this case the code needs to recursively glob things and substitute > > > each glob group in the destination for the respective matching glob in > > > the source. > > > > > > Luckily there are only two providers that support staging at this point: > > > local and coasters. Unfortunately this has to be implemented twice: once > > > in Java for local, and once in Perl for coasters. > > > > > > Mihael > > > > > > On Mon, 2014-03-24 at 08:46 -0700, Mihael Hategan wrote: > > >> An update on this. > > >> > > >> I'm still working on it, but here is the basic idea: > > >> > > >> - non-static mappers now support a method that, given a data type, > > >> returns a list of glob patterns that can be used to search for files > > >> that could be mapped by that mapper. The list (as opposed to one glob > > >> pattern) is necessary because there might be cases when you have: > > >> type s { file a; file[] b}; > > >> s[] x; > > >> Then s[] could match either s_????.a or s_????.???? > > >> - this (possibly empty) list gets sent to _swiftwrap > > >> - after the job is done, _swiftwrap creates a list of files matching > > >> those patterns > > >> - swift-int copies that list back and the files in it and uses the list > > >> to populate data in a fashion similar to what is done for input > > >> variables > > >> > > >> This is without provider staging. > > >> > > >> For provider staging, providers that support staging need to be modified > > >> to support staging out of files using glob patterns. There might be some > > >> complications there due to the local vs. remote path naming conventions. > > >> > > >> Mihael > > >> > > >> On Sat, 2014-03-22 at 14:08 -0500, Michael Wilde wrote: > > >>> On 3/22/14, 1:38 PM, Mihael Hategan wrote: > > >>>> My opinion is that this problem is NOT a language/mapper issue. It is an > > >>>> issue of implementation: how do you get the information about files to a > > >>>> place where it can be used. > > >>>> > > >>>> So I believe that whether we add a new mapper or make it work with > > >>>> existing mappers, we still need to fix that other more complex problem. > > >>>> This is the reason why I believe we shouldn't add anything to the > > >>>> language. > > >>> Mihael and I discussed this in a chat just now, and I think we are in > > >>> fact *in* sync. > > >>> So he's going to push forward on this. > > >>> > > >>> - Mike > > >> > > >> _______________________________________________ > > >> Swift-devel mailing list > > >> Swift-devel at ci.uchicago.edu > > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at anl.gov Thu Mar 27 21:53:54 2014 From: wilde at anl.gov (Michael Wilde) Date: Thu, 27 Mar 2014 21:53:54 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <1395972673.28720.13.camel@echo> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> <532DBAF8.9030300@anl.gov> <1395513490.1382.10.camel@echo> <532DDFA8.3040909@anl.gov> <1395676018.18549.13.camel@echo> <1395858271.26097.5.camel@echo> <53331F9F.9040404@anl.gov> <1395860669.26428.2.camel@echo> <1395972673.28720.13.camel@echo> Message-ID: <5334E442.2000101@anl.gov> Awesome work, Mihael! Im eager to try it. I assume its only committed to trunk? - Mike On 3/27/14, 9:11 PM, Mihael Hategan wrote: > I committed the changes. > > This only works for normal swift staging and provider staging. It can be > made to work with wrapper staging, but I have not done so. > > As far as I can tell, there are only two mappers that can map arrays > based on what's on the filesystem rather than what the mapper parameters > are saying: FilesysMapper and SimpleMapper (well, there's also > ConcurrentMapper and FMRIMapper, and, while they should be useable for > collecting output data, it's unlikely that anyone will try to guess how > to name files so that they match what those mappers are expecting). > > To use the feature, do the obvious: > > file[] a ; > // OR > // file[] a ; > // OR > // file[] a ; > > app (file[] oa) gen(int i) { > gen i; // writes a bunch of files of the form foo????.out to > outs/ > } > > a = gen(3); > > I've also committed tests for this. Yadu, if you have some time, can you > check why the check script doesn't find the output from the run? I don't > remember how that worked. > > There are probably going to be a few bugs, since a lot of how the swift > staging data was handled has changed. So please test this as swift in > general. > > One other thing to note is that the globbing is only supported for the > file names. For example: /a/b/??/c/*.txt won't work, but /a/b/x/c/*.txt > will. However, that's if you use a pattern with the FilesysMapper. > Complex structures mapped with SimpleMapper should work. For example, > you should be able to do this: > > type struct { > file a; > file[] b; > } > > app (struct[] o) f(...) {} > > It should figure out the ranges for both o[] and all of o[x].b[]. > > Mihael > > On Wed, 2014-03-26 at 12:04 -0700, Mihael Hategan wrote: >> Fortunately this would not be something that the users would see. >> >> But if they did, a*b -> c*d seems pretty intuitive to me. Maybe even >> more so than >> a(.*)b -> c\1d. It's the implementation of this that is more difficult. >> >> Mihael >> >> On Wed, 2014-03-26 at 13:42 -0500, Michael Wilde wrote: >>> Excellent! >>> >>> The path-difference challenge sounds like the same things we faced in >>> CDM, revisited. >>> >>> I suspect we can find ways to smooth this out in a way that works >>> naturally for users. >>> >>> - Mike >>> >>> >>> On 3/26/14, 1:24 PM, Mihael Hategan wrote: >>>> Almost there. >>>> >>>> It works for non-provider staging. >>>> I'm working on provider staging now. The difficulty is in that providers >>>> must now support glob-pattern staging with a twist. The twist is that >>>> local and remote path names are different. For example, without glob >>>> patterns, a stageout could look like this: >>>> >>>> __root__/dir/a.txt -> /dir/a.txt >>>> >>>> With globs, something like this is possible: >>>> >>>> __root__/dir/a_????_b_????.txt -> /dir/a_????_b_????.txt >>>> >>>> In this case the code needs to recursively glob things and substitute >>>> each glob group in the destination for the respective matching glob in >>>> the source. >>>> >>>> Luckily there are only two providers that support staging at this point: >>>> local and coasters. Unfortunately this has to be implemented twice: once >>>> in Java for local, and once in Perl for coasters. >>>> >>>> Mihael >>>> >>>> On Mon, 2014-03-24 at 08:46 -0700, Mihael Hategan wrote: >>>>> An update on this. >>>>> >>>>> I'm still working on it, but here is the basic idea: >>>>> >>>>> - non-static mappers now support a method that, given a data type, >>>>> returns a list of glob patterns that can be used to search for files >>>>> that could be mapped by that mapper. The list (as opposed to one glob >>>>> pattern) is necessary because there might be cases when you have: >>>>> type s { file a; file[] b}; >>>>> s[] x; >>>>> Then s[] could match either s_????.a or s_????.???? >>>>> - this (possibly empty) list gets sent to _swiftwrap >>>>> - after the job is done, _swiftwrap creates a list of files matching >>>>> those patterns >>>>> - swift-int copies that list back and the files in it and uses the list >>>>> to populate data in a fashion similar to what is done for input >>>>> variables >>>>> >>>>> This is without provider staging. >>>>> >>>>> For provider staging, providers that support staging need to be modified >>>>> to support staging out of files using glob patterns. There might be some >>>>> complications there due to the local vs. remote path naming conventions. >>>>> >>>>> Mihael >>>>> >>>>> On Sat, 2014-03-22 at 14:08 -0500, Michael Wilde wrote: >>>>>> On 3/22/14, 1:38 PM, Mihael Hategan wrote: >>>>>>> My opinion is that this problem is NOT a language/mapper issue. It is an >>>>>>> issue of implementation: how do you get the information about files to a >>>>>>> place where it can be used. >>>>>>> >>>>>>> So I believe that whether we add a new mapper or make it work with >>>>>>> existing mappers, we still need to fix that other more complex problem. >>>>>>> This is the reason why I believe we shouldn't add anything to the >>>>>>> language. >>>>>> Mihael and I discussed this in a chat just now, and I think we are in >>>>>> fact *in* sync. >>>>>> So he's going to push forward on this. >>>>>> >>>>>> - Mike >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From hategan at mcs.anl.gov Thu Mar 27 22:05:38 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 27 Mar 2014 20:05:38 -0700 Subject: [Swift-devel] Multiple output files In-Reply-To: <5334E442.2000101@anl.gov> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <532C410C.5080309@anl.gov> <532DBAF8.9030300@anl.gov> <1395513490.1382.10.camel@echo> <532DDFA8.3040909@anl.gov> <1395676018.18549.13.camel@echo> <1395858271.26097.5.camel@echo> <53331F9F.9040404@anl.gov> <1395860669.26428.2.camel@echo> <1395972673.28720.13.camel@echo> <5334E442.2000101@anl.gov> Message-ID: <1395975938.29826.1.camel@echo> On Thu, 2014-03-27 at 21:53 -0500, Michael Wilde wrote: > Awesome work, Mihael! Thanks. > > Im eager to try it. I assume its only committed to trunk? Yes. Due to the amount of changes I would suggest not rushing it into a release. Mihael From hategan at mcs.anl.gov Thu Mar 27 22:25:03 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 27 Mar 2014 20:25:03 -0700 Subject: [Swift-devel] .swiftx and .kml files Message-ID: <1395977103.29826.4.camel@echo> Hi, The .swiftx and .kml files get deleted by bin/swift after a run. They are maybe not very user friendly, but they are useful for debugging and swift skips re-compiling things if they are already there. Python does something similar. So I would vote against having them deleted. Mihael From davidkelly at uchicago.edu Thu Mar 27 23:11:47 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Thu, 27 Mar 2014 23:11:47 -0500 Subject: [Swift-devel] .swiftx and .kml files In-Reply-To: <1395977103.29826.4.camel@echo> References: <1395977103.29826.4.camel@echo> Message-ID: I think the problem with keeping *.xml and *.swiftx is that it clutters up your working directory with files that the majority of Swift users will never (directly) use. Is it possible to make the xml and swiftx files hidden? You'd get any possible compilation benefits without them getting in the way. Otherwise I'd say it makes sense to have a property to control this: If script.debug.keep=false (default) and a run directory is created, move them to the run directory at the end of the run so they are available for debugging If script.debug.keep=false and no run directory is created, remove them If script.debug.keep=true, keep them untouched in the current working directory On Thu, Mar 27, 2014 at 10:25 PM, Mihael Hategan wrote: > Hi, > > The .swiftx and .kml files get deleted by bin/swift after a run. They > are maybe not very user friendly, but they are useful for debugging and > swift skips re-compiling things if they are already there. Python does > something similar. > > So I would vote against having them deleted. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Thu Mar 27 23:38:06 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 27 Mar 2014 23:38:06 -0500 Subject: [Swift-devel] .swiftx and .kml files In-Reply-To: References: <1395977103.29826.4.camel@echo> Message-ID: We could make the kml and swiftx files dot files, or we could move them to the run directory instead of deleting them. On Thu, Mar 27, 2014 at 11:11 PM, David Kelly wrote: > I think the problem with keeping *.xml and *.swiftx is that it clutters up > your working directory with files that the majority of Swift users will > never (directly) use. > > Is it possible to make the xml and swiftx files hidden? You'd get any > possible compilation benefits without them getting in the way. > > Otherwise I'd say it makes sense to have a property to control this: > > If script.debug.keep=false (default) and a run directory is created, move > them to the run directory at the end of the run so they are available for > debugging > If script.debug.keep=false and no run directory is created, remove them > If script.debug.keep=true, keep them untouched in the current working > directory > > > On Thu, Mar 27, 2014 at 10:25 PM, Mihael Hategan wrote: > >> Hi, >> >> The .swiftx and .kml files get deleted by bin/swift after a run. They >> are maybe not very user friendly, but they are useful for debugging and >> swift skips re-compiling things if they are already there. Python does >> something similar. >> >> So I would vote against having them deleted. >> >> Mihael >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Mar 28 01:15:43 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 27 Mar 2014 23:15:43 -0700 Subject: [Swift-devel] .swiftx and .kml files In-Reply-To: References: <1395977103.29826.4.camel@echo> Message-ID: <1395987343.6866.3.camel@echo> On Thu, 2014-03-27 at 23:11 -0500, David Kelly wrote: > I think the problem with keeping *.xml and *.swiftx is that it clutters up > your working directory with files that the majority of Swift users will > never (directly) use. The even larger majority of python users don't use the .pyc or .pyo files. I would agree that log files that accumulate in the same directory clutter things. But .swiftx and .kml (should probably rename that to .swiftk) don't multiply, so it isn't such a big problem. So I agree with you in principle. But I don't think the benefits outweigh the cost. Mihael From yadunand at uchicago.edu Mon Mar 31 12:08:04 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Mon, 31 Mar 2014 12:08:04 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <64FDB7E3-003E-4E7A-9D7C-221072BA65B2@uchicago.edu> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <7A6631C0-A44D-4C47-BBB1-4B057D9CC14F@uchicago.edu> <563F817A-8D87-4AE6-902A-595EF17BAEB0@uchicago.edu> <532CB34D.3040701@anl.gov> <532CC271.4040805@uchicago.edu> <5843DF77-D52D-454E-9EA0-B653DFE7DA6A@uchicago.edu> <64FDB7E3-003E-4E7A-9D7C-221072BA65B2@uchicago.edu> Message-ID: <5339A0F4.40206@uchicago.edu> Hi Jonathan, Your understanding is spot on. Swift 0.94.1 used a config mechanism which required 3 config files, a tc.file (apps), a config file (swift.properties) and a sites.file (sites.xml). This is supported on the newer versions of swift for backward compatibility. Swift 0.95, uses a simplified config mechanism that merges all three config files into one swift.properties file. The midway tutorial is based on this version. So, If you are using the default swift, you should be using this : swift -tc.file apps -config swift.properties -sites.file sites.xml var_arrays.swift -dir=/scratch/midway/$USER/temp If you find the new config mechanism preferable, please use Swift 0.95. Here's a sample swift.properties file: site=midway-westmere app.midway-westmere.sh=/bin/bash site.midway-westmere { jobManager=slurm jobQueue=westmere tasksPerWorker=12 initialScore=10000 filesystem=local workdir=/scratch/midway/$USER/work } I ran this on midway with the following command. "swift.properties" will be picked up if it is named so, otherwise use the key: -properties : swift var_arrays.swift Thanks, Yadu On 03/31/2014 10:54 AM, Jonathan Ozik wrote: > Yadu, > > In the example that you provided you have a sites.xml file that > includes a lot of specific options to run on Midway. In the Midway > tutorial, on the other hand > (http://swift-lang.org/tutorials/midway/tutorial.html) it looks like > the swift.properties file is given a minimal set of overriding > submission parameters and the rest of the information is filled in by > sensible default values. Based on this I have two questions: > 1. Is my understanding correct regarding the swift.properties > functionality or do I have to provide a sites.xml file as well? > 2. Is it okay to use the default swift version provided by Midway? > I've been using that for my local testing but would like to know if > anything that we've discussed is very different > between swift/0.94.1(default) and the swift/0.95-RC1 version that the > tutorial wants to load. > > Jonathan > > On Mar 21, 2014, at 6:25 PM, Jonathan Ozik > wrote: > >> Yadu, Mike, >> >> Thank you for this. Much appreciated. >> >> Have a good weekend, >> >> Jonathan >> >> On Mar 21, 2014, at 5:51 PM, Yadu Nand B > > wrote: >> >>> Hi Jonathan, >>> >>> I have a tar ball of an example which you can download from here -> >>> http://swift.rcc.uchicago.edu:8042/var_arrays.tar.gz >>> >>> You can run the example on Midway using the following command: >>> swift -tc.file apps -config swift.properties -sites.file sites.xml >>> var_arrays.swift -dir=/scratch/midway/$USER/temp >>> >>> Please ensure that the directory passed to dir is on a shared >>> filesystem (/scratch on midway). >>> >>> The first foreach loop runs the script gen_n_files which creates a >>> random number of files within the specified directory >>> and echoes the names of the files to stdout. The stdout file is read >>> by swift using readData and passed to the array_mapper >>> to be mapped to an array, which is placed in an array of arrays. The >>> second foreach loop goes over each array in the array of >>> arrays and simply sums up all integers present in each array of files. >>> >>> Slightly contrived example, but I hope you get the method used. Let >>> me know if you want this example to be expanded >>> in any way. >>> >>> Thanks, >>> Yadu >>> >>> >>> On 03/21/2014 04:46 PM, Michael Wilde wrote: >>>> Jonathan, >>>> >>>> @text is behaving as expected here. The rationale is as follows. >>>> This line: >>>> >>>> file text >>> file="/Users/jozik/temp/instance_4/customOut7.dat">; >>>> >>>> ...asociates the variable "text" with the file >>>> /Users/jozik/temp/instance_4/customOut7.dat. >>>> >>>> If that file was passed to (or returned from) an app() call, then >>>> @filename(text) (for which @text is a shorthand notation) would be >>>> the name by which the app should refer to the file, relative to the >>>> app's job directory. The leading "/" is removed because the file, >>>> if its an input to the app, would get linked to the job dir with >>>> all of its directory components as specified in the mapping. Ie it >>>> be linked to ./Users/jozik/temp/instance_4/customOut7.dat >>>> If that file was an output from the app, Swift would expect the app >>>> to create a file by this name below the job dir. >>>> >>>> Again, these semantics date back to the origins of Swift, when >>>> every job was essentially expected to be executed on a remote grid >>>> node under Globus. >>>> >>>> Yadu is working on a complete example of the multiple-file return >>>> case right now. >>>> >>>> - Mike >>>> >>>> On 3/21/14, 1:59 PM, Jonathan Ozik wrote: >>>>> Mike, >>>>> >>>>> It looks like I misunderstood your workaround initially. Now I'm >>>>> having an issue with specifying absolute paths. >>>>> For example: >>>>> file text >>>> file="/Users/jozik/temp/instance_4/customOut7.dat">; >>>>> tracef("The file name is: %s\n", at text); >>>>> >>>>> yields: >>>>> The file name is: Users/jozik/temp/instance_4/customOut7.dat >>>>> (the leading forward slash is missing) >>>>> >>>>> The idea here is that the output data is being placed in a well >>>>> known location and retrieved via the output file location >>>>> aggregator. This is a pared down example where I'm looking to see >>>>> what each line from the output file location aggregator would be >>>>> interpreted as in swift. >>>>> >>>>> Jonathan >>>>> >>>>> On Mar 21, 2014, at 11:51 AM, Jonathan Ozik >>>> > wrote: >>>>> >>>>>> Mike, >>>>>> >>>>>> Thank you again for the detailed responses. I'm getting a better >>>>>> handle on what can be done and am trying to implement the >>>>>> workaround you suggested. >>>>>> Speaking of which, is the reason that a shared directory location >>>>>> needs to be utilized because readData() does not know to look in >>>>>> the "app task directory" and defaults to the swift script launch >>>>>> directory? >>>>>> >>>>>> Thanks again for the guidance, >>>>>> >>>>>> Jonathan >>>>>> >>>>>> On Mar 21, 2014, at 8:15 AM, Michael Wilde >>>>> > wrote: >>>>>> >>>>>>> Hi Jonathan, >>>>>>> >>>>>>> Thanks for bearing with us on this. I can see clearly where our >>>>>>> documentation is falling short of explaining this clearly. >>>>>>> >>>>>>> Ive got to work on some deadlines today, but I'll see if someone >>>>>>> else on the team can post a clarification with some examples. >>>>>>> >>>>>>> A brief response, below. >>>>>>> >>>>>>> On 3/20/14, 9:51 PM, Jonathan Ozik wrote: >>>>>>>> Hi Mike, >>>>>>>> >>>>>>>> I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on. >>>>>>> No, just swift-user. Ideally we should he started this >>>>>>> discussion there. I steered you to swift-devel because I thought >>>>>>> the issue was one of a new feature requirement, but I see its >>>>>>> also one of documentation and training. >>>>>>> >>>>>>> ... >>>>>>>> An app *can* return multiple files - even an array of files - >>>>>>>> but not an array of files whose names and count is not known >>>>>>>> before the app is launched. >>>>>>>> This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?) >>>>>>> Yes, that's the problem: you would need to know the exact names, >>>>>>> in the Swift script, before the app is called, so that you can >>>>>>> *map* all output file variables to the names that the app will >>>>>>> be *expected* to produce. I.e., current one needs a priori >>>>>>> knowledge of all output file names, and you need to map variable >>>>>>> (which can include array and structure members) to those names. >>>>>>>> and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality. >>>>>>>> >>>>>>>> For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app? >>>>>>> Thats the current language deficiency: you can not. We will >>>>>>> explain later today in more detail. >>>>>>> >>>>>>> The app is expected to produce all output files that any of its >>>>>>> output variables (or arrays or structures) are mapped to. >>>>>>> >>>>>>> For example, you can map an *ouput* array to the file names >>>>>>> f1.out, f2.out, and f3.out. Then the app will be expected to >>>>>>> produce those files. If it doesnt, Swift will raise a runtime >>>>>>> error. So if you know a prior (before the app is called) from >>>>>>> context or from input aregument values that these 3 files will >>>>>>> be produced, you can use one of the array mappers or the "ext" >>>>>>> mapper to declare this expectation. >>>>>>> >>>>>>> The best way to get past this obstacle (while we develop the >>>>>>> desired capability) is as follows. If you are running on a >>>>>>> single machine, you can write a wrapper shell script around the >>>>>>> repast app that runs repast and then returns a single file that >>>>>>> contains a *list* of its output files. But you need to place >>>>>>> these output files in a known shared directory, not in the >>>>>>> current working directory in which Swift will run the repast app >>>>>>> (called the "job directory" at the moment -- soon to be renamed >>>>>>> the "app task directory"). Then you do a readData() on this >>>>>>> returned file to create an array of strings, and use that array >>>>>>> with the "array" mapper (explained in the User Guide). >>>>>>> >>>>>>> We'll post to you a working example of as soon as possible - >>>>>>> today, if time permits. As well as an example of the proposed >>>>>>> new feature. >>>>>>> >>>>>>> - Mike >>>>>>>> I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information... >>>>>>>> >>>>>>>>> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. >>>>>>>>> >>>>>>>>> Mihael: is this something you could implement in the near future - after we agree on the semantics? >>>>>>>>> >>>>>>>>> Justin, Tim, do you want to comment on this from a Swift/T perspective? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> - Mike >>>>>>>>>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? >>>>>>>>>> >>>>>>>>>> Jonathan >>>>>>>>>> >>>>>>>>>> On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: >>>>>>>>>> >>>>>>>>>>> Hi Jonathan, >>>>>>>>>>> >>>>>>>>>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >>>>>>>>>>> >>>>>>>>>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >>>>>>>>>>> >>>>>>>>>>> What we do for now is one of these two work-arounds: >>>>>>>>>>> >>>>>>>>>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >>>>>>>>>>> >>>>>>>>>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >>>>>>>>>>> >>>>>>>>>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >>>>>>>>>>> >>>>>>>>>>> Im cc'ing swift-devel to see what we can do. >>>>>>>>>>> >>>>>>>>>>> Thanks for reminding us of this fairly common need! >>>>>>>>>>> >>>>>>>>>>> - Mike >>>>>>>>>>> >>>>>>>>>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>>>>>>>>>>> Mike, >>>>>>>>>>>> >>>>>>>>>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>>>>>>>>>>> >>>>>>>>>>>> Jonathan >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Michael Wilde >>>>>>>>>>> Mathematics and Computer Science Computation Institute >>>>>>>>>>> Argonne National Laboratory The University of Chicago >>>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Michael Wilde >>>>>>>>> Mathematics and Computer Science Computation Institute >>>>>>>>> Argonne National Laboratory The University of Chicago >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Michael Wilde >>>>>>> Mathematics and Computer Science Computation Institute >>>>>>> Argonne National Laboratory The University of Chicago >>>>>> >>>>> >>>> >>>> -- >>>> Michael Wilde >>>> Mathematics and Computer Science Computation Institute >>>> Argonne National Laboratory The University of Chicago >>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jozik at uchicago.edu Fri Mar 21 18:25:29 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Fri, 21 Mar 2014 23:25:29 -0000 Subject: [Swift-devel] Multiple output files In-Reply-To: <532CC271.4040805@uchicago.edu> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <7A6631C0-A44D-4C47-BBB1-4B057D9CC14F@uchicago.edu> <563F817A-8D87-4AE6-902A-595EF17BAEB0@uchicago.edu> <532CB34D.3040701@anl.gov> <532CC271.4040805@uchicago.edu> Message-ID: <5843DF77-D52D-454E-9EA0-B653DFE7DA6A@uchicago.edu> Yadu, Mike, Thank you for this. Much appreciated. Have a good weekend, Jonathan On Mar 21, 2014, at 5:51 PM, Yadu Nand B wrote: > Hi Jonathan, > > I have a tar ball of an example which you can download from here -> > http://swift.rcc.uchicago.edu:8042/var_arrays.tar.gz > > You can run the example on Midway using the following command: > swift -tc.file apps -config swift.properties -sites.file sites.xml var_arrays.swift -dir=/scratch/midway/$USER/temp > > Please ensure that the directory passed to dir is on a shared filesystem (/scratch on midway). > > The first foreach loop runs the script gen_n_files which creates a random number of files within the specified directory > and echoes the names of the files to stdout. The stdout file is read by swift using readData and passed to the array_mapper > to be mapped to an array, which is placed in an array of arrays. The second foreach loop goes over each array in the array of > arrays and simply sums up all integers present in each array of files. > > Slightly contrived example, but I hope you get the method used. Let me know if you want this example to be expanded > in any way. > > Thanks, > Yadu > > > On 03/21/2014 04:46 PM, Michael Wilde wrote: >> Jonathan, >> >> @text is behaving as expected here. The rationale is as follows. This line: >> >> file text ; >> >> ...asociates the variable "text" with the file /Users/jozik/temp/instance_4/customOut7.dat. >> >> If that file was passed to (or returned from) an app() call, then @filename(text) (for which @text is a shorthand notation) would be the name by which the app should refer to the file, relative to the app's job directory. The leading "/" is removed because the file, if its an input to the app, would get linked to the job dir with all of its directory components as specified in the mapping. Ie it be linked to ./Users/jozik/temp/instance_4/customOut7.dat >> If that file was an output from the app, Swift would expect the app to create a file by this name below the job dir. >> >> Again, these semantics date back to the origins of Swift, when every job was essentially expected to be executed on a remote grid node under Globus. >> >> Yadu is working on a complete example of the multiple-file return case right now. >> >> - Mike >> >> On 3/21/14, 1:59 PM, Jonathan Ozik wrote: >>> Mike, >>> >>> It looks like I misunderstood your workaround initially. Now I'm having an issue with specifying absolute paths. >>> For example: >>> file text ; >>> tracef("The file name is: %s\n", at text); >>> >>> yields: >>> The file name is: Users/jozik/temp/instance_4/customOut7.dat >>> (the leading forward slash is missing) >>> >>> The idea here is that the output data is being placed in a well known location and retrieved via the output file location aggregator. This is a pared down example where I'm looking to see what each line from the output file location aggregator would be interpreted as in swift. >>> >>> Jonathan >>> >>> On Mar 21, 2014, at 11:51 AM, Jonathan Ozik wrote: >>> >>>> Mike, >>>> >>>> Thank you again for the detailed responses. I'm getting a better handle on what can be done and am trying to implement the workaround you suggested. >>>> Speaking of which, is the reason that a shared directory location needs to be utilized because readData() does not know to look in the "app task directory" and defaults to the swift script launch directory? >>>> >>>> Thanks again for the guidance, >>>> >>>> Jonathan >>>> >>>> On Mar 21, 2014, at 8:15 AM, Michael Wilde wrote: >>>> >>>>> Hi Jonathan, >>>>> >>>>> Thanks for bearing with us on this. I can see clearly where our documentation is falling short of explaining this clearly. >>>>> >>>>> Ive got to work on some deadlines today, but I'll see if someone else on the team can post a clarification with some examples. >>>>> >>>>> A brief response, below. >>>>> >>>>> On 3/20/14, 9:51 PM, Jonathan Ozik wrote: >>>>>> Hi Mike, >>>>>> >>>>>> I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on. >>>>> No, just swift-user. Ideally we should he started this discussion there. I steered you to swift-devel because I thought the issue was one of a new feature requirement, but I see its also one of documentation and training. >>>>> >>>>> ... >>>>>> An app *can* return multiple files - even an array of files - but not an array of files whose names and count is not known before the app is launched. >>>>>> This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?) >>>>> Yes, that's the problem: you would need to know the exact names, in the Swift script, before the app is called, so that you can *map* all output file variables to the names that the app will be *expected* to produce. I.e., current one needs a priori knowledge of all output file names, and you need to map variable (which can include array and structure members) to those names. >>>>>> and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality. >>>>>> >>>>>> For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app? >>>>> Thats the current language deficiency: you can not. We will explain later today in more detail. >>>>> >>>>> The app is expected to produce all output files that any of its output variables (or arrays or structures) are mapped to. >>>>> >>>>> For example, you can map an *ouput* array to the file names f1.out, f2.out, and f3.out. Then the app will be expected to produce those files. If it doesnt, Swift will raise a runtime error. So if you know a prior (before the app is called) from context or from input aregument values that these 3 files will be produced, you can use one of the array mappers or the "ext" mapper to declare this expectation. >>>>> >>>>> The best way to get past this obstacle (while we develop the desired capability) is as follows. If you are running on a single machine, you can write a wrapper shell script around the repast app that runs repast and then returns a single file that contains a *list* of its output files. But you need to place these output files in a known shared directory, not in the current working directory in which Swift will run the repast app (called the "job directory" at the moment -- soon to be renamed the "app task directory"). Then you do a readData() on this returned file to create an array of strings, and use that array with the "array" mapper (explained in the User Guide). >>>>> >>>>> We'll post to you a working example of as soon as possible - today, if time permits. As well as an example of the proposed new feature. >>>>> >>>>> - Mike >>>>>> I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information... >>>>>> >>>>>>> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. >>>>>>> >>>>>>> Mihael: is this something you could implement in the near future - after we agree on the semantics? >>>>>>> >>>>>>> Justin, Tim, do you want to comment on this from a Swift/T perspective? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> - Mike >>>>>>>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? >>>>>>>> >>>>>>>> Jonathan >>>>>>>> >>>>>>>> On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: >>>>>>>> >>>>>>>>> Hi Jonathan, >>>>>>>>> >>>>>>>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >>>>>>>>> >>>>>>>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >>>>>>>>> >>>>>>>>> What we do for now is one of these two work-arounds: >>>>>>>>> >>>>>>>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >>>>>>>>> >>>>>>>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >>>>>>>>> >>>>>>>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >>>>>>>>> >>>>>>>>> Im cc'ing swift-devel to see what we can do. >>>>>>>>> >>>>>>>>> Thanks for reminding us of this fairly common need! >>>>>>>>> >>>>>>>>> - Mike >>>>>>>>> >>>>>>>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>>>>>>>>> Mike, >>>>>>>>>> >>>>>>>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>>>>>>>>> >>>>>>>>>> Jonathan >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Michael Wilde >>>>>>>>> Mathematics and Computer Science Computation Institute >>>>>>>>> Argonne National Laboratory The University of Chicago >>>>>>>>> >>>>>>> -- >>>>>>> Michael Wilde >>>>>>> Mathematics and Computer Science Computation Institute >>>>>>> Argonne National Laboratory The University of Chicago >>>>>>> >>>>> >>>>> -- >>>>> Michael Wilde >>>>> Mathematics and Computer Science Computation Institute >>>>> Argonne National Laboratory The University of Chicago >>>> >>> >> >> -- >> Michael Wilde >> Mathematics and Computer Science Computation Institute >> Argonne National Laboratory The University of Chicago >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jozik at uchicago.edu Mon Mar 31 10:54:42 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Mon, 31 Mar 2014 10:54:42 -0500 Subject: [Swift-devel] Multiple output files In-Reply-To: <5843DF77-D52D-454E-9EA0-B653DFE7DA6A@uchicago.edu> References: <532A305B.4000600@anl.gov> <532B9B8C.7000007@anl.gov> <3FD5A438-74FE-418C-822D-229125EA6CDD@uchicago.edu> <532C3B7C.30609@anl.gov> <7A6631C0-A44D-4C47-BBB1-4B057D9CC14F@uchicago.edu> <563F817A-8D87-4AE6-902A-595EF17BAEB0@uchicago.edu> <532CB34D.3040701@anl.gov> <532CC271.4040805@uchicago.edu> <5843DF77-D52D-454E-9EA0-B653DFE7DA6A@uchicago.edu> Message-ID: <64FDB7E3-003E-4E7A-9D7C-221072BA65B2@uchicago.edu> Yadu, In the example that you provided you have a sites.xml file that includes a lot of specific options to run on Midway. In the Midway tutorial, on the other hand (http://swift-lang.org/tutorials/midway/tutorial.html) it looks like the swift.properties file is given a minimal set of overriding submission parameters and the rest of the information is filled in by sensible default values. Based on this I have two questions: 1. Is my understanding correct regarding the swift.properties functionality or do I have to provide a sites.xml file as well? 2. Is it okay to use the default swift version provided by Midway? I've been using that for my local testing but would like to know if anything that we've discussed is very different between swift/0.94.1(default) and the swift/0.95-RC1 version that the tutorial wants to load. Jonathan On Mar 21, 2014, at 6:25 PM, Jonathan Ozik wrote: > Yadu, Mike, > > Thank you for this. Much appreciated. > > Have a good weekend, > > Jonathan > > On Mar 21, 2014, at 5:51 PM, Yadu Nand B wrote: > >> Hi Jonathan, >> >> I have a tar ball of an example which you can download from here -> >> http://swift.rcc.uchicago.edu:8042/var_arrays.tar.gz >> >> You can run the example on Midway using the following command: >> swift -tc.file apps -config swift.properties -sites.file sites.xml var_arrays.swift -dir=/scratch/midway/$USER/temp >> >> Please ensure that the directory passed to dir is on a shared filesystem (/scratch on midway). >> >> The first foreach loop runs the script gen_n_files which creates a random number of files within the specified directory >> and echoes the names of the files to stdout. The stdout file is read by swift using readData and passed to the array_mapper >> to be mapped to an array, which is placed in an array of arrays. The second foreach loop goes over each array in the array of >> arrays and simply sums up all integers present in each array of files. >> >> Slightly contrived example, but I hope you get the method used. Let me know if you want this example to be expanded >> in any way. >> >> Thanks, >> Yadu >> >> >> On 03/21/2014 04:46 PM, Michael Wilde wrote: >>> Jonathan, >>> >>> @text is behaving as expected here. The rationale is as follows. This line: >>> >>> file text ; >>> >>> ...asociates the variable "text" with the file /Users/jozik/temp/instance_4/customOut7.dat. >>> >>> If that file was passed to (or returned from) an app() call, then @filename(text) (for which @text is a shorthand notation) would be the name by which the app should refer to the file, relative to the app's job directory. The leading "/" is removed because the file, if its an input to the app, would get linked to the job dir with all of its directory components as specified in the mapping. Ie it be linked to ./Users/jozik/temp/instance_4/customOut7.dat >>> If that file was an output from the app, Swift would expect the app to create a file by this name below the job dir. >>> >>> Again, these semantics date back to the origins of Swift, when every job was essentially expected to be executed on a remote grid node under Globus. >>> >>> Yadu is working on a complete example of the multiple-file return case right now. >>> >>> - Mike >>> >>> On 3/21/14, 1:59 PM, Jonathan Ozik wrote: >>>> Mike, >>>> >>>> It looks like I misunderstood your workaround initially. Now I'm having an issue with specifying absolute paths. >>>> For example: >>>> file text ; >>>> tracef("The file name is: %s\n", at text); >>>> >>>> yields: >>>> The file name is: Users/jozik/temp/instance_4/customOut7.dat >>>> (the leading forward slash is missing) >>>> >>>> The idea here is that the output data is being placed in a well known location and retrieved via the output file location aggregator. This is a pared down example where I'm looking to see what each line from the output file location aggregator would be interpreted as in swift. >>>> >>>> Jonathan >>>> >>>> On Mar 21, 2014, at 11:51 AM, Jonathan Ozik wrote: >>>> >>>>> Mike, >>>>> >>>>> Thank you again for the detailed responses. I'm getting a better handle on what can be done and am trying to implement the workaround you suggested. >>>>> Speaking of which, is the reason that a shared directory location needs to be utilized because readData() does not know to look in the "app task directory" and defaults to the swift script launch directory? >>>>> >>>>> Thanks again for the guidance, >>>>> >>>>> Jonathan >>>>> >>>>> On Mar 21, 2014, at 8:15 AM, Michael Wilde wrote: >>>>> >>>>>> Hi Jonathan, >>>>>> >>>>>> Thanks for bearing with us on this. I can see clearly where our documentation is falling short of explaining this clearly. >>>>>> >>>>>> Ive got to work on some deadlines today, but I'll see if someone else on the team can post a clarification with some examples. >>>>>> >>>>>> A brief response, below. >>>>>> >>>>>> On 3/20/14, 9:51 PM, Jonathan Ozik wrote: >>>>>>> Hi Mike, >>>>>>> >>>>>>> I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on. >>>>>> No, just swift-user. Ideally we should he started this discussion there. I steered you to swift-devel because I thought the issue was one of a new feature requirement, but I see its also one of documentation and training. >>>>>> >>>>>> ... >>>>>>> An app *can* return multiple files - even an array of files - but not an array of files whose names and count is not known before the app is launched. >>>>>>> This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?) >>>>>> Yes, that's the problem: you would need to know the exact names, in the Swift script, before the app is called, so that you can *map* all output file variables to the names that the app will be *expected* to produce. I.e., current one needs a priori knowledge of all output file names, and you need to map variable (which can include array and structure members) to those names. >>>>>>> and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality. >>>>>>> >>>>>>> For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app? >>>>>> Thats the current language deficiency: you can not. We will explain later today in more detail. >>>>>> >>>>>> The app is expected to produce all output files that any of its output variables (or arrays or structures) are mapped to. >>>>>> >>>>>> For example, you can map an *ouput* array to the file names f1.out, f2.out, and f3.out. Then the app will be expected to produce those files. If it doesnt, Swift will raise a runtime error. So if you know a prior (before the app is called) from context or from input aregument values that these 3 files will be produced, you can use one of the array mappers or the "ext" mapper to declare this expectation. >>>>>> >>>>>> The best way to get past this obstacle (while we develop the desired capability) is as follows. If you are running on a single machine, you can write a wrapper shell script around the repast app that runs repast and then returns a single file that contains a *list* of its output files. But you need to place these output files in a known shared directory, not in the current working directory in which Swift will run the repast app (called the "job directory" at the moment -- soon to be renamed the "app task directory"). Then you do a readData() on this returned file to create an array of strings, and use that array with the "array" mapper (explained in the User Guide). >>>>>> >>>>>> We'll post to you a working example of as soon as possible - today, if time permits. As well as an example of the proposed new feature. >>>>>> >>>>>> - Mike >>>>>>> I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information... >>>>>>> >>>>>>>> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array. >>>>>>>> >>>>>>>> Mihael: is this something you could implement in the near future - after we agree on the semantics? >>>>>>>> >>>>>>>> Justin, Tim, do you want to comment on this from a Swift/T perspective? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> - Mike >>>>>>>>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible? >>>>>>>>> >>>>>>>>> Jonathan >>>>>>>>> >>>>>>>>> On Mar 19, 2014, at 7:03 PM, Michael Wilde wrote: >>>>>>>>> >>>>>>>>>> Hi Jonathan, >>>>>>>>>> >>>>>>>>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses. >>>>>>>>>> >>>>>>>>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics. >>>>>>>>>> >>>>>>>>>> What we do for now is one of these two work-arounds: >>>>>>>>>> >>>>>>>>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout" >>>>>>>>>> >>>>>>>>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage >>>>>>>>>> >>>>>>>>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now. >>>>>>>>>> >>>>>>>>>> Im cc'ing swift-devel to see what we can do. >>>>>>>>>> >>>>>>>>>> Thanks for reminding us of this fairly common need! >>>>>>>>>> >>>>>>>>>> - Mike >>>>>>>>>> >>>>>>>>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote: >>>>>>>>>>> Mike, >>>>>>>>>>> >>>>>>>>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way. >>>>>>>>>>> >>>>>>>>>>> Jonathan >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Michael Wilde >>>>>>>>>> Mathematics and Computer Science Computation Institute >>>>>>>>>> Argonne National Laboratory The University of Chicago >>>>>>>>>> >>>>>>>> -- >>>>>>>> Michael Wilde >>>>>>>> Mathematics and Computer Science Computation Institute >>>>>>>> Argonne National Laboratory The University of Chicago >>>>>>>> >>>>>> >>>>>> -- >>>>>> Michael Wilde >>>>>> Mathematics and Computer Science Computation Institute >>>>>> Argonne National Laboratory The University of Chicago >>>>> >>>> >>> >>> -- >>> Michael Wilde >>> Mathematics and Computer Science Computation Institute >>> Argonne National Laboratory The University of Chicago >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: