[Swift-devel] Multiple output files
Yadu Nand Babuji
yadunand at uchicago.edu
Mon Mar 31 12:08:04 CDT 2014
Hi Jonathan,
Your understanding is spot on.
Swift 0.94.1 used a config mechanism which required 3 config files, a
tc.file (apps), a config file (swift.properties) and a sites.file
(sites.xml). This is supported on the newer versions of swift for
backward compatibility.
Swift 0.95, uses a simplified config mechanism that merges all three
config files into one swift.properties file. The midway tutorial is
based on this version.
So, If you are using the default swift, you should be using this :
swift -tc.file apps -config swift.properties -sites.file sites.xml
var_arrays.swift -dir=/scratch/midway/$USER/temp
If you find the new config mechanism preferable, please use Swift 0.95.
Here's a sample swift.properties file:
site=midway-westmere
app.midway-westmere.sh=/bin/bash
site.midway-westmere {
jobManager=slurm
jobQueue=westmere
tasksPerWorker=12
initialScore=10000
filesystem=local
workdir=/scratch/midway/$USER/work
}
I ran this on midway with the following command. "swift.properties" will
be picked up if it is named so,
otherwise use the key: -properties <properties file>:
swift var_arrays.swift
Thanks,
Yadu
On 03/31/2014 10:54 AM, Jonathan Ozik wrote:
> Yadu,
>
> In the example that you provided you have a sites.xml file that
> includes a lot of specific options to run on Midway. In the Midway
> tutorial, on the other hand
> (http://swift-lang.org/tutorials/midway/tutorial.html) it looks like
> the swift.properties file is given a minimal set of overriding
> submission parameters and the rest of the information is filled in by
> sensible default values. Based on this I have two questions:
> 1. Is my understanding correct regarding the swift.properties
> functionality or do I have to provide a sites.xml file as well?
> 2. Is it okay to use the default swift version provided by Midway?
> I've been using that for my local testing but would like to know if
> anything that we've discussed is very different
> between swift/0.94.1(default) and the swift/0.95-RC1 version that the
> tutorial wants to load.
>
> Jonathan
>
> On Mar 21, 2014, at 6:25 PM, Jonathan Ozik <jozik at uchicago.edu
> <mailto:jozik at uchicago.edu>> wrote:
>
>> Yadu, Mike,
>>
>> Thank you for this. Much appreciated.
>>
>> Have a good weekend,
>>
>> Jonathan
>>
>> On Mar 21, 2014, at 5:51 PM, Yadu Nand B <yadunand at uchicago.edu
>> <mailto:yadunand at uchicago.edu>> wrote:
>>
>>> Hi Jonathan,
>>>
>>> I have a tar ball of an example which you can download from here ->
>>> http://swift.rcc.uchicago.edu:8042/var_arrays.tar.gz
>>>
>>> You can run the example on Midway using the following command:
>>> swift -tc.file apps -config swift.properties -sites.file sites.xml
>>> var_arrays.swift -dir=/scratch/midway/$USER/temp
>>>
>>> Please ensure that the directory passed to dir is on a shared
>>> filesystem (/scratch on midway).
>>>
>>> The first foreach loop runs the script gen_n_files which creates a
>>> random number of files within the specified directory
>>> and echoes the names of the files to stdout. The stdout file is read
>>> by swift using readData and passed to the array_mapper
>>> to be mapped to an array, which is placed in an array of arrays. The
>>> second foreach loop goes over each array in the array of
>>> arrays and simply sums up all integers present in each array of files.
>>>
>>> Slightly contrived example, but I hope you get the method used. Let
>>> me know if you want this example to be expanded
>>> in any way.
>>>
>>> Thanks,
>>> Yadu
>>>
>>>
>>> On 03/21/2014 04:46 PM, Michael Wilde wrote:
>>>> Jonathan,
>>>>
>>>> @text is behaving as expected here. The rationale is as follows.
>>>> This line:
>>>>
>>>> file text <single_file_mapper;
>>>> file="/Users/jozik/temp/instance_4/customOut7.dat">;
>>>>
>>>> ...asociates the variable "text" with the file
>>>> /Users/jozik/temp/instance_4/customOut7.dat.
>>>>
>>>> If that file was passed to (or returned from) an app() call, then
>>>> @filename(text) (for which @text is a shorthand notation) would be
>>>> the name by which the app should refer to the file, relative to the
>>>> app's job directory. The leading "/" is removed because the file,
>>>> if its an input to the app, would get linked to the job dir with
>>>> all of its directory components as specified in the mapping. Ie it
>>>> be linked to ./Users/jozik/temp/instance_4/customOut7.dat
>>>> If that file was an output from the app, Swift would expect the app
>>>> to create a file by this name below the job dir.
>>>>
>>>> Again, these semantics date back to the origins of Swift, when
>>>> every job was essentially expected to be executed on a remote grid
>>>> node under Globus.
>>>>
>>>> Yadu is working on a complete example of the multiple-file return
>>>> case right now.
>>>>
>>>> - Mike
>>>>
>>>> On 3/21/14, 1:59 PM, Jonathan Ozik wrote:
>>>>> Mike,
>>>>>
>>>>> It looks like I misunderstood your workaround initially. Now I'm
>>>>> having an issue with specifying absolute paths.
>>>>> For example:
>>>>> file text <single_file_mapper;
>>>>> file="/Users/jozik/temp/instance_4/customOut7.dat">;
>>>>> tracef("The file name is: %s\n", at text);
>>>>>
>>>>> yields:
>>>>> The file name is: Users/jozik/temp/instance_4/customOut7.dat
>>>>> (the leading forward slash is missing)
>>>>>
>>>>> The idea here is that the output data is being placed in a well
>>>>> known location and retrieved via the output file location
>>>>> aggregator. This is a pared down example where I'm looking to see
>>>>> what each line from the output file location aggregator would be
>>>>> interpreted as in swift.
>>>>>
>>>>> Jonathan
>>>>>
>>>>> On Mar 21, 2014, at 11:51 AM, Jonathan Ozik <jozik at uchicago.edu
>>>>> <mailto:jozik at uchicago.edu>> wrote:
>>>>>
>>>>>> Mike,
>>>>>>
>>>>>> Thank you again for the detailed responses. I'm getting a better
>>>>>> handle on what can be done and am trying to implement the
>>>>>> workaround you suggested.
>>>>>> Speaking of which, is the reason that a shared directory location
>>>>>> needs to be utilized because readData() does not know to look in
>>>>>> the "app task directory" and defaults to the swift script launch
>>>>>> directory?
>>>>>>
>>>>>> Thanks again for the guidance,
>>>>>>
>>>>>> Jonathan
>>>>>>
>>>>>> On Mar 21, 2014, at 8:15 AM, Michael Wilde <wilde at anl.gov
>>>>>> <mailto:wilde at anl.gov>> wrote:
>>>>>>
>>>>>>> Hi Jonathan,
>>>>>>>
>>>>>>> Thanks for bearing with us on this. I can see clearly where our
>>>>>>> documentation is falling short of explaining this clearly.
>>>>>>>
>>>>>>> Ive got to work on some deadlines today, but I'll see if someone
>>>>>>> else on the team can post a clarification with some examples.
>>>>>>>
>>>>>>> A brief response, below.
>>>>>>>
>>>>>>> On 3/20/14, 9:51 PM, Jonathan Ozik wrote:
>>>>>>>> Hi Mike,
>>>>>>>>
>>>>>>>> I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on.
>>>>>>> No, just swift-user. Ideally we should he started this
>>>>>>> discussion there. I steered you to swift-devel because I thought
>>>>>>> the issue was one of a new feature requirement, but I see its
>>>>>>> also one of documentation and training.
>>>>>>>
>>>>>>> ...
>>>>>>>> An app *can* return multiple files - even an array of files -
>>>>>>>> but not an array of files whose names and count is not known
>>>>>>>> before the app is launched.
>>>>>>>> This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?)
>>>>>>> Yes, that's the problem: you would need to know the exact names,
>>>>>>> in the Swift script, before the app is called, so that you can
>>>>>>> *map* all output file variables to the names that the app will
>>>>>>> be *expected* to produce. I.e., current one needs a priori
>>>>>>> knowledge of all output file names, and you need to map variable
>>>>>>> (which can include array and structure members) to those names.
>>>>>>>> and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality.
>>>>>>>>
>>>>>>>> For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app?
>>>>>>> Thats the current language deficiency: you can not. We will
>>>>>>> explain later today in more detail.
>>>>>>>
>>>>>>> The app is expected to produce all output files that any of its
>>>>>>> output variables (or arrays or structures) are mapped to.
>>>>>>>
>>>>>>> For example, you can map an *ouput* array to the file names
>>>>>>> f1.out, f2.out, and f3.out. Then the app will be expected to
>>>>>>> produce those files. If it doesnt, Swift will raise a runtime
>>>>>>> error. So if you know a prior (before the app is called) from
>>>>>>> context or from input aregument values that these 3 files will
>>>>>>> be produced, you can use one of the array mappers or the "ext"
>>>>>>> mapper to declare this expectation.
>>>>>>>
>>>>>>> The best way to get past this obstacle (while we develop the
>>>>>>> desired capability) is as follows. If you are running on a
>>>>>>> single machine, you can write a wrapper shell script around the
>>>>>>> repast app that runs repast and then returns a single file that
>>>>>>> contains a *list* of its output files. But you need to place
>>>>>>> these output files in a known shared directory, not in the
>>>>>>> current working directory in which Swift will run the repast app
>>>>>>> (called the "job directory" at the moment -- soon to be renamed
>>>>>>> the "app task directory"). Then you do a readData() on this
>>>>>>> returned file to create an array of strings, and use that array
>>>>>>> with the "array" mapper (explained in the User Guide).
>>>>>>>
>>>>>>> We'll post to you a working example of as soon as possible -
>>>>>>> today, if time permits. As well as an example of the proposed
>>>>>>> new feature.
>>>>>>>
>>>>>>> - Mike
>>>>>>>> I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information...
>>>>>>>>
>>>>>>>>> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array.
>>>>>>>>>
>>>>>>>>> Mihael: is this something you could implement in the near future - after we agree on the semantics?
>>>>>>>>>
>>>>>>>>> Justin, Tim, do you want to comment on this from a Swift/T perspective?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> - Mike
>>>>>>>>>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible?
>>>>>>>>>>
>>>>>>>>>> Jonathan
>>>>>>>>>>
>>>>>>>>>> On Mar 19, 2014, at 7:03 PM, Michael Wilde<wilde at anl.gov> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Jonathan,
>>>>>>>>>>>
>>>>>>>>>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses.
>>>>>>>>>>>
>>>>>>>>>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics.
>>>>>>>>>>>
>>>>>>>>>>> What we do for now is one of these two work-arounds:
>>>>>>>>>>>
>>>>>>>>>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout"
>>>>>>>>>>>
>>>>>>>>>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage
>>>>>>>>>>>
>>>>>>>>>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now.
>>>>>>>>>>>
>>>>>>>>>>> Im cc'ing swift-devel to see what we can do.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for reminding us of this fairly common need!
>>>>>>>>>>>
>>>>>>>>>>> - Mike
>>>>>>>>>>>
>>>>>>>>>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote:
>>>>>>>>>>>> Mike,
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way.
>>>>>>>>>>>>
>>>>>>>>>>>> Jonathan
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Michael Wilde
>>>>>>>>>>> Mathematics and Computer Science Computation Institute
>>>>>>>>>>> Argonne National Laboratory The University of Chicago
>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Michael Wilde
>>>>>>>>> Mathematics and Computer Science Computation Institute
>>>>>>>>> Argonne National Laboratory The University of Chicago
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Michael Wilde
>>>>>>> Mathematics and Computer Science Computation Institute
>>>>>>> Argonne National Laboratory The University of Chicago
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Michael Wilde
>>>> Mathematics and Computer Science Computation Institute
>>>> Argonne National Laboratory The University of Chicago
>>>>
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140331/87e0c588/attachment.html>
More information about the Swift-devel
mailing list