[Swift-devel] Multiple output files
Yadu Nand B
yadunand at uchicago.edu
Fri Mar 21 17:51:29 CDT 2014
Hi Jonathan,
I have a tar ball of an example which you can download from here ->
http://swift.rcc.uchicago.edu:8042/var_arrays.tar.gz
You can run the example on Midway using the following command:
swift -tc.file apps -config swift.properties -sites.file sites.xml
var_arrays.swift -dir=/scratch/midway/$USER/temp
Please ensure that the directory passed to dir is on a shared filesystem
(/scratch on midway).
The first foreach loop runs the script gen_n_files which creates a
random number of files within the specified directory
and echoes the names of the files to stdout. The stdout file is read by
swift using readData and passed to the array_mapper
to be mapped to an array, which is placed in an array of arrays. The
second foreach loop goes over each array in the array of
arrays and simply sums up all integers present in each array of files.
Slightly contrived example, but I hope you get the method used. Let me
know if you want this example to be expanded
in any way.
Thanks,
Yadu
On 03/21/2014 04:46 PM, Michael Wilde wrote:
> Jonathan,
>
> @text is behaving as expected here. The rationale is as follows. This
> line:
>
> file text <single_file_mapper;
> file="/Users/jozik/temp/instance_4/customOut7.dat">;
>
> ...asociates the variable "text" with the file
> /Users/jozik/temp/instance_4/customOut7.dat.
>
> If that file was passed to (or returned from) an app() call, then
> @filename(text) (for which @text is a shorthand notation) would be the
> name by which the app should refer to the file, relative to the app's
> job directory. The leading "/" is removed because the file, if its an
> input to the app, would get linked to the job dir with all of its
> directory components as specified in the mapping. Ie it be linked to
> ./Users/jozik/temp/instance_4/customOut7.dat
> If that file was an output from the app, Swift would expect the app to
> create a file by this name below the job dir.
>
> Again, these semantics date back to the origins of Swift, when every
> job was essentially expected to be executed on a remote grid node
> under Globus.
>
> Yadu is working on a complete example of the multiple-file return case
> right now.
>
> - Mike
>
> On 3/21/14, 1:59 PM, Jonathan Ozik wrote:
>> Mike,
>>
>> It looks like I misunderstood your workaround initially. Now I'm
>> having an issue with specifying absolute paths.
>> For example:
>> file text <single_file_mapper;
>> file="/Users/jozik/temp/instance_4/customOut7.dat">;
>> tracef("The file name is: %s\n", at text);
>>
>> yields:
>> The file name is: Users/jozik/temp/instance_4/customOut7.dat
>> (the leading forward slash is missing)
>>
>> The idea here is that the output data is being placed in a well known
>> location and retrieved via the output file location aggregator. This
>> is a pared down example where I'm looking to see what each line from
>> the output file location aggregator would be interpreted as in swift.
>>
>> Jonathan
>>
>> On Mar 21, 2014, at 11:51 AM, Jonathan Ozik <jozik at uchicago.edu
>> <mailto:jozik at uchicago.edu>> wrote:
>>
>>> Mike,
>>>
>>> Thank you again for the detailed responses. I'm getting a better
>>> handle on what can be done and am trying to implement the workaround
>>> you suggested.
>>> Speaking of which, is the reason that a shared directory location
>>> needs to be utilized because readData() does not know to look in the
>>> "app task directory" and defaults to the swift script launch directory?
>>>
>>> Thanks again for the guidance,
>>>
>>> Jonathan
>>>
>>> On Mar 21, 2014, at 8:15 AM, Michael Wilde <wilde at anl.gov
>>> <mailto:wilde at anl.gov>> wrote:
>>>
>>>> Hi Jonathan,
>>>>
>>>> Thanks for bearing with us on this. I can see clearly where our
>>>> documentation is falling short of explaining this clearly.
>>>>
>>>> Ive got to work on some deadlines today, but I'll see if someone
>>>> else on the team can post a clarification with some examples.
>>>>
>>>> A brief response, below.
>>>>
>>>> On 3/20/14, 9:51 PM, Jonathan Ozik wrote:
>>>>> Hi Mike,
>>>>>
>>>>> I've included my comments below. Also, please let me know if I should sign up for any mailing list in addition to swift-user, which I'm already signed up on.
>>>> No, just swift-user. Ideally we should he started this discussion
>>>> there. I steered you to swift-devel because I thought the issue was
>>>> one of a new feature requirement, but I see its also one of
>>>> documentation and training.
>>>>
>>>> ...
>>>>> An app *can* return multiple files - even an array of files - but
>>>>> not an array of files whose names and count is not known before
>>>>> the app is launched.
>>>>> This functionality would be exactly what I'd be looking for. If an app can return multiple files I'd just need to know where and how I'd have to specify the patterns for those files (or do I need to know the exact names?)
>>>> Yes, that's the problem: you would need to know the exact names, in
>>>> the Swift script, before the app is called, so that you can *map*
>>>> all output file variables to the names that the app will be
>>>> *expected* to produce. I.e., current one needs a priori knowledge
>>>> of all output file names, and you need to map variable (which can
>>>> include array and structure members) to those names.
>>>>> and how to make it so that the app returns those files. I've looked through the user guide and a few of the tutorials but I don't believe I've seen any example that fits this general functionality.
>>>>>
>>>>> For example, if I have an executable "repast" that outputs files with patterns specified by globs, how do I make use of that knowledge to pick up those files and return them from an app?
>>>> Thats the current language deficiency: you can not. We will
>>>> explain later today in more detail.
>>>>
>>>> The app is expected to produce all output files that any of its
>>>> output variables (or arrays or structures) are mapped to.
>>>>
>>>> For example, you can map an *ouput* array to the file names f1.out,
>>>> f2.out, and f3.out. Then the app will be expected to produce those
>>>> files. If it doesnt, Swift will raise a runtime error. So if you
>>>> know a prior (before the app is called) from context or from input
>>>> aregument values that these 3 files will be produced, you can use
>>>> one of the array mappers or the "ext" mapper to declare this
>>>> expectation.
>>>>
>>>> The best way to get past this obstacle (while we develop the
>>>> desired capability) is as follows. If you are running on a single
>>>> machine, you can write a wrapper shell script around the repast app
>>>> that runs repast and then returns a single file that contains a
>>>> *list* of its output files. But you need to place these output
>>>> files in a known shared directory, not in the current working
>>>> directory in which Swift will run the repast app (called the "job
>>>> directory" at the moment -- soon to be renamed the "app task
>>>> directory"). Then you do a readData() on this returned file to
>>>> create an array of strings, and use that array with the "array"
>>>> mapper (explained in the User Guide).
>>>>
>>>> We'll post to you a working example of as soon as possible - today,
>>>> if time permits. As well as an example of the proposed new feature.
>>>>
>>>> - Mike
>>>>> I see, the one file per app invocation is probably not going to work for our use case, but I still think that I'm missing some crucial understandings of what conditions are required to point to and use generated or existing files within swift. That is, other than "stdout=@filename", how do I indicate that I'd like to return a file, say "myoutput.dat", from an app invocation? Apologies if this is too simple of a question but, like I said, I feel like I'm missing some crucial information...
>>>>>
>>>>>> What we do not have - but have long known that we need - is the ability to declare that all the files created by a *single* app invocation which match a specified pattern be returned as an array.
>>>>>>
>>>>>> Mihael: is this something you could implement in the near future - after we agree on the semantics?
>>>>>>
>>>>>> Justin, Tim, do you want to comment on this from a Swift/T perspective?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> - Mike
>>>>>>> I think I'm stumbling on the notion of what output, in terms of files, can be produced using an "app" element in a swift script. Most of the examples have stdout (or stderr) pointing to a file, but I'm not sure I've found an example where side effects (e.g., files produced by a process) can be retrieved. Is this possible?
>>>>>>>
>>>>>>> Jonathan
>>>>>>>
>>>>>>> On Mar 19, 2014, at 7:03 PM, Michael Wilde<wilde at anl.gov> wrote:
>>>>>>>
>>>>>>>> Hi Jonathan,
>>>>>>>>
>>>>>>>> You are thinking about this exactly right, and have (quickly) hit one of our programmability weaknesses.
>>>>>>>>
>>>>>>>> At the moment, there is no good way to do this. We have had much discussion on it, though, and plan to create such "collect files of this pattern into an array" semantics.
>>>>>>>>
>>>>>>>> What we do for now is one of these two work-arounds:
>>>>>>>>
>>>>>>>> - write a single tarfile as output (in a wrapper script for the app, which tars files of the appropriate pattern, like "*.simout"
>>>>>>>>
>>>>>>>> - write the files directly to a specific (shared) directory instead of the Swift "app task directory" (called "job directory" in the current User Guide. Then return a single file with a list of these file names, and map that using an array_mapper if the results need to be passed to a next stage
>>>>>>>>
>>>>>>>> We should see if we can get a prototype of such a feature to you in short order. But hopefully just to get things working, one of the above methods will suffice for you, for now.
>>>>>>>>
>>>>>>>> Im cc'ing swift-devel to see what we can do.
>>>>>>>>
>>>>>>>> Thanks for reminding us of this fairly common need!
>>>>>>>>
>>>>>>>> - Mike
>>>>>>>>
>>>>>>>> On 3/19/14, 5:57 PM, Jonathan Ozik wrote:
>>>>>>>>> Mike,
>>>>>>>>>
>>>>>>>>> Perhaps I'm not thinking about this correctly, but I'm trying to figure out how to collect output files that are generated by a simulation run. The scenario is that the Repast executable is run and, after it is run, there are files that are output into some location. There can be an arbitrary number of such output files per simulation run but, if necessary, it would be possible to pre-specify which files to look for. Is there a simple way to indicate to swift that it should collect all the files matching a particular pattern within a directory? It looks like the mappers might work here but as far as I understand the mappers are defined prior to calls to executables. Again, I might just not be thinking about this in a "swift" enough way.
>>>>>>>>>
>>>>>>>>> Jonathan
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Michael Wilde
>>>>>>>> Mathematics and Computer Science Computation Institute
>>>>>>>> Argonne National Laboratory The University of Chicago
>>>>>>>>
>>>>>> --
>>>>>> Michael Wilde
>>>>>> Mathematics and Computer Science Computation Institute
>>>>>> Argonne National Laboratory The University of Chicago
>>>>>>
>>>>
>>>> --
>>>> Michael Wilde
>>>> Mathematics and Computer Science Computation Institute
>>>> Argonne National Laboratory The University of Chicago
>>>
>>
>
> --
> Michael Wilde
> Mathematics and Computer Science Computation Institute
> Argonne National Laboratory The University of Chicago
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140321/25ab5b4f/attachment.html>
More information about the Swift-devel
mailing list