[Swift-user] Issue with map reduce step one app to many

Lorenzo Pesce lpesce at uchicago.edu
Thu May 9 14:02:28 CDT 2013


What is I use a different fast method to determine how many files there will be (easy here since I have to read the header of the bam file and it will tell me how many it will spit out), read that file with readData and link the files into an array and then fed the array back?

On May 9, 2013, at 1:28 PM, Michael Wilde wrote:

> Hi Lorenzo,
> 
> Swift is not yet able to map an array of files returned from an app whose size is not known before the app runs. We've discussed how to do this and hope to add such semantics in the future.
> 
> In the meantime, the two techniques for doing this are:
> 
> - return a tar file or similar archive from the app() that creates an unknown number of files
> 
> - return a list of files from the app()
> 
> The second technique works very nicely, especially if the entire script is being run on a single shared filesystem cluster like Beagle. In your example, app1() would return the list of files it produces as a single text file, and you then use that text file to map the array RGinfile[] using for example array_mapper.
> 
> One way to get app1() to return the desired list of files is by wrapping it in a shell script that does a selective "ls" or "find" on its output directory. Another way, if you really dont want to create a wrapper, is to have app1() return an "external" variable, and then call an app() that uses an sh -c script to find the data.
> 
> You'll need to make sure that app1() produces its output files in a persistent, known directory rather than in its temporary Swift-created "job dir" (which is the app's default current working directory when Swift runs it). That's another aspect that's easiest to deal with using a wrapper script around the actual application.

This makes sense, but how do I tell swift to keep the correct execution order? How will the runs  after the map step know that their files are ready? THey aren't in the return of any app, so wouldn't swift assume that they should be there already?




> 
> I'll try post an example of this when time permits; another illustration is in the MODIS example program in the 2011 Swift paper from Parallel Computing.
> 
> - Mike
> 
> 
> ----- Original Message -----
>> From: "Lorenzo Pesce" <lpesce at uchicago.edu>
>> To: "Swift User Discussion List" <swift-user at ci.uchicago.edu>
>> Sent: Wednesday, May 8, 2013 2:43:13 PM
>> Subject: [Swift-user] Issue with map reduce step one app to many
>> 
>> This is more or less the step I would do. My problem is that I am not
>> sure how do I arrange the return of a set of files without
>> connecting them first and I can't connect them since they are not
>> made yet.
>> I could conceivably create a list first and use that, but I was
>> curious to know whether there is a shortcut. The files in the
>> intermediate step at least at this point are not important to us and
>> don't need to be tracked.
>> 
>> file inbam;
>> file [] RGinfile;
>> file [] RGoutfile;
>> 
>> 
>> (RGinfile) app1(inbam);
>> 
>> for file, idx in RGfile {
>> 
>>  (RGoutBAM)=app2 (file);
>>  RGoutfile [idx]) = RGoutBAM ;
>> 
>> }
>> 
>> (BAM) = app3 (RGoutfile);
>> 
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>> 




More information about the Swift-user mailing list