[Swift-user] Issue with map reduce step one app to many

Michael Wilde wilde at mcs.anl.gov
Thu May 9 14:59:48 CDT 2013


I think something like that might work well.

- Mike

----- Original Message -----
> From: "Lorenzo Pesce" <lpesce at uchicago.edu>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Swift User Discussion List" <swift-user at ci.uchicago.edu>
> Sent: Thursday, May 9, 2013 2:02:28 PM
> Subject: Re: [Swift-user] Issue with map reduce step one app to many
> 
> What is I use a different fast method to determine how many files
> there will be (easy here since I have to read the header of the bam
> file and it will tell me how many it will spit out), read that file
> with readData and link the files into an array and then fed the
> array back?
> 
> On May 9, 2013, at 1:28 PM, Michael Wilde wrote:
> 
> > Hi Lorenzo,
> > 
> > Swift is not yet able to map an array of files returned from an app
> > whose size is not known before the app runs. We've discussed how
> > to do this and hope to add such semantics in the future.
> > 
> > In the meantime, the two techniques for doing this are:
> > 
> > - return a tar file or similar archive from the app() that creates
> > an unknown number of files
> > 
> > - return a list of files from the app()
> > 
> > The second technique works very nicely, especially if the entire
> > script is being run on a single shared filesystem cluster like
> > Beagle. In your example, app1() would return the list of files it
> > produces as a single text file, and you then use that text file to
> > map the array RGinfile[] using for example array_mapper.
> > 
> > One way to get app1() to return the desired list of files is by
> > wrapping it in a shell script that does a selective "ls" or "find"
> > on its output directory. Another way, if you really dont want to
> > create a wrapper, is to have app1() return an "external" variable,
> > and then call an app() that uses an sh -c script to find the data.
> > 
> > You'll need to make sure that app1() produces its output files in a
> > persistent, known directory rather than in its temporary
> > Swift-created "job dir" (which is the app's default current
> > working directory when Swift runs it). That's another aspect
> > that's easiest to deal with using a wrapper script around the
> > actual application.
> 
> This makes sense, but how do I tell swift to keep the correct
> execution order? How will the runs  after the map step know that
> their files are ready? THey aren't in the return of any app, so
> wouldn't swift assume that they should be there already?
> 
> 
> 
> 
> > 
> > I'll try post an example of this when time permits; another
> > illustration is in the MODIS example program in the 2011 Swift
> > paper from Parallel Computing.
> > 
> > - Mike
> > 
> > 
> > ----- Original Message -----
> >> From: "Lorenzo Pesce" <lpesce at uchicago.edu>
> >> To: "Swift User Discussion List" <swift-user at ci.uchicago.edu>
> >> Sent: Wednesday, May 8, 2013 2:43:13 PM
> >> Subject: [Swift-user] Issue with map reduce step one app to many
> >> 
> >> This is more or less the step I would do. My problem is that I am
> >> not
> >> sure how do I arrange the return of a set of files without
> >> connecting them first and I can't connect them since they are not
> >> made yet.
> >> I could conceivably create a list first and use that, but I was
> >> curious to know whether there is a shortcut. The files in the
> >> intermediate step at least at this point are not important to us
> >> and
> >> don't need to be tracked.
> >> 
> >> file inbam;
> >> file [] RGinfile;
> >> file [] RGoutfile;
> >> 
> >> 
> >> (RGinfile) app1(inbam);
> >> 
> >> for file, idx in RGfile {
> >> 
> >>  (RGoutBAM)=app2 (file);
> >>  RGoutfile [idx]) = RGoutBAM ;
> >> 
> >> }
> >> 
> >> (BAM) = app3 (RGoutfile);
> >> 
> >> _______________________________________________
> >> Swift-user mailing list
> >> Swift-user at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >> 
> 
> 



More information about the Swift-user mailing list