[Swift-user] How can one force ordering into swift operations

Michael Wilde wilde at mcs.anl.gov
Thu Jan 24 11:59:43 CST 2013


Here's a split-and-process example:

$ cat SplitAndProcess.swift

type file;

app (file flist) split (file i)
{
  sh "-c" @strcat("split -l 50 ", @filename(i), " /tmp/segment ; /bin/ls -1 /tmp/segment??") stdout=@filename(flist);
}

app (file counts) wc (file i)
{
  sh "-c" @strcat("wc ", @filename(i)) stdout=@filename(counts);
}

file infile<"infile">;

string segnames[] = readData(split(infile));

foreach s,i in segnames {
  file segment <single_file_mapper; file=s>;
  string counts = readData(wc(segment));
  tracef("segment %i is file %s, counts=%s\n", i, s, counts );
}

$ wc -l infile

460 infile

$ swift -config cf -tc.file tc -sites.file local.xml SplitAndProcess.swift 

Warning: Procedure split is deprecated, at 15
Warning: Procedure wc is deprecated, at 19
Swift trunk swift-r6151 cog-r3552 (cog modified locally)

RunID: 20130124-1152-gg21bvq8
Progress:  time: Thu, 24 Jan 2013 11:52:06 -0600
Progress:  time: Thu, 24 Jan 2013 11:52:08 -0600  Active:9  Checking status:1  Finished successfully:1
segment 0 is file /tmp/segmentaa, counts=  50  267 2584 tmp/segmentaa
segment 5 is file /tmp/segmentaf, counts=  50  597 7284 tmp/segmentaf
segment 1 is file /tmp/segmentab, counts=  50  350 4196 tmp/segmentab
segment 8 is file /tmp/segmentai, counts=  50  579 7082 tmp/segmentai
segment 2 is file /tmp/segmentac, counts=  50  452 4949 tmp/segmentac
segment 9 is file /tmp/segmentaj, counts= 10  71 835 tmp/segmentaj
segment 4 is file /tmp/segmentae, counts=  50  490 6093 tmp/segmentae
segment 3 is file /tmp/segmentad, counts=  50  589 7026 tmp/segmentad
segment 7 is file /tmp/segmentah, counts=  50  498 6047 tmp/segmentah
segment 6 is file /tmp/segmentag, counts=  50  591 7046 tmp/segmentag
Final status: Thu, 24 Jan 2013 11:52:08 -0600  Finished successfully:11

Note that the script forces the split segments to be written to /tmp; otherwise they would be written to the job directory in which the split() app runs. This is not "location independent" but works fine when you run split on a local host.  You can use $PWD instead of /tmp by passing it into swift eg -cwd=$PWD and adjusting the script accordingly.

- Mike

----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Daniel S. Katz" <dsk at ci.uchicago.edu>
> Cc: "Glen Hocky" <hockyg at gmail.com>, "Swift User Discussion List" <swift-user at ci.uchicago.edu>
> Sent: Thursday, January 24, 2013 10:42:55 AM
> Subject: Re: [Swift-user] How can one force ordering into swift operations
> 
> 
> A few brief additional tips to help you make progress with this:
> 
> - your split app can create and return a single file containing a
> list of file names
> 
> - use readData to read that list into an array; then use one of the
> array mappers to map the list of files.
> 
> Separately: the "flag" Dan suggests can also be done using a variable
> of type "external" which allows you to do explicit synchronization.
> Its only honored as a return or an input of an app() function.
> 
> - Mike
> 
> ----- Original Message -----
> > From: "Daniel S. Katz" <dsk at ci.uchicago.edu>
> > To: "Lorenzo Pesce" <lpesce at uchicago.edu>
> > Cc: "Glen Hocky" <hockyg at gmail.com>, "Swift User Discussion List"
> > <swift-user at ci.uchicago.edu>
> > Sent: Thursday, January 24, 2013 10:20:52 AM
> > Subject: Re: [Swift-user] How can one force ordering into swift
> > operations
> > 
> > 
> > you could just add an artificial dependency. Make step out output
> > file "flag" when it is done.
> > 
> > 
> > Make step 2 dependent on file "flag"
> > 
> > 
> > Dan
> > 
> > 
> > 
> > 
> > 
> > On Jan 24, 2013, at 11:13 AM, Lorenzo Pesce < lpesce at uchicago.edu >
> > wrote:
> > 
> > 
> > 
> > So that I return an array of files of unknown size (don't know how
> > many files they will be) to the calling swift script?
> > 
> > 
> > 
> > 
> > 
> > On Jan 24, 2013, at 10:08 AM, Glen Hocky wrote:
> > 
> > 
> > 
> > Lorenzo,
> > This may not work for your purposes, but a simple solution similar
> > to
> > what I do, is to actually do step 1 in the wrapper before the
> > mapping is done. This guarantees that all files are in place.
> > 
> > 
> > Best,
> > Glen
> > 
> > 
> > 
> > On Thu, Jan 24, 2013 at 10:43 AM, Lorenzo Pesce <
> > lpesce at uchicago.edu
> > > wrote:
> > 
> > 
> > I have a simple problem:
> > step 1: I run an app that splits a file in a group of files and we
> > don't know how many they are.
> > step2: I want to map those files using a mapper after the fact
> > 
> > Problem is that the mapper doesn't know that it can't run till step
> > 1
> > is done because it has no input files. How can I tell the mapper
> > (and what follows it by consequence since those files will not be
> > there) that it has to wait for step 1 to b finished?
> > 
> > Thanks,
> > 
> > Lorenzo
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > 
> > 
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > 
> > 
> > 
> > --
> > 
> > 
> > 
> > 
> > 
> > Daniel S. Katz
> > University of Chicago
> > (773) 834-7186 (voice)
> > (773) 834-6818 (fax)
> > d.katz at ieee.org or dsk at ci.uchicago.edu
> > http://www.ci.uchicago.edu/~dsk/
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> 



More information about the Swift-user mailing list