[Swift-user] How can one force ordering into swift operations
Michael Wilde
wilde at mcs.anl.gov
Thu Jan 24 11:59:43 CST 2013
Here's a split-and-process example:
$ cat SplitAndProcess.swift
type file;
app (file flist) split (file i)
{
sh "-c" @strcat("split -l 50 ", @filename(i), " /tmp/segment ; /bin/ls -1 /tmp/segment??") stdout=@filename(flist);
}
app (file counts) wc (file i)
{
sh "-c" @strcat("wc ", @filename(i)) stdout=@filename(counts);
}
file infile<"infile">;
string segnames[] = readData(split(infile));
foreach s,i in segnames {
file segment <single_file_mapper; file=s>;
string counts = readData(wc(segment));
tracef("segment %i is file %s, counts=%s\n", i, s, counts );
}
$ wc -l infile
460 infile
$ swift -config cf -tc.file tc -sites.file local.xml SplitAndProcess.swift
Warning: Procedure split is deprecated, at 15
Warning: Procedure wc is deprecated, at 19
Swift trunk swift-r6151 cog-r3552 (cog modified locally)
RunID: 20130124-1152-gg21bvq8
Progress: time: Thu, 24 Jan 2013 11:52:06 -0600
Progress: time: Thu, 24 Jan 2013 11:52:08 -0600 Active:9 Checking status:1 Finished successfully:1
segment 0 is file /tmp/segmentaa, counts= 50 267 2584 tmp/segmentaa
segment 5 is file /tmp/segmentaf, counts= 50 597 7284 tmp/segmentaf
segment 1 is file /tmp/segmentab, counts= 50 350 4196 tmp/segmentab
segment 8 is file /tmp/segmentai, counts= 50 579 7082 tmp/segmentai
segment 2 is file /tmp/segmentac, counts= 50 452 4949 tmp/segmentac
segment 9 is file /tmp/segmentaj, counts= 10 71 835 tmp/segmentaj
segment 4 is file /tmp/segmentae, counts= 50 490 6093 tmp/segmentae
segment 3 is file /tmp/segmentad, counts= 50 589 7026 tmp/segmentad
segment 7 is file /tmp/segmentah, counts= 50 498 6047 tmp/segmentah
segment 6 is file /tmp/segmentag, counts= 50 591 7046 tmp/segmentag
Final status: Thu, 24 Jan 2013 11:52:08 -0600 Finished successfully:11
Note that the script forces the split segments to be written to /tmp; otherwise they would be written to the job directory in which the split() app runs. This is not "location independent" but works fine when you run split on a local host. You can use $PWD instead of /tmp by passing it into swift eg -cwd=$PWD and adjusting the script accordingly.
- Mike
----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Daniel S. Katz" <dsk at ci.uchicago.edu>
> Cc: "Glen Hocky" <hockyg at gmail.com>, "Swift User Discussion List" <swift-user at ci.uchicago.edu>
> Sent: Thursday, January 24, 2013 10:42:55 AM
> Subject: Re: [Swift-user] How can one force ordering into swift operations
>
>
> A few brief additional tips to help you make progress with this:
>
> - your split app can create and return a single file containing a
> list of file names
>
> - use readData to read that list into an array; then use one of the
> array mappers to map the list of files.
>
> Separately: the "flag" Dan suggests can also be done using a variable
> of type "external" which allows you to do explicit synchronization.
> Its only honored as a return or an input of an app() function.
>
> - Mike
>
> ----- Original Message -----
> > From: "Daniel S. Katz" <dsk at ci.uchicago.edu>
> > To: "Lorenzo Pesce" <lpesce at uchicago.edu>
> > Cc: "Glen Hocky" <hockyg at gmail.com>, "Swift User Discussion List"
> > <swift-user at ci.uchicago.edu>
> > Sent: Thursday, January 24, 2013 10:20:52 AM
> > Subject: Re: [Swift-user] How can one force ordering into swift
> > operations
> >
> >
> > you could just add an artificial dependency. Make step out output
> > file "flag" when it is done.
> >
> >
> > Make step 2 dependent on file "flag"
> >
> >
> > Dan
> >
> >
> >
> >
> >
> > On Jan 24, 2013, at 11:13 AM, Lorenzo Pesce < lpesce at uchicago.edu >
> > wrote:
> >
> >
> >
> > So that I return an array of files of unknown size (don't know how
> > many files they will be) to the calling swift script?
> >
> >
> >
> >
> >
> > On Jan 24, 2013, at 10:08 AM, Glen Hocky wrote:
> >
> >
> >
> > Lorenzo,
> > This may not work for your purposes, but a simple solution similar
> > to
> > what I do, is to actually do step 1 in the wrapper before the
> > mapping is done. This guarantees that all files are in place.
> >
> >
> > Best,
> > Glen
> >
> >
> >
> > On Thu, Jan 24, 2013 at 10:43 AM, Lorenzo Pesce <
> > lpesce at uchicago.edu
> > > wrote:
> >
> >
> > I have a simple problem:
> > step 1: I run an app that splits a file in a group of files and we
> > don't know how many they are.
> > step2: I want to map those files using a mapper after the fact
> >
> > Problem is that the mapper doesn't know that it can't run till step
> > 1
> > is done because it has no input files. How can I tell the mapper
> > (and what follows it by consequence since those files will not be
> > there) that it has to wait for step 1 to b finished?
> >
> > Thanks,
> >
> > Lorenzo
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >
> >
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >
> >
> >
> > --
> >
> >
> >
> >
> >
> > Daniel S. Katz
> > University of Chicago
> > (773) 834-7186 (voice)
> > (773) 834-6818 (fax)
> > d.katz at ieee.org or dsk at ci.uchicago.edu
> > http://www.ci.uchicago.edu/~dsk/
> >
> >
> >
> >
> >
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
More information about the Swift-user
mailing list