[Swift-user] using dual filesys_mappers
Michael Wilde
wilde at mcs.anl.gov
Fri Jul 20 13:48:40 CDT 2012
Thanks, Robin. I agree, Swift should have warned you. Ive entered this into bugzilla:
https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=801
- Mike
----- Original Message -----
> From: "Robin Weiss" <robinweiss at uchicago.edu>
> To: swift-user at ci.uchicago.edu
> Sent: Friday, July 20, 2012 11:50:39 AM
> Subject: Re: [Swift-user] using dual filesys_mappers
> So it turns out this was a PEBKAC issue. While I think I know why
> swift is doing what it's doing, this behavior is a little tricky to
> diagnose and I was totally looking at the problem but not seeing it.
> Let me explain.
>
>
> Here is the offending swift script and the output from running it. The
> directory 'in' contains a number of .dat files and corresponding .meta
> files (name differs only in extension). The expected output in each
> outFile file should be "-meta fileX.meta -dat fileX.dat", "-meta
> fileY.meta -dat fileY.dat", etc.
>
>
>
>
> type datFile;
> type metaFile;
> type outFile;
>
>
> app (outFile out) testApp (datFile dat, metaFile meta){
>
>
> echo "-meta" @meta "-dat" @dat stdout=@out;
>
>
> }
>
>
> datFile datFiles[] <filesys_mapper;location="in",sufix="dat">;
> metaFile metaFiles[] <filesys_mapper;location="in",sufix="meta">;
>
>
> foreach f,i in datFiles {
>
>
> outFile out <concurrent_mapper;prefix="out", suffix=".txt">;
>
> out = testApp(f, metaFiles[i]);
>
> }
>
>
>
>
> [robinweiss at midway037 bad_swift]$ pwd
> /home/robinweiss/bad_swift
> [robinweiss at midway037 bad_swift]$ cd in
> [robinweiss at midway037 in]$ ls
> file0.dat file0.meta file1.dat file1.meta file2.dat file2.meta
> file3.dat file3.meta
> [robinweiss at midway037 in]$ cd ..
> [robinweiss at midway037 bad_swift]$ ./runLocal.sh
> Swift 0.93 swift-r5483 cog-r3339
>
>
> RunID: 20120720-1623-cpgd5xr9
> (input): found 8 files
> (input): found 8 files
> Progress: time: Fri, 20 Jul 2012 16:23:48 +0000 Initializing:1
> Final status: time: Fri, 20 Jul 2012 16:23:49 +0000 Finished
> successfully:8
> [robinweiss at midway037 bad_swift]$ cd _concurrent/
> [robinweiss at midway037 _concurrent]$ ls
> out-4-0.txt out-4-1.txt out-4-2.txt out-4-3.txt out-4-4.txt
> out-4-5.txt out-4-6.txt out-4-7.txt
> [robinweiss at midway037 _concurrent]$ cat *
> -meta in/file0.meta -dat in/file0.meta
> -meta in/file0.dat -dat in/file0.dat
> -meta in/file1.meta -dat in/file1.meta
> -meta in/file1.dat -dat in/file1.dat
> -meta in/file2.dat -dat in/file2.dat
> -meta in/file3.dat -dat in/file3.dat
> -meta in/file3.meta -dat in/file3.meta
> -meta in/file2.meta -dat in/file2.meta
>
>
>
>
> Can you spot the problem in the script? Hint: "suffix" has two f's,
> not one. Because the parameter 'sufix' is meaningless to
> filesys_mapper it ends up just mapping everything in 'location'. It
> would seem mappers will allow you to put in pretty much any garbage
> you want inside the < >'s so long as its an assignment. For example:
>
>
> datFile datFiles[]
> <filesys_mapper;location="in",suffix="dat",foo="bar",biz="baz",garbage="more
> garbage">;
>
>
> works just fine also.
>
>
>
>
> And now for the good run:
>
>
> type datFile;
> type metaFile;
> type outFile;
>
>
> app (outFile out) testApp (datFile dat, metaFile meta){
>
>
> echo "-meta" @meta "-dat" @dat stdout=@out;
>
>
> }
>
>
> datFile datFiles[] <filesys_mapper;location="in",suffix="dat">;
> metaFile metaFiles[] <filesys_mapper;location="in",suffix="meta">;
>
>
> foreach f,i in datFiles {
>
>
> outFile out <concurrent_mapper;prefix="out", suffix=".txt">;
>
> out = testApp(f, metaFiles[i]);
>
> }
>
>
> [robinweiss at midway037 bad_swift]$ pwd
> /home/robinweiss/bad_swift
> [robinweiss at midway037 bad_swift]$ cd in
> [robinweiss at midway037 in]$ ls
> file0.dat file0.meta file1.dat file1.meta file2.dat file2.meta
> file3.dat file3.meta
> [robinweiss at midway037 in]$ cd ..
> [robinweiss at midway037 bad_swift]$ ./runLocal.sh
> Swift 0.93 swift-r5483 cog-r3339
>
>
> RunID: 20120720-1626-ou1juqf5
> (input): found 4 files
> (input): found 4 files
> Progress: time: Fri, 20 Jul 2012 16:26:51 +0000 Initializing:1
> Final status: time: Fri, 20 Jul 2012 16:26:51 +0000 Finished
> successfully:4
> [robinweiss at midway037 bad_swift]$ cd _concurrent/
> [robinweiss at midway037 _concurrent]$ ls
> out-4-0.txt out-4-1.txt out-4-2.txt out-4-3.txt
> [robinweiss at midway037 _concurrent]$ cat *
> -meta in/file0.meta -dat in/file0.dat
> -meta in/file1.meta -dat in/file1.dat
> -meta in/file3.meta -dat in/file2.dat
> -meta in/file2.meta -dat in/file3.dat
> [robinweiss at midway037 _concurrent]$
>
>
>
>
> So like I said, it turns out this whole issue boils down to a typo. I
> would suggest that swift should throw up some sort of warning if you
> pass something to one of the pre-defined mappers that is not defined
> as one of its parameters. I would have expected swift to complain that
> "sufix" is an unknown parameter of filesys_mapper.
>
>
> Cheers,
> Robin
>
>
>
>
> --
>
> Robin M. Weiss
> Research Programmer
> Research Computing Center
> The University of Chicago
> 6030 S. Ellis Ave., Suite 289C
> Chicago, IL 60637
> robinweiss at uchicago.edu
> 773.702.9030
>
>
> From: Robin Weiss < robinweiss at uchicago.edu >
> Date: Mon, 16 Jul 2012 10:43:29 -0500
> To: < swift-user at ci.uchicago.edu >
> Subject: using dual filesys_mappers
>
>
>
>
>
>
>
>
>
> Hello Swifters,
>
>
>
>
> I have a question about using two filesys_mappers and the foreach
> construct. I have attached the offending .swift script I am working
> with for reference. Here's the gist of what i'm trying to do and what
> appears to be happening instead:
>
>
>
>
> I have a program called 'boot' that takes as command line arguments 4
> file paths (log, out, data, and meta), and a number of other params
> (all numerical and seem to be getting passed in correctly, so no
> worries there). 'log' and 'out' are output files and 'data' and 'mata'
> are input files (located in directories called 'out' and 'in'
> respectively). The problem i'm having is with getting the correct
> values for 'data' and 'meta' passed in to my app call. I have an app
> section called processData that invokes boot. I will be assuming the
> the directory 'in' contains identically named data and meta files that
> differ only in their suffix ('.dat' or '.meta', respectively). This
> may or may not be safe, but for now it is fine and I'll cross that
> bridge when I come to it. Here's the relevant snippet from my script:
>
>
>
>
>
>
>
> app (clusFile out) processData (dataFile data, metaFile meta, logFile
> log){
>
>
>
>
> boot "-log" @log "-results" @out "-meta" @meta "-data" @data "-kmin"
> kmin "-kmax" kmax "-eps" eps "-bootpct" bootpct "-maxiterations"
> maxiterations "-maxtries" maxtries;
>
>
>
>
> }
>
>
>
>
> dataFile dataFiles[] <filesys_mapper;location="in",sufix="dat">;
>
> metaFile metaFiles[] <filesys_mapper;location="in",sufix="meta">;
>
>
>
>
> foreach f,i in dataFiles {
>
>
>
>
> clusFile o<single_file_mapper; location="out",
> file=@strcat("out/clusFile.",i,".clus")>;
>
> logFile l<single_file_mapper; location="out",
> file=@strcat("out/logFile.",i,".log")>;
>
> o = processData(f, metaFiles[i], l);
>
>
>
>
> }
>
>
>
>
> this configuration causes processData to be invoked as:
>
>
>
>
> out/clusFile.0.clus = processData(dataFile0.dat, dataFile0.dat,
> out/logFile.0.log);
>
>
>
>
> if i switch around the oder of the filesys_mappers so that the snippet
> reads:
>
>
>
>
> metaFile metaFiles[] <filesys_mapper;location="in",sufix="meta">;
>
> dataFile dataFiles[] <filesys_mapper;location="in",sufix="dat">;
>
>
>
>
> foreach f,i in dataFiles {
>
>
>
>
> clusFile o<single_file_mapper; location="out",
> file=@strcat("out/clusFile.",i,".clus")>;
>
> logFile l<single_file_mapper; location="out",
> file=@strcat("out/logFile.",i,".log")>;
>
> o = processData(f, metaFiles[i], l);
>
>
>
>
> }
>
>
>
>
> the app invocation is called as:
>
>
>
>
> out/clusFile.o.clus = processData(dataFile0.meta, dataFile0.meta,
> out/logFile.0.log);
>
>
>
>
> I guess the real question is this: what is the most appropriate way in
> swift to pass into a app invocation two corresponding input files?
> Ideally, it would be something like 'foreach file1,file2,i in
> inputFiles[][] { … } but that doesn't really make too much sense
> either.
>
>
>
>
> Anyway, any ideas would be appreciated.
>
>
>
>
> Cheers,
>
> Robin
>
>
>
>
>
> --
>
> Robin M. Weiss
> Research Programmer
> Research Computing Center
> The University of Chicago
> 6030 S. Ellis Ave., Suite 289C
> Chicago, IL 60637
> robinweiss at uchicago.edu
> 773.702.9030
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-user
mailing list