[Swift-user] using dual filesys_mappers

Robin Weiss robinweiss at uchicago.edu
Fri Jul 20 11:50:39 CDT 2012


So it turns out this was a PEBKAC issue.  While I think I know why swift is doing what it's doing, this behavior is a little tricky to diagnose and I was totally looking at the problem but not seeing it.  Let me explain.

Here is the offending swift script and the output from running it.  The directory 'in' contains a number of .dat files and corresponding .meta files (name differs only in extension).  The expected output in each outFile file should be "-meta fileX.meta -dat fileX.dat", "-meta fileY.meta -dat fileY.dat", etc.


type datFile;
type metaFile;
type outFile;

app (outFile out) testApp (datFile dat, metaFile meta){

echo "-meta" @meta "-dat" @dat stdout=@out;

}

datFile datFiles[] <filesys_mapper;location="in",sufix="dat">;
metaFile metaFiles[] <filesys_mapper;location="in",sufix="meta">;

foreach f,i in datFiles {

outFile out <concurrent_mapper;prefix="out", suffix=".txt">;
out = testApp(f, metaFiles[i]);

}


[robinweiss at midway037 bad_swift]$ pwd
/home/robinweiss/bad_swift
[robinweiss at midway037 bad_swift]$ cd in
[robinweiss at midway037 in]$ ls
file0.dat  file0.meta  file1.dat  file1.meta  file2.dat  file2.meta  file3.dat  file3.meta
[robinweiss at midway037 in]$ cd ..
[robinweiss at midway037 bad_swift]$ ./runLocal.sh
Swift 0.93 swift-r5483 cog-r3339

RunID: 20120720-1623-cpgd5xr9
 (input): found 8 files
 (input): found 8 files
Progress:  time: Fri, 20 Jul 2012 16:23:48 +0000  Initializing:1
Final status:  time: Fri, 20 Jul 2012 16:23:49 +0000  Finished successfully:8
[robinweiss at midway037 bad_swift]$ cd _concurrent/
[robinweiss at midway037 _concurrent]$ ls
out-4-0.txt  out-4-1.txt  out-4-2.txt  out-4-3.txt  out-4-4.txt  out-4-5.txt  out-4-6.txt  out-4-7.txt
[robinweiss at midway037 _concurrent]$ cat *
-meta in/file0.meta -dat in/file0.meta
-meta in/file0.dat -dat in/file0.dat
-meta in/file1.meta -dat in/file1.meta
-meta in/file1.dat -dat in/file1.dat
-meta in/file2.dat -dat in/file2.dat
-meta in/file3.dat -dat in/file3.dat
-meta in/file3.meta -dat in/file3.meta
-meta in/file2.meta -dat in/file2.meta


Can you spot the problem in the script?  Hint: "suffix" has two f's, not one.  Because the parameter 'sufix' is meaningless to filesys_mapper it ends up just mapping everything in 'location'.  It would seem mappers will allow you to put in pretty much any garbage you want inside the < >'s so long as its an assignment.  For example:

datFile datFiles[] <filesys_mapper;location="in",suffix="dat",foo="bar",biz="baz",garbage="more garbage">;

works just fine also.


And now for the good run:

type datFile;
type metaFile;
type outFile;

app (outFile out) testApp (datFile dat, metaFile meta){

echo "-meta" @meta "-dat" @dat stdout=@out;

}

datFile datFiles[] <filesys_mapper;location="in",suffix="dat">;
metaFile metaFiles[] <filesys_mapper;location="in",suffix="meta">;

foreach f,i in datFiles {

outFile out <concurrent_mapper;prefix="out", suffix=".txt">;
out = testApp(f, metaFiles[i]);

}

[robinweiss at midway037 bad_swift]$ pwd
/home/robinweiss/bad_swift
[robinweiss at midway037 bad_swift]$ cd in
[robinweiss at midway037 in]$ ls
file0.dat  file0.meta  file1.dat  file1.meta  file2.dat  file2.meta  file3.dat  file3.meta
[robinweiss at midway037 in]$ cd ..
[robinweiss at midway037 bad_swift]$ ./runLocal.sh
Swift 0.93 swift-r5483 cog-r3339

RunID: 20120720-1626-ou1juqf5
 (input): found 4 files
 (input): found 4 files
Progress:  time: Fri, 20 Jul 2012 16:26:51 +0000  Initializing:1
Final status:  time: Fri, 20 Jul 2012 16:26:51 +0000  Finished successfully:4
[robinweiss at midway037 bad_swift]$ cd _concurrent/
[robinweiss at midway037 _concurrent]$ ls
out-4-0.txt  out-4-1.txt  out-4-2.txt  out-4-3.txt
[robinweiss at midway037 _concurrent]$ cat *
-meta in/file0.meta -dat in/file0.dat
-meta in/file1.meta -dat in/file1.dat
-meta in/file3.meta -dat in/file2.dat
-meta in/file2.meta -dat in/file3.dat
[robinweiss at midway037 _concurrent]$


So like I said, it turns out this whole issue boils down to a typo.  I would suggest that swift should throw up some sort of warning if you pass something to one of the pre-defined mappers that is not defined as one of its parameters.  I would have expected swift to complain that "sufix" is an unknown parameter of filesys_mapper.

Cheers,
Robin

--
Robin M. Weiss
Research Programmer
Research Computing Center
The University of Chicago
6030 S. Ellis Ave., Suite 289C
Chicago, IL 60637
robinweiss at uchicago.edu<mailto:labello at uchicago.edu>
773.702.9030

From: Robin Weiss <robinweiss at uchicago.edu<mailto:robinweiss at uchicago.edu>>
Date: Mon, 16 Jul 2012 10:43:29 -0500
To: <swift-user at ci.uchicago.edu<mailto:swift-user at ci.uchicago.edu>>
Subject: using dual filesys_mappers


Hello Swifters,


I have a question about using two filesys_mappers and the foreach construct.  I have attached the offending .swift script I am working with for reference.  Here's the gist of what i'm trying to do and what appears to be happening instead:


I have a program called 'boot' that takes as command line arguments 4 file paths (log, out, data, and meta), and a number of other params (all numerical and seem to be getting passed in correctly, so no worries there).  'log' and 'out' are output files and 'data' and 'mata' are input files (located in directories called 'out' and 'in' respectively).  The problem i'm having is with getting the correct values for 'data' and 'meta' passed in to my app call.  I have an app section called processData that invokes boot.  I will be assuming the the directory 'in' contains identically named data and meta files that differ only in their suffix ('.dat' or '.meta', respectively).  This may or may not be safe, but for now it is fine and I'll cross that bridge when I come to it.  Here's the relevant snippet from my script:



app (clusFile out) processData (dataFile data, metaFile meta, logFile log){


        boot "-log" @log "-results" @out "-meta" @meta "-data" @data "-kmin" kmin "-kmax" kmax "-eps" eps "-bootpct" bootpct "-maxiterations" maxiterations "-maxtries" maxtries;


}


dataFile dataFiles[] <filesys_mapper;location="in",sufix="dat">;

metaFile metaFiles[] <filesys_mapper;location="in",sufix="meta">;


foreach f,i in dataFiles {


        clusFile o<single_file_mapper; location="out", file=@strcat("out/clusFile.",i,".clus")>;

        logFile l<single_file_mapper; location="out", file=@strcat("out/logFile.",i,".log")>;

        o = processData(f, metaFiles[i], l);


}


this configuration causes processData to be invoked as:


out/clusFile.0.clus = processData(dataFile0.dat, dataFile0.dat, out/logFile.0.log);


if i switch around the oder of the filesys_mappers so that the snippet reads:


metaFile metaFiles[] <filesys_mapper;location="in",sufix="meta">;

dataFile dataFiles[] <filesys_mapper;location="in",sufix="dat">;


foreach f,i in dataFiles {


        clusFile o<single_file_mapper; location="out", file=@strcat("out/clusFile.",i,".clus")>;

        logFile l<single_file_mapper; location="out", file=@strcat("out/logFile.",i,".log")>;

        o = processData(f, metaFiles[i], l);


}


the app invocation is called as:


out/clusFile.o.clus = processData(dataFile0.meta, dataFile0.meta, out/logFile.0.log);


I guess the real question is this: what is the most appropriate way in swift to pass into a app invocation two corresponding input files?  Ideally, it would be something like 'foreach file1,file2,i in inputFiles[][] { … } but that doesn't really make too much sense either.


Anyway, any ideas would be appreciated.


Cheers,

Robin


--
Robin M. Weiss
Research Programmer
Research Computing Center
The University of Chicago
6030 S. Ellis Ave., Suite 289C
Chicago, IL 60637
robinweiss at uchicago.edu<mailto:labello at uchicago.edu>
773.702.9030
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20120720/7df810d8/attachment.html>


More information about the Swift-user mailing list