[Swift-user] Unix find as external mapper
Michael Wilde
wilde at mcs.anl.gov
Wed Dec 5 18:55:38 CST 2012
I need to qualify what I said in the prior posting. Per the User Guide, an ext mapper needs to return two columns for the mapping, so for this format you almost always need a wrapper script to produce that format.
Neil, your suggestion would work for a mapper that mapped N elements of an array, with column 1 implicitly being [0], [1], ... With that caveat, your suggestion makes sense for some other new but useful mapper type.
In Swift/T ( ) the approach to mappers is evolving somewhat differently, and we will discuss that with the community and provide some kind of backward compatibility option or conversion guide.
You might want to do this as follows, instead:
- declare "sh" as an app.
- pass your find command to the sh as the value of the -c argument, and have the app return stdout as its result.
- pass that to readData() into an array, and map the dataset with this array using the array_mapper:
string s[] = [ "a.txt", "b.txt", "c.txt" ];
file f[] <array_mapper;files=s>;
A simple example of the above belongs in the User Guide. I'll create a ticket for that as well.
- Mike
----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Neil Best" <nbest at ci.uchicago.edu>
> Cc: swift-user at ci.uchicago.edu
> Sent: Wednesday, December 5, 2012 6:46:51 PM
> Subject: Re: [Swift-user] Unix find as external mapper
> Neil,
>
> While your suggested syntax sounds reasonable, thats not how the ext
> mapper currently works. You do indeed need to create a small wrapper
> script for such situations. The exec parameter is only used to specify
> the pathname of the mapper script (absolute or relative to the working
> dir in which you are running the swift command).
>
> I'll file an enhancement ticket to record your suggestion, which in
> hindsight seems obvious, and which I like :)
>
> Thanks,
>
> - Mike
>
> ----- Original Message -----
> > From: "Neil Best" <nbest at ci.uchicago.edu>
> > To: swift-user at ci.uchicago.edu
> > Sent: Wednesday, December 5, 2012 4:17:22 PM
> > Subject: [Swift-user] Unix find as external mapper
> > I tried to use find as an external mapper in a way that seemed
> > fairly
> > natural to me:
> >
> > file monthly[] <ext; exec="find data/nc -mindepth 1 -type d -printf
> > '[%f] %p.nc\n'\|.txt'">;
> >
> > When I run this find by itself at the command line the output looks
> > like this:
> >
> > $ find data/nc -mindepth 1 -type d -printf '[%f] %p.nc\n' | head
> > [197901] data/nc/197901.nc
> > [197903] data/nc/197903.nc
> > [197904] data/nc/197904.nc
> > [197902] data/nc/197902.nc
> > [197905] data/nc/197905.nc
> > [197906] data/nc/197906.nc
> > [197907] data/nc/197907.nc
> > [197908] data/nc/197908.nc
> > [197909] data/nc/197909.nc
> > [197910] data/nc/197910.nc
> >
> > The find actually looks at directories. These .nc files don't exist
> > yet, but each one will be an aggregation of the .nc files within the
> > folder of the same name using this command:
> >
> > cdo mergetime data/nc/197901/*.single.nc data/nc/197901.nc
> >
> > I thought I could then do a nested foreach over years and months to
> > execute the invocations.
> >
> > I presume that the external mapper is treating the exec argument as
> > a
> > command name and not parsing the arguments. Using the other
> > parameters in the ext construct would force me to write a wrapper
> > script for the find invocation since all of the arguments must have
> > names apparently.
> >
> > Do I have this right? Maybe there is a more straightforward
> > approach.
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-user
mailing list