[Swift-user] Resuming jobs when script has changed
Mihael Hategan
hategan at mcs.anl.gov
Mon May 19 16:40:11 CDT 2014
I think the problem is that you are assuming (when you say f1[0]) that
the mapper returns at least one element for the array.
I also believe that Mike's suggestion went along the following lines:
file[] list <ext;
exec="get-me-the-files-to-process-while-keeping-track-of-what-has-already-been-done.py", ...>;
etc.
Also, recent versions of Swift (after 0.94) have an exists() function.
E.g.
if (!exists(filename(outputFile)) {
outputFile = doStuff(...);
}
I have used this latter bit to do something similar to what you are
trying to do.
Mihael
On Mon, 2014-05-19 at 17:39 +0000, Bronevetsky, Greg wrote:
> Based on your suggestion, I came up with the following code. However, I'm having an issue with it. If I use an external mapper I can choose whether to return a file or not based on whether it already exists. However, once I made a decision not to return a file from the external mapper because it has already been generated, I don't see how to enable subsequent workflow steps to take it as input. In the code below, if dataFile exists but copyFile does not, I get the following error:
> Execution failed:
> org.griphyn.vdl.mapping.InvalidPathException: Array index '0' not found for f1 of size 0
> copyFile, testResume.swift, line 20
> copyFile, testResume.swift, line 20
>
> Greg Bronevetsky
> Lawrence Livermore National Lab
> (925) 424-5756
> bronevetsky at llnl.gov<mailto:bronevetsky at llnl.gov>
> http://greg.bronevetsky.com
>
> Swift:
> type file;
>
> string ROOT_PATH="/g/g15/bronevet/apps/swift-0.94.1/examples/test";
>
> app (file outF) writeFile(string message) {
> echo message stdout=@filename(outF);
> }
>
> app (file outF) copyFile(file inF) {
> cp @filename(inF) @filename(outF);
> }
>
> file f1[] <ext;exec=@strcat(ROOT_PATH,"/immutable.py"), fName="dataFile">;
> if(@length(f1) == 1) {
> f1[0] = writeFile("hello");
> }
>
> file f2[] <ext;exec=@strcat(ROOT_PATH,"/immutable.py"), fName="copyFile">;
> if(@length(f2) == 1) {
> f2[0] = copyFile(f1[0]);
> }
> External mapper in Python:
> #!/usr/apps/python2.7.3/bin/python
>
> import argparse
> import sys
> import os
>
> def main(argv):
> parser = argparse.ArgumentParser(description='Merge stats and distances files from experiments.')
> parser.add_argument('-fName', dest='fName', action='store', nargs="+", help='List of files to check for existence')
> args = parser.parse_args()
>
> for i in range(0, len(args.fName)):
> if(not (os.path.exists(args.fName[i]))):
> print "["+str(i)+"] "+args.fName[i]
>
> if __name__ == "__main__":
> main(sys.argv[1:])
>
> From: swift-user-bounces at ci.uchicago.edu [mailto:swift-user-bounces at ci.uchicago.edu] On Behalf Of Michael Wilde
> Sent: Friday, May 16, 2014 10:53 AM
> To: swift-user at ci.uchicago.edu
> Subject: Re: [Swift-user] Resuming jobs when script has changed
>
> Hi Greg,
>
> The Swift resume mechanism can't do this, as its driven by internal variables name within a Swift run, not by external file existence.
>
> The best way to do what you describe below is to write an external input file mapper (e.g. a simple shell or py script) that returns only the files that still need to be produced or processed. (This is the "ext" mapper: http://swift-lang.org/guides/release-0.94/userguide/userguide.html#_external_mapper )
>
> You can also call a local app to determine what files need to be produced, then use readData() to read those file names into an array, and array_mapper to map an array of remaining work to do.
>
> Would one of these approaches meet your needs?
>
> - Mike
>
> On 5/16/14, 12:35 PM, Bronevetsky, Greg wrote:
>
> I have Swift scripts that scan some portion of a design space and after I have scanned a sub-space of all the possibilities I modify the script to target a different, overlapping portion of the design space. However, it seems that when I do this Swift ignores the fact that I've already computed many of the tasks in the new run when I performed the prior run, and re-executes them redundantly. Is there a way for me to avoid such redundant executions? Can the -resume flag be used in this case?
>
>
>
> Greg Bronevetsky
>
> Lawrence Livermore National Lab
>
> (925) 424-5756
>
> bronevetsky at llnl.gov<mailto:bronevetsky at llnl.gov><mailto:bronevetsky at llnl.gov><mailto:bronevetsky at llnl.gov>
>
> http://greg.bronevetsky.com
>
>
>
>
>
>
>
>
> _______________________________________________
>
> Swift-user mailing list
>
> Swift-user at ci.uchicago.edu<mailto:Swift-user at ci.uchicago.edu>
>
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
> --
>
> Michael Wilde
>
> Mathematics and Computer Science Computation Institute
>
> Argonne National Laboratory The University of Chicago
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
More information about the Swift-user
mailing list