[Swift-user] Resuming jobs when script has changed

Mihael Hategan hategan at mcs.anl.gov
Mon May 19 16:40:11 CDT 2014


I think the problem is that you are assuming (when you say f1[0]) that
the mapper returns at least one element for the array.

I also believe that Mike's suggestion went along the following lines:

file[] list <ext;
exec="get-me-the-files-to-process-while-keeping-track-of-what-has-already-been-done.py", ...>;

etc.

Also, recent versions of Swift (after 0.94) have an exists() function.
E.g.

if (!exists(filename(outputFile)) {
   outputFile = doStuff(...);
}

I have used this latter bit to do something similar to what you are
trying to do.

Mihael

On Mon, 2014-05-19 at 17:39 +0000, Bronevetsky, Greg wrote:
> Based on your suggestion, I came up with the following code. However, I'm having an issue with it. If I use an external mapper I can choose whether to return a file or not based on whether it already exists. However, once I made a decision not to return a file from the external mapper because it has already been generated, I don't see how to enable subsequent workflow steps to take it as input. In the code below, if dataFile exists but copyFile does not, I get the following error:
> Execution failed:
>         org.griphyn.vdl.mapping.InvalidPathException: Array index '0' not found for f1 of size 0
>         copyFile, testResume.swift, line 20
>         copyFile, testResume.swift, line 20
> 
> Greg Bronevetsky
> Lawrence Livermore National Lab
> (925) 424-5756
> bronevetsky at llnl.gov<mailto:bronevetsky at llnl.gov>
> http://greg.bronevetsky.com
> 
> Swift:
> type file;
> 
> string ROOT_PATH="/g/g15/bronevet/apps/swift-0.94.1/examples/test";
> 
> app (file outF) writeFile(string message) {
>   echo message stdout=@filename(outF);
> }
> 
> app (file outF) copyFile(file inF) {
>   cp @filename(inF) @filename(outF);
> }
> 
> file f1[] <ext;exec=@strcat(ROOT_PATH,"/immutable.py"), fName="dataFile">;
> if(@length(f1) == 1) {
>   f1[0] = writeFile("hello");
> }
> 
> file f2[] <ext;exec=@strcat(ROOT_PATH,"/immutable.py"), fName="copyFile">;
> if(@length(f2) == 1) {
>   f2[0] = copyFile(f1[0]);
> }
> External mapper in Python:
> #!/usr/apps/python2.7.3/bin/python
> 
> import argparse
> import sys
> import os
> 
> def main(argv):
>   parser = argparse.ArgumentParser(description='Merge stats and distances files from experiments.')
>   parser.add_argument('-fName',     dest='fName',       action='store', nargs="+", help='List of files to check for existence')
>   args = parser.parse_args()
> 
>   for i in range(0, len(args.fName)):
>     if(not (os.path.exists(args.fName[i]))):
>       print "["+str(i)+"] "+args.fName[i]
> 
> if __name__ == "__main__":
>    main(sys.argv[1:])
> 
> From: swift-user-bounces at ci.uchicago.edu [mailto:swift-user-bounces at ci.uchicago.edu] On Behalf Of Michael Wilde
> Sent: Friday, May 16, 2014 10:53 AM
> To: swift-user at ci.uchicago.edu
> Subject: Re: [Swift-user] Resuming jobs when script has changed
> 
> Hi Greg,
> 
> The Swift resume mechanism can't do this, as its driven by internal variables name within a Swift run, not by external file existence.
> 
> The best way to do what you describe below is to write an external input file mapper (e.g. a simple shell or py script) that returns only the files that still need to be produced or processed.  (This is the "ext" mapper: http://swift-lang.org/guides/release-0.94/userguide/userguide.html#_external_mapper )
> 
> You can also call a local app to determine what files need to be produced, then use readData() to read those file names into an array, and array_mapper to map an array of remaining work to do.
> 
> Would one of these approaches meet your needs?
> 
> - Mike
> 
> On 5/16/14, 12:35 PM, Bronevetsky, Greg wrote:
> 
> I have Swift scripts that scan some portion of a design space and after I have scanned a sub-space of all the possibilities I modify the script to target a different, overlapping portion of the design space. However, it seems that when I do this Swift ignores the fact that I've already computed many of the tasks in the new run when I performed the prior run, and re-executes them redundantly. Is there a way for me to avoid such redundant executions? Can the -resume flag be used in this case?
> 
> 
> 
> Greg Bronevetsky
> 
> Lawrence Livermore National Lab
> 
> (925) 424-5756
> 
> bronevetsky at llnl.gov<mailto:bronevetsky at llnl.gov><mailto:bronevetsky at llnl.gov><mailto:bronevetsky at llnl.gov>
> 
> http://greg.bronevetsky.com
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> 
> Swift-user mailing list
> 
> Swift-user at ci.uchicago.edu<mailto:Swift-user at ci.uchicago.edu>
> 
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> 
> 
> 
> --
> 
> Michael Wilde
> 
> Mathematics and Computer Science          Computation Institute
> 
> Argonne National Laboratory               The University of Chicago
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user





More information about the Swift-user mailing list