[Swift-user] Resuming jobs when script has changed

Bronevetsky, Greg bronevetsky1 at llnl.gov
Mon May 19 14:06:24 CDT 2014


I've played around with it some more and put together the following solution, which appears to work. The main issues with this variant are:

-          It turns each 1-line procedure call into 4 lines that test if the output already exists.

-          I have to explicitly keep track of my working directory to make it possible to test for the existence of files there.

-          I need to explicitly copy the files that already exist from my working directory to temporary directory used by Swift to enable Swift to recognize these files' existence. Then Swift copies them back to their original locations, which is wasteful but at least correct.
If anybody can suggest alternatives that overcome the above issues, I'd be grateful. I expect that mine is a common use-case because without this every small change to a script causes all of its intermediate results to be recomputed, even if the script is large and takes days to compute.

Greg Bronevetsky
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky at llnl.gov<mailto:bronevetsky at llnl.gov>
http://greg.bronevetsky.com

type file;

string WORK_PATH="/g/g15/bronevet/apps/swift-0.94.1/examples/test/work";

app (file out) fileExistsApp(string d, string f) {
  fileExists "-fName" @strcat(d, "/", f) stdout=@filename(out);
}

(boolean exists) fileExists(string d, string f) {
  file tmp <concurrent_mapper; location="tmp", prefix="fileExists">;
  tracef("tmp=%M, f=%s\n", tmp, @strcat(d, "/", f));
  (tmp) = fileExistsApp(d, f);
  (exists) = readData(tmp);
  tracef("exists=%s\n", @toString(exists));
}

app (file outF) noop(string d, string f) {
  cp @strcat(d, "/", f) @filename(outF);
}

app (file outF) writeFile(string message) {
  echo message stdout=@filename(outF);
}

app (file outF) copyFile(file inF) {
  cp @filename(inF) @filename(outF);
}

file data <single_file_mapper; file="dataF">;
if(!fileExists(WORK_PATH, "dataF"))
{ (data) = writeData("hello"); }
else
{ (data) = noop(WORK_PATH, "dataF"); }

file copy <single_file_mapper; file="copyF">;
if(!fileExists(WORK_PATH, "copyF"))
{ (copy) = copyFile(data); }
else
{ (copy) = noop(WORK_PATH, "copyF"); }


From: swift-user-bounces at ci.uchicago.edu [mailto:swift-user-bounces at ci.uchicago.edu] On Behalf Of Bronevetsky, Greg
Sent: Monday, May 19, 2014 10:39 AM
To: Michael Wilde; swift-user at ci.uchicago.edu
Subject: Re: [Swift-user] Resuming jobs when script has changed

Based on your suggestion, I came up with the following code. However, I'm having an issue with it. If I use an external mapper I can choose whether to return a file or not based on whether it already exists. However, once I made a decision not to return a file from the external mapper because it has already been generated, I don't see how to enable subsequent workflow steps to take it as input. In the code below, if dataFile exists but copyFile does not, I get the following error:
Execution failed:
        org.griphyn.vdl.mapping.InvalidPathException: Array index '0' not found for f1 of size 0
        copyFile, testResume.swift, line 20
        copyFile, testResume.swift, line 20

Greg Bronevetsky
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky at llnl.gov<mailto:bronevetsky at llnl.gov>
http://greg.bronevetsky.com

Swift:
type file;

string ROOT_PATH="/g/g15/bronevet/apps/swift-0.94.1/examples/test";

app (file outF) writeFile(string message) {
  echo message stdout=@filename(outF);
}

app (file outF) copyFile(file inF) {
  cp @filename(inF) @filename(outF);
}

file f1[] <ext;exec=@strcat(ROOT_PATH,"/immutable.py"), fName="dataFile">;
if(@length(f1) == 1) {
  f1[0] = writeFile("hello");
}

file f2[] <ext;exec=@strcat(ROOT_PATH,"/immutable.py"), fName="copyFile">;
if(@length(f2) == 1) {
  f2[0] = copyFile(f1[0]);
}
External mapper in Python:
#!/usr/apps/python2.7.3/bin/python

import argparse
import sys
import os

def main(argv):
  parser = argparse.ArgumentParser(description='Merge stats and distances files from experiments.')
  parser.add_argument('-fName',     dest='fName',       action='store', nargs="+", help='List of files to check for existence')
  args = parser.parse_args()

  for i in range(0, len(args.fName)):
    if(not (os.path.exists(args.fName[i]))):
      print "["+str(i)+"] "+args.fName[i]

if __name__ == "__main__":
   main(sys.argv[1:])

From: swift-user-bounces at ci.uchicago.edu<mailto:swift-user-bounces at ci.uchicago.edu> [mailto:swift-user-bounces at ci.uchicago.edu] On Behalf Of Michael Wilde
Sent: Friday, May 16, 2014 10:53 AM
To: swift-user at ci.uchicago.edu<mailto:swift-user at ci.uchicago.edu>
Subject: Re: [Swift-user] Resuming jobs when script has changed

Hi Greg,

The Swift resume mechanism can't do this, as its driven by internal variables name within a Swift run, not by external file existence.

The best way to do what you describe below is to write an external input file mapper (e.g. a simple shell or py script) that returns only the files that still need to be produced or processed.  (This is the "ext" mapper: http://swift-lang.org/guides/release-0.94/userguide/userguide.html#_external_mapper )

You can also call a local app to determine what files need to be produced, then use readData() to read those file names into an array, and array_mapper to map an array of remaining work to do.

Would one of these approaches meet your needs?

- Mike
On 5/16/14, 12:35 PM, Bronevetsky, Greg wrote:

I have Swift scripts that scan some portion of a design space and after I have scanned a sub-space of all the possibilities I modify the script to target a different, overlapping portion of the design space. However, it seems that when I do this Swift ignores the fact that I've already computed many of the tasks in the new run when I performed the prior run, and re-executes them redundantly. Is there a way for me to avoid such redundant executions? Can the -resume flag be used in this case?



Greg Bronevetsky

Lawrence Livermore National Lab

(925) 424-5756

bronevetsky at llnl.gov<mailto:bronevetsky at llnl.gov><mailto:bronevetsky at llnl.gov><mailto:bronevetsky at llnl.gov>

http://greg.bronevetsky.com







_______________________________________________

Swift-user mailing list

Swift-user at ci.uchicago.edu<mailto:Swift-user at ci.uchicago.edu>

https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user


--

Michael Wilde

Mathematics and Computer Science          Computation Institute

Argonne National Laboratory               The University of Chicago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140519/5281f0ad/attachment.html>


More information about the Swift-user mailing list