[Swift-user] Swift crash

William Catino wcatino at gmail.com
Mon May 5 15:57:10 CDT 2014


I submitted a job that contained 1.9 million documents, and got a crash.
I tried this 2 different ways:

First, I submitted files with 10000 filenames in each file - so there were
197 files.
This generated a GC error.

Then I tried 1976 files, each containing 1000 filenames.
This caused another crash - part of the error message contained the phrase
"Heap table out of memory."

I tried 100 files contianing a total of 100,000 filenames.  It is running
now (for over 15 minutes).


Is there any documentation about the various limitations of Swift,
especially on OSG:
-max number of files to process
-max total number of bytes contained in the files processed
-max length of command line to app
-max number of nodes
-max number of slots


The swift script looks like this:
[wcatino at login01 df]$ cat df.swift
type file;
type script;

// Note that to use `bash` here, it has to be in SWIFT's 'apps' file:
app (file df, file err, file out) wrapper (script wrap, script df_script,
file all_files[]) {
    // this uses bash to call the wrapper script with two CLAs: directory
(of input files) and target output file df
    // It also sets stderr and stdout to two passed files, err and out.
    bash @wrap @df @all_files stderr=@err stdout=@out;
}

string dir    = (@arg("data"));
script calc_df  <"df.py">;
script wrap     <"wrapper.sh">;

// This grabs all *.txt files in dir:
file[] all_docs <filesys_mapper; location=dir, suffix=".txt">;

foreach list_of_files, index in all_docs {
    // Top file is for the results -- if this is not created by our python,
swift will fail.

    file df  <single_file_mapper; file=@strcat(@list_of_files, ".csv")>;
    file err <single_file_mapper; file=@strcat(@list_of_files, ".err")>;
    file out <single_file_mapper; file=@strcat(@list_of_files, ".out")>;

    // Read the list of files as an array of string which is then passed
    // to the array mapper as the array of filenames for mapping.
    string names[] = readData(list_of_files);
    file all_files[] <array_mapper; files=names >;
    // Do we need to ?
    // Note we have to pass both dir and doc along with the wrapper script
and the actual python script:

    (df,err,out) = wrapper (wrap, calc_df, all_files);


Any help would be appreciated.
Sincerely,


William Catino, Ph.D.

Principal Software Engineer

Knowledge Lab | Computation Institute | University of Chicago

*wcatino at uchicago.edu <wcatino at uchicago.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140505/7aa22e28/attachment.html>


More information about the Swift-user mailing list