[Swift-user] Swift crash
William Catino
wcatino at gmail.com
Mon May 5 15:57:10 CDT 2014
I submitted a job that contained 1.9 million documents, and got a crash.
I tried this 2 different ways:
First, I submitted files with 10000 filenames in each file - so there were
197 files.
This generated a GC error.
Then I tried 1976 files, each containing 1000 filenames.
This caused another crash - part of the error message contained the phrase
"Heap table out of memory."
I tried 100 files contianing a total of 100,000 filenames. It is running
now (for over 15 minutes).
Is there any documentation about the various limitations of Swift,
especially on OSG:
-max number of files to process
-max total number of bytes contained in the files processed
-max length of command line to app
-max number of nodes
-max number of slots
The swift script looks like this:
[wcatino at login01 df]$ cat df.swift
type file;
type script;
// Note that to use `bash` here, it has to be in SWIFT's 'apps' file:
app (file df, file err, file out) wrapper (script wrap, script df_script,
file all_files[]) {
// this uses bash to call the wrapper script with two CLAs: directory
(of input files) and target output file df
// It also sets stderr and stdout to two passed files, err and out.
bash @wrap @df @all_files stderr=@err stdout=@out;
}
string dir = (@arg("data"));
script calc_df <"df.py">;
script wrap <"wrapper.sh">;
// This grabs all *.txt files in dir:
file[] all_docs <filesys_mapper; location=dir, suffix=".txt">;
foreach list_of_files, index in all_docs {
// Top file is for the results -- if this is not created by our python,
swift will fail.
file df <single_file_mapper; file=@strcat(@list_of_files, ".csv")>;
file err <single_file_mapper; file=@strcat(@list_of_files, ".err")>;
file out <single_file_mapper; file=@strcat(@list_of_files, ".out")>;
// Read the list of files as an array of string which is then passed
// to the array mapper as the array of filenames for mapping.
string names[] = readData(list_of_files);
file all_files[] <array_mapper; files=names >;
// Do we need to ?
// Note we have to pass both dir and doc along with the wrapper script
and the actual python script:
(df,err,out) = wrapper (wrap, calc_df, all_files);
Any help would be appreciated.
Sincerely,
William Catino, Ph.D.
Principal Software Engineer
Knowledge Lab | Computation Institute | University of Chicago
*wcatino at uchicago.edu <wcatino at uchicago.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140505/7aa22e28/attachment.html>
More information about the Swift-user
mailing list