[Swift-user] Swift crash

Mihael Hategan hategan at mcs.anl.gov
Sat May 10 16:26:48 CDT 2014


Hi,

On Mon, 2014-05-05 at 15:57 -0500, William Catino wrote:
> I tried 100 files contianing a total of 100,000 filenames.  It is running
> now (for over 15 minutes).

Just wanted to mention that even for relatively short path names you
would need on the order of 500MB for the above just to store those file
names.

So I think most of the problems that you are seeing are due to this
being a large problem, rather than some limitation with file staging
performance.

Your best bet would be to bias the splitting towards a larger number of
application invocations with smaller arrays for each. Then move as much
as possible of the app instance information inside a foreach loop and
use foreach.max.threads to limit the number of concurrent iterations in
that loop to something that is reasonable for the amount of heap given
to the swift process. An alternative strategy is to set
foreach.max.threads on the order of the amount of actual jobs you expect
to be able to run at a time.

If those are not viable options, I suggest looking into Swift/T, which
has a distributed implementation of the runtime engine and can scale
better for large problems.

> 
> 
> Is there any documentation about the various limitations of Swift,
> especially on OSG:
> -max number of files to process

Unfortunately these numbers depend on a lot of factors and it is hard to
come up with something that is universally true. A file in /tmp will
result in different memory use than a file
in /gpfs/projects/john-d-jones/2014/projectx/version6/swift/input/mixed/, but then it really depends whether this uses a mapper that uses full file names explicitly or a mapper that builds file names procedurally. The complexity of the swift script is also a factor.

> -max total number of bytes contained in the files processed

This should not be a limitation of swift. Larger files will take more
time to transfer, but should not cause any errors.

> -max length of command line to app

This unfortunately does not depend on swift. There are no limits (beside
RAM) to how large an app command line can be. However, the operating
system or job submission mechanism can impose their own limits. I
believe that Condor has a limit of something like 4096 bytes for the
command line.

In any event, if this is a problem, and you are submitting jobs directly
to condor (instead of coasters), you can use
"wrapper.parameter.mode=files" in swift.properties. If you are using
coasters, the OS will be the limit you are likely to hit first. 

> -max number of nodes

This isn't limited by Swift. Sites may limit this number and it is also
possible that for short apps you will not get much speedup after a
certain number of nodes.

> -max number of slots

Again, not limited by Swift, but certain sites/queues may only allow a
limited number of jobs in the queue.

Mihael




More information about the Swift-user mailing list