[Swift-user] Question about packing jobs in Cray XE6 nodes
Michael Wilde
wilde at mcs.anl.gov
Mon Mar 26 14:08:26 CDT 2012
Hi Lorenzo,
----- Original Message -----
> From: "Lorenzo Pesce" <lpesce at uchicago.edu>
> To: swift-user at ci.uchicago.edu
> Sent: Monday, March 26, 2012 1:38:56 PM
> Subject: [Swift-user] Question about packing jobs in Cray XE6 nodes
> Hi all --
> Thanks a lot for the help so far.
>
> Most jobs work fine, but some of them crash. Crashing appears to be
> caused by either:
> a) Node runs out of memory (but it seems that it affects only one job,
> not the whole node -- however, when I send out the job alone it works
> fine)
If a none is truly running out of memory, to the point where the Linux kernel "out of memory" action is triggered, the entire PBS job will be killed. I think that would be more visible to you (likely from PBS errors received by Swift).
> b) Lack of convergence (algorithm needs to be changed)
>
>
> I am testing my hypothesis right now.
>
> Is it possible to split the pool of nodes into two groups, one where I
> run them more packed and one where the more demanding ones are sent?
Yes; you can create multiple "pool" entries in your sites file with different JobsPerNode values. Then you can create multiple versions of your app entry or entries in your tc file, with a different app name (2nd field) for each site.
Then in your Swift script you need to create can call multiple app() function names, using the app() name to determine what site it runs on.
Thats a bit crude, but it works. A future Swift enhancement might let you force an app call to run on a specific site via a settable parameter.
Depending on what you are trying to vary between sites, you might be able to do something clever by varying an environment variable within a single app function definition. I'll look for that info and post a pointer.
- Mike
> Thanks a lot,
>
> Lorenzo
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-user
mailing list