[mpich-discuss] Large memory allocations in MPI applications under Linux
Pavan Balaji
balaji at mcs.anl.gov
Wed Apr 15 12:25:29 CDT 2009
One of the research groups that I have been working with had some
problem with virtual address swapping as well. The solution they used
was to run /sbin/swapoff, which essentially tells the OS to only use
physical address space and return a NULL if there's no memory. They were
using this to optimize their LINPACK performance, so they didn't really
care if this returned a NULL pointer (for graceful termination) or just
aborted the application. But you can try it.
-- Pavan
Sudarshan Raghunathan wrote:
> Dear all,
> My question does not pertain to MPICH per se, but I was curious to
> know if I've run into a previously-solved problem.
>
> I am running on a AMD Opteron SMP machine with 8 cores in total and 20
> GB of physical memory running Linux kernel 2.6.18. My MPI application
> (a hugely simplified version attached) is such that all processes
> together need to allocate slightly more than 20GB at the same time
> (i.e., when running P MPI processes, each rank allocates slightly more
> than 20/P GB). When running with P=1, I get a NULL pointer from malloc
> when allocating more than the amount of physical memory and can
> gracefully terminate my application. However, when running with P > 1,
> I see the OS swapping very heavily and the machine becomes totally
> unresponsive for a long time (for larger Ps, the only way to get the
> machine back into a responsive state is to reboot it). Clearly, this
> is a major annoyance for me and for other users of the machine.
>
> I am wondering if there is any way to restructure/rewrite my
> application (or tweak settings for malloc), so that irrespective of
> how many processes I'm running with, I get a null-pointer exception on
> at least a subset of the ranks as soon as the total physical memory is
> exhausted. The "obvious" solution is to look at /proc/meminfo to see
> the physical amount of available memory and allocate only if
> sufficient memory is available, but this seems to be highly
> sub-optimal and fragile. Has anyone in the MPICH community run into
> this problem before and if so, are there best practices for how one
> must deal with memory allocations?
>
> Thank you much in advance.
>
> Sudarshan
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list