[mpich-discuss] Large memory allocations in MPI applications under Linux
Dave Goodell
goodell at mcs.anl.gov
Wed Apr 15 13:53:59 CDT 2009
Are you actually touching all of this memory? If so, you just
shouldn't be trying to allocate more memory than is physically
available in general if you care about the performance of your
application. In most scenarios swapping will absolutely kill the
performance of your application. Is there a hard requirement that you
process >20GiB of data in-core on a 20GiB-sized machine or are you
just trying to squeeze every last drop of performance/precision/
problem out of the system?
Alternatively if you use the rlimit trick described earlier and you
malloc until it returns NULL then you will probably cause problems for
any libraries that you use, including the MPI library. Many libraries
assume that at least small to moderate amounts of memory are available
via malloc and will bail out if they are unable to allocate that
memory. This is definitely true of MPICH2 and is also the case for
anything that uses certain libc functions such as strdup or mergesort.
In either case, you should self-impose limits for your memory usage
rather than relying on the operating system to impose limits on your
memory usage. When you run all the way up against OS resource limits
bad things usually start to happen depending on the exact resource in
question. It also usually leads to portability problems down the road
when you try to move your software to a new platform.
-Dave
On Apr 15, 2009, at 1:28 PM, Sudarshan Raghunathan wrote:
> Thank you Jed, I will try your approach and see if it works. I suppose
> the simplest solution is to set the rlimit per MPI process assuming an
> almost equal distribution of the allocation, but this will not work
> when one of the ranks has to allocate a lot more than the others and
> the total is still near the available physical memory.
>
> Regards,
> Sudarshan
>
> 2009/4/15 Jed Brown <jed at 59a2.org>:
>> A lightweight solution is to set `ulimit -v' in your shell.
>> Alternatively, look at setrlimit (2), RLIMIT_AS. This limits the
>> total
>> amount of virtual memory available to your process. If you try to
>> malloc beyond this limit, it will fail (return NULL).
>>
>> Most kernels are configured to have no problem wildly over-committing
>> memory. I can malloc 10GB on a machine with 4 GB memory and 4 GB
>> swap.
>> Clearly I can't actually touch all of that memory, but malloc doesn't
>> mind. This is a feature and disabling overcommitment may cause
>> problems
>> for other programs (depending on what else is running on your
>> machine).
>>
>> Jed
>>
More information about the mpich-discuss
mailing list