[mpich-discuss] Large memory allocations in MPI applications under Linux

Dave Goodell goodell at mcs.anl.gov
Wed Apr 15 13:53:59 CDT 2009


Are you actually touching all of this memory?  If so, you just  
shouldn't be trying to allocate more memory than is physically  
available in general if you care about the performance of your  
application.  In most scenarios swapping will absolutely kill the  
performance of your application.  Is there a hard requirement that you  
process >20GiB of data in-core on a 20GiB-sized machine or are you  
just trying to squeeze every last drop of performance/precision/ 
problem out of the system?

Alternatively if you use the rlimit trick described earlier and you  
malloc until it returns NULL then you will probably cause problems for  
any libraries that you use, including the MPI library.  Many libraries  
assume that at least small to moderate amounts of memory are available  
via malloc and will bail out if they are unable to allocate that  
memory.  This is definitely true of MPICH2 and is also the case for  
anything that uses certain libc functions such as strdup or mergesort.

In either case, you should self-impose limits for your memory usage  
rather than relying on the operating system to impose limits on your  
memory usage.  When you run all the way up against OS resource limits  
bad things usually start to happen depending on the exact resource in  
question.  It also usually leads to portability problems down the road  
when you try to move your software to a new platform.

-Dave

On Apr 15, 2009, at 1:28 PM, Sudarshan Raghunathan wrote:

> Thank you Jed, I will try your approach and see if it works. I suppose
> the simplest solution is to set the rlimit per MPI process assuming an
> almost equal distribution of the allocation, but this will not work
> when one of the ranks has to allocate a lot more than the others and
> the total is still near the available physical memory.
>
> Regards,
> Sudarshan
>
> 2009/4/15 Jed Brown <jed at 59a2.org>:
>> A lightweight solution is to set `ulimit -v' in your shell.
>> Alternatively, look at setrlimit (2), RLIMIT_AS.  This limits the  
>> total
>> amount of virtual memory available to your process.  If you try to
>> malloc beyond this limit, it will fail (return NULL).
>>
>> Most kernels are configured to have no problem wildly over-committing
>> memory.  I can malloc 10GB on a machine with 4 GB memory and 4 GB  
>> swap.
>> Clearly I can't actually touch all of that memory, but malloc doesn't
>> mind.  This is a feature and disabling overcommitment may cause  
>> problems
>> for other programs (depending on what else is running on your  
>> machine).
>>
>> Jed
>>



More information about the mpich-discuss mailing list