[MPICH2-dev] maximum number of processes per node

William Gropp gropp at mcs.anl.gov
Fri Jan 12 14:55:19 CST 2007


Are you running with the latest version of MPICH2 (1.0.5)?  There are  
some changes in mpd that might help here.  There are also, I believe,  
some timeouts in mpd that may need to be increased when running many  
processes on the same node.

Another option when running all of the processes on the same node is  
to use the gforker process manager; to use this, just configure  
mpich2 with --with-pm=gforker .  In fact, you can use either mpd or  
gforker by configuring with --with-pm=gforker:mpd .

One problem that you may run into is running out of file descriptors;  
in the gforker case, the mpiexec process will need several fd's for  
each process.  Make sure that your OS allows more than 1024 fds per  
process.

Bill

On Jan 11, 2007, at 4:09 PM, Eric Grobelny wrote:

> Hello,
>
>
>
> I am running some experiments to see the effects of time sharing  
> when running many processes on a single node.  I have been able to  
> run up to 256 processes, but when I try to increase this value to  
> 512, I get the following error:
>
>
>
> mpiexec_compute-0-10.local (mpiexec 375): no msg recvd from mpd  
> when expecting ack of request
>
>
>
> I am guessing that the problem is caused due to a limit set on the  
> number of processes that can run on a single node.  Is this the  
> case?  Can I solve this problem by redefining some #define in the  
> source code?  Or is this from a limitation placed by the OS?
>
>
>
> By the way, the single node I am running my experiments on is  
> running CentOS with kernel version 2.6.9-22.ELsmp.
>
>
>
> Thanks,
>
>
>
> Eric Grobelny
>
>
>
>
>
>
>
> =======================================================
> Eric Grobelny
> ECE Ph.D. Candidate, Research Assistant
> Advanced Space Computing (ASC) group member
> Modeling and Simulation (Performance Prediction) focus
> High-performance Computing and Simulation (HCS) Research Laboratory
>
>
>
> Dept. of Electrical and Computer Engineering, University of Florida
> PO Box 116200, 330 Benton Hall, Gainesville, FL 32611-6200
> Lab: (352)392-9034/9046      FAX: (352)392-8671
>
>
>
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.432 / Virus Database: 268.16.9/622 - Release Date:  
> 1/10/2007 2:52 PM
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20070112/4bd047e8/attachment.htm>


More information about the mpich2-dev mailing list