[mpich-discuss] mpirun on 1500~2000 cores

Rajeev Thakur thakur at mcs.anl.gov
Mon Jul 6 09:10:41 CDT 2009


Bob,
        If you are using a large number of cores and the default Nemesis channel and the MPD process manager, it will affect you. It
has been fixed in the current source (available via nightly snapshots) and will be in the 1.1.1 release later this week.
 
(The problem was that in MPI_Init, each process was doing p queries to the process manager for info about other processes, resulting
in p^2 queries across all processes. On small p's it didn't matter, but on large p's, the p^2 queries took too long. All that has
been fixed now.)
 
Rajeev
 


  _____  

From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of bob ilgner
Sent: Monday, July 06, 2009 5:05 AM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpirun on 1500~2000 cores


Hi Dmitry,
 
What sort of cluster are you running the 1500-2000 cores on(example e1350) and what is the nature of the application that you are
running? 
 
I did see Rajeev's response and noted the improced loading time with the latest build.
 
Regards, bob

On Sun, Jul 5, 2009 at 5:03 AM, dvg <dvg at ieee.org> wrote:


Hello,

What would be considered as reasonable time for mpirun to start a job on
1500~2000 cores, 1 gige cluster?

Are there any kernel (linux) or eth-related parameters which can be
tuned to speed it up?  MPICH2 libraries were compiled with most/all
optimization options enabled.

Thank you,
Dmitry




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090706/45736d39/attachment.htm>


More information about the mpich-discuss mailing list