[mpich-discuss] core 2 quad and other multiple core processors

Robert Kubrick robertkubrick at gmail.com
Fri Jul 4 12:26:40 CDT 2008


I wonder why ch3:nemesis or ch3:ssm is the default in MPICH. Why  
ch3:socket?

Robert

On Jul 2, 2008, at 10:59 PM, Rajeev Thakur wrote:

> For best performance, configure with --with-device=ch3:nemesis. It  
> will use the Nemesis device within MPICH2 that communicates using  
> shared memory within a node and TCP across nodes.
>
> Rajeev
>
> From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich- 
> discuss at mcs.anl.gov] On Behalf Of Ariovaldo de Souza Junior
> Sent: Wednesday, July 02, 2008 3:15 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] core 2 quad and other multiple core  
> processors
>
> Hello everybody!
>
> I'm really a newbie on clustering, so I have some, let's say,  
> stupid questions. When I'm starting a job like this "mpiexec -l -n  
> 6 ./cpi" in my small cluster of (until now) 6 core 2 quad machines,  
> I'm sending 1 process to each node, right? Assuming that I'm  
> correct, each process will utilize only 1 core of each node? and  
> how to make 1 process run utilizing the whole processing capacity  
> of the processor, the 4 cores? is there a way to do this? or I'll  
> always utilize just one processor for each process? if I change  
> this submission to "mpiexec -l -n 24 ./cpi" then the same process  
> will run 24 times, 4 times per node (maybe simultaneously) and one  
> process per core, right?
>
> I'm asking all this because I think it is a bit strange to see the  
> processing time increasing each time I put one more process to run,  
> once in my mind it should be the contrary. I'll give some examples:
>
> mpiexec -n 1 ./cpi
> wall clock time = 0.000579
>
> mpiexec -n 2 ./cpi
> wall clock time = 0.002442
>
> mpiexec -n 3 ./cpi
> wall clock time = 0.004568
>
> mpiexec -n 4 ./cpi
> wall clock time = 0.005150
>
> mpiexec -n 5 ./cpi
> wall clock time = 0.008923
>
> mpiexec -n 6 ./cpi
> wall clock time = 0.009309
>
> mpiexec -n 12 ./cpi
> wall clock time = 0.019445
>
> mpiexec -n 18 ./cpi
> wall clock time = 0.032204
>
> mpiexec -n 24 ./cpi
> wall clock time = 0.045413
>
> mpiexec -n 48 ./cpi
> wall clock time = 0.089815
>
> mpiexec -n 96 ./cpi
> wall clock time = 0.218894
>
> mpiexec -n 192 ./cpi
> wall clock time = 0.492870
>
> So, as you all can see is that as more processes I add, more time  
> it takes, what makes me think that mpi is performing this test 192  
> times in the end and due to this the time increased. Is that  
> correct that mpi performed the same test 192? Or did it divide the  
> process into 192 pieces, calculated and then gathered the results  
> and mounted the output again? I really would like to understand  
> this relationship processor # x process # x .
>
> I have the feeling that my questions are a bit "poor" and really  
> from a newbie, but the answer will help me on utilizing other  
> programs that will need mpi to run.
>
> Thanks to all!
>
> Ari - UFAM - Brazil
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080704/133634f7/attachment.htm>


More information about the mpich-discuss mailing list