[mpich-discuss] confusing range of cpu usage for mpi job

Rajeev Thakur thakur at mcs.anl.gov
Mon Sep 21 14:44:55 CDT 2009


Try using the Hydra process manager that supports process-to-core binding.
Try
mpiexec.hydra -binding rr -f hostfile -n 8 a.out
or
mpiexec.hydra -binding pack -f hostfile -n 8 a.out

See http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding

Rajeev

> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Iain Hannah
> Sent: Monday, September 21, 2009 11:54 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] confusing range of cpu usage for mpi job
> 
> I'm a mpi newbie so please forgive my ignorance/stupid question.
> 
> I'm running some absoft complied fortran f90 simulation code on a 
> cluster and getting some strange performance issues. At the 
> moment I'm 
> limiting my runs to a single machine in the cluster, 
> containing 4 quad 
> core opterons (8378) so 16 cores.
> 
> The simulation is a grid in velocity and space with evolves 
> in time and 
> so is relatively easy to split the grid across several cpus with mpi. 
> But when I run the code on multiple cores I don't get 100% on 
> each one. 
> I don't even get an equal % of use across each but a fairly linear 
> spread of usage (info via top).
> 
> i.e.
> mpiexec -n 4 ./code gives 90%, 60%, 50%, 40% (90sec to reach t_test)
> mpiexec -n 8 ./code gives 70% through to 30% (61 sec "      ")
> mpiexec -n 16 ./code gives 60% through to 15% (53 sec " ")
> 
> I wouldn't expect the code to run 2x or 4x faster going from 
> -n4 to -n8 
> or n-16 but I'm getting such a small increase.
> 
> If this was purely latency between cores then surely they 
> would all give 
> the same % of usage? I though mpi was as fast as the slowest 
> processor? 
> The simulation is solving the same equations over and equal 
> size part of 
> the grid per cpuso I don't udnerstand why there is such a 
> range of cpu 
> useage.
> 
> So is this normal or have I configured mpich2 wrongly or 
> running it wrongly?
> 
> Cheers
> Iain
> 



More information about the mpich-discuss mailing list