[mpich-discuss] confusing range of cpu usage for mpi job
Rajeev Thakur
thakur at mcs.anl.gov
Mon Sep 21 14:44:55 CDT 2009
Try using the Hydra process manager that supports process-to-core binding.
Try
mpiexec.hydra -binding rr -f hostfile -n 8 a.out
or
mpiexec.hydra -binding pack -f hostfile -n 8 a.out
See http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding
Rajeev
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Iain Hannah
> Sent: Monday, September 21, 2009 11:54 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] confusing range of cpu usage for mpi job
>
> I'm a mpi newbie so please forgive my ignorance/stupid question.
>
> I'm running some absoft complied fortran f90 simulation code on a
> cluster and getting some strange performance issues. At the
> moment I'm
> limiting my runs to a single machine in the cluster,
> containing 4 quad
> core opterons (8378) so 16 cores.
>
> The simulation is a grid in velocity and space with evolves
> in time and
> so is relatively easy to split the grid across several cpus with mpi.
> But when I run the code on multiple cores I don't get 100% on
> each one.
> I don't even get an equal % of use across each but a fairly linear
> spread of usage (info via top).
>
> i.e.
> mpiexec -n 4 ./code gives 90%, 60%, 50%, 40% (90sec to reach t_test)
> mpiexec -n 8 ./code gives 70% through to 30% (61 sec " ")
> mpiexec -n 16 ./code gives 60% through to 15% (53 sec " ")
>
> I wouldn't expect the code to run 2x or 4x faster going from
> -n4 to -n8
> or n-16 but I'm getting such a small increase.
>
> If this was purely latency between cores then surely they
> would all give
> the same % of usage? I though mpi was as fast as the slowest
> processor?
> The simulation is solving the same equations over and equal
> size part of
> the grid per cpuso I don't udnerstand why there is such a
> range of cpu
> useage.
>
> So is this normal or have I configured mpich2 wrongly or
> running it wrongly?
>
> Cheers
> Iain
>
More information about the mpich-discuss
mailing list