[mpich-discuss] hydra_pmi_proxy using 100% CPU

Pavan Balaji balaji at mcs.anl.gov
Thu Jan 20 09:32:08 CST 2011


On 01/20/2011 09:07 AM, Colin Hercus wrote:
> I have a fairly simple MPICH2 job where master process reads a
> transaction files and send them to the slaves for processing. It also
> collects output from the slaves and writes it to stdout.
>
> The other day I noticed that the whole process seems to slow to a crawl
> with hydra_pmi_proxy using 100% CPU, mpiexec popping up on top at
> 50-100% CPU and my slaves and master recording almost no CPU time.
> Sometimes this clears itself and the process speeds up again, sometimes not.

This is surprising. I tried out trunk, and both mpiexec and 
hydra_pmi_proxy take nearly 0% CPU for most tests. I did some tests 
where 16 MPI processes blast out 100MB of stdout in a matter of 5 
seconds -- in this case, mpiexec did take around 50% CPU for about a 
second in between, but that was probably because it was trying to dump 
out all of the stdout (to a file in my case).

> I tried ch3:nem and ch3:sock with no difference

Right, this should not have anything to do with what channel you use.

> Problem doesn't occur until the program has been running for about an
> hour. In that hour there would be about 1Gbyte of messages. It then
> occurs intermittently during the remainder of the run.

I'm assuming these messages are MPI messages, not stdout/stderr, 
correct? If yes, then that doesn't matter for "mpiexec" or 
"hydra_pmi_proxy".

> So I'm stuck, not sure what to try next and would appreciate any
> suggestions or help.

I'm puzzled as well. For starters, which version of MPICH2 are you 
using? Can you try upgrading to the latest version, so we are not 
chasing something that got fixed.

If you'd like to try out the very bleeding edge, the nightly snapshots 
of "mpiexec" and "hydra_pmi_proxy" can be downloaded from here: 
http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra

You can just replace your existing mpiexec and hydra_pmi_proxy with these.

  -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list