[mpich-discuss] hydra_pmi_proxy using 100% CPU

Colin Hercus colin at novocraft.com
Thu Jan 20 09:51:48 CST 2011


HI Pavan,

Thanks for your quick reply.

I'm using MPICH2-V1.3.1

The 1Gb is MPI messages, about the same is written to stdout.

My PC for compile and (static) link was at 1.2.1 and I was running on a
server with 1.3.1 when I first noticed this. I then upgraded to 1.3.1 but it
made no difference.

stdout is redirected to a file.

When would mpiexec and hydra_pmi_proxy use CPU? is it just related to
stdout?

I'm at GMT+8 and it's just before midnight and I can't think straight at the
moment so I'm calling it quits for the night.

I'll try the nightlies tomorrow.

Cheers, Colin

On Thu, Jan 20, 2011 at 11:32 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:

>
> On 01/20/2011 09:07 AM, Colin Hercus wrote:
>
>> I have a fairly simple MPICH2 job where master process reads a
>> transaction files and send them to the slaves for processing. It also
>> collects output from the slaves and writes it to stdout.
>>
>> The other day I noticed that the whole process seems to slow to a crawl
>> with hydra_pmi_proxy using 100% CPU, mpiexec popping up on top at
>> 50-100% CPU and my slaves and master recording almost no CPU time.
>> Sometimes this clears itself and the process speeds up again, sometimes
>> not.
>>
>
> This is surprising. I tried out trunk, and both mpiexec and hydra_pmi_proxy
> take nearly 0% CPU for most tests. I did some tests where 16 MPI processes
> blast out 100MB of stdout in a matter of 5 seconds -- in this case, mpiexec
> did take around 50% CPU for about a second in between, but that was probably
> because it was trying to dump out all of the stdout (to a file in my case).
>
>
>  I tried ch3:nem and ch3:sock with no difference
>>
>
> Right, this should not have anything to do with what channel you use.
>
>
>  Problem doesn't occur until the program has been running for about an
>> hour. In that hour there would be about 1Gbyte of messages. It then
>> occurs intermittently during the remainder of the run.
>>
>
> I'm assuming these messages are MPI messages, not stdout/stderr, correct?
> If yes, then that doesn't matter for "mpiexec" or "hydra_pmi_proxy".
>
>
>  So I'm stuck, not sure what to try next and would appreciate any
>> suggestions or help.
>>
>
> I'm puzzled as well. For starters, which version of MPICH2 are you using?
> Can you try upgrading to the latest version, so we are not chasing something
> that got fixed.
>
> If you'd like to try out the very bleeding edge, the nightly snapshots of
> "mpiexec" and "hydra_pmi_proxy" can be downloaded from here:
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra
>
> You can just replace your existing mpiexec and hydra_pmi_proxy with these.
>
>  -- Pavan
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110120/71ff4343/attachment-0001.htm>


More information about the mpich-discuss mailing list