HI Pavan,<br><br>Thanks for your quick reply.<br><br>I'm using MPICH2-V1.3.1<br><br>The 1Gb is MPI messages, about the same is written to stdout.<br><br>My PC for compile and (static) link was at 1.2.1 and I was running on a server with 1.3.1 when I first noticed this. I then upgraded to 1.3.1 but it made no difference.<br>
<br>stdout is redirected to a file.<br><br>When would mpiexec and hydra_pmi_proxy use CPU? is it just related to stdout?<br><br>I'm at GMT+8 and it's just before midnight and I can't think straight at the moment so I'm calling it quits for the night.<br>
<br>I'll try the nightlies tomorrow.<br><br>Cheers, Colin<br><br><div class="gmail_quote">On Thu, Jan 20, 2011 at 11:32 PM, Pavan Balaji <span dir="ltr"><<a href="mailto:balaji@mcs.anl.gov">balaji@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im"><br>
On 01/20/2011 09:07 AM, Colin Hercus wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
I have a fairly simple MPICH2 job where master process reads a<br>
transaction files and send them to the slaves for processing. It also<br>
collects output from the slaves and writes it to stdout.<br>
<br>
The other day I noticed that the whole process seems to slow to a crawl<br>
with hydra_pmi_proxy using 100% CPU, mpiexec popping up on top at<br>
50-100% CPU and my slaves and master recording almost no CPU time.<br>
Sometimes this clears itself and the process speeds up again, sometimes not.<br>
</blockquote>
<br></div>
This is surprising. I tried out trunk, and both mpiexec and hydra_pmi_proxy take nearly 0% CPU for most tests. I did some tests where 16 MPI processes blast out 100MB of stdout in a matter of 5 seconds -- in this case, mpiexec did take around 50% CPU for about a second in between, but that was probably because it was trying to dump out all of the stdout (to a file in my case).<div class="im">
<br>
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
I tried ch3:nem and ch3:sock with no difference<br>
</blockquote>
<br></div>
Right, this should not have anything to do with what channel you use.<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Problem doesn't occur until the program has been running for about an<br>
hour. In that hour there would be about 1Gbyte of messages. It then<br>
occurs intermittently during the remainder of the run.<br>
</blockquote>
<br></div>
I'm assuming these messages are MPI messages, not stdout/stderr, correct? If yes, then that doesn't matter for "mpiexec" or "hydra_pmi_proxy".<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
So I'm stuck, not sure what to try next and would appreciate any<br>
suggestions or help.<br>
</blockquote>
<br></div>
I'm puzzled as well. For starters, which version of MPICH2 are you using? Can you try upgrading to the latest version, so we are not chasing something that got fixed.<br>
<br>
If you'd like to try out the very bleeding edge, the nightly snapshots of "mpiexec" and "hydra_pmi_proxy" can be downloaded from here: <a href="http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra" target="_blank">http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra</a><br>
<br>
You can just replace your existing mpiexec and hydra_pmi_proxy with these.<br>
<br>
-- Pavan<br><font color="#888888">
<br>
-- <br>
Pavan Balaji<br>
<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>
</font></blockquote></div><br>