[mpich-discuss] hydra_pmi_proxy using 100% CPU

Colin Hercus colin at novocraft.com
Thu Jan 20 09:07:03 CST 2011


Hi,

This is my first post here and I apologise if this has been discussed before
but I keep getting timeouts while trying to access the archives.

I have a fairly simple MPICH2 job where master process reads a transaction
files and send them to the slaves for processing. It also collects output
from the slaves and writes it to stdout.

The other day I noticed that the whole process seems to slow to a crawl with
hydra_pmi_proxy using 100% CPU, mpiexec popping up on top at 50-100% CPU and
my slaves and master recording almost no CPU time. Sometimes this clears
itself and the process speeds up again, sometimes not.

I figured this is probably my problem so I spent all week trying to fix it
(but I don't know why it happens so I'm not really getting anywhere.

I tried ch3:nem and ch3:sock with no difference

My slave processes are multi-threaded but don't do any MPI calls from the
threads, and I can run with just a single thread.

My standard run on 4 servers is mpiexec -np 5 so I have one master task with
a single thread and 4 slaves each multithreaded to the number of CPUs on the
server. This is a little over committed but master usually uses about 1% CPU
so it's not much.

I then tried

mpiexec -np 49  with single threaded slaves (I have 48 cores)
mpiexec -np 48
mpiexec -np 44
mpiexec -np 32
mpiexec -np 24

all these exhibit the same problem.

My slaves are very simple

while() {
     MPI_Probe( 0, MPI_ANY_TAG,
     MPI_Get_Count...
     if(tag == EOF) break;
     if(not first) MPI_SendSz(...)
     MPI_Recv()
}
MPI_SendSz(...)

The master process is a tad more complicated and I've rewritten it several
times in attempts to fix it.

Original

for each slave task
     MPI_Issend( first message)

while(..) {
    MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG...   for receiving messages
    if(flag) {
         MPI_Recv
         fwrite(stdout, buffer)
    }
    MPI_TestAny (Issend Requests)
    if(flag)
          MPI_Issend(next buffer)
}

This loop was using Probe Any Source Any Tag and I read somewhere this could
be a problem so I rewrote the above loop
to use MPI_Irecv. In the first attempt I used two MPI_request arrays, one
for sends and one for receives and the main loop had to TestAnys.

I then changed to a single array of requests where Reads were first part of
the array and sends the second. This allowed me to use WaitAny so now my
master code looks like:

for each slave task {
     MPI_Issend( first message)
     MPI_Irecv()
}

while(..) {
    MPI_WaitAny(....)
    if(idx != MPI_UNDEFINED) {
         if(idx < num_slaves) {
             fwrite(stdout, buffer)
             MPI_Irecv
        } else
             MPI_Issend(next buffer)
     }
}

These changes haven't made the slightest difference.

A few more details:

Message buffer is 16Kbytes ( I've also tried 200byte, 2K &4K buffers)

Problem doesn't occur until the program has been running for about an hour.
In that hour there would be about 1Gbyte of messages. It then occurs
intermittently during the remainder of the run.

Running exactly same settings and data again and the slow down will occur at
a different time in the run.

Servers are on a 1G Ethernet LAN

Disk files are on same server as the master so no other network traffic.

I have exclusive use of servers for this test, only top and some other job
monitoring that uses minimal CPU is running.

So I'm stuck, not sure what to try next and would appreciate any suggestions
or help.

Thanks, Colin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110120/9cdd4ad0/attachment.htm>


More information about the mpich-discuss mailing list