[MPICH] mpich2 "Internal MPI error!"

Thu Jun 16 10:03:00 CDT 2005

sage weil <sage at newdream.net> writes:

>>> I compiled with --enable-threads=multiple, and I'm doing some sends
>>> between threads on the same node.  Could that be part of the problem?
>>> I'm not sure how stable the thread support is...
>>
>> That could well be the problem. The multithreaded part has not been tested
>> extensively yet.
>
> Hmm.  That brings up a different question, then.  All of my MPI
> traffic is already isolated to a single thread, with this one
> exception:  I need a way to make the MPI thread respond to incoming
> messages AND local events (outgoing messages).  Right now I do that by
> having the MPI thread do a Recv to wait for incoming messages.  Other
> local threads that need to wake it up send a dummy message to self
> using Isend.

So that is where you are multithreaded, your MPI thread is doing a
MPI_Recv while other threads may be doing MPI_Send.  So this will only
work with MPI_THREAD_MULTIPLE compliant (don't know if that is the
right word) MPI implementations.  If you are already doing this, why
not have two MPI threads:

t1:  listens for incoming traffic via MPI_Waitall on recv's from all potential
     incoming ranks,  etc.
t2:  waits on a queue that your local threads enqueue to; whenever
     they enqueue a message, this thread will wake up and call
     MPI_Send; it should be able use blocking send, since your t1
     on your other host should be waiting in a Waitall state

This way, Waitall and Send will be executing concurrently, without the
need to send a dummy message.

This approach may be more or less clean, depending your perspective
(two threads versus 1 could make debugging harder).

The approach I take instead to mix MPI and threads is, since the world
is currently not an MPI_THREAD_MULTIPLE-compliant world, and I want my
code to be portable and run everywhere, I choose to have a single MPI
thread that handles both incoming and outgoing traffic.  It has an
array of incoming requests from all potential incoming ranks, and
array of outgoing requests created as needed (when local threads
enqueue something).  It loops over each array, and calls MPI_Test on
each request.  If it makes no progress, then to avoid hogging the CPU,
it sleeps with exponential increase (first sleep with no progress
0.0001 seconds, second sleep with no progress 0.0002 seconds, up to
0.001 seconds).  I don't see how else it should be done until
MPI_THREAD_MULTIPLE is ubiquitous.
-- 
Benjamin