[MPICH] exception hangs app

Tom Hilinski tom.hilinski at comcast.net
Thu Aug 31 07:48:16 CDT 2006


Followup to my example:
I discovered that in LogsMessages, moving the MPI code out of the 
constructor causes the example to run ok. The altered code looks like this:

class LogsMessages
{
  public:

    LogsMessages ()
    {
    }

    void Activate ()
    {
        cout << "LogsMessages: Probe/Recv" << endl;
        MPI::Status status;
        MPI::COMM_WORLD.Probe ( MPI_ANY_SOURCE, MPI_ANY_TAG, status );
        int const count = status.Get_count (MPI::CHAR);
        char * buffer = new char [count + 1];
        MPI::COMM_WORLD.Recv (
                buffer, count, MPI::CHAR,
                status.Get_source(), status.Get_tag() );
        std::string msg (buffer, buffer + count);
        cout << "LogsMessages msg: " << msg << endl;
        delete [] buffer;
    }
};

class Task
{
  public:

    Task (int const rank)
    {
      if ( rank == 0 )
        ThrowException (rank);
      else if ( rank == 1 )
      {
        // following works
        LogsMessages msgs;
        msgs.Activate ();
        // following hangs:
        // LogsMessages msgs ();
      }
      else
        DoTask (rank);
    }


Any ideas on why the constructor must complete before the blocking 
messaging works?

BTW, same result on g++ 3.4.4 and 4.0.1, and cygwin and linux, same 
version of mpich.
...Tom




William Gropp wrote:
> Can you send a short program that illustrates the problem?  What you 
> describe should work.
>
> Bill
>
> On Aug 30, 2006, at 4:00 PM, Tom Hilinski wrote:
>
>> For mpi/c++ gurus: I have an app that has classes can throw an 
>> exception, specifically std::runtime_error. main() has a try-catch 
>> for these as well as MPI::Exception. When a process (rank == 0) 
>> throws an exception, the process seems to go off into space - I can't 
>> trace it in a debugger, and the exception is not caught by the catch 
>> (which immediately writes e.what() to cout).  I can kill it with 
>> mpdkilljob.
>>
>> I've reproduced this behavior on both suse linux 10.0/x86_64 and 
>> Win2K/cygwin. MPICH version is mpich2-1.0.4p1, which I've built with 
>> debugging enabled (--enable-debuginfo --enable-g=dbg,handle) and no 
>> threads.
>>
>> Any suggestions you have would be great.
>>
>
>

-- 


Tom Hilinski
-----------------------------------------------
e-mail    Personal: Tom.Hilinski at comcast.net
          CSU:      Tom.Hilinski at ColoState.edu
-----------------------------------------------




More information about the mpich-discuss mailing list