[MPICH] Failure of MPI_Recv related to timing - need help

Glaser, Amnon aglaser at efw.com
Wed Nov 16 11:32:03 CST 2005


Hello All

I have a problem with MPICH2 implementation that seems to be highly 
sensitive to timing (!)

In case I use many processes, each communicating less, the program works
fine.
 
When I increase the amount of work and communications per worker by 
lowering the number of workers or by turning the clog2 logging on,
I start having frequent failures of the MPI_Recv service.

Am I doing something wrong or its a known problem ?

//------------{ usage sample }-----------
MPI_Status msgStatus ;

MyAnswerClass answer ;

MPI_Recv((void*)&answer,sizeof(MyAnswerClass),MPI_CHAR,MPI_ANY_SOURCE,MP
I_ANY_TAG,MPI_COMM_WORLD,&msgStatus);

if (msgStatus.MPI_ERROR == MPI_SUCCESS)
{
	// Process incoming message according to its TAG
}
else
{
	char errorStr[MPI_MAX_ERROR_STRING];
	int errorStrLen ;
	MPI_Error_string(msgStatus.MPI_ERROR,errorStr,&errorStrLen);
	printf("Rank: %d, Error code (%d) while receiving message:
%s\n",m_myRank,msgStatus.MPI_ERROR,errorStr);
	fflush(stdout);
}
//---------{ end of usage sample }---------


// The resulting error print out
Rank: 1, Error code (-858993460) while receiving message: Undefined
dynamic error code

The value -858993460 is 0xCCCCCCCC - the value of uninitialized memory
location in debug version
In release version the value I get is -1.

// Additional information:
// I link the release version with "mpi.lib cxx.lib" and /MD
// I link the release version with "mpi.lib cxxd.lib" and /MDd

To sum up: the problem appears or goes away just by changing the number
of Processes 
Or by turning on/off the logging without changing anything in the
executable (!)

Any Ideas ?

Thank you for your time
Amnon


_______________________________________________________________________

This e-mail message has been sent by EFW Inc. and is for the use
of the intended recipients only. The message may contain privileged
or confidential information. If you are not the intended recipient
you are hereby notified that any use, distribution or copying of
this communication is strictly prohibited, and you are requested to
delete the e-mail and any attachments and notify the sender immediately.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20051116/08436570/attachment.htm>


More information about the mpich-discuss mailing list