[mpich-discuss] "unexpected messages" Question

Hiatt, Dave M dave.m.hiatt at citi.com
Thu Jan 7 11:27:44 CST 2010


I'm following up on an earlier question.  I'm auditing the number of Bcast and Sends I do versus an exception message that is thrown during processing.  The message is saying "261894 unexpected messages queued".  This number is dramatically different that what appears to be the counts of messages the app is sending (I'm counting a Bcast as 1 message). and counting messages being received and sent between node 0 and the compute nodes.  This cluster has 496 total nodes.  When I run on a 60 node cluster I never see any hit of a problem like this.  And the network utilization does not indicate some kind of large congestion, but clearly something is happening.  So I'm assuming it's my app.  To that end a few questions if I might ask:

First question - Is a BCast considered 1 message or will it be N messages where N is the number of active nodes in terms of this kind of count?
Second question - What constitutes an "unexpected message"?  I am assuming any Send or BCast is expected.  Am I confused on this nomenclature?
Third question  - I've assumed that the message count being stated in this queue translated directly to the number of calls to MPI::Send and MPI::Bcast calls I make.

I have not been able so far to duplicate this problem on my test clusters (albeit they are much smaller, typically 60 nodes).  And I have no indication of being able to create some kind of "message storm" as it were in some kind of race condition.

Thanks
dave

"Consequences, Schmonsequences, as long as I'm rich". - Daffy Duck
Dave Hiatt
Market Risk Systems Integration
CitiMortgage, Inc.
1000 Technology Dr.
Third Floor East, M.S. 55
O'Fallon, MO 63368-2240

Phone:  636-261-1408
Mobile: 314-452-9165
FAX:    636-261-1312
Email:     Dave.M.Hiatt at citigroup.com






More information about the mpich-discuss mailing list