[mpich-discuss] Question about MPI - Message Truncation Problem in MPI_Recv

Rajeev Thakur thakur at mcs.anl.gov
Sun Dec 25 00:40:14 CST 2011


Try adding printfs after the first MPI_Send and first MPI_Recv to print elements_in_stack, ecount, and status.MPI_SOURCE to see why the second receive gets 21 bytes instead of the 19 it is expecting.

Rajeev


On Dec 25, 2011, at 12:34 AM, Ayaz ul Hassan Khan wrote:

> Yes, I double check these.
> ecount and status variables are local to the thread (not shared).
> 
> Any other idea?
> 
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev Thakur
> Sent: Monday, December 19, 2011 8:18 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Question about MPI - Message Truncation Problem in MPI_Recv
> 
> Since you are using threads, it could be a race condition. Can you make sure the ecount and status variables are local to a thread (not shared with other threads).
> 
> Rajeev 
> 
> On Dec 19, 2011, at 12:49 AM, Ayaz ul Hassan Khan wrote:
> 
>> I am having problems in one my project related to MPI development. I am working on the implementation of an RNA parsing algorithm using MPI in which I started the parsing of an input string based on some parsing rules and parsing table (contains different states and related actions) with a master node. In parsing table, there are multiple actions for each state which can be done in parallel. So, I have to distribute these actions among different processes. To do that, I am sending the current state and parsing info (current stack of parsing) to the nodes using separate thread to receive actions from other nodes while the main thread is busy in parsing based on received actions. Following are the code snippets of the sender and receiver:
>> 
>> Sender Code:
>> StackFlush(&snd_stack);
>> StackPush(&snd_stack, state_index);
>> StackPush(&snd_stack, current_ch);
>> StackPush(&snd_stack, actions_to_skip);
>> elements_in_stack = stack.top + 1;
>> for(int a=elements_in_stack-1;a>=0;a--)
>>                StackPush(&snd_stack, stack.contents[a]);
>> StackPush(&snd_stack, elements_in_stack);
>> elements_in_stack = parse_tree.top + 1;
>> for(int a=elements_in_stack-1;a>=0;a--)
>>                StackPush(&snd_stack, parse_tree.contents[a]);
>> StackPush(&snd_stack, elements_in_stack);
>> elements_in_stack = snd_stack.top+1;
>> MPI_Send(&elements_in_stack, 1, MPI_INT, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD);
>> MPI_Send(&snd_stack.contents[0], elements_in_stack, MPI_CHAR, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK, MPI_COMM_WORLD);
>> 
>> Receiver Code:
>> MPI_Recv(&e_count, 1, MPI_INT, MPI_ANY_SOURCE, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD, &status);
>> if(e_count == 0){
>>                break;
>> }
>> while((bt_stack.top + e_count) >= bt_stack.maxSize - 1){usleep(500);}
>> pthread_mutex_lock(&mutex_bt_stack); //using mutex for accessing shared data among threads
>> MPI_Recv(&bt_stack.contents[bt_stack.top + 1], e_count, MPI_CHAR, status.MPI_SOURCE, MSG_ACTION_STACK, MPI_COMM_WORLD, &status);
>> bt_stack.top += e_count;
>> pthread_mutex_unlock(&mutex_bt_stack);
>> 
>> The program is running fine for small input having less communications but as we increase the input size which in response increases the communication so the receiver receives many requests while processing few then it get crashed with the following errors:
>> Fatal error in MPI_Recv: Message truncated, error stack:
>> MPI_Recv(186) ..........................................: MPI_Recv(buf=0x5b8d7b1, count=19, MPI_CHAR, src=3, tag=1, MPI_COMM_WORLD, status=0x41732100) failed
>> MPIDI_CH3U_Request_unpack_uebuf(625)L Message truncated; 21 bytes received but buffer size is 19
>> Rank 0 in job 73 hpc081_56549 caused collective abort of all ranks exit status of rank 0: killed by signal 9.
>> 
>> I have also tried this by using Non-Blocking MPI calls but still the similar errors.
>> 
>> 
>> Ayaz ul Hassan Khan
>> Lecturer-B (PhD Student), Information and Computer Sciences
>> King Fahd University of Petroleum & Minerals
>> Dhahran 31261, Kingdom of Saudi Arabia
>> 
>> 
>> 
>> 
>>  Save a tree. Don't print this e-mail unless it's really necessary
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> Save a tree. Don't print this e-mail unless it's really necessary
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list