[mpich-discuss] Question about MPI - Message Truncation Problem in MPI_Recv

Rajeev Thakur thakur at mcs.anl.gov
Mon Dec 19 11:18:02 CST 2011


Since you are using threads, it could be a race condition. Can you make sure the ecount and status variables are local to a thread (not shared with other threads).

Rajeev 

On Dec 19, 2011, at 12:49 AM, Ayaz ul Hassan Khan wrote:

> I am having problems in one my project related to MPI development. I am working on the implementation of an RNA parsing algorithm using MPI in which I started the parsing of an input string based on some parsing rules and parsing table (contains different states and related actions) with a master node. In parsing table, there are multiple actions for each state which can be done in parallel. So, I have to distribute these actions among different processes. To do that, I am sending the current state and parsing info (current stack of parsing) to the nodes using separate thread to receive actions from other nodes while the main thread is busy in parsing based on received actions. Following are the code snippets of the sender and receiver:
>  
> Sender Code:
> StackFlush(&snd_stack);
> StackPush(&snd_stack, state_index);
> StackPush(&snd_stack, current_ch);
> StackPush(&snd_stack, actions_to_skip);
> elements_in_stack = stack.top + 1;
> for(int a=elements_in_stack-1;a>=0;a--)
>                 StackPush(&snd_stack, stack.contents[a]);
> StackPush(&snd_stack, elements_in_stack);
> elements_in_stack = parse_tree.top + 1;
> for(int a=elements_in_stack-1;a>=0;a--)
>                 StackPush(&snd_stack, parse_tree.contents[a]);
> StackPush(&snd_stack, elements_in_stack);
> elements_in_stack = snd_stack.top+1;
> MPI_Send(&elements_in_stack, 1, MPI_INT, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD);
> MPI_Send(&snd_stack.contents[0], elements_in_stack, MPI_CHAR, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK, MPI_COMM_WORLD);
>  
> Receiver Code:
> MPI_Recv(&e_count, 1, MPI_INT, MPI_ANY_SOURCE, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD, &status);
> if(e_count == 0){
>                 break;
> }
> while((bt_stack.top + e_count) >= bt_stack.maxSize - 1){usleep(500);}
> pthread_mutex_lock(&mutex_bt_stack); //using mutex for accessing shared data among threads
> MPI_Recv(&bt_stack.contents[bt_stack.top + 1], e_count, MPI_CHAR, status.MPI_SOURCE, MSG_ACTION_STACK, MPI_COMM_WORLD, &status);
> bt_stack.top += e_count;
> pthread_mutex_unlock(&mutex_bt_stack);
>  
> The program is running fine for small input having less communications but as we increase the input size which in response increases the communication so the receiver receives many requests while processing few then it get crashed with the following errors:
> Fatal error in MPI_Recv: Message truncated, error stack:
> MPI_Recv(186) ……………………………………: MPI_Recv(buf=0x5b8d7b1, count=19, MPI_CHAR, src=3, tag=1, MPI_COMM_WORLD, status=0x41732100) failed
> MPIDI_CH3U_Request_unpack_uebuf(625)L Message truncated; 21 bytes received but buffer size is 19
> Rank 0 in job 73 hpc081_56549 caused collective abort of all ranks exit status of rank 0: killed by signal 9.
>  
> I have also tried this by using Non-Blocking MPI calls but still the similar errors.
>  
>  
> Ayaz ul Hassan Khan
> Lecturer-B (PhD Student), Information and Computer Sciences
> King Fahd University of Petroleum & Minerals
> Dhahran 31261, Kingdom of Saudi Arabia
>  
>  
>  
>  
>   Save a tree. Don't print this e-mail unless it's really necessary
>  
>  
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list