[MPICH] Problems with MPI_Iprobe, MPI_Recv

Rajeev Thakur thakur at mcs.anl.gov
Tue Sep 25 08:14:12 CDT 2007


There is probably a bug in your program somewhere. If you can send us a
small version of the program that fails, we can look at it.

Rajeev

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-
> discuss at mcs.anl.gov] On Behalf Of Wenhao Xu
> Sent: Monday, September 24, 2007 10:07 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] Problems with MPI_Iprobe, MPI_Recv
> 
> Hi all,
> 
> In my program, I wrote such sentence:
> 	While(1){
> 	....
> 		MPI_Iprobe(MPI_ANY_SOURCE, MSG_FROM_WORKER, MPI_COMM_WORLD,
> &flag, &status);
>       if(flag){
>         MPI_Recv(&msg_from_worker, sizeof(msg_worker_completion_t),
> MPI_BYTE, status.MPI_SOURCE, MSG_FROM_WORKER, MPI_COMM_WORLD, &\
> status);
>       handle_completion_msg( &msg_from_worker, &computing_list,
> &idle_queue,
> &waiting_list, &ready_queue );
>       }else{
>         /* do somthing */
> 
>         MPI_Recv(&msg_from_worker, sizeof(msg_worker_completion_t),
> MPI_BYTE, MPI_ANY_SOURCE, MSG_FROM_WORKER, MPI_COMM_WORLD, &sta\
> tus);
>         handle_completion_msg( &msg_from_worker, &computing_list,
> &idle_queue, &waiting_list, &ready_queue );
> 
> 
>       }
>      }
> 
> I got the following message when I run with command: mpiexec -n 14 ./a.out
> 
> fatal error in MPI_Recv: Internal MPI error!, error stack:
> MPI_Recv(186).............................: MPI_Recv(buf=0xbff15eac,
> count=24, MPI_BYTE, src=MPI_ANY_SOURCE, tag=36, MPI_COMM_WORLD,
> status=0xbff15ec8) failed
> MPIDI_CH3_Progress_wait(212)..............: an error occurred while
> handling
> an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(637):
> MPIDI_CH3_Sockconn_handle_conn_event(809).: [ch3:sock] received packet of
> unknown type (0)
> 
> But when I run the program with the command: mpiexec -n 9 ./a.out, I got
> the
> other messages:
> *** glibc detected *** ./checker: free(): invalid size: 0x09f7a8f8 ***
> ======= Backtrace: =========
> /lib/libc.so.6[0x4c630a68]
> /lib/libc.so.6[0x4c6317f5]
> /lib/libc.so.6(malloc+0x73)[0x4c6327f4]
> ./a.out[0x806d515]
> ./ a.out [0x806f15a]
> ./ a.out [0x80602e8]
> ./ a.out [0x8097b33]
> ./ a.out [0x8067498]
> ./ a.out [0x806a987]
> ./ a.out [0x805da09]
> ./ a.out [0x804cd56]
> ./ a.out [0x804d4b1]
> ./ a.out [0x804d86f]
> /lib/libc.so.6(__libc_start_main+0xdc)[0x4c5e24e4]
> 
> Finally, when I run with the programe wit the command: mpiexec -n 7
> ./a.out,
> the program run normally and no error occurred.
> 
> Why did I get different error with different processes running??
> 
> Thanks in advance!
> 
> Best,
> Peter





More information about the mpich-discuss mailing list