[mpich-discuss] MPI_Recv: MPI + Pthreads

Olya Krachina okrachin at purdue.edu
Tue Apr 22 21:00:37 CDT 2008


Hello list,

i am kind of new to MPI and Pthreads.... and trying to understand MPI_Recv.
I am trying to write a parallel matrix multiply (square power of two dimension: 
AxB = C). I am using 3 machines, 8-core each. So, my program should create MPI 
processes (-np option with mpiexec) and each MPI process in turn should 
generate 8 threads on the "home" machine. 

I have a regular MPI version running fine for up to 32 MPI processes, and my 1-
MPI_process + pthreads runs fine, and regular 8-thread version working fine, 
but once i increase number of MPI processes to 2 with 8 threads (i.e. matrices 
are 16x16) i get:

$rank 1 in job 147 .... caused collective abord of all ranks, killed by signal 9

My algorithm is very basic:
root mpi_broadcasts B and mpi_sends strip of rows of A, non-roots mpi_receive A 
and broadcast/receive B;
then all perform threaded computation of part of A and B place it in C and send 
it back to the root;
i do not use any mutexes with pthreads, since writing to resulting C is totally 
independent.

and somehow i get abort signal on receive in root.... another thing is that 
root always finishes first, while non-root doesnt get to collect its threads 
before the abort. 

So, my question is: what is the timing involved with MPI_Recv... it seems to 
fail if there is no data to receive... i tried using MPI_Irecv and wait, but no 
change. Another thing, how can i find out what is the error status..... isnt it 
what is the last argument of Irecv?

thank you in advance,
Olya




More information about the mpich-discuss mailing list