[mpich-discuss] Problem running MPI code on multiple nodes

Chavez, Andres andres.chavez.53 at my.csun.edu
Fri Nov 18 15:59:11 CST 2011


My fortran code runs fine when restricted to one node, but when I try to
run on multiple nodes, the following error occurs
*
*** when restricted to one host the code runs perfectly**
run line*
mpiexec -hosts n13,n02 -np 4 ./reg

*Error*
*Fatal error in PMPI_Gather: Other MPI error, error stack:
PMPI_Gather(863)..........: MPI_Gather(sbuf=0x12cc3e0, scount=5120,
MPI_DOUBLE_COMPLEX, rbuf=(nil), rcount=5120, MPI_DOUBLE_COMPLEX, root=0,
MPI_COMM_WORLD) failed
MPIR_Gather_impl(693).....:
MPIR_Gather(655)..........:
MPIR_Gather_intra(283)....:
MPIC_Send(63).............:
MPIDI_EagerContigSend(186): failure occurred while attempting to send an
eager message
MPIDI_CH3_iStartMsgv(44)..: Communication error with rank 2*


These are all the instances of MPI_GATHER

*call
MPI_GATHER(xi_dot_matrix_transp,na*n_elements*nsd/numtasks,MPI_DOUBLE_COMPLEX,xi_dot_matrix_gath,&
     na*n_elements*nsd/numtasks,MPI_DOUBLE_COMPLEX,0,MPI_COMM_WORLD,ierr)
call
MPI_GATHER(Matrix_A_hat_3d_transp,5*na*size_matrix*nsd/numtasks,MPI_DOUBLE_COMPLEX,&

Matrix_A_hat_3d_gath,5*na*size_matrix*nsd/numtasks,MPI_DOUBLE_COMPLEX,0,MPI_COMM_WORLD,ierr)
call
MPI_GATHER(JR_matrix_transp,5*na*size_matrix*nsd/numtasks,MPI_INTEGER,JR_matrix_gath,&
     5*na*size_matrix*nsd/numtasks,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
call
MPI_GATHER(JC_matrix_transp,5*na*size_matrix*nsd/numtasks,MPI_INTEGER,JC_matrix_gath,&
     5*na*size_matrix*nsd/numtasks,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)*

Any help is greatly appreciated.

Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111118/3d465b80/attachment.htm>


More information about the mpich-discuss mailing list