[mpich-discuss] Problem running MPI code on multiple nodes
Pavan Balaji
balaji at mcs.anl.gov
Mon Nov 28 22:40:17 CST 2011
Please keep mpich-discuss cc'ed.
My guess is that there's a firewall setup on one or both of the machines.
-- Pavan
On 11/29/2011 05:06 AM, Chavez, Andres wrote:
> What are some causes for this? Do you think it is because I have to
> enter a password whenever I move from one node to another?
>
> Thank you
>
> On Wed, Nov 23, 2011 at 12:06 AM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
>
>
> n13 and n02 are not able to communicate with each other.
>
> -- Pavan
>
>
> On 11/18/2011 03:59 PM, Chavez, Andres wrote:
>
> My fortran code runs fine when restricted to one node, but when
> I try to
> run on multiple nodes, the following error occurs
> *
> */* when restricted to one host the code runs perfectly/*
> run line*
>
> mpiexec -hosts n13,n02 -np 4 ./reg
>
> *Error*
> /Fatal error in PMPI_Gather: Other MPI error, error stack:
>
> PMPI_Gather(863)..........: MPI_Gather(sbuf=0x12cc3e0, scount=5120,
> MPI_DOUBLE_COMPLEX, rbuf=(nil), rcount=5120, MPI_DOUBLE_COMPLEX,
> root=0,
> MPI_COMM_WORLD) failed
> MPIR_Gather_impl(693).....:
> MPIR_Gather(655)..........:
> MPIR_Gather_intra(283)....:
> MPIC_Send(63).............:
> MPIDI_EagerContigSend(186): failure occurred while attempting to
> send an
> eager message
> MPIDI_CH3_iStartMsgv(44)..: Communication error with rank 2/
>
>
>
> These are all the instances of MPI_GATHER
>
> /call
>
> MPI_GATHER(xi_dot_matrix___transp,na*n_elements*nsd/__numtasks,MPI_DOUBLE_COMPLEX,__xi_dot_matrix_gath,&
>
> na*n_elements*nsd/numtasks,__MPI_DOUBLE_COMPLEX,0,MPI_COMM___WORLD,ierr)
> call
> MPI_GATHER(Matrix_A_hat_3d___transp,5*na*size_matrix*nsd/__numtasks,MPI_DOUBLE_COMPLEX,&
>
> Matrix_A_hat_3d_gath,5*na*__size_matrix*nsd/numtasks,MPI___DOUBLE_COMPLEX,0,MPI_COMM___WORLD,ierr)
> call
> MPI_GATHER(JR_matrix_transp,5*__na*size_matrix*nsd/numtasks,__MPI_INTEGER,JR_matrix_gath,&
>
> 5*na*size_matrix*nsd/numtasks,__MPI_INTEGER,0,MPI_COMM_WORLD,__ierr)
> call
> MPI_GATHER(JC_matrix_transp,5*__na*size_matrix*nsd/numtasks,__MPI_INTEGER,JC_matrix_gath,&
>
> 5*na*size_matrix*nsd/numtasks,__MPI_INTEGER,0,MPI_COMM_WORLD,__ierr)/
>
>
> Any help is greatly appreciated.
>
> Thank you
>
>
> _________________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> <mailto:mpich-discuss at mcs.anl.gov>
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/__mailman/listinfo/mpich-discuss
> <https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list