[mpich-discuss] Problem running MPI code on multiple nodes

Pavan Balaji balaji at mcs.anl.gov
Mon Nov 28 22:40:17 CST 2011


Please keep mpich-discuss cc'ed.

My guess is that there's a firewall setup on one or both of the machines.

  -- Pavan

On 11/29/2011 05:06 AM, Chavez, Andres wrote:
> What are some causes for this? Do you think it is because I have to
> enter a password whenever I move from one node to another?
>
> Thank you
>
> On Wed, Nov 23, 2011 at 12:06 AM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
>
>
>     n13 and n02 are not able to communicate with each other.
>
>       -- Pavan
>
>
>     On 11/18/2011 03:59 PM, Chavez, Andres wrote:
>
>         My fortran code runs fine when restricted to one node, but when
>         I try to
>         run on multiple nodes, the following error occurs
>         *
>         */* when restricted to one host the code runs perfectly/*
>         run line*
>
>         mpiexec -hosts n13,n02 -np 4 ./reg
>
>         *Error*
>         /Fatal error in PMPI_Gather: Other MPI error, error stack:
>
>         PMPI_Gather(863)..........: MPI_Gather(sbuf=0x12cc3e0, scount=5120,
>         MPI_DOUBLE_COMPLEX, rbuf=(nil), rcount=5120, MPI_DOUBLE_COMPLEX,
>         root=0,
>         MPI_COMM_WORLD) failed
>         MPIR_Gather_impl(693).....:
>         MPIR_Gather(655)..........:
>         MPIR_Gather_intra(283)....:
>         MPIC_Send(63).............:
>         MPIDI_EagerContigSend(186): failure occurred while attempting to
>         send an
>         eager message
>         MPIDI_CH3_iStartMsgv(44)..: Communication error with rank 2/
>
>
>
>         These are all the instances of MPI_GATHER
>
>         /call
>
>         MPI_GATHER(xi_dot_matrix___transp,na*n_elements*nsd/__numtasks,MPI_DOUBLE_COMPLEX,__xi_dot_matrix_gath,&
>
>           na*n_elements*nsd/numtasks,__MPI_DOUBLE_COMPLEX,0,MPI_COMM___WORLD,ierr)
>         call
>         MPI_GATHER(Matrix_A_hat_3d___transp,5*na*size_matrix*nsd/__numtasks,MPI_DOUBLE_COMPLEX,&
>
>         Matrix_A_hat_3d_gath,5*na*__size_matrix*nsd/numtasks,MPI___DOUBLE_COMPLEX,0,MPI_COMM___WORLD,ierr)
>         call
>         MPI_GATHER(JR_matrix_transp,5*__na*size_matrix*nsd/numtasks,__MPI_INTEGER,JR_matrix_gath,&
>
>           5*na*size_matrix*nsd/numtasks,__MPI_INTEGER,0,MPI_COMM_WORLD,__ierr)
>         call
>         MPI_GATHER(JC_matrix_transp,5*__na*size_matrix*nsd/numtasks,__MPI_INTEGER,JC_matrix_gath,&
>
>           5*na*size_matrix*nsd/numtasks,__MPI_INTEGER,0,MPI_COMM_WORLD,__ierr)/
>
>
>         Any help is greatly appreciated.
>
>         Thank you
>
>
>         _________________________________________________
>         mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>         <mailto:mpich-discuss at mcs.anl.gov>
>         To manage subscription options or unsubscribe:
>         https://lists.mcs.anl.gov/__mailman/listinfo/mpich-discuss
>         <https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>
>
>     --
>     Pavan Balaji
>     http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list