[mpich-discuss] Problem running MPI code on multiple nodes

Pavan Balaji balaji at mcs.anl.gov
Mon Nov 28 22:43:35 CST 2011


Based on your other email to mpich-discuss, it looks like cpi works 
fine. So it's not a firewall issue.

Try some other small MPI benchmarks and see if they work fine (google 
will be able to find a ton of them for you).

  -- Pavan

On 11/29/2011 12:40 PM, Pavan Balaji wrote:
>
> Please keep mpich-discuss cc'ed.
>
> My guess is that there's a firewall setup on one or both of the machines.
>
>    -- Pavan
>
> On 11/29/2011 05:06 AM, Chavez, Andres wrote:
>> What are some causes for this? Do you think it is because I have to
>> enter a password whenever I move from one node to another?
>>
>> Thank you
>>
>> On Wed, Nov 23, 2011 at 12:06 AM, Pavan Balaji<balaji at mcs.anl.gov
>> <mailto:balaji at mcs.anl.gov>>  wrote:
>>
>>
>>      n13 and n02 are not able to communicate with each other.
>>
>>        -- Pavan
>>
>>
>>      On 11/18/2011 03:59 PM, Chavez, Andres wrote:
>>
>>          My fortran code runs fine when restricted to one node, but when
>>          I try to
>>          run on multiple nodes, the following error occurs
>>          *
>>          */* when restricted to one host the code runs perfectly/*
>>          run line*
>>
>>          mpiexec -hosts n13,n02 -np 4 ./reg
>>
>>          *Error*
>>          /Fatal error in PMPI_Gather: Other MPI error, error stack:
>>
>>          PMPI_Gather(863)..........: MPI_Gather(sbuf=0x12cc3e0, scount=5120,
>>          MPI_DOUBLE_COMPLEX, rbuf=(nil), rcount=5120, MPI_DOUBLE_COMPLEX,
>>          root=0,
>>          MPI_COMM_WORLD) failed
>>          MPIR_Gather_impl(693).....:
>>          MPIR_Gather(655)..........:
>>          MPIR_Gather_intra(283)....:
>>          MPIC_Send(63).............:
>>          MPIDI_EagerContigSend(186): failure occurred while attempting to
>>          send an
>>          eager message
>>          MPIDI_CH3_iStartMsgv(44)..: Communication error with rank 2/
>>
>>
>>
>>          These are all the instances of MPI_GATHER
>>
>>          /call
>>
>>          MPI_GATHER(xi_dot_matrix___transp,na*n_elements*nsd/__numtasks,MPI_DOUBLE_COMPLEX,__xi_dot_matrix_gath,&
>>
>>            na*n_elements*nsd/numtasks,__MPI_DOUBLE_COMPLEX,0,MPI_COMM___WORLD,ierr)
>>          call
>>          MPI_GATHER(Matrix_A_hat_3d___transp,5*na*size_matrix*nsd/__numtasks,MPI_DOUBLE_COMPLEX,&
>>
>>          Matrix_A_hat_3d_gath,5*na*__size_matrix*nsd/numtasks,MPI___DOUBLE_COMPLEX,0,MPI_COMM___WORLD,ierr)
>>          call
>>          MPI_GATHER(JR_matrix_transp,5*__na*size_matrix*nsd/numtasks,__MPI_INTEGER,JR_matrix_gath,&
>>
>>            5*na*size_matrix*nsd/numtasks,__MPI_INTEGER,0,MPI_COMM_WORLD,__ierr)
>>          call
>>          MPI_GATHER(JC_matrix_transp,5*__na*size_matrix*nsd/numtasks,__MPI_INTEGER,JC_matrix_gath,&
>>
>>            5*na*size_matrix*nsd/numtasks,__MPI_INTEGER,0,MPI_COMM_WORLD,__ierr)/
>>
>>
>>          Any help is greatly appreciated.
>>
>>          Thank you
>>
>>
>>          _________________________________________________
>>          mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>          <mailto:mpich-discuss at mcs.anl.gov>
>>          To manage subscription options or unsubscribe:
>>          https://lists.mcs.anl.gov/__mailman/listinfo/mpich-discuss
>>          <https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>>
>>
>>      --
>>      Pavan Balaji
>>      http://www.mcs.anl.gov/~balaji<http://www.mcs.anl.gov/%7Ebalaji>
>>
>>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list