[mpich-discuss] Problem running MPI code on multiple nodes
Pavan Balaji
balaji at mcs.anl.gov
Mon Nov 28 22:43:35 CST 2011
Based on your other email to mpich-discuss, it looks like cpi works
fine. So it's not a firewall issue.
Try some other small MPI benchmarks and see if they work fine (google
will be able to find a ton of them for you).
-- Pavan
On 11/29/2011 12:40 PM, Pavan Balaji wrote:
>
> Please keep mpich-discuss cc'ed.
>
> My guess is that there's a firewall setup on one or both of the machines.
>
> -- Pavan
>
> On 11/29/2011 05:06 AM, Chavez, Andres wrote:
>> What are some causes for this? Do you think it is because I have to
>> enter a password whenever I move from one node to another?
>>
>> Thank you
>>
>> On Wed, Nov 23, 2011 at 12:06 AM, Pavan Balaji<balaji at mcs.anl.gov
>> <mailto:balaji at mcs.anl.gov>> wrote:
>>
>>
>> n13 and n02 are not able to communicate with each other.
>>
>> -- Pavan
>>
>>
>> On 11/18/2011 03:59 PM, Chavez, Andres wrote:
>>
>> My fortran code runs fine when restricted to one node, but when
>> I try to
>> run on multiple nodes, the following error occurs
>> *
>> */* when restricted to one host the code runs perfectly/*
>> run line*
>>
>> mpiexec -hosts n13,n02 -np 4 ./reg
>>
>> *Error*
>> /Fatal error in PMPI_Gather: Other MPI error, error stack:
>>
>> PMPI_Gather(863)..........: MPI_Gather(sbuf=0x12cc3e0, scount=5120,
>> MPI_DOUBLE_COMPLEX, rbuf=(nil), rcount=5120, MPI_DOUBLE_COMPLEX,
>> root=0,
>> MPI_COMM_WORLD) failed
>> MPIR_Gather_impl(693).....:
>> MPIR_Gather(655)..........:
>> MPIR_Gather_intra(283)....:
>> MPIC_Send(63).............:
>> MPIDI_EagerContigSend(186): failure occurred while attempting to
>> send an
>> eager message
>> MPIDI_CH3_iStartMsgv(44)..: Communication error with rank 2/
>>
>>
>>
>> These are all the instances of MPI_GATHER
>>
>> /call
>>
>> MPI_GATHER(xi_dot_matrix___transp,na*n_elements*nsd/__numtasks,MPI_DOUBLE_COMPLEX,__xi_dot_matrix_gath,&
>>
>> na*n_elements*nsd/numtasks,__MPI_DOUBLE_COMPLEX,0,MPI_COMM___WORLD,ierr)
>> call
>> MPI_GATHER(Matrix_A_hat_3d___transp,5*na*size_matrix*nsd/__numtasks,MPI_DOUBLE_COMPLEX,&
>>
>> Matrix_A_hat_3d_gath,5*na*__size_matrix*nsd/numtasks,MPI___DOUBLE_COMPLEX,0,MPI_COMM___WORLD,ierr)
>> call
>> MPI_GATHER(JR_matrix_transp,5*__na*size_matrix*nsd/numtasks,__MPI_INTEGER,JR_matrix_gath,&
>>
>> 5*na*size_matrix*nsd/numtasks,__MPI_INTEGER,0,MPI_COMM_WORLD,__ierr)
>> call
>> MPI_GATHER(JC_matrix_transp,5*__na*size_matrix*nsd/numtasks,__MPI_INTEGER,JC_matrix_gath,&
>>
>> 5*na*size_matrix*nsd/numtasks,__MPI_INTEGER,0,MPI_COMM_WORLD,__ierr)/
>>
>>
>> Any help is greatly appreciated.
>>
>> Thank you
>>
>>
>> _________________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> <mailto:mpich-discuss at mcs.anl.gov>
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/__mailman/listinfo/mpich-discuss
>> <https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>>
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji<http://www.mcs.anl.gov/%7Ebalaji>
>>
>>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list