<DIV>I debug the error by mpich2-1.2.1p1 and mpd.</DIV>
<DIV>It is very odd just because some nodes set DNS address.</DIV>
<DIV> </DIV>
<DIV>When I unset DNS ,It works. Thank you for replying.</DIV>
<DIV> </DIV>
<DIV><includetail>
<DIV> </DIV>
<DIV> </DIV>
<DIV style="COLOR: #000">
<DIV style="PADDING-RIGHT: 0px; PADDING-LEFT: 0px; FONT-SIZE: 12px; PADDING-BOTTOM: 2px; PADDING-TOP: 2px; FONT-FAMILY: Arial Narrow">------------------ 原始邮件 ------------------</DIV>
<DIV style="PADDING-RIGHT: 8px; PADDING-LEFT: 8px; FONT-SIZE: 12px; BACKGROUND: #efefef; PADDING-BOTTOM: 8px; PADDING-TOP: 8px">
<DIV id=menu_sender><B>发件人:</B> "Pavan Balaji"<balaji@mcs.anl.gov>;</DIV>
<DIV><B>发送时间:</B> 2011年7月26日(星期二) 晚上9:35</DIV>
<DIV><B>收件人:</B> "mpich-discuss"<mpich-discuss@mcs.anl.gov>; <WBR></DIV>
<DIV><B>抄送:</B> "游手好闲"<66152764@qq.com>; <WBR></DIV>
<DIV><B>主题:</B> Re: [mpich-discuss] Fatal error in PMPI_Bcast: Other MPI error, errorstack:</DIV></DIV>
<DIV> </DIV><BR>Did you do the checks listed on this FAQ entry?<BR><BR>http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_My_MPI_program_aborts_with_an_error_saying_it_cannot_communicate_with_other_processes<BR><BR> -- Pavan<BR><BR>On 07/26/2011 01:55 AM, 游手好闲 wrote:<BR>> Hi ,<BR>> My hosts:<BR>> hksbs-s13.com:8<BR>> hksbs-s11.com:8<BR>> When i run in one node,it is ok.<BR>> [root@hksbs-s13 examples_collchk]# mpiexec -f hosts -n 8 ./time_bcast_nochk<BR>> time taken by 1X1 MPI_Bcast() at rank 0 = 0.000005<BR>> time taken by 1X1 MPI_Bcast() at rank 1 = 0.000002<BR>> time taken by 1X1 MPI_Bcast() at rank 2 = 0.000003<BR>> time taken by 1X1 MPI_Bcast() at rank 3 = 0.000002<BR>> time taken by 1X1 MPI_Bcast() at rank 4 = 0.000004<BR>> time taken by 1X1 MPI_Bcast() at rank 5 = 0.000002<BR>> time taken by 1X1 MPI_Bcast() at rank 6 = 0.000003<BR>> time taken by 1X1 MPI_Bcast() at rank 7 = 0.000002<BR>> but when i connect to other node, it failed<BR>> [root@hksbs-s13 examples_logging]# mpiexec -f hosts -n 9 ./srtest<BR>> Fatal error in PMPI_Bcast: Other MPI error, error stack:<BR>> PMPI_Bcast(1478)......................: MPI_Bcast(buf=0x16fc2aa8,<BR>> count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed<BR>> MPIR_Bcast_impl(1321).................:<BR>> MPIR_Bcast_intra(1119)................:<BR>> MPIR_Bcast_scatter_ring_allgather(961):<BR>> MPIR_Bcast_binomial(213)..............: Failure during collective<BR>> MPIR_Bcast_scatter_ring_allgather(952):<BR>> MPIR_Bcast_binomial(189)..............:<BR>> MPIC_Send(63).........................:<BR>> MPIDI_EagerContigShortSend(262).......: failure occurred while<BR>> attempting to send an eager message<BR>> MPIDI_CH3_iStartMsg(36)...............: Communication error with rank 8<BR>> when i ssh the other node, for example<BR>><BR>> [root@hksbs-s13 examples_logging]# ssh hksbs-s11.com<BR>> Last login: Tue Jul 26 15:45:22 2011 from 10.33.15.233<BR>> [root@hksbs-s11 ~]#<BR>> it works.<BR>> How can check the reason?<BR>><BR>><BR>><BR>><BR>> _______________________________________________<BR>> mpich-discuss mailing list<BR>> mpich-discuss@mcs.anl.gov<BR>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<BR><BR>-- <BR>Pavan Balaji<BR>http://www.mcs.anl.gov/~balaji<BR></DIV></includetail></DIV>