[mpich-discuss] cryptic (to me) error

Rajeev Thakur thakur at mcs.anl.gov
Wed Aug 4 11:44:37 CDT 2010


Not cpilog. Can you run just cpi from the mpich2/examples directory.

Rajeev


On Aug 4, 2010, at 11:37 AM, SULLIVAN David (AREVA) wrote:

> Rajeev,  Darius,
> 
> Thanks for your response.
> cpi yields  the  following-
> 
> [dfs at aramis examples_logging]$ mpiexec -host aramis -n 12 ./cpilog
> Process 0 running on aramis
> Process 2 running on aramis
> Process 3 running on aramis
> Process 1 running on aramis
> Process 6 running on aramis
> Process 7 running on aramis
> Process 8 running on aramis
> Process 4 running on aramis
> Process 5 running on aramis
> Process 9 running on aramis
> Process 10 running on aramis
> Process 11 running on aramis
> pi is approximately 3.1415926535898762, Error is 0.0000000000000830
> wall clock time = 0.058131
> Writing logfile....
> Enabling the Default clock synchronization...
> clog_merger.c:CLOG_Merger_init() -
>        Could not open file ./cpilog.clog2 for merging!
> Backtrace of the callstack at rank 0:
>        At [0]: ./cpilog(CLOG_Util_abort+0x92)[0x456326]
>        At [1]: ./cpilog(CLOG_Merger_init+0x11f)[0x45db7c]
>        At [2]: ./cpilog(CLOG_Converge_init+0x8e)[0x45a691]
>        At [3]: ./cpilog(MPE_Finish_log+0xea)[0x4560aa]
>        At [4]: ./cpilog(MPI_Finalize+0x50c)[0x4268af]
>        At [5]: ./cpilog(main+0x428)[0x415963]
>        At [6]: /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c1881d994]
>        At [7]: ./cpilog[0x415449]
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15) 
> 
> So  it looks like it works  with some issues.
> 
> When does  it fail? Immediately
> 
> Is there  a  bug? Many sucessfully use the aplication (MCNP5,  from
> LANL) with  mpi,  so  think that  a  bug there is  unlikely.
> 
> Core files, unfortunately reveals some ignorance on my part. Were
> exactly should I be looking for them?
> 
> Thanks again,
> 
> Dave
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius Buntinas
> Sent: Wednesday, August 04, 2010 12:19 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] cryptic (to me) error
> 
> 
> This error message says that two processes terminated because they were
> unable to communicate with another (or two other) process.  It's
> possible that another process died, so the others got errors trying to
> communicate with them.  It's also possible that there is something
> preventing some processes from communicating with each other.
> 
> Are you able to run cpi from the examples directory with 12 processes?
> 
> At what point in your code does this fail?  Are there any other
> communication operations before the MPI_Comm_dup?
> 
> Enable core files (add "ulimit -c unlimited" to your .bashrc or .tcshrc)
> then run your app and look for core files.  If there is a bug in your
> application that causes a process to die this might tell you which one
> and why.
> 
> Let us know how this goes.
> 
> -d
> 
> 
> On Aug 4, 2010, at 11:03 AM, SULLIVAN David (AREVA) wrote:
> 
>> Since I have  had  no responses, is  there any other additional
> information could I provide to solicit some direction for overcoming
> these latest string of mpi errors?
>> 
>> Thanks,
>> 
>> Dave
>> 
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of SULLIVAN David
> 
>> F (AREVA NP INC)
>> Sent: Friday, July 23, 2010 4:29 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: [mpich-discuss] cryptic (to me) error
>> 
>> With my firewall issues firmly behind me, I have a new problem for the
> collective wisdom. I am attempting to run a program to which the
> response is as follows:
>> 
>> [mcnp5_1-4 at athos ~]$ mpiexec -f nodes -n 12 mcnp5.mpi i=TN04 o=TN04.o 
>> Fatal error in MPI_Comm_dup: Other MPI error, error stack:
>> MPI_Comm_dup(168).................: MPI_Comm_dup(MPI_COMM_WORLD, 
>> new_comm=0x7fff58edb450) failed
>> MPIR_Comm_copy(923)...............:
>> MPIR_Get_contextid(639)...........:
>> MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE,
> rbuf=0x7fff
> 58edb1a0, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
>> MPIR_Allreduce(228)...............:
>> MPIC_Send(41).....................:
>> MPIC_Wait(513)....................:
>> MPIDI_CH3I_Progress(150)..........:
>> MPID_nem_mpich2_blocking_recv(933):
>> MPID_nem_tcp_connpoll(1709).......: Communication error Fatal error in
> 
>> MPI_Comm_dup: Other MPI error, error stack:
>> MPI_Comm_dup(168).................: MPI_Comm_dup(MPI_COMM_WORLD,
> new_comm=0x7fff
> 97dca620) failed
>> MPIR_Comm_copy(923)...............:
>> MPIR_Get_contextid(639)...........:
>> MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE,
> rbuf=0x7fff
> 97dca370, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
>> MPIR_Allreduce(289)...............:
>> MPIC_Sendrecv(161)................:
>> MPIC_Wait(513)....................:
>> MPIDI_CH3I_Progress(150)..........:
>> MPID_nem_mpich2_blocking_recv(948):
>> MPID_nem_tcp_connpoll(1709).......: Communication error Killed by 
>> signal 2.
>> Ctrl-C caught... cleaning up processes [mpiexec at athos] 
>> HYDT_dmx_deregister_fd (./tools/demux/demux.c:142): could not find fd 
>> to deregister: -2 [mpiexec at athos] HYD_pmcd_pmiserv_cleanup 
>> (./pm/pmiserv/pmiserv_cb.c:401): error deregistering fd [press Ctrl-C 
>> again to force abort] APPLICATION TERMINATED WITH THE EXIT STRING: 
>> Killed (signal 9) [mcnp5_1-4 at athos ~]$ Any ideas?
>> 
>> Thanks in advance,
>> 
>> David Sullivan
>> 
>> 
>> 
>> AREVA NP INC
>> 400 Donald Lynch Boulevard
>> Marlborough, MA, 01752
>> Phone: (508) 573-6721
>> Fax: (434) 382-5597
>> David.Sullivan at AREVA.com
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list