[mpich-discuss] cryptic (to me) error
Rajeev Thakur
thakur at mcs.anl.gov
Wed Aug 4 11:44:37 CDT 2010
Not cpilog. Can you run just cpi from the mpich2/examples directory.
Rajeev
On Aug 4, 2010, at 11:37 AM, SULLIVAN David (AREVA) wrote:
> Rajeev, Darius,
>
> Thanks for your response.
> cpi yields the following-
>
> [dfs at aramis examples_logging]$ mpiexec -host aramis -n 12 ./cpilog
> Process 0 running on aramis
> Process 2 running on aramis
> Process 3 running on aramis
> Process 1 running on aramis
> Process 6 running on aramis
> Process 7 running on aramis
> Process 8 running on aramis
> Process 4 running on aramis
> Process 5 running on aramis
> Process 9 running on aramis
> Process 10 running on aramis
> Process 11 running on aramis
> pi is approximately 3.1415926535898762, Error is 0.0000000000000830
> wall clock time = 0.058131
> Writing logfile....
> Enabling the Default clock synchronization...
> clog_merger.c:CLOG_Merger_init() -
> Could not open file ./cpilog.clog2 for merging!
> Backtrace of the callstack at rank 0:
> At [0]: ./cpilog(CLOG_Util_abort+0x92)[0x456326]
> At [1]: ./cpilog(CLOG_Merger_init+0x11f)[0x45db7c]
> At [2]: ./cpilog(CLOG_Converge_init+0x8e)[0x45a691]
> At [3]: ./cpilog(MPE_Finish_log+0xea)[0x4560aa]
> At [4]: ./cpilog(MPI_Finalize+0x50c)[0x4268af]
> At [5]: ./cpilog(main+0x428)[0x415963]
> At [6]: /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c1881d994]
> At [7]: ./cpilog[0x415449]
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
>
> So it looks like it works with some issues.
>
> When does it fail? Immediately
>
> Is there a bug? Many sucessfully use the aplication (MCNP5, from
> LANL) with mpi, so think that a bug there is unlikely.
>
> Core files, unfortunately reveals some ignorance on my part. Were
> exactly should I be looking for them?
>
> Thanks again,
>
> Dave
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius Buntinas
> Sent: Wednesday, August 04, 2010 12:19 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] cryptic (to me) error
>
>
> This error message says that two processes terminated because they were
> unable to communicate with another (or two other) process. It's
> possible that another process died, so the others got errors trying to
> communicate with them. It's also possible that there is something
> preventing some processes from communicating with each other.
>
> Are you able to run cpi from the examples directory with 12 processes?
>
> At what point in your code does this fail? Are there any other
> communication operations before the MPI_Comm_dup?
>
> Enable core files (add "ulimit -c unlimited" to your .bashrc or .tcshrc)
> then run your app and look for core files. If there is a bug in your
> application that causes a process to die this might tell you which one
> and why.
>
> Let us know how this goes.
>
> -d
>
>
> On Aug 4, 2010, at 11:03 AM, SULLIVAN David (AREVA) wrote:
>
>> Since I have had no responses, is there any other additional
> information could I provide to solicit some direction for overcoming
> these latest string of mpi errors?
>>
>> Thanks,
>>
>> Dave
>>
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of SULLIVAN David
>
>> F (AREVA NP INC)
>> Sent: Friday, July 23, 2010 4:29 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: [mpich-discuss] cryptic (to me) error
>>
>> With my firewall issues firmly behind me, I have a new problem for the
> collective wisdom. I am attempting to run a program to which the
> response is as follows:
>>
>> [mcnp5_1-4 at athos ~]$ mpiexec -f nodes -n 12 mcnp5.mpi i=TN04 o=TN04.o
>> Fatal error in MPI_Comm_dup: Other MPI error, error stack:
>> MPI_Comm_dup(168).................: MPI_Comm_dup(MPI_COMM_WORLD,
>> new_comm=0x7fff58edb450) failed
>> MPIR_Comm_copy(923)...............:
>> MPIR_Get_contextid(639)...........:
>> MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE,
> rbuf=0x7fff
> 58edb1a0, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
>> MPIR_Allreduce(228)...............:
>> MPIC_Send(41).....................:
>> MPIC_Wait(513)....................:
>> MPIDI_CH3I_Progress(150)..........:
>> MPID_nem_mpich2_blocking_recv(933):
>> MPID_nem_tcp_connpoll(1709).......: Communication error Fatal error in
>
>> MPI_Comm_dup: Other MPI error, error stack:
>> MPI_Comm_dup(168).................: MPI_Comm_dup(MPI_COMM_WORLD,
> new_comm=0x7fff
> 97dca620) failed
>> MPIR_Comm_copy(923)...............:
>> MPIR_Get_contextid(639)...........:
>> MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE,
> rbuf=0x7fff
> 97dca370, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
>> MPIR_Allreduce(289)...............:
>> MPIC_Sendrecv(161)................:
>> MPIC_Wait(513)....................:
>> MPIDI_CH3I_Progress(150)..........:
>> MPID_nem_mpich2_blocking_recv(948):
>> MPID_nem_tcp_connpoll(1709).......: Communication error Killed by
>> signal 2.
>> Ctrl-C caught... cleaning up processes [mpiexec at athos]
>> HYDT_dmx_deregister_fd (./tools/demux/demux.c:142): could not find fd
>> to deregister: -2 [mpiexec at athos] HYD_pmcd_pmiserv_cleanup
>> (./pm/pmiserv/pmiserv_cb.c:401): error deregistering fd [press Ctrl-C
>> again to force abort] APPLICATION TERMINATED WITH THE EXIT STRING:
>> Killed (signal 9) [mcnp5_1-4 at athos ~]$ Any ideas?
>>
>> Thanks in advance,
>>
>> David Sullivan
>>
>>
>>
>> AREVA NP INC
>> 400 Donald Lynch Boulevard
>> Marlborough, MA, 01752
>> Phone: (508) 573-6721
>> Fax: (434) 382-5597
>> David.Sullivan at AREVA.com
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list