[mpich-discuss] cryptic (to me) error

SULLIVAN David (AREVA) David.Sullivan at areva.com
Wed Aug 4 12:03:34 CDT 2010


Yea, that always bothered me.  There is no such folder.
There are :
bin
etc
include
lib
sbin
share

The  only examples I found were in the  share folder,  where  there  are
examples for collchk,  graphics and  logging.   

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev Thakur
Sent: Wednesday, August 04, 2010 12:45 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] cryptic (to me) error

Not cpilog. Can you run just cpi from the mpich2/examples directory.

Rajeev


On Aug 4, 2010, at 11:37 AM, SULLIVAN David (AREVA) wrote:

> Rajeev,  Darius,
> 
> Thanks for your response.
> cpi yields  the  following-
> 
> [dfs at aramis examples_logging]$ mpiexec -host aramis -n 12 ./cpilog 
> Process 0 running on aramis Process 2 running on aramis Process 3 
> running on aramis Process 1 running on aramis Process 6 running on 
> aramis Process 7 running on aramis Process 8 running on aramis Process

> 4 running on aramis Process 5 running on aramis Process 9 running on 
> aramis Process 10 running on aramis Process 11 running on aramis pi is

> approximately 3.1415926535898762, Error is 0.0000000000000830 wall 
> clock time = 0.058131 Writing logfile....
> Enabling the Default clock synchronization...
> clog_merger.c:CLOG_Merger_init() -
>        Could not open file ./cpilog.clog2 for merging!
> Backtrace of the callstack at rank 0:
>        At [0]: ./cpilog(CLOG_Util_abort+0x92)[0x456326]
>        At [1]: ./cpilog(CLOG_Merger_init+0x11f)[0x45db7c]
>        At [2]: ./cpilog(CLOG_Converge_init+0x8e)[0x45a691]
>        At [3]: ./cpilog(MPE_Finish_log+0xea)[0x4560aa]
>        At [4]: ./cpilog(MPI_Finalize+0x50c)[0x4268af]
>        At [5]: ./cpilog(main+0x428)[0x415963]
>        At [6]: /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c1881d994]
>        At [7]: ./cpilog[0x415449]
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 
> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
> 
> So  it looks like it works  with some issues.
> 
> When does  it fail? Immediately
> 
> Is there  a  bug? Many sucessfully use the aplication (MCNP5,  from
> LANL) with  mpi,  so  think that  a  bug there is  unlikely.
> 
> Core files, unfortunately reveals some ignorance on my part. Were 
> exactly should I be looking for them?
> 
> Thanks again,
> 
> Dave
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius 
> Buntinas
> Sent: Wednesday, August 04, 2010 12:19 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] cryptic (to me) error
> 
> 
> This error message says that two processes terminated because they 
> were unable to communicate with another (or two other) process.  It's 
> possible that another process died, so the others got errors trying to

> communicate with them.  It's also possible that there is something 
> preventing some processes from communicating with each other.
> 
> Are you able to run cpi from the examples directory with 12 processes?
> 
> At what point in your code does this fail?  Are there any other 
> communication operations before the MPI_Comm_dup?
> 
> Enable core files (add "ulimit -c unlimited" to your .bashrc or 
> .tcshrc) then run your app and look for core files.  If there is a bug

> in your application that causes a process to die this might tell you 
> which one and why.
> 
> Let us know how this goes.
> 
> -d
> 
> 
> On Aug 4, 2010, at 11:03 AM, SULLIVAN David (AREVA) wrote:
> 
>> Since I have  had  no responses, is  there any other additional
> information could I provide to solicit some direction for overcoming 
> these latest string of mpi errors?
>> 
>> Thanks,
>> 
>> Dave
>> 
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of SULLIVAN 
>> David
> 
>> F (AREVA NP INC)
>> Sent: Friday, July 23, 2010 4:29 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: [mpich-discuss] cryptic (to me) error
>> 
>> With my firewall issues firmly behind me, I have a new problem for 
>> the
> collective wisdom. I am attempting to run a program to which the 
> response is as follows:
>> 
>> [mcnp5_1-4 at athos ~]$ mpiexec -f nodes -n 12 mcnp5.mpi i=TN04 o=TN04.o

>> Fatal error in MPI_Comm_dup: Other MPI error, error stack:
>> MPI_Comm_dup(168).................: MPI_Comm_dup(MPI_COMM_WORLD,
>> new_comm=0x7fff58edb450) failed
>> MPIR_Comm_copy(923)...............:
>> MPIR_Get_contextid(639)...........:
>> MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE,
> rbuf=0x7fff
> 58edb1a0, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
>> MPIR_Allreduce(228)...............:
>> MPIC_Send(41).....................:
>> MPIC_Wait(513)....................:
>> MPIDI_CH3I_Progress(150)..........:
>> MPID_nem_mpich2_blocking_recv(933):
>> MPID_nem_tcp_connpoll(1709).......: Communication error Fatal error 
>> in
> 
>> MPI_Comm_dup: Other MPI error, error stack:
>> MPI_Comm_dup(168).................: MPI_Comm_dup(MPI_COMM_WORLD,
> new_comm=0x7fff
> 97dca620) failed
>> MPIR_Comm_copy(923)...............:
>> MPIR_Get_contextid(639)...........:
>> MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE,
> rbuf=0x7fff
> 97dca370, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
>> MPIR_Allreduce(289)...............:
>> MPIC_Sendrecv(161)................:
>> MPIC_Wait(513)....................:
>> MPIDI_CH3I_Progress(150)..........:
>> MPID_nem_mpich2_blocking_recv(948):
>> MPID_nem_tcp_connpoll(1709).......: Communication error Killed by 
>> signal 2.
>> Ctrl-C caught... cleaning up processes [mpiexec at athos] 
>> HYDT_dmx_deregister_fd (./tools/demux/demux.c:142): could not find fd

>> to deregister: -2 [mpiexec at athos] HYD_pmcd_pmiserv_cleanup
>> (./pm/pmiserv/pmiserv_cb.c:401): error deregistering fd [press Ctrl-C

>> again to force abort] APPLICATION TERMINATED WITH THE EXIT STRING:
>> Killed (signal 9) [mcnp5_1-4 at athos ~]$ Any ideas?
>> 
>> Thanks in advance,
>> 
>> David Sullivan
>> 
>> 
>> 
>> AREVA NP INC
>> 400 Donald Lynch Boulevard
>> Marlborough, MA, 01752
>> Phone: (508) 573-6721
>> Fax: (434) 382-5597
>> David.Sullivan at AREVA.com
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list