[mpich-discuss] cryptic (to me) error
Rajeev Thakur
thakur at mcs.anl.gov
Wed Aug 4 14:06:16 CDT 2010
Then one level above that directory (in the main MPICH2 source directory), type make testing, which will run through the entire MPICH2 test suite.
Rajeev
On Aug 4, 2010, at 2:04 PM, SULLIVAN David (AREVA) wrote:
> Oh. That's embarrassing. Yea. I have those examples. It runs with no
> problems:
>
> [dfs at aramis examples]$ mpiexec -host aramis -n 4 ./cpi
> Process 2 of 4 is on aramis
> Process 3 of 4 is on aramis
> Process 0 of 4 is on aramis
> Process 1 of 4 is on aramis
> pi is approximately 3.1415926544231239, Error is 0.0000000008333307
> wall clock time = 0.000652
>
>
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gus Correa
> Sent: Wednesday, August 04, 2010 1:13 PM
> To: Mpich Discuss
> Subject: Re: [mpich-discuss] cryptic (to me) error
>
> Hi David
>
> I think the "examples" dir is not copied to the installation directory.
> You may find it where you decompressed the MPICH2 tarball, in case you
> installed it from source.
> At least, this is what I have here.
>
> Gus Correa
>
>
> SULLIVAN David (AREVA) wrote:
>> Yea, that always bothered me. There is no such folder.
>> There are :
>> bin
>> etc
>> include
>> lib
>> sbin
>> share
>>
>> The only examples I found were in the share folder, where there
> are
>> examples for collchk, graphics and logging.
>>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev Thakur
>> Sent: Wednesday, August 04, 2010 12:45 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>
>> Not cpilog. Can you run just cpi from the mpich2/examples directory.
>>
>> Rajeev
>>
>>
>> On Aug 4, 2010, at 11:37 AM, SULLIVAN David (AREVA) wrote:
>>
>>> Rajeev, Darius,
>>>
>>> Thanks for your response.
>>> cpi yields the following-
>>>
>>> [dfs at aramis examples_logging]$ mpiexec -host aramis -n 12 ./cpilog
>>> Process 0 running on aramis Process 2 running on aramis Process 3
>>> running on aramis Process 1 running on aramis Process 6 running on
>>> aramis Process 7 running on aramis Process 8 running on aramis
>>> Process
>>
>>> 4 running on aramis Process 5 running on aramis Process 9 running on
>>> aramis Process 10 running on aramis Process 11 running on aramis pi
>>> is
>>
>>> approximately 3.1415926535898762, Error is 0.0000000000000830 wall
>>> clock time = 0.058131 Writing logfile....
>>> Enabling the Default clock synchronization...
>>> clog_merger.c:CLOG_Merger_init() -
>>> Could not open file ./cpilog.clog2 for merging!
>>> Backtrace of the callstack at rank 0:
>>> At [0]: ./cpilog(CLOG_Util_abort+0x92)[0x456326]
>>> At [1]: ./cpilog(CLOG_Merger_init+0x11f)[0x45db7c]
>>> At [2]: ./cpilog(CLOG_Converge_init+0x8e)[0x45a691]
>>> At [3]: ./cpilog(MPE_Finish_log+0xea)[0x4560aa]
>>> At [4]: ./cpilog(MPI_Finalize+0x50c)[0x4268af]
>>> At [5]: ./cpilog(main+0x428)[0x415963]
>>> At [6]: /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c1881d994]
>>> At [7]: ./cpilog[0x415449]
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
>>>
>>> So it looks like it works with some issues.
>>>
>>> When does it fail? Immediately
>>>
>>> Is there a bug? Many sucessfully use the aplication (MCNP5, from
>>> LANL) with mpi, so think that a bug there is unlikely.
>>>
>>> Core files, unfortunately reveals some ignorance on my part. Were
>>> exactly should I be looking for them?
>>>
>>> Thanks again,
>>>
>>> Dave
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius
>>> Buntinas
>>> Sent: Wednesday, August 04, 2010 12:19 PM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>
>>>
>>> This error message says that two processes terminated because they
>>> were unable to communicate with another (or two other) process. It's
>
>>> possible that another process died, so the others got errors trying
>>> to
>>
>>> communicate with them. It's also possible that there is something
>>> preventing some processes from communicating with each other.
>>>
>>> Are you able to run cpi from the examples directory with 12
> processes?
>>>
>>> At what point in your code does this fail? Are there any other
>>> communication operations before the MPI_Comm_dup?
>>>
>>> Enable core files (add "ulimit -c unlimited" to your .bashrc or
>>> .tcshrc) then run your app and look for core files. If there is a
>>> bug
>>
>>> in your application that causes a process to die this might tell you
>>> which one and why.
>>>
>>> Let us know how this goes.
>>>
>>> -d
>>>
>>>
>>> On Aug 4, 2010, at 11:03 AM, SULLIVAN David (AREVA) wrote:
>>>
>>>> Since I have had no responses, is there any other additional
>>> information could I provide to solicit some direction for overcoming
>>> these latest string of mpi errors?
>>>> Thanks,
>>>>
>>>> Dave
>>>>
>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of SULLIVAN
>>>> David F (AREVA NP INC)
>>>> Sent: Friday, July 23, 2010 4:29 PM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: [mpich-discuss] cryptic (to me) error
>>>>
>>>> With my firewall issues firmly behind me, I have a new problem for
>>>> the
>>> collective wisdom. I am attempting to run a program to which the
>>> response is as follows:
>>>> [mcnp5_1-4 at athos ~]$ mpiexec -f nodes -n 12 mcnp5.mpi i=TN04
>>>> o=TN04.o
>>
>>>> Fatal error in MPI_Comm_dup: Other MPI error, error stack:
>>>> MPI_Comm_dup(168).................: MPI_Comm_dup(MPI_COMM_WORLD,
>>>> new_comm=0x7fff58edb450) failed
>>>> MPIR_Comm_copy(923)...............:
>>>> MPIR_Get_contextid(639)...........:
>>>> MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>> rbuf=0x7fff
>>> 58edb1a0, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
>>>> MPIR_Allreduce(228)...............:
>>>> MPIC_Send(41).....................:
>>>> MPIC_Wait(513)....................:
>>>> MPIDI_CH3I_Progress(150)..........:
>>>> MPID_nem_mpich2_blocking_recv(933):
>>>> MPID_nem_tcp_connpoll(1709).......: Communication error Fatal error
>>>> in
>>>> MPI_Comm_dup: Other MPI error, error stack:
>>>> MPI_Comm_dup(168).................: MPI_Comm_dup(MPI_COMM_WORLD,
>>> new_comm=0x7fff
>>> 97dca620) failed
>>>> MPIR_Comm_copy(923)...............:
>>>> MPIR_Get_contextid(639)...........:
>>>> MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>> rbuf=0x7fff
>>> 97dca370, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
>>>> MPIR_Allreduce(289)...............:
>>>> MPIC_Sendrecv(161)................:
>>>> MPIC_Wait(513)....................:
>>>> MPIDI_CH3I_Progress(150)..........:
>>>> MPID_nem_mpich2_blocking_recv(948):
>>>> MPID_nem_tcp_connpoll(1709).......: Communication error Killed by
>>>> signal 2.
>>>> Ctrl-C caught... cleaning up processes [mpiexec at athos]
>>>> HYDT_dmx_deregister_fd (./tools/demux/demux.c:142): could not find
>>>> fd
>>
>>>> to deregister: -2 [mpiexec at athos] HYD_pmcd_pmiserv_cleanup
>>>> (./pm/pmiserv/pmiserv_cb.c:401): error deregistering fd [press
>>>> Ctrl-C
>>
>>>> again to force abort] APPLICATION TERMINATED WITH THE EXIT STRING:
>>>> Killed (signal 9) [mcnp5_1-4 at athos ~]$ Any ideas?
>>>>
>>>> Thanks in advance,
>>>>
>>>> David Sullivan
>>>>
>>>>
>>>>
>>>> AREVA NP INC
>>>> 400 Donald Lynch Boulevard
>>>> Marlborough, MA, 01752
>>>> Phone: (508) 573-6721
>>>> Fax: (434) 382-5597
>>>> David.Sullivan at AREVA.com
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list