[mpich-discuss] cryptic (to me) error

SULLIVAN David (AREVA) David.Sullivan at areva.com
Thu Sep 2 14:38:23 CDT 2010


That fixed the compile. Thanks!

The latest release does not fix the issues I am having though. Cpi works
fine, the test suit is certainly improved (see summary.xml output)
though when I try to use mcnp it still crashes in the same way (see
error.txt)

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Anthony Chan
Sent: Thursday, September 02, 2010 1:38 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] cryptic (to me) error


There is a bug in 1.3b1 about the option --enable-fc.  Since Fortran 90
is enabled by default, so remove the --enable-fc from your configure
command and try again.  If there is error again, send us the configure
output as you seen on your screen (See README) instead of config.log.

A.Chan

----- "SULLIVAN David (AREVA)" <David.Sullivan at areva.com> wrote:

> Failure again.
> The 1.3 beta version will not compile with Intel 10.1. It bombs at the

> configuration script:
> 
> checking for Fortran flag needed to allow free-form source... unknown
> configure: WARNING: Fortran 90 test being disabled because the 
> /home/dfs/mpich2-1.3b1/bin/mpif90 compiler does not accept a .f90 
> extension
> configure: error: Fortran does not accept free-form source
> configure: error: ./configure failed for test/mpi
> 
> I have attached to config.log
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev Thakur
> Sent: Thursday, September 02, 2010 12:11 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] cryptic (to me) error
> 
> Just try relinking with the new library at first.
> 
> Rajeev
> 
> On Sep 2, 2010, at 9:32 AM, SULLIVAN David (AREVA) wrote:
> 
> > I saw that there was a newer beta. I was really hoping to find I
> just
> > configured something incorrectly. Will this not require me to
> re-build
> 
> > mcnp (the only program I run that uses mpi for parallel) if I change
> 
> > the mpi version? If so, this is a bit of a hardship, requiring codes
> 
> > to be revalidated. If not- I will try it in a second.
> > 
> > Thanks,
> > 
> > Dave
> > 
> > -----Original Message-----
> > From: mpich-discuss-bounces at mcs.anl.gov 
> > [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave
> Goodell
> > Sent: Thursday, September 02, 2010 10:27 AM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: Re: [mpich-discuss] cryptic (to me) error
> > 
> > Can you try the latest release (1.3b1) to see if that fixes the 
> > problems you are seeing with your application?
> > 
> >
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=
> > do
> > wnloads
> > 
> > -Dave
> > 
> > On Sep 2, 2010, at 9:15 AM CDT, SULLIVAN David (AREVA) wrote:
> > 
> >> Another output file, hopefully of use. 
> >> 
> >> Thanks again
> >> 
> >> Dave
> >> 
> >> -----Original Message-----
> >> From: mpich-discuss-bounces at mcs.anl.gov 
> >> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of SULLIVAN 
> >> David
> >> (AREVA)
> >> Sent: Thursday, September 02, 2010 8:20 AM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [mpich-discuss] cryptic (to me) error
> >> 
> >> First my apologies for the delay in continuing this thread.
> >> Unfortunately I have not resolved it so if I can indulge the gurus
> 
> >> and
> > 
> >> developers once again...
> >> 
> >> As suggested by Rajeev I ran the testing suit in the source
> directory.
> >> The output of errors, which are similar to what I was seeing when I
> 
> >> ran
> >> mcnp5 (v. 1.40 and 1.51), is attached. 
> >> 
> >> Any insights would be greatly appreciated,
> >> 
> >> Dave
> >> 
> >> -----Original Message-----
> >> From: mpich-discuss-bounces at mcs.anl.gov 
> >> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev
> Thakur
> >> Sent: Wednesday, August 04, 2010 3:06 PM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [mpich-discuss] cryptic (to me) error
> >> 
> >> Then one level above that directory (in the main MPICH2 source 
> >> directory), type make testing, which will run through the entire
> >> MPICH2 test suite.
> >> 
> >> Rajeev
> >> 
> >> On Aug 4, 2010, at 2:04 PM, SULLIVAN David (AREVA) wrote:
> >> 
> >>> Oh. That's  embarrassing. Yea. I have those examples. It runs
> with
> >>> no
> >>> problems:
> >>> 
> >>> [dfs at aramis examples]$ mpiexec -host aramis -n 4 ./cpi Process 2
> of
> >>> 4
> > 
> >>> is on aramis Process 3 of 4 is on aramis Process 0 of 4 is on
> aramis
> 
> >>> Process 1 of 4 is on aramis pi is approximately
> 3.1415926544231239,
> >>> Error is 0.0000000008333307 wall clock time = 0.000652
> >>> 
> >>> 
> >>> -----Original Message-----
> >>> From: mpich-discuss-bounces at mcs.anl.gov 
> >>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gus
> Correa
> >>> Sent: Wednesday, August 04, 2010 1:13 PM
> >>> To: Mpich Discuss
> >>> Subject: Re: [mpich-discuss] cryptic (to me) error
> >>> 
> >>> Hi David
> >>> 
> >>> I think the "examples" dir is not copied to the installation
> >> directory.
> >>> You may find it where you decompressed the MPICH2 tarball, in case
> 
> >>> you
> >> 
> >>> installed it from source.
> >>> At least, this is what I have here.
> >>> 
> >>> Gus Correa
> >>> 
> >>> 
> >>> SULLIVAN David (AREVA) wrote:
> >>>> Yea, that always bothered me.  There is no such folder.
> >>>> There are :
> >>>> bin
> >>>> etc
> >>>> include
> >>>> lib
> >>>> sbin
> >>>> share
> >>>> 
> >>>> The  only examples I found were in the  share folder,  where
> there
> >>> are
> >>>> examples for collchk,  graphics and  logging.   
> >>>> 
> >>>> -----Original Message-----
> >>>> From: mpich-discuss-bounces at mcs.anl.gov 
> >>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev 
> >>>> Thakur
> >>>> Sent: Wednesday, August 04, 2010 12:45 PM
> >>>> To: mpich-discuss at mcs.anl.gov
> >>>> Subject: Re: [mpich-discuss] cryptic (to me) error
> >>>> 
> >>>> Not cpilog. Can you run just cpi from the mpich2/examples
> directory.
> >>>> 
> >>>> Rajeev
> >>>> 
> >>>> 
> >>>> On Aug 4, 2010, at 11:37 AM, SULLIVAN David (AREVA) wrote:
> >>>> 
> >>>>> Rajeev,  Darius,
> >>>>> 
> >>>>> Thanks for your response.
> >>>>> cpi yields  the  following-
> >>>>> 
> >>>>> [dfs at aramis examples_logging]$ mpiexec -host aramis -n 12
> ./cpilog
> 
> >>>>> Process 0 running on aramis Process 2 running on aramis Process
> 3
> >>>>> running on aramis Process 1 running on aramis Process 6 running
> on
> 
> >>>>> aramis Process 7 running on aramis Process 8 running on aramis 
> >>>>> Process
> >>>> 
> >>>>> 4 running on aramis Process 5 running on aramis Process 9
> running
> >>>>> on
> >> 
> >>>>> aramis Process 10 running on aramis Process 11 running on aramis
> 
> >>>>> pi
> > 
> >>>>> is
> >>>> 
> >>>>> approximately 3.1415926535898762, Error is 0.0000000000000830
> wall
> 
> >>>>> clock time = 0.058131 Writing logfile....
> >>>>> Enabling the Default clock synchronization...
> >>>>> clog_merger.c:CLOG_Merger_init() -
> >>>>>     Could not open file ./cpilog.clog2 for merging!
> >>>>> Backtrace of the callstack at rank 0:
> >>>>>     At [0]: ./cpilog(CLOG_Util_abort+0x92)[0x456326]
> >>>>>     At [1]: ./cpilog(CLOG_Merger_init+0x11f)[0x45db7c]
> >>>>>     At [2]: ./cpilog(CLOG_Converge_init+0x8e)[0x45a691]
> >>>>>     At [3]: ./cpilog(MPE_Finish_log+0xea)[0x4560aa]
> >>>>>     At [4]: ./cpilog(MPI_Finalize+0x50c)[0x4268af]
> >>>>>     At [5]: ./cpilog(main+0x428)[0x415963]
> >>>>>     At [6]:
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c1881d994]
> >>>>>     At [7]: ./cpilog[0x415449]
> >>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 
> >>>>> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal
> 
> >>>>> 15)
> >>>>> 
> >>>>> So  it looks like it works  with some issues.
> >>>>> 
> >>>>> When does  it fail? Immediately
> >>>>> 
> >>>>> Is there  a  bug? Many sucessfully use the aplication (MCNP5, 
> >>>>> from
> >>>>> LANL) with  mpi,  so  think that  a  bug there is  unlikely.
> >>>>> 
> >>>>> Core files, unfortunately reveals some ignorance on my part.
> Were
> >>>>> exactly should I be looking for them?
> >>>>> 
> >>>>> Thanks again,
> >>>>> 
> >>>>> Dave
> >>>>> -----Original Message-----
> >>>>> From: mpich-discuss-bounces at mcs.anl.gov 
> >>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius 
> >>>>> Buntinas
> >>>>> Sent: Wednesday, August 04, 2010 12:19 PM
> >>>>> To: mpich-discuss at mcs.anl.gov
> >>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
> >>>>> 
> >>>>> 
> >>>>> This error message says that two processes terminated because
> they
> 
> >>>>> were unable to communicate with another (or two other) process.
> >> It's
> >>> 
> >>>>> possible that another process died, so the others got errors 
> >>>>> trying
> > 
> >>>>> to
> >>>> 
> >>>>> communicate with them.  It's also possible that there is
> something
> 
> >>>>> preventing some processes from communicating with each other.
> >>>>> 
> >>>>> Are you able to run cpi from the examples directory with 12
> >>> processes?
> >>>>> 
> >>>>> At what point in your code does this fail?  Are there any other
> 
> >>>>> communication operations before the MPI_Comm_dup?
> >>>>> 
> >>>>> Enable core files (add "ulimit -c unlimited" to your .bashrc or
> >>>>> .tcshrc) then run your app and look for core files.  If there is
> a
> 
> >>>>> bug
> >>>> 
> >>>>> in your application that causes a process to die this might tell
> 
> >>>>> you
> >> 
> >>>>> which one and why.
> >>>>> 
> >>>>> Let us know how this goes.
> >>>>> 
> >>>>> -d
> >>>>> 
> >>>>> 
> >>>>> On Aug 4, 2010, at 11:03 AM, SULLIVAN David (AREVA) wrote:
> >>>>> 
> >>>>>> Since I have  had  no responses, is  there any other
> additional
> >>>>> information could I provide to solicit some direction for 
> >>>>> overcoming
> >> 
> >>>>> these latest string of mpi errors?
> >>>>>> Thanks,
> >>>>>> 
> >>>>>> Dave
> >>>>>> 
> >>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
> >>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
> SULLIVAN
> >>>>>> David F (AREVA NP INC)
> >>>>>> Sent: Friday, July 23, 2010 4:29 PM
> >>>>>> To: mpich-discuss at mcs.anl.gov
> >>>>>> Subject: [mpich-discuss] cryptic (to me) error
> >>>>>> 
> >>>>>> With my firewall issues firmly behind me, I have a new problem
> 
> >>>>>> for
> > 
> >>>>>> the
> >>>>> collective wisdom. I am attempting to run a program to which the
> 
> >>>>> response is as follows:
> >>>>>> [mcnp5_1-4 at athos ~]$ mpiexec -f nodes -n 12 mcnp5.mpi i=TN04 
> >>>>>> o=TN04.o
> >>>> 
> >>>>>> Fatal error in MPI_Comm_dup: Other MPI error, error stack:
> >>>>>> MPI_Comm_dup(168).................:
> MPI_Comm_dup(MPI_COMM_WORLD,
> >>>>>> new_comm=0x7fff58edb450) failed
> >>>>>> MPIR_Comm_copy(923)...............:
> >>>>>> MPIR_Get_contextid(639)...........:
> >>>>>> MPI_Allreduce(773)................:
> >> MPI_Allreduce(sbuf=MPI_IN_PLACE,
> >>>>> rbuf=0x7fff
> >>>>> 58edb1a0, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
> >>>>>> MPIR_Allreduce(228)...............:
> >>>>>> MPIC_Send(41).....................:
> >>>>>> MPIC_Wait(513)....................:
> >>>>>> MPIDI_CH3I_Progress(150)..........:
> >>>>>> MPID_nem_mpich2_blocking_recv(933):
> >>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error Fatal 
> >>>>>> error
> >> 
> >>>>>> in
> >>>>>> MPI_Comm_dup: Other MPI error, error stack:
> >>>>>> MPI_Comm_dup(168).................:
> MPI_Comm_dup(MPI_COMM_WORLD,
> >>>>> new_comm=0x7fff
> >>>>> 97dca620) failed
> >>>>>> MPIR_Comm_copy(923)...............:
> >>>>>> MPIR_Get_contextid(639)...........:
> >>>>>> MPI_Allreduce(773)................:
> >> MPI_Allreduce(sbuf=MPI_IN_PLACE,
> >>>>> rbuf=0x7fff
> >>>>> 97dca370, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) failed
> >>>>>> MPIR_Allreduce(289)...............:
> >>>>>> MPIC_Sendrecv(161)................:
> >>>>>> MPIC_Wait(513)....................:
> >>>>>> MPIDI_CH3I_Progress(150)..........:
> >>>>>> MPID_nem_mpich2_blocking_recv(948):
> >>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error Killed
> by
> 
> >>>>>> signal 2.
> >>>>>> Ctrl-C caught... cleaning up processes [mpiexec at athos] 
> >>>>>> HYDT_dmx_deregister_fd (./tools/demux/demux.c:142): could not 
> >>>>>> find
> > 
> >>>>>> fd
> >>>> 
> >>>>>> to deregister: -2 [mpiexec at athos] HYD_pmcd_pmiserv_cleanup
> >>>>>> (./pm/pmiserv/pmiserv_cb.c:401): error deregistering fd [press
> 
> >>>>>> Ctrl-C
> >>>> 
> >>>>>> again to force abort] APPLICATION TERMINATED WITH THE EXIT
> STRING:
> >>>>>> Killed (signal 9) [mcnp5_1-4 at athos ~]$ Any ideas?
> >>>>>> 
> >>>>>> Thanks in advance,
> >>>>>> 
> >>>>>> David Sullivan
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> AREVA NP INC
> >>>>>> 400 Donald Lynch Boulevard
> >>>>>> Marlborough, MA, 01752
> >>>>>> Phone: (508) 573-6721
> >>>>>> Fax: (434) 382-5597
> >>>>>> David.Sullivan at AREVA.com
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> mpich-discuss mailing list
> >>>>>> mpich-discuss at mcs.anl.gov
> >>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>>>> _______________________________________________
> >>>>> mpich-discuss mailing list
> >>>>> mpich-discuss at mcs.anl.gov
> >>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>>>> _______________________________________________
> >>>>> mpich-discuss mailing list
> >>>>> mpich-discuss at mcs.anl.gov
> >>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>>> 
> >>>> _______________________________________________
> >>>> mpich-discuss mailing list
> >>>> mpich-discuss at mcs.anl.gov
> >>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>>> _______________________________________________
> >>>> mpich-discuss mailing list
> >>>> mpich-discuss at mcs.anl.gov
> >>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>> 
> >>> _______________________________________________
> >>> mpich-discuss mailing list
> >>> mpich-discuss at mcs.anl.gov
> >>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>> _______________________________________________
> >>> mpich-discuss mailing list
> >>> mpich-discuss at mcs.anl.gov
> >>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >> 
> >> _______________________________________________
> >> mpich-discuss mailing list
> >> mpich-discuss at mcs.anl.gov
> >> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >> <summary.xml>_______________________________________________
> >> mpich-discuss mailing list
> >> mpich-discuss at mcs.anl.gov
> >> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > 
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: summary.xml
Type: text/xml
Size: 66377 bytes
Desc: summary.xml
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100902/acd14c4e/attachment-0001.bin>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: error.txt
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100902/acd14c4e/attachment-0001.txt>


More information about the mpich-discuss mailing list