[mpich-discuss] cryptic (to me) error
SULLIVAN David (AREVA)
David.Sullivan at areva.com
Fri Sep 3 13:10:07 CDT 2010
No it is not. I have copied everything to each node in the same
location.
-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
Sent: Friday, September 03, 2010 2:03 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] cryptic (to me) error
The test suite directory must be on a shared filesystem because mpiexec
does not stage executables for you.
-Dave
On Sep 3, 2010, at 1:02 PM CDT, SULLIVAN David (AREVA) wrote:
> Oh, Well I left it and every test results in...
>
> Unexpected output in attrend: [proxy:0:1 at porthos] HYDU_create_process
> (./utils/l
> aunch/launch.c:70): execvp error on file ./attrend (No such file or
> directory)
> Unexpected output in attrend: [proxy:0:1 at porthos] HYDU_create_process
> (./utils/l
> aunch/launch.c:70): execvp error on file ./attrend (No such file or
> directory)
> Unexpected output in attrend: [mpiexec at aramis] HYDT_dmx_deregister_fd
> (./tools/demux/demux.c:142): could not find fd to deregister: -2
> Unexpected output in attrend: [mpiexec at aramis]
> HYD_pmcd_pmiserv_cleanup
> (./pm/pmiserv/pmiserv_cb.c:398): error deregistering fd Unexpected
> output in attrend: APPLICATION TERMINATED WITH THE EXIT
> STRING: Hangup (signal 1)
> Program attrend exited without No Errors Unexpected output in
> attrend2: [proxy:0:1 at porthos] HYDU_create_process
> (./utils/launch/launch.c:70): execvp error on file ./attrend2 (No such
> file or directory) Unexpected output in attrend2: [proxy:0:1 at porthos]
> HYDU_create_process
> (./utils/launch/launch.c:70): execvp error on file ./attrend2 (No such
> file or directory) Unexpected output in attrend2: [mpiexec at aramis]
> HYDT_dmx_deregister_fd
> (./tools/demux/demux.c:142): could not find fd to deregister: -2
> Unexpected output in attrend2: [mpiexec at aramis]
> HYD_pmcd_pmiserv_cleanup
> (./pm/pmiserv/pmiserv_cb.c:398): error deregistering fd Unexpected
> output in attrend2: APPLICATION TERMINATED WITH THE EXIT
> STRING: Hangup (signal 1)
> Program attrend2 exited without No Errors
>
>
>
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius
> Buntinas
> Sent: Friday, September 03, 2010 1:45 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] cryptic (to me) error
>
> The tests run quietly, unless there's an error. Set VERBOSE=1 to the
> command like to see each test running:
>
> HYDRA_HOST_FILE=XXX VERBOSE=1 make testing
>
> -d
>
> On Sep 3, 2010, at 12:40 PM, SULLIVAN David (AREVA) wrote:
>
>> Oh, Ok. Well it just hangs so I aborted. Checked each machine with
>> top
>
>> and neither are doing anything when it hangs. Attached is the screen
>> output.
>>
>> Thanks again.
>>
>>
>> Dave
>>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>> Sent: Friday, September 03, 2010 12:14 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>
>> Give the full path to your host file: /path/to/node_list
>>
>> When the test script changes directories to the different test
>> directories, a relative path isn't correct anymore.
>>
>> -Dave
>>
>> On Sep 3, 2010, at 9:48 AM CDT, SULLIVAN David (AREVA) wrote:
>>
>>> Wow, that made a difference. Not a good one. But a big one. I
>>> stopped
>
>>> the test since each process resulted in the attached errors. It
>>> looks
>
>>> like it is trying to access files that are not there to parse the
>>> HYDRA_HOST_FILE. I checked if there were there and it was just a
>>> permissions thing, but that was not the case.
>>>
>>> Dave
>>>
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>>> Sent: Friday, September 03, 2010 10:36 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>
>>> I was unclear. I meant to use HYDRA_HOST_FILE when running the
>>> MPICH2
>>
>>> test suite in order to make it run across the same set of machines
>>> that your MCNP code runs on.
>>>
>>> -Dave
>>>
>>> On Sep 3, 2010, at 9:25 AM CDT, SULLIVAN David (AREVA) wrote:
>>>
>>>> Same result, mpi doesn't even get to call mcnp5.mpi. It just
>>>> returns
>
>>>> the
>>>>
>>>> Fatal error in PMPI_Comm_dup: Other MPI error, error stack: error.
>>>>
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave
>>>> Goodell
>>>> Sent: Friday, September 03, 2010 10:21 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>
>>>> Try setting HYDRA_HOST_FILE=your_machine_file_here
>>>>
>>>> This will make hydra act as though you passed "-f your
>>>> machine_file_here" to every mpiexec.
>>>>
>>>> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Ma
>>>> n
>>>> a
>>>> g
>>>> er
>>>> #Environment_Settings
>>>>
>>>> -Dave
>>>>
>>>> On Sep 3, 2010, at 5:00 AM CDT, SULLIVAN David (AREVA) wrote:
>>>>
>>>>> I was wondering about that. Is there a configuration file that
>>>>> sets
>
>>>>> up
>>>> the cluster and defines which node to run on? Would that make the
>>>> issue any clearer?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Dave
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Rajeev Thakur
>>>>> Sent: Thu 9/2/2010 10:22 PM
>>>>> To: mpich-discuss at mcs.anl.gov
>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>
>>>>> There might be some connection issues between the two machines.
>>>>> The
>>>> MPICH2 test suite that you ran with "make testing" probably ran on
>>>> a
>
>>>> single machine.
>>>>>
>>>>> On Sep 2, 2010, at 6:27 PM, SULLIVAN David (AREVA) wrote:
>>>>>
>>>>>> The error occurs immediately- I don't think it even starts the
>>>> executable. It does work on the single machine with 4 processes.
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Rajeev
>>>>>> Thakur
>>>>>> Sent: Thu 9/2/2010 4:34 PM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>
>>>>>> Does it run with 2 processes on a single machine?
>>>>>>
>>>>>>
>>>>>> On Sep 2, 2010, at 2:38 PM, SULLIVAN David (AREVA) wrote:
>>>>>>
>>>>>>> That fixed the compile. Thanks!
>>>>>>>
>>>>>>> The latest release does not fix the issues I am having though.
>>>>>>> Cpi
>>
>>>>>>> works fine, the test suit is certainly improved (see summary.xml
>>>>>>> output) though when I try to use mcnp it still crashes in the
>>>>>>> same
>>
>>>>>>> way (see
>>>>>>> error.txt)
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Anthony
>>>>>>> Chan
>>>>>>> Sent: Thursday, September 02, 2010 1:38 PM
>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>
>>>>>>>
>>>>>>> There is a bug in 1.3b1 about the option --enable-fc. Since
>>>>>>> Fortran
>>>>
>>>>>>> 90 is enabled by default, so remove the --enable-fc from your
>>>>>>> configure command and try again. If there is error again, send
>>>>>>> us
>>
>>>>>>> the configure output as you seen on your screen (See README)
>>>>>>> instead
>>>> of config.log.
>>>>>>>
>>>>>>> A.Chan
>>>>>>>
>>>>>>> ----- "SULLIVAN David (AREVA)" <David.Sullivan at areva.com> wrote:
>>>>>>>
>>>>>>>> Failure again.
>>>>>>>> The 1.3 beta version will not compile with Intel 10.1. It bombs
>>>>>>>> at
>>>
>>>>>>>> the
>>>>>>>
>>>>>>>> configuration script:
>>>>>>>>
>>>>>>>> checking for Fortran flag needed to allow free-form source...
>>>>>>>> unknown
>>>>>>>> configure: WARNING: Fortran 90 test being disabled because the
>>>>>>>> /home/dfs/mpich2-1.3b1/bin/mpif90 compiler does not accept a
>>>>>>>> .f90
>>
>>>>>>>> extension
>>>>>>>> configure: error: Fortran does not accept free-form source
>>>>>>>> configure: error: ./configure failed for test/mpi
>>>>>>>>
>>>>>>>> I have attached to config.log
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev
>>>>>>>> Thakur
>>>>>>>> Sent: Thursday, September 02, 2010 12:11 PM
>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>
>>>>>>>> Just try relinking with the new library at first.
>>>>>>>>
>>>>>>>> Rajeev
>>>>>>>>
>>>>>>>> On Sep 2, 2010, at 9:32 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>
>>>>>>>>> I saw that there was a newer beta. I was really hoping to find
>>>>>>>>> I
>>>>>>>> just
>>>>>>>>> configured something incorrectly. Will this not require me to
>>>>>>>> re-build
>>>>>>>>
>>>>>>>>> mcnp (the only program I run that uses mpi for parallel) if I
>>>>>>>>> change
>>>>>>>>
>>>>>>>>> the mpi version? If so, this is a bit of a hardship, requiring
>>>>>>>>> codes
>>>>>>>>
>>>>>>>>> to be revalidated. If not- I will try it in a second.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave
>>>>>>>> Goodell
>>>>>>>>> Sent: Thursday, September 02, 2010 10:27 AM
>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>
>>>>>>>>> Can you try the latest release (1.3b1) to see if that fixes
>>>>>>>>> the
>
>>>>>>>>> problems you are seeing with your application?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.
>>>>>>>> p
>>>>>>>> h
>>>>>>>> p
>>>>>>>> ?s=
>>>>>>>>> do
>>>>>>>>> wnloads
>>>>>>>>>
>>>>>>>>> -Dave
>>>>>>>>>
>>>>>>>>> On Sep 2, 2010, at 9:15 AM CDT, SULLIVAN David (AREVA) wrote:
>>>>>>>>>
>>>>>>>>>> Another output file, hopefully of use.
>>>>>>>>>>
>>>>>>>>>> Thanks again
>>>>>>>>>>
>>>>>>>>>> Dave
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>> SULLIVAN
>>>
>>>>>>>>>> David
>>>>>>>>>> (AREVA)
>>>>>>>>>> Sent: Thursday, September 02, 2010 8:20 AM
>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>
>>>>>>>>>> First my apologies for the delay in continuing this thread.
>>>>>>>>>> Unfortunately I have not resolved it so if I can indulge the
>>>>>>>>>> gurus
>>>>>>>>
>>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>> developers once again...
>>>>>>>>>>
>>>>>>>>>> As suggested by Rajeev I ran the testing suit in the source
>>>>>>>> directory.
>>>>>>>>>> The output of errors, which are similar to what I was seeing
>>>>>>>>>> when
>>>>
>>>>>>>>>> I
>>>>>>>>
>>>>>>>>>> ran
>>>>>>>>>> mcnp5 (v. 1.40 and 1.51), is attached.
>>>>>>>>>>
>>>>>>>>>> Any insights would be greatly appreciated,
>>>>>>>>>>
>>>>>>>>>> Dave
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>> Rajeev
>>>>>>>> Thakur
>>>>>>>>>> Sent: Wednesday, August 04, 2010 3:06 PM
>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>
>>>>>>>>>> Then one level above that directory (in the main MPICH2
>>>>>>>>>> source
>
>>>>>>>>>> directory), type make testing, which will run through the
>>>>>>>>>> entire
>>>>>>>>>> MPICH2 test suite.
>>>>>>>>>>
>>>>>>>>>> Rajeev
>>>>>>>>>>
>>>>>>>>>> On Aug 4, 2010, at 2:04 PM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>
>>>>>>>>>>> Oh. That's embarrassing. Yea. I have those examples. It
>>>>>>>>>>> runs
>>>>>>>> with
>>>>>>>>>>> no
>>>>>>>>>>> problems:
>>>>>>>>>>>
>>>>>>>>>>> [dfs at aramis examples]$ mpiexec -host aramis -n 4 ./cpi
>>>>>>>>>>> Process
>>>>>>>>>>> 2
>>>>>>>> of
>>>>>>>>>>> 4
>>>>>>>>>
>>>>>>>>>>> is on aramis Process 3 of 4 is on aramis Process 0 of 4 is
>>>>>>>>>>> on
>>>>>>>> aramis
>>>>>>>>
>>>>>>>>>>> Process 1 of 4 is on aramis pi is approximately
>>>>>>>> 3.1415926544231239,
>>>>>>>>>>> Error is 0.0000000008333307 wall clock time = 0.000652
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gus
>>>>>>>> Correa
>>>>>>>>>>> Sent: Wednesday, August 04, 2010 1:13 PM
>>>>>>>>>>> To: Mpich Discuss
>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>
>>>>>>>>>>> Hi David
>>>>>>>>>>>
>>>>>>>>>>> I think the "examples" dir is not copied to the installation
>>>>>>>>>> directory.
>>>>>>>>>>> You may find it where you decompressed the MPICH2 tarball,
>>>>>>>>>>> in
>
>>>>>>>>>>> case
>>>>>>>>
>>>>>>>>>>> you
>>>>>>>>>>
>>>>>>>>>>> installed it from source.
>>>>>>>>>>> At least, this is what I have here.
>>>>>>>>>>>
>>>>>>>>>>> Gus Correa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>> Yea, that always bothered me. There is no such folder.
>>>>>>>>>>>> There are :
>>>>>>>>>>>> bin
>>>>>>>>>>>> etc
>>>>>>>>>>>> include
>>>>>>>>>>>> lib
>>>>>>>>>>>> sbin
>>>>>>>>>>>> share
>>>>>>>>>>>>
>>>>>>>>>>>> The only examples I found were in the share folder,
>>>>>>>>>>>> where
>>>>>>>> there
>>>>>>>>>>> are
>>>>>>>>>>>> examples for collchk, graphics and logging.
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>>>> Rajeev
>>>
>>>>>>>>>>>> Thakur
>>>>>>>>>>>> Sent: Wednesday, August 04, 2010 12:45 PM
>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>
>>>>>>>>>>>> Not cpilog. Can you run just cpi from the mpich2/examples
>>>>>>>> directory.
>>>>>>>>>>>>
>>>>>>>>>>>> Rajeev
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Aug 4, 2010, at 11:37 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Rajeev, Darius,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your response.
>>>>>>>>>>>>> cpi yields the following-
>>>>>>>>>>>>>
>>>>>>>>>>>>> [dfs at aramis examples_logging]$ mpiexec -host aramis -n 12
>>>>>>>> ./cpilog
>>>>>>>>
>>>>>>>>>>>>> Process 0 running on aramis Process 2 running on aramis
>>>>>>>>>>>>> Process
>>>>>>>> 3
>>>>>>>>>>>>> running on aramis Process 1 running on aramis Process 6
>>>>>>>>>>>>> running
>>>>>>>> on
>>>>>>>>
>>>>>>>>>>>>> aramis Process 7 running on aramis Process 8 running on
>>>>>>>>>>>>> aramis
>>>>
>>>>>>>>>>>>> Process
>>>>>>>>>>>>
>>>>>>>>>>>>> 4 running on aramis Process 5 running on aramis Process 9
>>>>>>>> running
>>>>>>>>>>>>> on
>>>>>>>>>>
>>>>>>>>>>>>> aramis Process 10 running on aramis Process 11 running on
>>>>>>>>>>>>> aramis
>>>>>>>>
>>>>>>>>>>>>> pi
>>>>>>>>>
>>>>>>>>>>>>> is
>>>>>>>>>>>>
>>>>>>>>>>>>> approximately 3.1415926535898762, Error is
>>>>>>>>>>>>> 0.0000000000000830
>>>>>>>> wall
>>>>>>>>
>>>>>>>>>>>>> clock time = 0.058131 Writing logfile....
>>>>>>>>>>>>> Enabling the Default clock synchronization...
>>>>>>>>>>>>> clog_merger.c:CLOG_Merger_init() - Could not open file
>>>>>>>>>>>>> ./cpilog.clog2 for merging!
>>>>>>>>>>>>> Backtrace of the callstack at rank 0:
>>>>>>>>>>>>> At [0]: ./cpilog(CLOG_Util_abort+0x92)[0x456326]
>>>>>>>>>>>>> At [1]: ./cpilog(CLOG_Merger_init+0x11f)[0x45db7c]
>>>>>>>>>>>>> At [2]: ./cpilog(CLOG_Converge_init+0x8e)[0x45a691]
>>>>>>>>>>>>> At [3]: ./cpilog(MPE_Finish_log+0xea)[0x4560aa]
>>>>>>>>>>>>> At [4]: ./cpilog(MPI_Finalize+0x50c)[0x4268af]
>>>>>>>>>>>>> At [5]: ./cpilog(main+0x428)[0x415963] At [6]:
>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c1881d994]
>>>>>>>>>>>>> At [7]: ./cpilog[0x415449] application called
>>>>>>>>>>>>> MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>
>>>>>>>>>>>>> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated
>>>>>>>>>>>>> (signal
>>>>>>>>
>>>>>>>>>>>>> 15)
>>>>>>>>>>>>>
>>>>>>>>>>>>> So it looks like it works with some issues.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When does it fail? Immediately
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there a bug? Many sucessfully use the aplication
>>>>>>>>>>>>> (MCNP5,
>>>
>>>>>>>>>>>>> from
>>>>>>>>>>>>> LANL) with mpi, so think that a bug there is
> unlikely.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Core files, unfortunately reveals some ignorance on my
> part.
>>>>>>>> Were
>>>>>>>>>>>>> exactly should I be looking for them?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dave
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>>>>> Darius
>>>>
>>>>>>>>>>>>> Buntinas
>>>>>>>>>>>>> Sent: Wednesday, August 04, 2010 12:19 PM
>>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This error message says that two processes terminated
>>>>>>>>>>>>> because
>>>>>>>> they
>>>>>>>>
>>>>>>>>>>>>> were unable to communicate with another (or two other)
>>>> process.
>>>>>>>>>> It's
>>>>>>>>>>>
>>>>>>>>>>>>> possible that another process died, so the others got
>>>>>>>>>>>>> errors
>>
>>>>>>>>>>>>> trying
>>>>>>>>>
>>>>>>>>>>>>> to
>>>>>>>>>>>>
>>>>>>>>>>>>> communicate with them. It's also possible that there is
>>>>>>>> something
>>>>>>>>
>>>>>>>>>>>>> preventing some processes from communicating with each
>> other.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you able to run cpi from the examples directory with
>>>>>>>>>>>>> 12
>>>>>>>>>>> processes?
>>>>>>>>>>>>>
>>>>>>>>>>>>> At what point in your code does this fail? Are there any
>>>>>>>>>>>>> other
>>>>>>>>
>>>>>>>>>>>>> communication operations before the MPI_Comm_dup?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Enable core files (add "ulimit -c unlimited" to your
>>>>>>>>>>>>> .bashrc
>>
>>>>>>>>>>>>> or
>>>>>>>>>>>>> .tcshrc) then run your app and look for core files. If
>>>>>>>>>>>>> there
>>>
>>>>>>>>>>>>> is
>>>>>>>> a
>>>>>>>>
>>>>>>>>>>>>> bug
>>>>>>>>>>>>
>>>>>>>>>>>>> in your application that causes a process to die this
>>>>>>>>>>>>> might
>
>>>>>>>>>>>>> tell
>>>>>>>>
>>>>>>>>>>>>> you
>>>>>>>>>>
>>>>>>>>>>>>> which one and why.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let us know how this goes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -d
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Aug 4, 2010, at 11:03 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since I have had no responses, is there any other
>>>>>>>> additional
>>>>>>>>>>>>> information could I provide to solicit some direction for
>>>>>>>>>>>>> overcoming
>>>>>>>>>>
>>>>>>>>>>>>> these latest string of mpi errors?
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dave
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>> SULLIVAN
>>>>>>>>>>>>>> David F (AREVA NP INC)
>>>>>>>>>>>>>> Sent: Friday, July 23, 2010 4:29 PM
>>>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>>> Subject: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With my firewall issues firmly behind me, I have a new
>>>>>>>>>>>>>> problem
>>>>>>>>
>>>>>>>>>>>>>> for
>>>>>>>>>
>>>>>>>>>>>>>> the
>>>>>>>>>>>>> collective wisdom. I am attempting to run a program to
>>>>>>>>>>>>> which
>>
>>>>>>>>>>>>> the
>>>>>>>>
>>>>>>>>>>>>> response is as follows:
>>>>>>>>>>>>>> [mcnp5_1-4 at athos ~]$ mpiexec -f nodes -n 12 mcnp5.mpi
>>>>>>>>>>>>>> i=TN04
>>>
>>>>>>>>>>>>>> o=TN04.o
>>>>>>>>>>>>
>>>>>>>>>>>>>> Fatal error in MPI_Comm_dup: Other MPI error, error
stack:
>>>>>>>>>>>>>> MPI_Comm_dup(168).................:
>>>>>>>> MPI_Comm_dup(MPI_COMM_WORLD,
>>>>>>>>>>>>>> new_comm=0x7fff58edb450) failed
>>>>>>>>>>>>>> MPIR_Comm_copy(923)...............:
>>>>>>>>>>>>>> MPIR_Get_contextid(639)...........:
>>>>>>>>>>>>>> MPI_Allreduce(773)................:
>>>>>>>>>> MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>>>>>>>>>>>> rbuf=0x7fff
>>>>>>>>>>>>> 58edb1a0, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD)
>>>>>>>>>>>>> failed
>>>>>>>>>>>>>> MPIR_Allreduce(228)...............:
>>>>>>>>>>>>>> MPIC_Send(41).....................:
>>>>>>>>>>>>>> MPIC_Wait(513)....................:
>>>>>>>>>>>>>> MPIDI_CH3I_Progress(150)..........:
>>>>>>>>>>>>>> MPID_nem_mpich2_blocking_recv(933):
>>>>>>>>>>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error
>>>>>>>>>>>>>> Fatal
>>>>
>>>>>>>>>>>>>> error
>>>>>>>>>>
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> MPI_Comm_dup: Other MPI error, error stack:
>>>>>>>>>>>>>> MPI_Comm_dup(168).................:
>>>>>>>> MPI_Comm_dup(MPI_COMM_WORLD,
>>>>>>>>>>>>> new_comm=0x7fff
>>>>>>>>>>>>> 97dca620) failed
>>>>>>>>>>>>>> MPIR_Comm_copy(923)...............:
>>>>>>>>>>>>>> MPIR_Get_contextid(639)...........:
>>>>>>>>>>>>>> MPI_Allreduce(773)................:
>>>>>>>>>> MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>>>>>>>>>>>> rbuf=0x7fff
>>>>>>>>>>>>> 97dca370, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD)
>>>>>>>>>>>>> failed
>>>>>>>>>>>>>> MPIR_Allreduce(289)...............:
>>>>>>>>>>>>>> MPIC_Sendrecv(161)................:
>>>>>>>>>>>>>> MPIC_Wait(513)....................:
>>>>>>>>>>>>>> MPIDI_CH3I_Progress(150)..........:
>>>>>>>>>>>>>> MPID_nem_mpich2_blocking_recv(948):
>>>>>>>>>>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error
>>>>>>>>>>>>>> Killed
>>>>>>>> by
>>>>>>>>
>>>>>>>>>>>>>> signal 2.
>>>>>>>>>>>>>> Ctrl-C caught... cleaning up processes [mpiexec at athos]
>>>>>>>>>>>>>> HYDT_dmx_deregister_fd (./tools/demux/demux.c:142): could
>>>>>>>>>>>>>> not
>>>>
>>>>>>>>>>>>>> find
>>>>>>>>>
>>>>>>>>>>>>>> fd
>>>>>>>>>>>>
>>>>>>>>>>>>>> to deregister: -2 [mpiexec at athos]
>>>>>>>>>>>>>> HYD_pmcd_pmiserv_cleanup
>>>>>>>>>>>>>> (./pm/pmiserv/pmiserv_cb.c:401): error deregistering fd
>>>>>>>>>>>>>> [press
>>>>>>>>
>>>>>>>>>>>>>> Ctrl-C
>>>>>>>>>>>>
>>>>>>>>>>>>>> again to force abort] APPLICATION TERMINATED WITH THE
>>>>>>>>>>>>>> EXIT
>>>>>>>> STRING:
>>>>>>>>>>>>>> Killed (signal 9) [mcnp5_1-4 at athos ~]$ Any ideas?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David Sullivan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AREVA NP INC
>>>>>>>>>>>>>> 400 Donald Lynch Boulevard Marlborough, MA, 01752
>>>>>>>>>>>>>> Phone: (508) 573-6721
>>>>>>>>>>>>>> Fax: (434) 382-5597
>>>>>>>>>>>>>> David.Sullivan at AREVA.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>> <summary.xml>_______________________________________________
>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>> <summary.xml><error.txt>________________________________________
>>>>>>> _
>>>>>>> _
>>>>>>> _
>>>>>>> _
>>>>>>> ___
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>
>>>>>> <winmail.dat>_______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>> <winmail.dat>_______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> <testing_screen.txt>_______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> <testing_screen.txt>_______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list