[mpich-discuss] cryptic (to me) error
Dave Goodell
goodell at mcs.anl.gov
Fri Sep 3 13:18:18 CDT 2010
If you are going to take that approach for the test suite, then you must make sure to do a plain "make" in "test/mpi" before copying to the other nodes.
-Dave
On Sep 3, 2010, at 1:10 PM CDT, SULLIVAN David (AREVA) wrote:
> No it is not. I have copied everything to each node in the same
> location.
>
>
>
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
> Sent: Friday, September 03, 2010 2:03 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] cryptic (to me) error
>
> The test suite directory must be on a shared filesystem because mpiexec
> does not stage executables for you.
>
> -Dave
>
> On Sep 3, 2010, at 1:02 PM CDT, SULLIVAN David (AREVA) wrote:
>
>> Oh, Well I left it and every test results in...
>>
>> Unexpected output in attrend: [proxy:0:1 at porthos] HYDU_create_process
>> (./utils/l
>> aunch/launch.c:70): execvp error on file ./attrend (No such file or
>> directory)
>> Unexpected output in attrend: [proxy:0:1 at porthos] HYDU_create_process
>> (./utils/l
>> aunch/launch.c:70): execvp error on file ./attrend (No such file or
>> directory)
>> Unexpected output in attrend: [mpiexec at aramis] HYDT_dmx_deregister_fd
>> (./tools/demux/demux.c:142): could not find fd to deregister: -2
>> Unexpected output in attrend: [mpiexec at aramis]
>> HYD_pmcd_pmiserv_cleanup
>> (./pm/pmiserv/pmiserv_cb.c:398): error deregistering fd Unexpected
>> output in attrend: APPLICATION TERMINATED WITH THE EXIT
>> STRING: Hangup (signal 1)
>> Program attrend exited without No Errors Unexpected output in
>> attrend2: [proxy:0:1 at porthos] HYDU_create_process
>> (./utils/launch/launch.c:70): execvp error on file ./attrend2 (No such
>
>> file or directory) Unexpected output in attrend2: [proxy:0:1 at porthos]
>> HYDU_create_process
>> (./utils/launch/launch.c:70): execvp error on file ./attrend2 (No such
>
>> file or directory) Unexpected output in attrend2: [mpiexec at aramis]
>> HYDT_dmx_deregister_fd
>> (./tools/demux/demux.c:142): could not find fd to deregister: -2
>> Unexpected output in attrend2: [mpiexec at aramis]
>> HYD_pmcd_pmiserv_cleanup
>> (./pm/pmiserv/pmiserv_cb.c:398): error deregistering fd Unexpected
>> output in attrend2: APPLICATION TERMINATED WITH THE EXIT
>> STRING: Hangup (signal 1)
>> Program attrend2 exited without No Errors
>>
>>
>>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius
>> Buntinas
>> Sent: Friday, September 03, 2010 1:45 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>
>> The tests run quietly, unless there's an error. Set VERBOSE=1 to the
>> command like to see each test running:
>>
>> HYDRA_HOST_FILE=XXX VERBOSE=1 make testing
>>
>> -d
>>
>> On Sep 3, 2010, at 12:40 PM, SULLIVAN David (AREVA) wrote:
>>
>>> Oh, Ok. Well it just hangs so I aborted. Checked each machine with
>>> top
>>
>>> and neither are doing anything when it hangs. Attached is the screen
>>> output.
>>>
>>> Thanks again.
>>>
>>>
>>> Dave
>>>
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>>> Sent: Friday, September 03, 2010 12:14 PM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>
>>> Give the full path to your host file: /path/to/node_list
>>>
>>> When the test script changes directories to the different test
>>> directories, a relative path isn't correct anymore.
>>>
>>> -Dave
>>>
>>> On Sep 3, 2010, at 9:48 AM CDT, SULLIVAN David (AREVA) wrote:
>>>
>>>> Wow, that made a difference. Not a good one. But a big one. I
>>>> stopped
>>
>>>> the test since each process resulted in the attached errors. It
>>>> looks
>>
>>>> like it is trying to access files that are not there to parse the
>>>> HYDRA_HOST_FILE. I checked if there were there and it was just a
>>>> permissions thing, but that was not the case.
>>>>
>>>> Dave
>>>>
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>>>> Sent: Friday, September 03, 2010 10:36 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>
>>>> I was unclear. I meant to use HYDRA_HOST_FILE when running the
>>>> MPICH2
>>>
>>>> test suite in order to make it run across the same set of machines
>>>> that your MCNP code runs on.
>>>>
>>>> -Dave
>>>>
>>>> On Sep 3, 2010, at 9:25 AM CDT, SULLIVAN David (AREVA) wrote:
>>>>
>>>>> Same result, mpi doesn't even get to call mcnp5.mpi. It just
>>>>> returns
>>
>>>>> the
>>>>>
>>>>> Fatal error in PMPI_Comm_dup: Other MPI error, error stack: error.
>>>>>
>>>>> -----Original Message-----
>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave
>>>>> Goodell
>>>>> Sent: Friday, September 03, 2010 10:21 AM
>>>>> To: mpich-discuss at mcs.anl.gov
>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>
>>>>> Try setting HYDRA_HOST_FILE=your_machine_file_here
>>>>>
>>>>> This will make hydra act as though you passed "-f your
>>>>> machine_file_here" to every mpiexec.
>>>>>
>>>>> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Ma
>>>>> n
>>>>> a
>>>>> g
>>>>> er
>>>>> #Environment_Settings
>>>>>
>>>>> -Dave
>>>>>
>>>>> On Sep 3, 2010, at 5:00 AM CDT, SULLIVAN David (AREVA) wrote:
>>>>>
>>>>>> I was wondering about that. Is there a configuration file that
>>>>>> sets
>>
>>>>>> up
>>>>> the cluster and defines which node to run on? Would that make the
>>>>> issue any clearer?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Rajeev Thakur
>>>>>> Sent: Thu 9/2/2010 10:22 PM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>
>>>>>> There might be some connection issues between the two machines.
>>>>>> The
>>>>> MPICH2 test suite that you ran with "make testing" probably ran on
>>>>> a
>>
>>>>> single machine.
>>>>>>
>>>>>> On Sep 2, 2010, at 6:27 PM, SULLIVAN David (AREVA) wrote:
>>>>>>
>>>>>>> The error occurs immediately- I don't think it even starts the
>>>>> executable. It does work on the single machine with 4 processes.
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Rajeev
>>>>>>> Thakur
>>>>>>> Sent: Thu 9/2/2010 4:34 PM
>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>
>>>>>>> Does it run with 2 processes on a single machine?
>>>>>>>
>>>>>>>
>>>>>>> On Sep 2, 2010, at 2:38 PM, SULLIVAN David (AREVA) wrote:
>>>>>>>
>>>>>>>> That fixed the compile. Thanks!
>>>>>>>>
>>>>>>>> The latest release does not fix the issues I am having though.
>>>>>>>> Cpi
>>>
>>>>>>>> works fine, the test suit is certainly improved (see summary.xml
>>>>>>>> output) though when I try to use mcnp it still crashes in the
>>>>>>>> same
>>>
>>>>>>>> way (see
>>>>>>>> error.txt)
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Anthony
>>>>>>>> Chan
>>>>>>>> Sent: Thursday, September 02, 2010 1:38 PM
>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>
>>>>>>>>
>>>>>>>> There is a bug in 1.3b1 about the option --enable-fc. Since
>>>>>>>> Fortran
>>>>>
>>>>>>>> 90 is enabled by default, so remove the --enable-fc from your
>>>>>>>> configure command and try again. If there is error again, send
>>>>>>>> us
>>>
>>>>>>>> the configure output as you seen on your screen (See README)
>>>>>>>> instead
>>>>> of config.log.
>>>>>>>>
>>>>>>>> A.Chan
>>>>>>>>
>>>>>>>> ----- "SULLIVAN David (AREVA)" <David.Sullivan at areva.com> wrote:
>>>>>>>>
>>>>>>>>> Failure again.
>>>>>>>>> The 1.3 beta version will not compile with Intel 10.1. It bombs
>
>>>>>>>>> at
>>>>
>>>>>>>>> the
>>>>>>>>
>>>>>>>>> configuration script:
>>>>>>>>>
>>>>>>>>> checking for Fortran flag needed to allow free-form source...
>>>>>>>>> unknown
>>>>>>>>> configure: WARNING: Fortran 90 test being disabled because the
>>>>>>>>> /home/dfs/mpich2-1.3b1/bin/mpif90 compiler does not accept a
>>>>>>>>> .f90
>>>
>>>>>>>>> extension
>>>>>>>>> configure: error: Fortran does not accept free-form source
>>>>>>>>> configure: error: ./configure failed for test/mpi
>>>>>>>>>
>>>>>>>>> I have attached to config.log
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev
>>>>>>>>> Thakur
>>>>>>>>> Sent: Thursday, September 02, 2010 12:11 PM
>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>
>>>>>>>>> Just try relinking with the new library at first.
>>>>>>>>>
>>>>>>>>> Rajeev
>>>>>>>>>
>>>>>>>>> On Sep 2, 2010, at 9:32 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>
>>>>>>>>>> I saw that there was a newer beta. I was really hoping to find
>
>>>>>>>>>> I
>>>>>>>>> just
>>>>>>>>>> configured something incorrectly. Will this not require me to
>>>>>>>>> re-build
>>>>>>>>>
>>>>>>>>>> mcnp (the only program I run that uses mpi for parallel) if I
>>>>>>>>>> change
>>>>>>>>>
>>>>>>>>>> the mpi version? If so, this is a bit of a hardship, requiring
>
>>>>>>>>>> codes
>>>>>>>>>
>>>>>>>>>> to be revalidated. If not- I will try it in a second.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Dave
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave
>>>>>>>>> Goodell
>>>>>>>>>> Sent: Thursday, September 02, 2010 10:27 AM
>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>
>>>>>>>>>> Can you try the latest release (1.3b1) to see if that fixes
>>>>>>>>>> the
>>
>>>>>>>>>> problems you are seeing with your application?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.
>>>>>>>>> p
>>>>>>>>> h
>>>>>>>>> p
>>>>>>>>> ?s=
>>>>>>>>>> do
>>>>>>>>>> wnloads
>>>>>>>>>>
>>>>>>>>>> -Dave
>>>>>>>>>>
>>>>>>>>>> On Sep 2, 2010, at 9:15 AM CDT, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>
>>>>>>>>>>> Another output file, hopefully of use.
>>>>>>>>>>>
>>>>>>>>>>> Thanks again
>>>>>>>>>>>
>>>>>>>>>>> Dave
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>>> SULLIVAN
>>>>
>>>>>>>>>>> David
>>>>>>>>>>> (AREVA)
>>>>>>>>>>> Sent: Thursday, September 02, 2010 8:20 AM
>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>
>>>>>>>>>>> First my apologies for the delay in continuing this thread.
>>>>>>>>>>> Unfortunately I have not resolved it so if I can indulge the
>>>>>>>>>>> gurus
>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>>> developers once again...
>>>>>>>>>>>
>>>>>>>>>>> As suggested by Rajeev I ran the testing suit in the source
>>>>>>>>> directory.
>>>>>>>>>>> The output of errors, which are similar to what I was seeing
>>>>>>>>>>> when
>>>>>
>>>>>>>>>>> I
>>>>>>>>>
>>>>>>>>>>> ran
>>>>>>>>>>> mcnp5 (v. 1.40 and 1.51), is attached.
>>>>>>>>>>>
>>>>>>>>>>> Any insights would be greatly appreciated,
>>>>>>>>>>>
>>>>>>>>>>> Dave
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>>> Rajeev
>>>>>>>>> Thakur
>>>>>>>>>>> Sent: Wednesday, August 04, 2010 3:06 PM
>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>
>>>>>>>>>>> Then one level above that directory (in the main MPICH2
>>>>>>>>>>> source
>>
>>>>>>>>>>> directory), type make testing, which will run through the
>>>>>>>>>>> entire
>>>>>>>>>>> MPICH2 test suite.
>>>>>>>>>>>
>>>>>>>>>>> Rajeev
>>>>>>>>>>>
>>>>>>>>>>> On Aug 4, 2010, at 2:04 PM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Oh. That's embarrassing. Yea. I have those examples. It
>>>>>>>>>>>> runs
>>>>>>>>> with
>>>>>>>>>>>> no
>>>>>>>>>>>> problems:
>>>>>>>>>>>>
>>>>>>>>>>>> [dfs at aramis examples]$ mpiexec -host aramis -n 4 ./cpi
>>>>>>>>>>>> Process
>>>>>>>>>>>> 2
>>>>>>>>> of
>>>>>>>>>>>> 4
>>>>>>>>>>
>>>>>>>>>>>> is on aramis Process 3 of 4 is on aramis Process 0 of 4 is
>>>>>>>>>>>> on
>>>>>>>>> aramis
>>>>>>>>>
>>>>>>>>>>>> Process 1 of 4 is on aramis pi is approximately
>>>>>>>>> 3.1415926544231239,
>>>>>>>>>>>> Error is 0.0000000008333307 wall clock time = 0.000652
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gus
>>>>>>>>> Correa
>>>>>>>>>>>> Sent: Wednesday, August 04, 2010 1:13 PM
>>>>>>>>>>>> To: Mpich Discuss
>>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>
>>>>>>>>>>>> Hi David
>>>>>>>>>>>>
>>>>>>>>>>>> I think the "examples" dir is not copied to the installation
>>>>>>>>>>> directory.
>>>>>>>>>>>> You may find it where you decompressed the MPICH2 tarball,
>>>>>>>>>>>> in
>>
>>>>>>>>>>>> case
>>>>>>>>>
>>>>>>>>>>>> you
>>>>>>>>>>>
>>>>>>>>>>>> installed it from source.
>>>>>>>>>>>> At least, this is what I have here.
>>>>>>>>>>>>
>>>>>>>>>>>> Gus Correa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>>> Yea, that always bothered me. There is no such folder.
>>>>>>>>>>>>> There are :
>>>>>>>>>>>>> bin
>>>>>>>>>>>>> etc
>>>>>>>>>>>>> include
>>>>>>>>>>>>> lib
>>>>>>>>>>>>> sbin
>>>>>>>>>>>>> share
>>>>>>>>>>>>>
>>>>>>>>>>>>> The only examples I found were in the share folder,
>>>>>>>>>>>>> where
>>>>>>>>> there
>>>>>>>>>>>> are
>>>>>>>>>>>>> examples for collchk, graphics and logging.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>>>>> Rajeev
>>>>
>>>>>>>>>>>>> Thakur
>>>>>>>>>>>>> Sent: Wednesday, August 04, 2010 12:45 PM
>>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>>
>>>>>>>>>>>>> Not cpilog. Can you run just cpi from the mpich2/examples
>>>>>>>>> directory.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rajeev
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Aug 4, 2010, at 11:37 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Rajeev, Darius,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your response.
>>>>>>>>>>>>>> cpi yields the following-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [dfs at aramis examples_logging]$ mpiexec -host aramis -n 12
>>>>>>>>> ./cpilog
>>>>>>>>>
>>>>>>>>>>>>>> Process 0 running on aramis Process 2 running on aramis
>>>>>>>>>>>>>> Process
>>>>>>>>> 3
>>>>>>>>>>>>>> running on aramis Process 1 running on aramis Process 6
>>>>>>>>>>>>>> running
>>>>>>>>> on
>>>>>>>>>
>>>>>>>>>>>>>> aramis Process 7 running on aramis Process 8 running on
>>>>>>>>>>>>>> aramis
>>>>>
>>>>>>>>>>>>>> Process
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 4 running on aramis Process 5 running on aramis Process 9
>>>>>>>>> running
>>>>>>>>>>>>>> on
>>>>>>>>>>>
>>>>>>>>>>>>>> aramis Process 10 running on aramis Process 11 running on
>>>>>>>>>>>>>> aramis
>>>>>>>>>
>>>>>>>>>>>>>> pi
>>>>>>>>>>
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>
>>>>>>>>>>>>>> approximately 3.1415926535898762, Error is
>>>>>>>>>>>>>> 0.0000000000000830
>>>>>>>>> wall
>>>>>>>>>
>>>>>>>>>>>>>> clock time = 0.058131 Writing logfile....
>>>>>>>>>>>>>> Enabling the Default clock synchronization...
>>>>>>>>>>>>>> clog_merger.c:CLOG_Merger_init() - Could not open file
>>>>>>>>>>>>>> ./cpilog.clog2 for merging!
>>>>>>>>>>>>>> Backtrace of the callstack at rank 0:
>>>>>>>>>>>>>> At [0]: ./cpilog(CLOG_Util_abort+0x92)[0x456326]
>>>>>>>>>>>>>> At [1]: ./cpilog(CLOG_Merger_init+0x11f)[0x45db7c]
>>>>>>>>>>>>>> At [2]: ./cpilog(CLOG_Converge_init+0x8e)[0x45a691]
>>>>>>>>>>>>>> At [3]: ./cpilog(MPE_Finish_log+0xea)[0x4560aa]
>>>>>>>>>>>>>> At [4]: ./cpilog(MPI_Finalize+0x50c)[0x4268af]
>>>>>>>>>>>>>> At [5]: ./cpilog(main+0x428)[0x415963] At [6]:
>>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c1881d994]
>>>>>>>>>>>>>> At [7]: ./cpilog[0x415449] application called
>>>>>>>>>>>>>> MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>
>>>>>>>>>>>>>> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated
>>>>>>>>>>>>>> (signal
>>>>>>>>>
>>>>>>>>>>>>>> 15)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So it looks like it works with some issues.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When does it fail? Immediately
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a bug? Many sucessfully use the aplication
>>>>>>>>>>>>>> (MCNP5,
>>>>
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> LANL) with mpi, so think that a bug there is
>> unlikely.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Core files, unfortunately reveals some ignorance on my
>> part.
>>>>>>>>> Were
>>>>>>>>>>>>>> exactly should I be looking for them?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dave
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>>>>>> Darius
>>>>>
>>>>>>>>>>>>>> Buntinas
>>>>>>>>>>>>>> Sent: Wednesday, August 04, 2010 12:19 PM
>>>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This error message says that two processes terminated
>>>>>>>>>>>>>> because
>>>>>>>>> they
>>>>>>>>>
>>>>>>>>>>>>>> were unable to communicate with another (or two other)
>>>>> process.
>>>>>>>>>>> It's
>>>>>>>>>>>>
>>>>>>>>>>>>>> possible that another process died, so the others got
>>>>>>>>>>>>>> errors
>>>
>>>>>>>>>>>>>> trying
>>>>>>>>>>
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>
>>>>>>>>>>>>>> communicate with them. It's also possible that there is
>>>>>>>>> something
>>>>>>>>>
>>>>>>>>>>>>>> preventing some processes from communicating with each
>>> other.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are you able to run cpi from the examples directory with
>>>>>>>>>>>>>> 12
>>>>>>>>>>>> processes?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At what point in your code does this fail? Are there any
>>>>>>>>>>>>>> other
>>>>>>>>>
>>>>>>>>>>>>>> communication operations before the MPI_Comm_dup?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Enable core files (add "ulimit -c unlimited" to your
>>>>>>>>>>>>>> .bashrc
>>>
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>> .tcshrc) then run your app and look for core files. If
>>>>>>>>>>>>>> there
>>>>
>>>>>>>>>>>>>> is
>>>>>>>>> a
>>>>>>>>>
>>>>>>>>>>>>>> bug
>>>>>>>>>>>>>
>>>>>>>>>>>>>> in your application that causes a process to die this
>>>>>>>>>>>>>> might
>>
>>>>>>>>>>>>>> tell
>>>>>>>>>
>>>>>>>>>>>>>> you
>>>>>>>>>>>
>>>>>>>>>>>>>> which one and why.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let us know how this goes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -d
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Aug 4, 2010, at 11:03 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Since I have had no responses, is there any other
>>>>>>>>> additional
>>>>>>>>>>>>>> information could I provide to solicit some direction for
>>>>>>>>>>>>>> overcoming
>>>>>>>>>>>
>>>>>>>>>>>>>> these latest string of mpi errors?
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dave
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>> SULLIVAN
>>>>>>>>>>>>>>> David F (AREVA NP INC)
>>>>>>>>>>>>>>> Sent: Friday, July 23, 2010 4:29 PM
>>>>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>>>> Subject: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With my firewall issues firmly behind me, I have a new
>>>>>>>>>>>>>>> problem
>>>>>>>>>
>>>>>>>>>>>>>>> for
>>>>>>>>>>
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> collective wisdom. I am attempting to run a program to
>>>>>>>>>>>>>> which
>>>
>>>>>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>>>>>> response is as follows:
>>>>>>>>>>>>>>> [mcnp5_1-4 at athos ~]$ mpiexec -f nodes -n 12 mcnp5.mpi
>>>>>>>>>>>>>>> i=TN04
>>>>
>>>>>>>>>>>>>>> o=TN04.o
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Fatal error in MPI_Comm_dup: Other MPI error, error
> stack:
>>>>>>>>>>>>>>> MPI_Comm_dup(168).................:
>>>>>>>>> MPI_Comm_dup(MPI_COMM_WORLD,
>>>>>>>>>>>>>>> new_comm=0x7fff58edb450) failed
>>>>>>>>>>>>>>> MPIR_Comm_copy(923)...............:
>>>>>>>>>>>>>>> MPIR_Get_contextid(639)...........:
>>>>>>>>>>>>>>> MPI_Allreduce(773)................:
>>>>>>>>>>> MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>>>>>>>>>>>>> rbuf=0x7fff
>>>>>>>>>>>>>> 58edb1a0, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD)
>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>> MPIR_Allreduce(228)...............:
>>>>>>>>>>>>>>> MPIC_Send(41).....................:
>>>>>>>>>>>>>>> MPIC_Wait(513)....................:
>>>>>>>>>>>>>>> MPIDI_CH3I_Progress(150)..........:
>>>>>>>>>>>>>>> MPID_nem_mpich2_blocking_recv(933):
>>>>>>>>>>>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error
>>>>>>>>>>>>>>> Fatal
>>>>>
>>>>>>>>>>>>>>> error
>>>>>>>>>>>
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> MPI_Comm_dup: Other MPI error, error stack:
>>>>>>>>>>>>>>> MPI_Comm_dup(168).................:
>>>>>>>>> MPI_Comm_dup(MPI_COMM_WORLD,
>>>>>>>>>>>>>> new_comm=0x7fff
>>>>>>>>>>>>>> 97dca620) failed
>>>>>>>>>>>>>>> MPIR_Comm_copy(923)...............:
>>>>>>>>>>>>>>> MPIR_Get_contextid(639)...........:
>>>>>>>>>>>>>>> MPI_Allreduce(773)................:
>>>>>>>>>>> MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>>>>>>>>>>>>> rbuf=0x7fff
>>>>>>>>>>>>>> 97dca370, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD)
>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>> MPIR_Allreduce(289)...............:
>>>>>>>>>>>>>>> MPIC_Sendrecv(161)................:
>>>>>>>>>>>>>>> MPIC_Wait(513)....................:
>>>>>>>>>>>>>>> MPIDI_CH3I_Progress(150)..........:
>>>>>>>>>>>>>>> MPID_nem_mpich2_blocking_recv(948):
>>>>>>>>>>>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error
>>>>>>>>>>>>>>> Killed
>>>>>>>>> by
>>>>>>>>>
>>>>>>>>>>>>>>> signal 2.
>>>>>>>>>>>>>>> Ctrl-C caught... cleaning up processes [mpiexec at athos]
>>>>>>>>>>>>>>> HYDT_dmx_deregister_fd (./tools/demux/demux.c:142): could
>
>>>>>>>>>>>>>>> not
>>>>>
>>>>>>>>>>>>>>> find
>>>>>>>>>>
>>>>>>>>>>>>>>> fd
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> to deregister: -2 [mpiexec at athos]
>>>>>>>>>>>>>>> HYD_pmcd_pmiserv_cleanup
>>>>>>>>>>>>>>> (./pm/pmiserv/pmiserv_cb.c:401): error deregistering fd
>>>>>>>>>>>>>>> [press
>>>>>>>>>
>>>>>>>>>>>>>>> Ctrl-C
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> again to force abort] APPLICATION TERMINATED WITH THE
>>>>>>>>>>>>>>> EXIT
>>>>>>>>> STRING:
>>>>>>>>>>>>>>> Killed (signal 9) [mcnp5_1-4 at athos ~]$ Any ideas?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David Sullivan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> AREVA NP INC
>>>>>>>>>>>>>>> 400 Donald Lynch Boulevard Marlborough, MA, 01752
>>>>>>>>>>>>>>> Phone: (508) 573-6721
>>>>>>>>>>>>>>> Fax: (434) 382-5597
>>>>>>>>>>>>>>> David.Sullivan at AREVA.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>> <summary.xml>_______________________________________________
>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>> <summary.xml><error.txt>________________________________________
>>>>>>>> _
>>>>>>>> _
>>>>>>>> _
>>>>>>>> _
>>>>>>>> ___
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>
>>>>>>> <winmail.dat>_______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>
>>>>>> <winmail.dat>_______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> <testing_screen.txt>_______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> <testing_screen.txt>_______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list