[mpich-discuss] cryptic (to me) error
SULLIVAN David (AREVA)
David.Sullivan at areva.com
Fri Sep 3 13:02:01 CDT 2010
Oh, Well I left it and every test results in...
Unexpected output in attrend: [proxy:0:1 at porthos] HYDU_create_process
(./utils/l
aunch/launch.c:70): execvp error on file ./attrend (No such file or
directory)
Unexpected output in attrend: [proxy:0:1 at porthos] HYDU_create_process
(./utils/l
aunch/launch.c:70): execvp error on file ./attrend (No such file or
directory)
Unexpected output in attrend: [mpiexec at aramis] HYDT_dmx_deregister_fd
(./tools/demux/demux.c:142): could not find fd to deregister: -2
Unexpected output in attrend: [mpiexec at aramis] HYD_pmcd_pmiserv_cleanup
(./pm/pmiserv/pmiserv_cb.c:398): error deregistering fd
Unexpected output in attrend: APPLICATION TERMINATED WITH THE EXIT
STRING: Hangup (signal 1)
Program attrend exited without No Errors
Unexpected output in attrend2: [proxy:0:1 at porthos] HYDU_create_process
(./utils/launch/launch.c:70): execvp error on file ./attrend2 (No such
file or directory)
Unexpected output in attrend2: [proxy:0:1 at porthos] HYDU_create_process
(./utils/launch/launch.c:70): execvp error on file ./attrend2 (No such
file or directory)
Unexpected output in attrend2: [mpiexec at aramis] HYDT_dmx_deregister_fd
(./tools/demux/demux.c:142): could not find fd to deregister: -2
Unexpected output in attrend2: [mpiexec at aramis] HYD_pmcd_pmiserv_cleanup
(./pm/pmiserv/pmiserv_cb.c:398): error deregistering fd
Unexpected output in attrend2: APPLICATION TERMINATED WITH THE EXIT
STRING: Hangup (signal 1)
Program attrend2 exited without No Errors
-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius Buntinas
Sent: Friday, September 03, 2010 1:45 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] cryptic (to me) error
The tests run quietly, unless there's an error. Set VERBOSE=1 to the
command like to see each test running:
HYDRA_HOST_FILE=XXX VERBOSE=1 make testing
-d
On Sep 3, 2010, at 12:40 PM, SULLIVAN David (AREVA) wrote:
> Oh, Ok. Well it just hangs so I aborted. Checked each machine with top
> and neither are doing anything when it hangs. Attached is the screen
> output.
>
> Thanks again.
>
>
> Dave
>
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
> Sent: Friday, September 03, 2010 12:14 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] cryptic (to me) error
>
> Give the full path to your host file: /path/to/node_list
>
> When the test script changes directories to the different test
> directories, a relative path isn't correct anymore.
>
> -Dave
>
> On Sep 3, 2010, at 9:48 AM CDT, SULLIVAN David (AREVA) wrote:
>
>> Wow, that made a difference. Not a good one. But a big one. I stopped
>> the test since each process resulted in the attached errors. It looks
>> like it is trying to access files that are not there to parse the
>> HYDRA_HOST_FILE. I checked if there were there and it was just a
>> permissions thing, but that was not the case.
>>
>> Dave
>>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>> Sent: Friday, September 03, 2010 10:36 AM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>
>> I was unclear. I meant to use HYDRA_HOST_FILE when running the
>> MPICH2
>
>> test suite in order to make it run across the same set of machines
>> that your MCNP code runs on.
>>
>> -Dave
>>
>> On Sep 3, 2010, at 9:25 AM CDT, SULLIVAN David (AREVA) wrote:
>>
>>> Same result, mpi doesn't even get to call mcnp5.mpi. It just returns
>>> the
>>>
>>> Fatal error in PMPI_Comm_dup: Other MPI error, error stack: error.
>>>
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>>> Sent: Friday, September 03, 2010 10:21 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>
>>> Try setting HYDRA_HOST_FILE=your_machine_file_here
>>>
>>> This will make hydra act as though you passed "-f your
>>> machine_file_here" to every mpiexec.
>>>
>>> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Man
>>> a
>>> g
>>> er
>>> #Environment_Settings
>>>
>>> -Dave
>>>
>>> On Sep 3, 2010, at 5:00 AM CDT, SULLIVAN David (AREVA) wrote:
>>>
>>>> I was wondering about that. Is there a configuration file that sets
>>>> up
>>> the cluster and defines which node to run on? Would that make the
>>> issue any clearer?
>>>>
>>>> Thanks,
>>>>
>>>> Dave
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Rajeev Thakur
>>>> Sent: Thu 9/2/2010 10:22 PM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>
>>>> There might be some connection issues between the two machines. The
>>> MPICH2 test suite that you ran with "make testing" probably ran on a
>>> single machine.
>>>>
>>>> On Sep 2, 2010, at 6:27 PM, SULLIVAN David (AREVA) wrote:
>>>>
>>>>> The error occurs immediately- I don't think it even starts the
>>> executable. It does work on the single machine with 4 processes.
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Rajeev Thakur
>>>>> Sent: Thu 9/2/2010 4:34 PM
>>>>> To: mpich-discuss at mcs.anl.gov
>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>
>>>>> Does it run with 2 processes on a single machine?
>>>>>
>>>>>
>>>>> On Sep 2, 2010, at 2:38 PM, SULLIVAN David (AREVA) wrote:
>>>>>
>>>>>> That fixed the compile. Thanks!
>>>>>>
>>>>>> The latest release does not fix the issues I am having though.
>>>>>> Cpi
>
>>>>>> works fine, the test suit is certainly improved (see summary.xml
>>>>>> output) though when I try to use mcnp it still crashes in the
>>>>>> same
>
>>>>>> way (see
>>>>>> error.txt)
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Anthony
>>>>>> Chan
>>>>>> Sent: Thursday, September 02, 2010 1:38 PM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>
>>>>>>
>>>>>> There is a bug in 1.3b1 about the option --enable-fc. Since
>>>>>> Fortran
>>>
>>>>>> 90 is enabled by default, so remove the --enable-fc from your
>>>>>> configure command and try again. If there is error again, send
>>>>>> us
>
>>>>>> the configure output as you seen on your screen (See README)
>>>>>> instead
>>> of config.log.
>>>>>>
>>>>>> A.Chan
>>>>>>
>>>>>> ----- "SULLIVAN David (AREVA)" <David.Sullivan at areva.com> wrote:
>>>>>>
>>>>>>> Failure again.
>>>>>>> The 1.3 beta version will not compile with Intel 10.1. It bombs
>>>>>>> at
>>
>>>>>>> the
>>>>>>
>>>>>>> configuration script:
>>>>>>>
>>>>>>> checking for Fortran flag needed to allow free-form source...
>>>>>>> unknown
>>>>>>> configure: WARNING: Fortran 90 test being disabled because the
>>>>>>> /home/dfs/mpich2-1.3b1/bin/mpif90 compiler does not accept a
>>>>>>> .f90
>
>>>>>>> extension
>>>>>>> configure: error: Fortran does not accept free-form source
>>>>>>> configure: error: ./configure failed for test/mpi
>>>>>>>
>>>>>>> I have attached to config.log
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev
>>>>>>> Thakur
>>>>>>> Sent: Thursday, September 02, 2010 12:11 PM
>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>
>>>>>>> Just try relinking with the new library at first.
>>>>>>>
>>>>>>> Rajeev
>>>>>>>
>>>>>>> On Sep 2, 2010, at 9:32 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>
>>>>>>>> I saw that there was a newer beta. I was really hoping to find
>>>>>>>> I
>>>>>>> just
>>>>>>>> configured something incorrectly. Will this not require me to
>>>>>>> re-build
>>>>>>>
>>>>>>>> mcnp (the only program I run that uses mpi for parallel) if I
>>>>>>>> change
>>>>>>>
>>>>>>>> the mpi version? If so, this is a bit of a hardship, requiring
>>>>>>>> codes
>>>>>>>
>>>>>>>> to be revalidated. If not- I will try it in a second.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Dave
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave
>>>>>>> Goodell
>>>>>>>> Sent: Thursday, September 02, 2010 10:27 AM
>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>
>>>>>>>> Can you try the latest release (1.3b1) to see if that fixes the
>>>>>>>> problems you are seeing with your application?
>>>>>>>>
>>>>>>>>
>>>>>>> http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.
>>>>>>> p
>>>>>>> h
>>>>>>> p
>>>>>>> ?s=
>>>>>>>> do
>>>>>>>> wnloads
>>>>>>>>
>>>>>>>> -Dave
>>>>>>>>
>>>>>>>> On Sep 2, 2010, at 9:15 AM CDT, SULLIVAN David (AREVA) wrote:
>>>>>>>>
>>>>>>>>> Another output file, hopefully of use.
>>>>>>>>>
>>>>>>>>> Thanks again
>>>>>>>>>
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>> SULLIVAN
>>
>>>>>>>>> David
>>>>>>>>> (AREVA)
>>>>>>>>> Sent: Thursday, September 02, 2010 8:20 AM
>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>
>>>>>>>>> First my apologies for the delay in continuing this thread.
>>>>>>>>> Unfortunately I have not resolved it so if I can indulge the
>>>>>>>>> gurus
>>>>>>>
>>>>>>>>> and
>>>>>>>>
>>>>>>>>> developers once again...
>>>>>>>>>
>>>>>>>>> As suggested by Rajeev I ran the testing suit in the source
>>>>>>> directory.
>>>>>>>>> The output of errors, which are similar to what I was seeing
>>>>>>>>> when
>>>
>>>>>>>>> I
>>>>>>>
>>>>>>>>> ran
>>>>>>>>> mcnp5 (v. 1.40 and 1.51), is attached.
>>>>>>>>>
>>>>>>>>> Any insights would be greatly appreciated,
>>>>>>>>>
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev
>>>>>>> Thakur
>>>>>>>>> Sent: Wednesday, August 04, 2010 3:06 PM
>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>
>>>>>>>>> Then one level above that directory (in the main MPICH2 source
>>>>>>>>> directory), type make testing, which will run through the
>>>>>>>>> entire
>>>>>>>>> MPICH2 test suite.
>>>>>>>>>
>>>>>>>>> Rajeev
>>>>>>>>>
>>>>>>>>> On Aug 4, 2010, at 2:04 PM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>
>>>>>>>>>> Oh. That's embarrassing. Yea. I have those examples. It runs
>>>>>>> with
>>>>>>>>>> no
>>>>>>>>>> problems:
>>>>>>>>>>
>>>>>>>>>> [dfs at aramis examples]$ mpiexec -host aramis -n 4 ./cpi
>>>>>>>>>> Process
>>>>>>>>>> 2
>>>>>>> of
>>>>>>>>>> 4
>>>>>>>>
>>>>>>>>>> is on aramis Process 3 of 4 is on aramis Process 0 of 4 is on
>>>>>>> aramis
>>>>>>>
>>>>>>>>>> Process 1 of 4 is on aramis pi is approximately
>>>>>>> 3.1415926544231239,
>>>>>>>>>> Error is 0.0000000008333307 wall clock time = 0.000652
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gus
>>>>>>> Correa
>>>>>>>>>> Sent: Wednesday, August 04, 2010 1:13 PM
>>>>>>>>>> To: Mpich Discuss
>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>
>>>>>>>>>> Hi David
>>>>>>>>>>
>>>>>>>>>> I think the "examples" dir is not copied to the installation
>>>>>>>>> directory.
>>>>>>>>>> You may find it where you decompressed the MPICH2 tarball, in
>>>>>>>>>> case
>>>>>>>
>>>>>>>>>> you
>>>>>>>>>
>>>>>>>>>> installed it from source.
>>>>>>>>>> At least, this is what I have here.
>>>>>>>>>>
>>>>>>>>>> Gus Correa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> SULLIVAN David (AREVA) wrote:
>>>>>>>>>>> Yea, that always bothered me. There is no such folder.
>>>>>>>>>>> There are :
>>>>>>>>>>> bin
>>>>>>>>>>> etc
>>>>>>>>>>> include
>>>>>>>>>>> lib
>>>>>>>>>>> sbin
>>>>>>>>>>> share
>>>>>>>>>>>
>>>>>>>>>>> The only examples I found were in the share folder, where
>>>>>>> there
>>>>>>>>>> are
>>>>>>>>>>> examples for collchk, graphics and logging.
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>>> Rajeev
>>
>>>>>>>>>>> Thakur
>>>>>>>>>>> Sent: Wednesday, August 04, 2010 12:45 PM
>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>
>>>>>>>>>>> Not cpilog. Can you run just cpi from the mpich2/examples
>>>>>>> directory.
>>>>>>>>>>>
>>>>>>>>>>> Rajeev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Aug 4, 2010, at 11:37 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Rajeev, Darius,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your response.
>>>>>>>>>>>> cpi yields the following-
>>>>>>>>>>>>
>>>>>>>>>>>> [dfs at aramis examples_logging]$ mpiexec -host aramis -n 12
>>>>>>> ./cpilog
>>>>>>>
>>>>>>>>>>>> Process 0 running on aramis Process 2 running on aramis
>>>>>>>>>>>> Process
>>>>>>> 3
>>>>>>>>>>>> running on aramis Process 1 running on aramis Process 6
>>>>>>>>>>>> running
>>>>>>> on
>>>>>>>
>>>>>>>>>>>> aramis Process 7 running on aramis Process 8 running on
>>>>>>>>>>>> aramis
>>>
>>>>>>>>>>>> Process
>>>>>>>>>>>
>>>>>>>>>>>> 4 running on aramis Process 5 running on aramis Process 9
>>>>>>> running
>>>>>>>>>>>> on
>>>>>>>>>
>>>>>>>>>>>> aramis Process 10 running on aramis Process 11 running on
>>>>>>>>>>>> aramis
>>>>>>>
>>>>>>>>>>>> pi
>>>>>>>>
>>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>>> approximately 3.1415926535898762, Error is
>>>>>>>>>>>> 0.0000000000000830
>>>>>>> wall
>>>>>>>
>>>>>>>>>>>> clock time = 0.058131 Writing logfile....
>>>>>>>>>>>> Enabling the Default clock synchronization...
>>>>>>>>>>>> clog_merger.c:CLOG_Merger_init() - Could not open file
>>>>>>>>>>>> ./cpilog.clog2 for merging!
>>>>>>>>>>>> Backtrace of the callstack at rank 0:
>>>>>>>>>>>> At [0]: ./cpilog(CLOG_Util_abort+0x92)[0x456326]
>>>>>>>>>>>> At [1]: ./cpilog(CLOG_Merger_init+0x11f)[0x45db7c]
>>>>>>>>>>>> At [2]: ./cpilog(CLOG_Converge_init+0x8e)[0x45a691]
>>>>>>>>>>>> At [3]: ./cpilog(MPE_Finish_log+0xea)[0x4560aa]
>>>>>>>>>>>> At [4]: ./cpilog(MPI_Finalize+0x50c)[0x4268af]
>>>>>>>>>>>> At [5]: ./cpilog(main+0x428)[0x415963] At [6]:
>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c1881d994]
>>>>>>>>>>>> At [7]: ./cpilog[0x415449]
>>>>>>>>>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>>>>>>>>>> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated
>>>>>>>>>>>> (signal
>>>>>>>
>>>>>>>>>>>> 15)
>>>>>>>>>>>>
>>>>>>>>>>>> So it looks like it works with some issues.
>>>>>>>>>>>>
>>>>>>>>>>>> When does it fail? Immediately
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a bug? Many sucessfully use the aplication
>>>>>>>>>>>> (MCNP5,
>>
>>>>>>>>>>>> from
>>>>>>>>>>>> LANL) with mpi, so think that a bug there is
unlikely.
>>>>>>>>>>>>
>>>>>>>>>>>> Core files, unfortunately reveals some ignorance on my
part.
>>>>>>> Were
>>>>>>>>>>>> exactly should I be looking for them?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>>
>>>>>>>>>>>> Dave
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>>>>>> Darius
>>>
>>>>>>>>>>>> Buntinas
>>>>>>>>>>>> Sent: Wednesday, August 04, 2010 12:19 PM
>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This error message says that two processes terminated
>>>>>>>>>>>> because
>>>>>>> they
>>>>>>>
>>>>>>>>>>>> were unable to communicate with another (or two other)
>>> process.
>>>>>>>>> It's
>>>>>>>>>>
>>>>>>>>>>>> possible that another process died, so the others got
>>>>>>>>>>>> errors
>
>>>>>>>>>>>> trying
>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>>> communicate with them. It's also possible that there is
>>>>>>> something
>>>>>>>
>>>>>>>>>>>> preventing some processes from communicating with each
> other.
>>>>>>>>>>>>
>>>>>>>>>>>> Are you able to run cpi from the examples directory with 12
>>>>>>>>>> processes?
>>>>>>>>>>>>
>>>>>>>>>>>> At what point in your code does this fail? Are there any
>>>>>>>>>>>> other
>>>>>>>
>>>>>>>>>>>> communication operations before the MPI_Comm_dup?
>>>>>>>>>>>>
>>>>>>>>>>>> Enable core files (add "ulimit -c unlimited" to your
>>>>>>>>>>>> .bashrc
>
>>>>>>>>>>>> or
>>>>>>>>>>>> .tcshrc) then run your app and look for core files. If
>>>>>>>>>>>> there
>>
>>>>>>>>>>>> is
>>>>>>> a
>>>>>>>
>>>>>>>>>>>> bug
>>>>>>>>>>>
>>>>>>>>>>>> in your application that causes a process to die this might
>>>>>>>>>>>> tell
>>>>>>>
>>>>>>>>>>>> you
>>>>>>>>>
>>>>>>>>>>>> which one and why.
>>>>>>>>>>>>
>>>>>>>>>>>> Let us know how this goes.
>>>>>>>>>>>>
>>>>>>>>>>>> -d
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Aug 4, 2010, at 11:03 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Since I have had no responses, is there any other
>>>>>>> additional
>>>>>>>>>>>> information could I provide to solicit some direction for
>>>>>>>>>>>> overcoming
>>>>>>>>>
>>>>>>>>>>>> these latest string of mpi errors?
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dave
>>>>>>>>>>>>>
>>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>> SULLIVAN
>>>>>>>>>>>>> David F (AREVA NP INC)
>>>>>>>>>>>>> Sent: Friday, July 23, 2010 4:29 PM
>>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>> Subject: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>>
>>>>>>>>>>>>> With my firewall issues firmly behind me, I have a new
>>>>>>>>>>>>> problem
>>>>>>>
>>>>>>>>>>>>> for
>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>> collective wisdom. I am attempting to run a program to
>>>>>>>>>>>> which
>
>>>>>>>>>>>> the
>>>>>>>
>>>>>>>>>>>> response is as follows:
>>>>>>>>>>>>> [mcnp5_1-4 at athos ~]$ mpiexec -f nodes -n 12 mcnp5.mpi
>>>>>>>>>>>>> i=TN04
>>
>>>>>>>>>>>>> o=TN04.o
>>>>>>>>>>>
>>>>>>>>>>>>> Fatal error in MPI_Comm_dup: Other MPI error, error stack:
>>>>>>>>>>>>> MPI_Comm_dup(168).................:
>>>>>>> MPI_Comm_dup(MPI_COMM_WORLD,
>>>>>>>>>>>>> new_comm=0x7fff58edb450) failed
>>>>>>>>>>>>> MPIR_Comm_copy(923)...............:
>>>>>>>>>>>>> MPIR_Get_contextid(639)...........:
>>>>>>>>>>>>> MPI_Allreduce(773)................:
>>>>>>>>> MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>>>>>>>>>>> rbuf=0x7fff
>>>>>>>>>>>> 58edb1a0, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD)
>>>>>>>>>>>> failed
>>>>>>>>>>>>> MPIR_Allreduce(228)...............:
>>>>>>>>>>>>> MPIC_Send(41).....................:
>>>>>>>>>>>>> MPIC_Wait(513)....................:
>>>>>>>>>>>>> MPIDI_CH3I_Progress(150)..........:
>>>>>>>>>>>>> MPID_nem_mpich2_blocking_recv(933):
>>>>>>>>>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error
>>>>>>>>>>>>> Fatal
>>>
>>>>>>>>>>>>> error
>>>>>>>>>
>>>>>>>>>>>>> in
>>>>>>>>>>>>> MPI_Comm_dup: Other MPI error, error stack:
>>>>>>>>>>>>> MPI_Comm_dup(168).................:
>>>>>>> MPI_Comm_dup(MPI_COMM_WORLD,
>>>>>>>>>>>> new_comm=0x7fff
>>>>>>>>>>>> 97dca620) failed
>>>>>>>>>>>>> MPIR_Comm_copy(923)...............:
>>>>>>>>>>>>> MPIR_Get_contextid(639)...........:
>>>>>>>>>>>>> MPI_Allreduce(773)................:
>>>>>>>>> MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>>>>>>>>>>> rbuf=0x7fff
>>>>>>>>>>>> 97dca370, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD)
>>>>>>>>>>>> failed
>>>>>>>>>>>>> MPIR_Allreduce(289)...............:
>>>>>>>>>>>>> MPIC_Sendrecv(161)................:
>>>>>>>>>>>>> MPIC_Wait(513)....................:
>>>>>>>>>>>>> MPIDI_CH3I_Progress(150)..........:
>>>>>>>>>>>>> MPID_nem_mpich2_blocking_recv(948):
>>>>>>>>>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error
>>>>>>>>>>>>> Killed
>>>>>>> by
>>>>>>>
>>>>>>>>>>>>> signal 2.
>>>>>>>>>>>>> Ctrl-C caught... cleaning up processes [mpiexec at athos]
>>>>>>>>>>>>> HYDT_dmx_deregister_fd (./tools/demux/demux.c:142): could
>>>>>>>>>>>>> not
>>>
>>>>>>>>>>>>> find
>>>>>>>>
>>>>>>>>>>>>> fd
>>>>>>>>>>>
>>>>>>>>>>>>> to deregister: -2 [mpiexec at athos] HYD_pmcd_pmiserv_cleanup
>>>>>>>>>>>>> (./pm/pmiserv/pmiserv_cb.c:401): error deregistering fd
>>>>>>>>>>>>> [press
>>>>>>>
>>>>>>>>>>>>> Ctrl-C
>>>>>>>>>>>
>>>>>>>>>>>>> again to force abort] APPLICATION TERMINATED WITH THE EXIT
>>>>>>> STRING:
>>>>>>>>>>>>> Killed (signal 9) [mcnp5_1-4 at athos ~]$ Any ideas?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> David Sullivan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> AREVA NP INC
>>>>>>>>>>>>> 400 Donald Lynch Boulevard Marlborough, MA, 01752
>>>>>>>>>>>>> Phone: (508) 573-6721
>>>>>>>>>>>>> Fax: (434) 382-5597
>>>>>>>>>>>>> David.Sullivan at AREVA.com
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>> <summary.xml>_______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> <summary.xml><error.txt>_________________________________________
>>>>>> _
>>>>>> _
>>>>>> _
>>>>>> ___
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>> <winmail.dat>_______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>> <winmail.dat>_______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> <testing_screen.txt>_______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> <testing_screen.txt>_______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list