[mpich-discuss] cryptic (to me) error

SULLIVAN David (AREVA) David.Sullivan at areva.com
Fri Sep 3 13:43:57 CDT 2010


Interesting. So that would be the same for any executable that uses
mpiexec? This is confusing though because the install guide says that it
can be done either as NFS or a exact duplicate. I have set this up
before (as exact duplicates) without issues (with MPICH1 on WinXP) so I
assumed, as it states in the install guide, this has not changed. Thanks
again for the remedial assistance..
Dave

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
Sent: Friday, September 03, 2010 2:03 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] cryptic (to me) error

The test suite directory must be on a shared filesystem because mpiexec
does not stage executables for you.

-Dave

On Sep 3, 2010, at 1:02 PM CDT, SULLIVAN David (AREVA) wrote:

> Oh, Well I left it and every test results in...
> 
> Unexpected output in attrend: [proxy:0:1 at porthos] HYDU_create_process 
> (./utils/l
> aunch/launch.c:70): execvp error on file ./attrend (No such file or
> directory)
> Unexpected output in attrend: [proxy:0:1 at porthos] HYDU_create_process 
> (./utils/l
> aunch/launch.c:70): execvp error on file ./attrend (No such file or
> directory)
> Unexpected output in attrend: [mpiexec at aramis] HYDT_dmx_deregister_fd
> (./tools/demux/demux.c:142): could not find fd to deregister: -2 
> Unexpected output in attrend: [mpiexec at aramis] 
> HYD_pmcd_pmiserv_cleanup
> (./pm/pmiserv/pmiserv_cb.c:398): error deregistering fd Unexpected 
> output in attrend: APPLICATION TERMINATED WITH THE EXIT
> STRING: Hangup (signal 1)
> Program attrend exited without No Errors Unexpected output in 
> attrend2: [proxy:0:1 at porthos] HYDU_create_process
> (./utils/launch/launch.c:70): execvp error on file ./attrend2 (No such

> file or directory) Unexpected output in attrend2: [proxy:0:1 at porthos] 
> HYDU_create_process
> (./utils/launch/launch.c:70): execvp error on file ./attrend2 (No such

> file or directory) Unexpected output in attrend2: [mpiexec at aramis] 
> HYDT_dmx_deregister_fd
> (./tools/demux/demux.c:142): could not find fd to deregister: -2 
> Unexpected output in attrend2: [mpiexec at aramis] 
> HYD_pmcd_pmiserv_cleanup
> (./pm/pmiserv/pmiserv_cb.c:398): error deregistering fd Unexpected 
> output in attrend2: APPLICATION TERMINATED WITH THE EXIT
> STRING: Hangup (signal 1)
> Program attrend2 exited without No Errors
> 
> 
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Darius 
> Buntinas
> Sent: Friday, September 03, 2010 1:45 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] cryptic (to me) error
> 
> The tests run quietly, unless there's an error.  Set VERBOSE=1 to the 
> command like to see each test running:
> 
> HYDRA_HOST_FILE=XXX VERBOSE=1 make testing
> 
> -d
> 
> On Sep 3, 2010, at 12:40 PM, SULLIVAN David (AREVA) wrote:
> 
>> Oh, Ok. Well it just hangs so I aborted. Checked each machine with 
>> top
> 
>> and neither are doing anything when it hangs. Attached is the screen 
>> output.
>> 
>> Thanks again.
>> 
>> 
>> Dave
>> 
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>> Sent: Friday, September 03, 2010 12:14 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] cryptic (to me) error
>> 
>> Give the full path to your host file: /path/to/node_list
>> 
>> When the test script changes directories to the different test 
>> directories, a relative path isn't correct anymore.
>> 
>> -Dave
>> 
>> On Sep 3, 2010, at 9:48 AM CDT, SULLIVAN David (AREVA) wrote:
>> 
>>> Wow, that made a difference. Not a good one. But a big one. I 
>>> stopped
> 
>>> the test since each process resulted in the attached errors. It 
>>> looks
> 
>>> like it is trying to access files that are not there to parse the 
>>> HYDRA_HOST_FILE. I checked if there were there and it was just a 
>>> permissions thing, but that was not the case.
>>> 
>>> Dave
>>> 
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>>> Sent: Friday, September 03, 2010 10:36 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>> 
>>> I was unclear.  I meant to use HYDRA_HOST_FILE when running the
>>> MPICH2
>> 
>>> test suite in order to make it run across the same set of machines 
>>> that your MCNP code runs on.
>>> 
>>> -Dave
>>> 
>>> On Sep 3, 2010, at 9:25 AM CDT, SULLIVAN David (AREVA) wrote:
>>> 
>>>> Same result, mpi doesn't even get to call mcnp5.mpi. It just 
>>>> returns
> 
>>>> the
>>>> 
>>>> Fatal error in PMPI_Comm_dup: Other MPI error, error stack: error. 
>>>> 
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave 
>>>> Goodell
>>>> Sent: Friday, September 03, 2010 10:21 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>> 
>>>> Try setting HYDRA_HOST_FILE=your_machine_file_here
>>>> 
>>>> This will make hydra act as though you passed "-f your 
>>>> machine_file_here" to every mpiexec.
>>>> 
>>>> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Ma
>>>> n
>>>> a
>>>> g
>>>> er
>>>> #Environment_Settings
>>>> 
>>>> -Dave
>>>> 
>>>> On Sep 3, 2010, at 5:00 AM CDT, SULLIVAN David (AREVA) wrote:
>>>> 
>>>>> I was wondering about that. Is there a configuration file that 
>>>>> sets
> 
>>>>> up
>>>> the cluster and defines which node to run on? Would that make the 
>>>> issue any clearer?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Dave
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Rajeev Thakur
>>>>> Sent: Thu 9/2/2010 10:22 PM
>>>>> To: mpich-discuss at mcs.anl.gov
>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>> 
>>>>> There might be some connection issues between the two machines. 
>>>>> The
>>>> MPICH2 test suite that you ran with "make testing" probably ran on 
>>>> a
> 
>>>> single machine.
>>>>> 
>>>>> On Sep 2, 2010, at 6:27 PM, SULLIVAN David (AREVA) wrote:
>>>>> 
>>>>>> The error occurs immediately- I don't think it even starts the
>>>> executable. It does work on the single machine with 4 processes.
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: mpich-discuss-bounces at mcs.anl.gov on behalf of Rajeev 
>>>>>> Thakur
>>>>>> Sent: Thu 9/2/2010 4:34 PM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>> 
>>>>>> Does it run with 2 processes on a single machine?
>>>>>> 
>>>>>> 
>>>>>> On Sep 2, 2010, at 2:38 PM, SULLIVAN David (AREVA) wrote:
>>>>>> 
>>>>>>> That fixed the compile. Thanks!
>>>>>>> 
>>>>>>> The latest release does not fix the issues I am having though. 
>>>>>>> Cpi
>> 
>>>>>>> works fine, the test suit is certainly improved (see summary.xml
>>>>>>> output) though when I try to use mcnp it still crashes in the 
>>>>>>> same
>> 
>>>>>>> way (see
>>>>>>> error.txt)
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Anthony 
>>>>>>> Chan
>>>>>>> Sent: Thursday, September 02, 2010 1:38 PM
>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>> 
>>>>>>> 
>>>>>>> There is a bug in 1.3b1 about the option --enable-fc.  Since 
>>>>>>> Fortran
>>>> 
>>>>>>> 90 is enabled by default, so remove the --enable-fc from your 
>>>>>>> configure command and try again.  If there is error again, send 
>>>>>>> us
>> 
>>>>>>> the configure output as you seen on your screen (See README) 
>>>>>>> instead
>>>> of config.log.
>>>>>>> 
>>>>>>> A.Chan
>>>>>>> 
>>>>>>> ----- "SULLIVAN David (AREVA)" <David.Sullivan at areva.com> wrote:
>>>>>>> 
>>>>>>>> Failure again.
>>>>>>>> The 1.3 beta version will not compile with Intel 10.1. It bombs

>>>>>>>> at
>>> 
>>>>>>>> the
>>>>>>> 
>>>>>>>> configuration script:
>>>>>>>> 
>>>>>>>> checking for Fortran flag needed to allow free-form source... 
>>>>>>>> unknown
>>>>>>>> configure: WARNING: Fortran 90 test being disabled because the 
>>>>>>>> /home/dfs/mpich2-1.3b1/bin/mpif90 compiler does not accept a 
>>>>>>>> .f90
>> 
>>>>>>>> extension
>>>>>>>> configure: error: Fortran does not accept free-form source
>>>>>>>> configure: error: ./configure failed for test/mpi
>>>>>>>> 
>>>>>>>> I have attached to config.log
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev 
>>>>>>>> Thakur
>>>>>>>> Sent: Thursday, September 02, 2010 12:11 PM
>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>> 
>>>>>>>> Just try relinking with the new library at first.
>>>>>>>> 
>>>>>>>> Rajeev
>>>>>>>> 
>>>>>>>> On Sep 2, 2010, at 9:32 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>> 
>>>>>>>>> I saw that there was a newer beta. I was really hoping to find

>>>>>>>>> I
>>>>>>>> just
>>>>>>>>> configured something incorrectly. Will this not require me to
>>>>>>>> re-build
>>>>>>>> 
>>>>>>>>> mcnp (the only program I run that uses mpi for parallel) if I 
>>>>>>>>> change
>>>>>>>> 
>>>>>>>>> the mpi version? If so, this is a bit of a hardship, requiring

>>>>>>>>> codes
>>>>>>>> 
>>>>>>>>> to be revalidated. If not- I will try it in a second.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Dave
>>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave
>>>>>>>> Goodell
>>>>>>>>> Sent: Thursday, September 02, 2010 10:27 AM
>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>> 
>>>>>>>>> Can you try the latest release (1.3b1) to see if that fixes 
>>>>>>>>> the
> 
>>>>>>>>> problems you are seeing with your application?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>
http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.
>>>>>>>> p
>>>>>>>> h
>>>>>>>> p
>>>>>>>> ?s=
>>>>>>>>> do
>>>>>>>>> wnloads
>>>>>>>>> 
>>>>>>>>> -Dave
>>>>>>>>> 
>>>>>>>>> On Sep 2, 2010, at 9:15 AM CDT, SULLIVAN David (AREVA) wrote:
>>>>>>>>> 
>>>>>>>>>> Another output file, hopefully of use. 
>>>>>>>>>> 
>>>>>>>>>> Thanks again
>>>>>>>>>> 
>>>>>>>>>> Dave
>>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
>>>>>>>>>> SULLIVAN
>>> 
>>>>>>>>>> David
>>>>>>>>>> (AREVA)
>>>>>>>>>> Sent: Thursday, September 02, 2010 8:20 AM
>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>> 
>>>>>>>>>> First my apologies for the delay in continuing this thread.
>>>>>>>>>> Unfortunately I have not resolved it so if I can indulge the 
>>>>>>>>>> gurus
>>>>>>>> 
>>>>>>>>>> and
>>>>>>>>> 
>>>>>>>>>> developers once again...
>>>>>>>>>> 
>>>>>>>>>> As suggested by Rajeev I ran the testing suit in the source
>>>>>>>> directory.
>>>>>>>>>> The output of errors, which are similar to what I was seeing 
>>>>>>>>>> when
>>>> 
>>>>>>>>>> I
>>>>>>>> 
>>>>>>>>>> ran
>>>>>>>>>> mcnp5 (v. 1.40 and 1.51), is attached. 
>>>>>>>>>> 
>>>>>>>>>> Any insights would be greatly appreciated,
>>>>>>>>>> 
>>>>>>>>>> Dave
>>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
>>>>>>>>>> Rajeev
>>>>>>>> Thakur
>>>>>>>>>> Sent: Wednesday, August 04, 2010 3:06 PM
>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>> 
>>>>>>>>>> Then one level above that directory (in the main MPICH2 
>>>>>>>>>> source
> 
>>>>>>>>>> directory), type make testing, which will run through the 
>>>>>>>>>> entire
>>>>>>>>>> MPICH2 test suite.
>>>>>>>>>> 
>>>>>>>>>> Rajeev
>>>>>>>>>> 
>>>>>>>>>> On Aug 4, 2010, at 2:04 PM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>> 
>>>>>>>>>>> Oh. That's  embarrassing. Yea. I have those examples. It 
>>>>>>>>>>> runs
>>>>>>>> with
>>>>>>>>>>> no
>>>>>>>>>>> problems:
>>>>>>>>>>> 
>>>>>>>>>>> [dfs at aramis examples]$ mpiexec -host aramis -n 4 ./cpi 
>>>>>>>>>>> Process
>>>>>>>>>>> 2
>>>>>>>> of
>>>>>>>>>>> 4
>>>>>>>>> 
>>>>>>>>>>> is on aramis Process 3 of 4 is on aramis Process 0 of 4 is 
>>>>>>>>>>> on
>>>>>>>> aramis
>>>>>>>> 
>>>>>>>>>>> Process 1 of 4 is on aramis pi is approximately
>>>>>>>> 3.1415926544231239,
>>>>>>>>>>> Error is 0.0000000008333307 wall clock time = 0.000652
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gus
>>>>>>>> Correa
>>>>>>>>>>> Sent: Wednesday, August 04, 2010 1:13 PM
>>>>>>>>>>> To: Mpich Discuss
>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>> 
>>>>>>>>>>> Hi David
>>>>>>>>>>> 
>>>>>>>>>>> I think the "examples" dir is not copied to the installation
>>>>>>>>>> directory.
>>>>>>>>>>> You may find it where you decompressed the MPICH2 tarball, 
>>>>>>>>>>> in
> 
>>>>>>>>>>> case
>>>>>>>> 
>>>>>>>>>>> you
>>>>>>>>>> 
>>>>>>>>>>> installed it from source.
>>>>>>>>>>> At least, this is what I have here.
>>>>>>>>>>> 
>>>>>>>>>>> Gus Correa
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>> Yea, that always bothered me.  There is no such folder.
>>>>>>>>>>>> There are :
>>>>>>>>>>>> bin
>>>>>>>>>>>> etc
>>>>>>>>>>>> include
>>>>>>>>>>>> lib
>>>>>>>>>>>> sbin
>>>>>>>>>>>> share
>>>>>>>>>>>> 
>>>>>>>>>>>> The  only examples I found were in the  share folder,  
>>>>>>>>>>>> where
>>>>>>>> there
>>>>>>>>>>> are
>>>>>>>>>>>> examples for collchk,  graphics and  logging.   
>>>>>>>>>>>> 
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
>>>>>>>>>>>> Rajeev
>>> 
>>>>>>>>>>>> Thakur
>>>>>>>>>>>> Sent: Wednesday, August 04, 2010 12:45 PM
>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>> 
>>>>>>>>>>>> Not cpilog. Can you run just cpi from the mpich2/examples
>>>>>>>> directory.
>>>>>>>>>>>> 
>>>>>>>>>>>> Rajeev
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Aug 4, 2010, at 11:37 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Rajeev,  Darius,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for your response.
>>>>>>>>>>>>> cpi yields  the  following-
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [dfs at aramis examples_logging]$ mpiexec -host aramis -n 12
>>>>>>>> ./cpilog
>>>>>>>> 
>>>>>>>>>>>>> Process 0 running on aramis Process 2 running on aramis 
>>>>>>>>>>>>> Process
>>>>>>>> 3
>>>>>>>>>>>>> running on aramis Process 1 running on aramis Process 6 
>>>>>>>>>>>>> running
>>>>>>>> on
>>>>>>>> 
>>>>>>>>>>>>> aramis Process 7 running on aramis Process 8 running on 
>>>>>>>>>>>>> aramis
>>>> 
>>>>>>>>>>>>> Process
>>>>>>>>>>>> 
>>>>>>>>>>>>> 4 running on aramis Process 5 running on aramis Process 9
>>>>>>>> running
>>>>>>>>>>>>> on
>>>>>>>>>> 
>>>>>>>>>>>>> aramis Process 10 running on aramis Process 11 running on 
>>>>>>>>>>>>> aramis
>>>>>>>> 
>>>>>>>>>>>>> pi
>>>>>>>>> 
>>>>>>>>>>>>> is
>>>>>>>>>>>> 
>>>>>>>>>>>>> approximately 3.1415926535898762, Error is 
>>>>>>>>>>>>> 0.0000000000000830
>>>>>>>> wall
>>>>>>>> 
>>>>>>>>>>>>> clock time = 0.058131 Writing logfile....
>>>>>>>>>>>>> Enabling the Default clock synchronization...
>>>>>>>>>>>>> clog_merger.c:CLOG_Merger_init() -  Could not open file
>>>>>>>>>>>>> ./cpilog.clog2 for merging!
>>>>>>>>>>>>> Backtrace of the callstack at rank 0:
>>>>>>>>>>>>> At [0]: ./cpilog(CLOG_Util_abort+0x92)[0x456326]
>>>>>>>>>>>>> At [1]: ./cpilog(CLOG_Merger_init+0x11f)[0x45db7c]
>>>>>>>>>>>>> At [2]: ./cpilog(CLOG_Converge_init+0x8e)[0x45a691]
>>>>>>>>>>>>> At [3]: ./cpilog(MPE_Finish_log+0xea)[0x4560aa]
>>>>>>>>>>>>> At [4]: ./cpilog(MPI_Finalize+0x50c)[0x4268af]
>>>>>>>>>>>>> At [5]: ./cpilog(main+0x428)[0x415963]  At [6]:
>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c1881d994]
>>>>>>>>>>>>> At [7]: ./cpilog[0x415449] application called 
>>>>>>>>>>>>> MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> 
>>>>>>>>>>>>> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated 
>>>>>>>>>>>>> (signal
>>>>>>>> 
>>>>>>>>>>>>> 15)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So  it looks like it works  with some issues.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> When does  it fail? Immediately
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Is there  a  bug? Many sucessfully use the aplication 
>>>>>>>>>>>>> (MCNP5,
>>> 
>>>>>>>>>>>>> from
>>>>>>>>>>>>> LANL) with  mpi,  so  think that  a  bug there is
> unlikely.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Core files, unfortunately reveals some ignorance on my
> part.
>>>>>>>> Were
>>>>>>>>>>>>> exactly should I be looking for them?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Dave
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
>>>>>>>>>>>>> Darius
>>>> 
>>>>>>>>>>>>> Buntinas
>>>>>>>>>>>>> Sent: Wednesday, August 04, 2010 12:19 PM
>>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This error message says that two processes terminated 
>>>>>>>>>>>>> because
>>>>>>>> they
>>>>>>>> 
>>>>>>>>>>>>> were unable to communicate with another (or two other)
>>>> process.
>>>>>>>>>> It's
>>>>>>>>>>> 
>>>>>>>>>>>>> possible that another process died, so the others got 
>>>>>>>>>>>>> errors
>> 
>>>>>>>>>>>>> trying
>>>>>>>>> 
>>>>>>>>>>>>> to
>>>>>>>>>>>> 
>>>>>>>>>>>>> communicate with them.  It's also possible that there is
>>>>>>>> something
>>>>>>>> 
>>>>>>>>>>>>> preventing some processes from communicating with each
>> other.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Are you able to run cpi from the examples directory with 
>>>>>>>>>>>>> 12
>>>>>>>>>>> processes?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> At what point in your code does this fail?  Are there any 
>>>>>>>>>>>>> other
>>>>>>>> 
>>>>>>>>>>>>> communication operations before the MPI_Comm_dup?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Enable core files (add "ulimit -c unlimited" to your 
>>>>>>>>>>>>> .bashrc
>> 
>>>>>>>>>>>>> or
>>>>>>>>>>>>> .tcshrc) then run your app and look for core files.  If 
>>>>>>>>>>>>> there
>>> 
>>>>>>>>>>>>> is
>>>>>>>> a
>>>>>>>> 
>>>>>>>>>>>>> bug
>>>>>>>>>>>> 
>>>>>>>>>>>>> in your application that causes a process to die this 
>>>>>>>>>>>>> might
> 
>>>>>>>>>>>>> tell
>>>>>>>> 
>>>>>>>>>>>>> you
>>>>>>>>>> 
>>>>>>>>>>>>> which one and why.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Let us know how this goes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -d
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 4, 2010, at 11:03 AM, SULLIVAN David (AREVA) wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Since I have  had  no responses, is  there any other
>>>>>>>> additional
>>>>>>>>>>>>> information could I provide to solicit some direction for 
>>>>>>>>>>>>> overcoming
>>>>>>>>>> 
>>>>>>>>>>>>> these latest string of mpi errors?
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Dave
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>>>>>>>>>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>>>>>>>> SULLIVAN
>>>>>>>>>>>>>> David F (AREVA NP INC)
>>>>>>>>>>>>>> Sent: Friday, July 23, 2010 4:29 PM
>>>>>>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>>>>>>> Subject: [mpich-discuss] cryptic (to me) error
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With my firewall issues firmly behind me, I have a new 
>>>>>>>>>>>>>> problem
>>>>>>>> 
>>>>>>>>>>>>>> for
>>>>>>>>> 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>> collective wisdom. I am attempting to run a program to 
>>>>>>>>>>>>> which
>> 
>>>>>>>>>>>>> the
>>>>>>>> 
>>>>>>>>>>>>> response is as follows:
>>>>>>>>>>>>>> [mcnp5_1-4 at athos ~]$ mpiexec -f nodes -n 12 mcnp5.mpi
>>>>>>>>>>>>>> i=TN04
>>> 
>>>>>>>>>>>>>> o=TN04.o
>>>>>>>>>>>> 
>>>>>>>>>>>>>> Fatal error in MPI_Comm_dup: Other MPI error, error
stack:
>>>>>>>>>>>>>> MPI_Comm_dup(168).................:
>>>>>>>> MPI_Comm_dup(MPI_COMM_WORLD,
>>>>>>>>>>>>>> new_comm=0x7fff58edb450) failed
>>>>>>>>>>>>>> MPIR_Comm_copy(923)...............:
>>>>>>>>>>>>>> MPIR_Get_contextid(639)...........:
>>>>>>>>>>>>>> MPI_Allreduce(773)................:
>>>>>>>>>> MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>>>>>>>>>>>> rbuf=0x7fff
>>>>>>>>>>>>> 58edb1a0, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) 
>>>>>>>>>>>>> failed
>>>>>>>>>>>>>> MPIR_Allreduce(228)...............:
>>>>>>>>>>>>>> MPIC_Send(41).....................:
>>>>>>>>>>>>>> MPIC_Wait(513)....................:
>>>>>>>>>>>>>> MPIDI_CH3I_Progress(150)..........:
>>>>>>>>>>>>>> MPID_nem_mpich2_blocking_recv(933):
>>>>>>>>>>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error 
>>>>>>>>>>>>>> Fatal
>>>> 
>>>>>>>>>>>>>> error
>>>>>>>>>> 
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> MPI_Comm_dup: Other MPI error, error stack:
>>>>>>>>>>>>>> MPI_Comm_dup(168).................:
>>>>>>>> MPI_Comm_dup(MPI_COMM_WORLD,
>>>>>>>>>>>>> new_comm=0x7fff
>>>>>>>>>>>>> 97dca620) failed
>>>>>>>>>>>>>> MPIR_Comm_copy(923)...............:
>>>>>>>>>>>>>> MPIR_Get_contextid(639)...........:
>>>>>>>>>>>>>> MPI_Allreduce(773)................:
>>>>>>>>>> MPI_Allreduce(sbuf=MPI_IN_PLACE,
>>>>>>>>>>>>> rbuf=0x7fff
>>>>>>>>>>>>> 97dca370, count=64, MPI_INT, MPI_BAND, MPI_COMM_WORLD) 
>>>>>>>>>>>>> failed
>>>>>>>>>>>>>> MPIR_Allreduce(289)...............:
>>>>>>>>>>>>>> MPIC_Sendrecv(161)................:
>>>>>>>>>>>>>> MPIC_Wait(513)....................:
>>>>>>>>>>>>>> MPIDI_CH3I_Progress(150)..........:
>>>>>>>>>>>>>> MPID_nem_mpich2_blocking_recv(948):
>>>>>>>>>>>>>> MPID_nem_tcp_connpoll(1709).......: Communication error 
>>>>>>>>>>>>>> Killed
>>>>>>>> by
>>>>>>>> 
>>>>>>>>>>>>>> signal 2.
>>>>>>>>>>>>>> Ctrl-C caught... cleaning up processes [mpiexec at athos] 
>>>>>>>>>>>>>> HYDT_dmx_deregister_fd (./tools/demux/demux.c:142): could

>>>>>>>>>>>>>> not
>>>> 
>>>>>>>>>>>>>> find
>>>>>>>>> 
>>>>>>>>>>>>>> fd
>>>>>>>>>>>> 
>>>>>>>>>>>>>> to deregister: -2 [mpiexec at athos] 
>>>>>>>>>>>>>> HYD_pmcd_pmiserv_cleanup
>>>>>>>>>>>>>> (./pm/pmiserv/pmiserv_cb.c:401): error deregistering fd 
>>>>>>>>>>>>>> [press
>>>>>>>> 
>>>>>>>>>>>>>> Ctrl-C
>>>>>>>>>>>> 
>>>>>>>>>>>>>> again to force abort] APPLICATION TERMINATED WITH THE 
>>>>>>>>>>>>>> EXIT
>>>>>>>> STRING:
>>>>>>>>>>>>>> Killed (signal 9) [mcnp5_1-4 at athos ~]$ Any ideas?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> David Sullivan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> AREVA NP INC
>>>>>>>>>>>>>> 400 Donald Lynch Boulevard Marlborough, MA, 01752
>>>>>>>>>>>>>> Phone: (508) 573-6721
>>>>>>>>>>>>>> Fax: (434) 382-5597
>>>>>>>>>>>>>> David.Sullivan at AREVA.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov 
>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov 
>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov 
>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>> <summary.xml>_______________________________________________
>>>>>>>>>> mpich-discuss mailing list
>>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>> <summary.xml><error.txt>________________________________________
>>>>>>> _
>>>>>>> _
>>>>>>> _
>>>>>>> _
>>>>>>> ___
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> 
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> 
>>>>>> <winmail.dat>_______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> 
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> 
>>>>> <winmail.dat>_______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> 
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> <testing_screen.txt>_______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> <testing_screen.txt>_______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list