[mpich-discuss] cryptic (to me) error

Dave Goodell goodell at mcs.anl.gov
Tue Sep 7 12:56:36 CDT 2010


If I'm following you correctly, the summary.xml that you attached supports the theory that your network is broken somehow.  The test suite is experiencing random network failures in various MPI communication routines, especially collective ones that are probably stressing the network and networking stack.  There is no way that we know of to configure/install MPICH2 such that you would experience this type of problem, and the test suite is known good MPI code.  The problem is almost certainly outside of MPICH2.

Check your system logs, the output from "dmesg", and any diagnostics you have in your network switch(es).  If you don't know how to troubleshoot networking problems, you should contact your system/network administrators.

-Dave

On Sep 7, 2010, at 12:40 PM CDT, SULLIVAN David (AREVA) wrote:

> Had some time to work some more on this...
> I have copied the test suit folder in a NFS shared folder. The machine
> file is passed by way of HYDRA_HOST_FILE=/home/dfs/shared/mpich2 make
> testing, as suggested. It is still running, but the results so far
> indicate I am still messing up the process somehow. 
> 
> Thanks again,
> 
> Dave
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
> Sent: Friday, September 03, 2010 2:56 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] cryptic (to me) error
> 
> Based on what you've sent us in the past, your mpiexec is actually
> running mcnp.mpi.  The error occurs during an MPI_Comm_dup somewhere in
> that code.  Running the test suite over the network will help us figure
> out whether there is a problem with your network.
> 
> -Dave
> 
> On Sep 3, 2010, at 1:50 PM CDT, SULLIVAN David (AREVA) wrote:
> 
>> That makes sense. Since my real problem is that mpiexec doesn't get to
> 
>> starting mcnp.mpi do we need the testing suite to troubleshoot or is 
>> there a source of clues elsewhere? I can get an NFS set up and all, 
>> but the testing suite isn't my true aim so if we don't need it...
>> 
>> 
>> Dave
>> 
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>> Sent: Friday, September 03, 2010 2:47 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] cryptic (to me) error
>> 
>> The general statement is true.  The problem is that "make testing" 
>> does not first build all executables and then second mpiexec each
> executable.
>> 
>> Instead it builds each test  (on the machine where you invoked "make
>> testing")  just before it is executed.  So the built executables only 
>> end up on the node where you ran "make testing".
>> 
>> -Dave
>> 
>> On Sep 3, 2010, at 1:43 PM CDT, SULLIVAN David (AREVA) wrote:
>> 
>>> Interesting. So that would be the same for any executable that uses 
>>> mpiexec? This is confusing though because the install guide says that
> 
>>> it can be done either as NFS or a exact duplicate. I have set this up
> 
>>> before (as exact duplicates) without issues (with MPICH1 on WinXP) so
> 
>>> I assumed, as it states in the install guide, this has not changed.
>>> Thanks again for the remedial assistance..
>>> Dave
>>> 
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
>>> Sent: Friday, September 03, 2010 2:03 PM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] cryptic (to me) error
>>> 
>>> The test suite directory must be on a shared filesystem because 
>>> mpiexec does not stage executables for you.
>>> 
>>> -Dave
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> <summary_3.txt>_______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list