[petsc-dev] Bug in petsc-dev?

Barry Smith bsmith at mcs.anl.gov
Tue Mar 22 13:09:49 CDT 2011


  Is the use  of the "current petsc-dev", and "Using PETSc 3.1-p8" both built with the exact same MPI?

  Are you using shared or static libraries for OpenMPI and PETSc? 

  Are you using the exact same mpiexec to start up all the cases?

  If you change the order of the four nodes that you run this on does the "oddball" result process rank always refer to the same physical node? That is if the machine that is now used as the fourth node is instead used as the third node does the wrong answer appear on then on the third node or still on the fourth? If you use a different physical machine for the fourth node does the problem persist?

  If you get rid of the rand() call and just set the fileRandomNumber value with say 450385 does it behave the same way?

  The reason I am asking you all these questions is that this is a very strange error that defies easy explanation; since it is just an MPI call the fact that PETSc is used shouldn't matter (yet it does).


   Barry

On Mar 22, 2011, at 12:50 PM, Thomas Witkowski wrote:

> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
> 
>> 
>> On Mar 22, 2011, at 11:08 AM, Thomas Witkowski wrote:
>> 
>>> Could some of you test the very small attached example? I make use  of the current petsc-dev, OpenMPI 1.4.1 and GCC 4.2.4. In this  environment, using 4 nodes, I get the following output, which is  wrong:
>>> 
>>> [3] BCAST-RESULT: 812855920
>>> [2] BCAST-RESULT: 450385
>>> [1] BCAST-RESULT: 450385
>>> [0] BCAST-RESULT: 450385
>>> 
>>> The problem occurs only when I run the code on different nodes.  When I start mpirun on only one node with four threads
>> 
>>   You mean 4 MPI processes?
> 
> Yes.
> 
>> 
>> 
>>> or I make use of a four core system, everything is fine. valgrind  and Allinea DDT, both say that everything is fine. So I'm really  not sure where the problem is. Using PETSc 3.1-p8 there is no  problem with this example. Would be quite interesting to know if  some of you can reproduce this problem or not. Thanks for any try!
>> 
>>   Replace the PetscInitialize() and PetscFinalize() with MPI_Init()  and MPI_Finalize() and remove the include petsc.h now link under old  and new PETSc and run under the different systems.
>> 
>>   I'm thinking you'll still get the wrong result without the Petsc  calls indicating that it is an MPI issue.
> 
> No! When I already did this test. In this case I get the correct results!
> 
> Thomas
> 
> 
>> 
>>   Barry
>> 
>>> 
>>> Thomas
>>> 
>>> <test.c>
>> 
>> 
>> 
> 
> 




More information about the petsc-dev mailing list