[petsc-dev] Bug in petsc-dev?
Barry Smith
bsmith at mcs.anl.gov
Tue Mar 22 13:09:49 CDT 2011
Is the use of the "current petsc-dev", and "Using PETSc 3.1-p8" both built with the exact same MPI?
Are you using shared or static libraries for OpenMPI and PETSc?
Are you using the exact same mpiexec to start up all the cases?
If you change the order of the four nodes that you run this on does the "oddball" result process rank always refer to the same physical node? That is if the machine that is now used as the fourth node is instead used as the third node does the wrong answer appear on then on the third node or still on the fourth? If you use a different physical machine for the fourth node does the problem persist?
If you get rid of the rand() call and just set the fileRandomNumber value with say 450385 does it behave the same way?
The reason I am asking you all these questions is that this is a very strange error that defies easy explanation; since it is just an MPI call the fact that PETSc is used shouldn't matter (yet it does).
Barry
On Mar 22, 2011, at 12:50 PM, Thomas Witkowski wrote:
> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>
>>
>> On Mar 22, 2011, at 11:08 AM, Thomas Witkowski wrote:
>>
>>> Could some of you test the very small attached example? I make use of the current petsc-dev, OpenMPI 1.4.1 and GCC 4.2.4. In this environment, using 4 nodes, I get the following output, which is wrong:
>>>
>>> [3] BCAST-RESULT: 812855920
>>> [2] BCAST-RESULT: 450385
>>> [1] BCAST-RESULT: 450385
>>> [0] BCAST-RESULT: 450385
>>>
>>> The problem occurs only when I run the code on different nodes. When I start mpirun on only one node with four threads
>>
>> You mean 4 MPI processes?
>
> Yes.
>
>>
>>
>>> or I make use of a four core system, everything is fine. valgrind and Allinea DDT, both say that everything is fine. So I'm really not sure where the problem is. Using PETSc 3.1-p8 there is no problem with this example. Would be quite interesting to know if some of you can reproduce this problem or not. Thanks for any try!
>>
>> Replace the PetscInitialize() and PetscFinalize() with MPI_Init() and MPI_Finalize() and remove the include petsc.h now link under old and new PETSc and run under the different systems.
>>
>> I'm thinking you'll still get the wrong result without the Petsc calls indicating that it is an MPI issue.
>
> No! When I already did this test. In this case I get the correct results!
>
> Thomas
>
>
>>
>> Barry
>>
>>>
>>> Thomas
>>>
>>> <test.c>
>>
>>
>>
>
>
More information about the petsc-dev
mailing list