Non repetability issue and difference between 2.3.0 and 2.3.3

Thu Sep 25 06:58:48 CDT 2008

On Thu, Sep 25, 2008 at 5:09 AM, Etienne PERCHAT
<etienne.perchat at transvalor.com> wrote:
> Hi Matt,
>
> I am sure that the partitioning is exactly the same:
> I have an external tool that partitions the mesh before launching the FE code. So for all the runs the mesh partitions has been created only once and then reused.
>
> For the case where I wanted every ghost node to be shared by two and only two processors, I used simple geometries like rings or bars with structured meshes. Once again the partitions have been created once and then reused.
>
> The initial residuals and the initial matrix are exactly the same.
>
> I have added some lines in my code:
> After calling MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
> I made a Matrix vector product between A and an unity vector. Then I've computed the norm of the resulting vector. You will see below the results for 4 linear system solves (two with 2.3.0 and two with 2.3.3p8)
>
> Mainly:
> With all runs :
> 1/ the results of the matrix * unity vector product are the same: 6838.31173987650
>
> 2/ the     Initial Residual also     : 1.50972105381228e+006
>
> 3/ At iteration 40 all the runs provides exactly the same residual:
> Iteration= 40   residual= 2.64670054e+003       tolerance=  3.01944211e+000
>
> 3/ with 2.3.0 the final residual is always the same : 3.19392726797939e+000
>
> 4/ with 2.3.3p8 the final residual vary after iteration 40.

The problem here is that we run ever week a collection of regression
test, in parallel, that cover
40+ configurations of OS, compilers, algorithms that we check each
night and we have never
seen this behavior. So, in order to investigate further, can you

  1) Send us the matrix and rhs in Petsc binary format

  2) Run this problem with GMRES instead

  Thanks,

     Matt

> Some statistics made with 12 successive runs :
>
> We obtained 5 times 3.19515221050523, two times 3.19369843187027,  three times 3.19373947848208e  and  two others for the two lasts.
>
> RUN1:  3.19515221050523e+000
> RUN2:  3.19515221050523e+000
> RUN3:  3.19369843187027e+000
> RUN4:  3.19588480582213e+000
> RUN5:  3.19515221050523e+000
> RUN6:  3.19373947848208e+000
> RUN7:  3.19515221050523e+000
> RUN8:  3.19384417350916e+000
> RUN9:  3.19515221050523e+000
> RUN10: 3.19373947848208e+000
> RUN11: 3.19369843187027e+000
> RUN12: 3.19373947848208e+000
>
>
> So same initial residual, same results for the matrix * unity vector product, same residual at iteration 40.
> I always used the options:
>
> OptionTable: -ksp_truemonitor
> OptionTable: -log_summary
>
> Any ideas will be very welcome, don't hesitate if you need additional tests.
>
> It sound, perhaps, reuse of a buffer that has not been properly released ?
>
> Best regards,
> Etienne
>
> ------------------------------------------------------------------------
> With 2.3.0: Using Petsc Release Version 2.3.0, Patch 44, April, 26, 2005
>
> RUN1:
>
> Norm A*One =   6838.31173987650
>
> *     Resolution method              : Preconditionned Conjugate Residual
> *     Preconditionner                : BJACOBI with ILU, Blocks of 1
> *
> *     Initial Residual               : 1.50972105381228e+006
> Iteration= 1    residual= 9.59236416e+004       tolerance=  7.54860527e-002
> Iteration= 2    residual= 8.46044988e+004       tolerance=  1.50972105e-001
>
> Iteration= 66   residual= 3.73014307e+001       tolerance=  4.98207948e+000
> Iteration= 67   residual= 3.75579067e+001       tolerance=  5.05756553e+000
> Iteration= 68   residual= 3.19392727e+000       tolerance=  5.13305158e+000 *
> *     Number of iterations           : 68
> *     Convergency code               : 3
> *     Final Residual Norm            : 3.19392726797939e+000
> *     PETSC : Resolution time        : 1.000389 seconds
>
>
> RUN2:
>
> Norme A*Un =   6838.31173987650
> *     Resolution method              : Preconditionned Conjugate Residual
> *     Preconditionner                : BJACOBI with ILU, Blocks of 1
> *
> *     Initial Residual               : 1.50972105381228e+006
> Iteration= 1    residual= 9.59236416e+004       tolerance=  7.54860527e-002
> Iteration= 2    residual= 8.46044988e+004       tolerance=  1.50972105e-001
> Iteration= 10   residual= 2.73382943e+004       tolerance=  7.54860527e-001
> Iteration= 20   residual= 7.27122933e+003       tolerance=  1.50972105e+000
> Iteration= 30   residual= 8.42209039e+003       tolerance=  2.26458158e+000
> Iteration= 40   residual= 2.64670054e+003       tolerance=  3.01944211e+000
> Iteration= 50   residual= 3.17446784e+002       tolerance=  3.77430263e+000
> Iteration= 60   residual= 3.53234217e+001       tolerance=  4.52916316e+000
> Iteration= 66   residual= 3.73014307e+001       tolerance=  4.98207948e+000
> Iteration= 67   residual= 3.75579067e+001       tolerance=  5.05756553e+000
> Iteration= 68   residual= 3.19392727e+000       tolerance=  5.13305158e+000
> *
> *     Number of iterations           : 68
> *     Convergency code               : 3
> *     Final Residual Norm            : 3.19392726797939e+000
> *     PETSC : Resolution time        : 0.888913 seconds
>
>
> ********************************************************************************************************************************************************
>
> WITH 2.3.3p8:
>
>
> Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
>
> RUN1:
> Norme A*Un =   6838.31173987650
> *     Resolution method              : Preconditionned Conjugate Residual
> *     Preconditionner                : BJACOBI with ILU, Blocks of 1
> *
> *     Initial Residual               : 1.50972105381228e+006
> Iteration= 1    residual= 9.59236416e+004       tolerance=  7.54860527e-002
> Iteration= 2    residual= 8.46044988e+004       tolerance=  1.50972105e-001
> Iteration= 10   residual= 2.73382943e+004       tolerance=  7.54860527e-001
> Iteration= 20   residual= 7.27122933e+003       tolerance=  1.50972105e+000
> Iteration= 30   residual= 8.42209039e+003       tolerance=  2.26458158e+000
> Iteration= 40   residual= 2.64670054e+003       tolerance=  3.01944211e+000
> Iteration= 50   residual= 3.17446756e+002       tolerance=  3.77430263e+000
> Iteration= 60   residual= 3.53234489e+001       tolerance=  4.52916316e+000
> Iteration= 65   residual= 7.12874932e+000       tolerance=  4.90659342e+000
> Iteration= 66   residual= 3.72396571e+001       tolerance=  4.98207948e+000
> Iteration= 67   residual= 3.75096723e+001       tolerance=  5.05756553e+000
> Iteration= 68   residual= 3.19515221e+000       tolerance=  5.13305158e+000
> *
> *     Number of iterations           : 68
> *     Convergency code               : 3
> *     Final Residual Norm            : 3.19515221050523e+000
> *     PETSC : Resolution time        : 0.928915 seconds
>
> RUN2:
>
> Norme A*Un =   6838.31173987650
> *     Resolution method              : Preconditionned Conjugate Residual
> *     Preconditionner                : BJACOBI with ILU, Blocks of 1
> *
> *     Initial Residual               : 1.50972105381228e+006
> Iteration= 1    residual= 9.59236416e+004       tolerance=  7.54860527e-002
> Iteration= 2    residual= 8.46044988e+004       tolerance=  1.50972105e-001
> Iteration= 10   residual= 2.73382943e+004       tolerance=  7.54860527e-001
> Iteration= 20   residual= 7.27122933e+003       tolerance=  1.50972105e+000
> Iteration= 30   residual= 8.42209039e+003       tolerance=  2.26458158e+000
> Iteration= 40   residual= 2.64670054e+003       tolerance=  3.01944211e+000
> Iteration= 50   residual= 3.17446774e+002       tolerance=  3.77430263e+000
> Iteration= 60   residual= 3.53233608e+001       tolerance=  4.52916316e+000
> Iteration= 65   residual= 7.12937602e+000       tolerance=  4.90659342e+000
> Iteration= 66   residual= 3.72832632e+001       tolerance=  4.98207948e+000
> Iteration= 67   residual= 3.75447170e+001       tolerance=  5.05756553e+000
> Iteration= 68   residual= 3.19369843e+000       tolerance=  5.13305158e+000
> *
> *     Number of iterations           : 68
> *     Convergency code               : 3
> *     Final Residual Norm            : 3.19369843187027e+000
> *     PETSC : Resolution time        : 0.872702 seconds
> Etienne
>
>
> -----Message d'origine-----
> De : owner-petsc-users at mcs.anl.gov [mailto:owner-petsc-users at mcs.anl.gov] De la part de Matthew Knepley
> Envoyé : mercredi 24 septembre 2008 19:15
> À : petsc-users at mcs.anl.gov
> Objet : Re: Non repetability issue and difference between 2.3.0 and 2.3.3
>
> On Wed, Sep 24, 2008 at 11:21 AM, Etienne PERCHAT
> <etienne.perchat at transvalor.com> wrote:
>> Dear Petsc users,
>>
>>
>>
>> I come again with my comparisons between v2.3.0 and v2.3.3p8.
>>
>>
>>
>> I face a non repeatability issue with v2.3.3 that I didn't have with v2.3.0.
>>
>> I have read the exchanges made in March on a related subject but in my case
>> it is at the first linear system solution that two successive runs differ.
>>
>>
>>
>>
>>
>> It happens when the number of processors used is greater than 2, even on a
>> standard PC.
>>
>> I am solving MPIBAIJ symmetric systems with the Conjugate Residual method
>> preconditioned ILU(1) and Block Jacobi  between subdomains.
>>
>> This system is the results of a FE assembly on an unstructured mesh.
>>
>>
>>
>> I made all the runs using -log_summary and -ksp_truemonitor.
>>
>>
>>
>> Starting with the same initial matrix and RHS, each run using 2.3.3p8
>> provides slightly different results while we obtain exactly the same
>> solution with v2.3.0.
>>
>>
>>
>> With Petsc 2.3.3p8:
>>
>>
>>
>> Run1:   Iteration= 68      residual= 3.19515221e+000       tolerance=
>> 5.13305158e+000 0
>>
>> Run2:    Iteration= 68     residual= 3.19588481e+000       tolerance=
>> 5.13305158e+000 0
>>
>> Run3:    Iteration= 68     residual= 3.19384417e+000       tolerance=
>> 5.13305158e+000 0
>>
>>
>>
>> With Petsc 2.3.0:
>>
>>
>>
>> Run1:   Iteration= 68      residual= 3.19369843e+000       tolerance=
>> 5.13305158e+000 0
>>
>> Run2:   Iteration= 68      residual= 3.19369843e+000       tolerance=
>> 5.13305158e+000 0
>>
>>
>>
>> If I made a 4proc run with a mesh partitioning such that any node could be
>> located on more than 2 proc. I did not face the problem.
>
> It is not clear whether you have verified that on different runs, the
> partitioning is
> exactly the same.
>
>  Matt
>
>> I first thought about a MPI problem related to the order in which messages
>> are received and then summed.
>>
>> But it would have been exactly the same with 2.3.0 ?
>>
>>
>>
>> Any tips/ideas ?
>>
>>
>>
>> Thanks by advance.
>>
>> Best regards,
>>
>>
>>
>> Etienne Perchat
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener