[petsc-users] Petsc has generated inconsistent data!

Dominik Szczerba dominik at itis.ethz.ch
Fri Aug 26 16:58:25 CDT 2011


OK I got it: initial guess.
Was zero, why not, but perturbing it a bit off zero seems to work. I
saw this already before with my own bicgstab implementation, and
always thought this was some bug. So you either have a similar one, or
its a bicgstab's feature :)

Dominik

On Fri, Aug 26, 2011 at 11:51 PM, Dominik Szczerba <dominik at itis.ethz.ch> wrote:
> Valgrind excerpts below, none seems related, and both are there also with gmres.
>
> I just cant believe bicgstab would quit on such a trivial problem...
>
> Regards,
> Dominik
>
> ==10423== Syscall param writev(vector[...]) points to uninitialised byte(s)
> ==10423==    at 0x6B5E789: writev (writev.c:56)
> ==10423==    by 0x515658: MPIDU_Sock_writev (sock_immed.i:610)
> ==10423==    by 0x517943: MPIDI_CH3_iSendv (ch3_isendv.c:84)
> ==10423==    by 0x4FB756: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:509)
> ==10423==    by 0x4FD6E4: MPID_Isend (mpid_isend.c:118)
> ==10423==    by 0x4E502A: MPIC_Isend (helper_fns.c:210)
> ==10423==    by 0xED701D: MPIR_Alltoall (alltoall.c:420)
> ==10423==    by 0xED7880: PMPI_Alltoall (alltoall.c:685)
> ==10423==    by 0xE4BC95: SetUp__ (setup.c:122)
> ==10423==    by 0xE4C4E4: PartitionSmallGraph__ (weird.c:39)
> ==10423==    by 0xE497D0: ParMETIS_V3_PartKway (kmetis.c:131)
> ==10423==    by 0x71B774: MatPartitioningApply_Parmetis (pmetis.c:97)
> ==10423==    by 0x717D70: MatPartitioningApply (partition.c:236)
> ==10423==    by 0x5287D7: PetscSolver::LoadMesh(std::string const&)
> (PetscSolver.cxx:676)
> ==10423==    by 0x4C6157: CD3T10_USER::ProcessInputFile()
> (cd3t10mpi_main.cxx:321)
> ==10423==    by 0x4C3B57: main (cd3t10mpi_main.cxx:568)
> ==10423==  Address 0x71ce9b4 is 4 bytes inside a block of size 72 alloc'd
> ==10423==    at 0x4C28FAC: malloc (vg_replace_malloc.c:236)
> ==10423==    by 0xE5C792: GKmalloc__ (util.c:151)
> ==10423==    by 0xE57C09: PreAllocateMemory__ (memory.c:38)
> ==10423==    by 0xE496B3: ParMETIS_V3_PartKway (kmetis.c:116)
> ==10423==    by 0x71B774: MatPartitioningApply_Parmetis (pmetis.c:97)
> ==10423==    by 0x717D70: MatPartitioningApply (partition.c:236)
> ==10423==    by 0x5287D7: PetscSolver::LoadMesh(std::string const&)
> (PetscSolver.cxx:676)
> ==10423==    by 0x4C6157: CD3T10_USER::ProcessInputFile()
> (cd3t10mpi_main.cxx:321)
> ==10423==    by 0x4C3B57: main (cd3t10mpi_main.cxx:568)
> ==10423==
> ==10423== Conditional jump or move depends on uninitialised value(s)
> ==10423==    at 0x55B6510: inflateReset2 (in
> /lib/x86_64-linux-gnu/libz.so.1.2.3.4)
> ==10423==    by 0x55B6605: inflateInit2_ (in
> /lib/x86_64-linux-gnu/libz.so.1.2.3.4)
> ==10423==    by 0x5308C13: H5Z_filter_deflate (H5Zdeflate.c:110)
> ==10423==    by 0x5308170: H5Z_pipeline (H5Z.c:1103)
> ==10423==    by 0x518BB69: H5D_chunk_lock (H5Dchunk.c:2758)
> ==10423==    by 0x518CB20: H5D_chunk_read (H5Dchunk.c:1728)
> ==10423==    by 0x519BDDA: H5D_read (H5Dio.c:447)
> ==10423==    by 0x519C248: H5Dread (H5Dio.c:173)
> ==10423==    by 0x4CEB99: HDF5::HDF5Reader::readData(std::string
> const&) (HDF5Reader.cxx:634)
> ==10423==    by 0x4CE305: HDF5::HDF5Reader::read(std::vector<int,
> std::allocator<int> >&, std::string const&) (HDF5Reader.cxx:527)
> ==10423==    by 0x4C71A3: CD3T10_USER::SetupConstraints()
> (cd3t10mpi_main.cxx:404)
> ==10423==    by 0x4BA02A: CD3T10::Solve() (CD3T10mpi.cxx:658)
> ==10423==    by 0x4C3BE1: main (cd3t10mpi_main.cxx:590)
> ==10423==
>
> On Fri, Aug 26, 2011 at 11:31 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>  First run with valgrind http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind to make sure it does not have some "code bug cause".
>>
>>   Do you get the same message on one process.
>>
>>   My previous message still holds, if it is not a "code bug" then bi-CG-stab is just breaking down on that matrix and preconditioner combination.
>>
>>   Barry
>>
>> On Aug 26, 2011, at 4:27 PM, Dominik Szczerba wrote:
>>
>>> Later in the message he only requested that I use "-ksp_norm_type
>>> unpreconditioned". So I did, and the error comes back, now fully
>>> documented below. As I wrote, it works fine with gmres, and the
>>> problem is very simple, diagonal dominant steady state diffusion.
>>>
>>> Any hints are highly appreciated.
>>>
>>> Dominik
>>>
>>> #PETSc Option Table entries:
>>> -ksp_converged_reason
>>> -ksp_converged_use_initial_residual_norm
>>> -ksp_monitor_true_residual
>>> -ksp_norm_type unpreconditioned
>>> -ksp_rtol 1e-3
>>> -ksp_type bcgs
>>> -ksp_view
>>> -log_summary
>>> -pc_type jacobi
>>> #End of PETSc Option Table entries
>>>  0 KSP preconditioned resid norm 1.166190378969e+01 true resid norm
>>> 1.166190378969e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>>  1 KSP preconditioned resid norm 5.658835826231e-01 true resid norm
>>> 5.658835826231e-01 ||Ae||/||Ax|| 4.852411688762e-02
>>> [0]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [0]PETSC ERROR: Petsc has generated inconsistent data!
>>> [0]PETSC ERROR: Divide by zero!
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [2]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [2]PETSC ERROR: Petsc has generated inconsistent data!
>>> [2]PETSC ERROR: Divide by zero!
>>> [2]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [2]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>> 13:37:48 CDT 2011
>>> [2]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [2]PETSC ERROR: See docs/index.html for manual pages.
>>> [2]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [2]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on
>>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011
>>> [2]PETSC ERROR: Libraries linked from
>>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>>> [2]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011
>>> [2]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>>> --download-f-blas-lapack=CD3T10::SaveSolution()
>>> [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>> 13:37:48 CDT 2011
>>> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [0]PETSC ERROR: See docs/index.html for manual pages.
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on
>>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011
>>> [0]PETSC ERROR: Libraries linked from
>>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>>> [0]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011
>>> [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>>> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1
>>> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: KSPSolve_BCGS() line 75 in
>>> src/ksp/ksp/impls/bcgs/[1]PETSC ERROR: --------------------- Error
>>> Message ------------------------------------
>>> [1]PETSC ERROR: Petsc has generated inconsistent data!
>>> [1]PETSC ERROR: Divide by zero!
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [1]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>> 13:37:48 CDT 2011
>>> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [1]PETSC ERROR: See docs/index.html for manual pages.
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [1]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on
>>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011
>>> [1]PETSC ERROR: Libraries linked from
>>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>>> [1]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011
>>> [1]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>>> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1
>>> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1
>>> [2]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [2]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c
>>> [2]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c
>>> [2]PETSC ERROR: User provided function() line 1215 in
>>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx
>>> [3]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [3]PETSC ERROR: Petsc has generated inconsistent data!
>>> [3]PETSC ERROR: Divide by zero!
>>> [3]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [3]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>> 13:37:48 CDT 2011
>>> [3]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [3]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [3]PETSC ERROR: See docs/index.html for manual pages.
>>> [3]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [3]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on
>>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011
>>> [3]PETSC ERROR: Libraries linked from
>>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>>> [3]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011
>>> [3]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>>> --download-f-blas-lapack=bcgs.c
>>> [0]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c
>>> [0]PETSC ERROR: User provided function() line 1215 in
>>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx
>>> 1 --download-mpich=1 --download-hypre=1 --with-parmetis=1
>>> --download-parmetis=1 --with-x=0 --with-debugging=1
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [1]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c
>>> [1]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c
>>> [1]PETSC ERROR: User provided function() line 1215 in
>>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx
>>> 1 --download-mpich=1 --download-hypre=1 --with-parmetis=1
>>> --download-parmetis=1 --with-x=0 --with-debugging=1
>>> [3]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [3]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c
>>> [3]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c
>>> [3]PETSC ERROR: User provided function() line 1215 in
>>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx
>>> PetscSolver::Finalize()
>>> PetscSolver::FinalizePetsc()
>>>
>>>
>>> On Fri, Aug 26, 2011 at 11:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>
>>>>  Are you sure that is the entire error message. It should print the routine and the line number where this happens.
>>>>
>>>>   Likely it is at
>>>>
>>>>  do {
>>>>    ierr = VecDot(R,RP,&rho);CHKERRQ(ierr);       /*   rho <- (r,rp)      */
>>>>    beta = (rho/rhoold) * (alpha/omegaold);
>>>>    ierr = VecAXPBYPCZ(P,1.0,-omegaold*beta,beta,R,V);CHKERRQ(ierr);  /* p <- r - omega * beta* v + beta * p */
>>>>    ierr = KSP_PCApplyBAorAB(ksp,P,V,T);CHKERRQ(ierr);  /*   v <- K p           */
>>>>    ierr = VecDot(V,RP,&d1);CHKERRQ(ierr);
>>>>    if (d1 == 0.0) SETERRQ(((PetscObject)ksp)->comm,PETSC_ERR_PLIB,"Divide by zero");
>>>>    alpha = rho / d1;                 /*   a <- rho / (v,rp)  */
>>>>
>>>>  Which means bi-cg-stab has broken down. You'll need to consult references to Bi-CG-stab to see why this might happen (it can happen while GMRES is happy). It may be KSPBCGSL can proceed past this point with a problem values for
>>>>   Options Database Keys:
>>>> +  -ksp_bcgsl_ell <ell> Number of Krylov search directions
>>>> -  -ksp_bcgsl_cxpol Use a convex function of the MR and OR polynomials after the BiCG step
>>>> -  -ksp_bcgsl_xres <res> Threshold used to decide when to refresh computed residuals
>>>>
>>>> but most likely the preconditioner or matrix is bogus in some way since I think Bi-CG-stab rarely breaks down in practice.
>>>>
>>>>
>>>>  Barry
>>>>
>>>>
>>>>
>>>> On Aug 26, 2011, at 4:05 PM, Dominik Szczerba wrote:
>>>>
>>>>> When solving my linear system with -ksp_type bcgs I get:
>>>>>
>>>>>  0 KSP preconditioned resid norm 1.166190378969e+01 true resid norm
>>>>> 1.166190378969e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>>>>  1 KSP preconditioned resid norm 5.658835826231e-01 true resid norm
>>>>> 5.658835826231e-01 ||Ae||/||Ax|| 4.852411688762e-02
>>>>> [1]PETSC ERROR: --------------------- Error Message
>>>>> ------------------------------------
>>>>> [1]PETSC ERROR: Petsc has generated inconsistent data!
>>>>> [1]PETSC ERROR: Divide by zero!
>>>>> [1]PETSC ERROR:
>>>>> ------------------------------------------------------------------------
>>>>> [1]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>>>> 13:37:48 CDT 2011
>>>>> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
>>>>> [1]PETSC ERROR: [3]PETSC ERROR: --------------------- Error Message
>>>>> ------------------------------------
>>>>> [3]PETSC ERROR: Petsc has generated inconsistent data!
>>>>> [3]PETSC ERROR: Divide by zero!
>>>>>
>>>>> PETSc Option Table entries:
>>>>> -ksp_converged_reason
>>>>> -ksp_monitor
>>>>> -ksp_monitor_true_residual
>>>>> -ksp_norm_type unpreconditioned
>>>>> -ksp_right_pc
>>>>> -ksp_rtol 1e-3
>>>>> -ksp_type bcgs
>>>>> -ksp_view
>>>>> -log_summary
>>>>> -pc_type jacobi
>>>>>
>>>>> When solving the same system with GMRES all works fine. This is a
>>>>> simple test diffusion problem. How can I find out what the problem is?
>>>>>
>>>>> Dominik
>>>>
>>>>
>>
>>
>


More information about the petsc-users mailing list