[petsc-users] Petsc has generated inconsistent data!

Barry Smith bsmith at mcs.anl.gov
Fri Aug 26 16:56:27 CDT 2011


On Aug 26, 2011, at 4:51 PM, Dominik Szczerba wrote:

> Valgrind excerpts below, none seems related, and both are there also with gmres.
> 
> I just cant believe bicgstab would quit on such a trivial problem...

   The bicgstab is used extensively so it is unlikely it has a trivial bug in it or it would have come up before.  Fortunately it stops during the first iteration so you can just run the program with the option -start_in_debugger  then use cont in the debugger then when it errors out you can look at the numerical values and use VecView(vector,0) to look at the various vectors (run on one process, of course,) and look at the matrix to see why it is breaking down. 

   Barry

> 
> Regards,
> Dominik
> 
> ==10423== Syscall param writev(vector[...]) points to uninitialised byte(s)
> ==10423==    at 0x6B5E789: writev (writev.c:56)
> ==10423==    by 0x515658: MPIDU_Sock_writev (sock_immed.i:610)
> ==10423==    by 0x517943: MPIDI_CH3_iSendv (ch3_isendv.c:84)
> ==10423==    by 0x4FB756: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:509)
> ==10423==    by 0x4FD6E4: MPID_Isend (mpid_isend.c:118)
> ==10423==    by 0x4E502A: MPIC_Isend (helper_fns.c:210)
> ==10423==    by 0xED701D: MPIR_Alltoall (alltoall.c:420)
> ==10423==    by 0xED7880: PMPI_Alltoall (alltoall.c:685)
> ==10423==    by 0xE4BC95: SetUp__ (setup.c:122)
> ==10423==    by 0xE4C4E4: PartitionSmallGraph__ (weird.c:39)
> ==10423==    by 0xE497D0: ParMETIS_V3_PartKway (kmetis.c:131)
> ==10423==    by 0x71B774: MatPartitioningApply_Parmetis (pmetis.c:97)
> ==10423==    by 0x717D70: MatPartitioningApply (partition.c:236)
> ==10423==    by 0x5287D7: PetscSolver::LoadMesh(std::string const&)
> (PetscSolver.cxx:676)
> ==10423==    by 0x4C6157: CD3T10_USER::ProcessInputFile()
> (cd3t10mpi_main.cxx:321)
> ==10423==    by 0x4C3B57: main (cd3t10mpi_main.cxx:568)
> ==10423==  Address 0x71ce9b4 is 4 bytes inside a block of size 72 alloc'd
> ==10423==    at 0x4C28FAC: malloc (vg_replace_malloc.c:236)
> ==10423==    by 0xE5C792: GKmalloc__ (util.c:151)
> ==10423==    by 0xE57C09: PreAllocateMemory__ (memory.c:38)
> ==10423==    by 0xE496B3: ParMETIS_V3_PartKway (kmetis.c:116)
> ==10423==    by 0x71B774: MatPartitioningApply_Parmetis (pmetis.c:97)
> ==10423==    by 0x717D70: MatPartitioningApply (partition.c:236)
> ==10423==    by 0x5287D7: PetscSolver::LoadMesh(std::string const&)
> (PetscSolver.cxx:676)
> ==10423==    by 0x4C6157: CD3T10_USER::ProcessInputFile()
> (cd3t10mpi_main.cxx:321)
> ==10423==    by 0x4C3B57: main (cd3t10mpi_main.cxx:568)
> ==10423==
> ==10423== Conditional jump or move depends on uninitialised value(s)
> ==10423==    at 0x55B6510: inflateReset2 (in
> /lib/x86_64-linux-gnu/libz.so.1.2.3.4)
> ==10423==    by 0x55B6605: inflateInit2_ (in
> /lib/x86_64-linux-gnu/libz.so.1.2.3.4)
> ==10423==    by 0x5308C13: H5Z_filter_deflate (H5Zdeflate.c:110)
> ==10423==    by 0x5308170: H5Z_pipeline (H5Z.c:1103)
> ==10423==    by 0x518BB69: H5D_chunk_lock (H5Dchunk.c:2758)
> ==10423==    by 0x518CB20: H5D_chunk_read (H5Dchunk.c:1728)
> ==10423==    by 0x519BDDA: H5D_read (H5Dio.c:447)
> ==10423==    by 0x519C248: H5Dread (H5Dio.c:173)
> ==10423==    by 0x4CEB99: HDF5::HDF5Reader::readData(std::string
> const&) (HDF5Reader.cxx:634)
> ==10423==    by 0x4CE305: HDF5::HDF5Reader::read(std::vector<int,
> std::allocator<int> >&, std::string const&) (HDF5Reader.cxx:527)
> ==10423==    by 0x4C71A3: CD3T10_USER::SetupConstraints()
> (cd3t10mpi_main.cxx:404)
> ==10423==    by 0x4BA02A: CD3T10::Solve() (CD3T10mpi.cxx:658)
> ==10423==    by 0x4C3BE1: main (cd3t10mpi_main.cxx:590)
> ==10423==
> 
> On Fri, Aug 26, 2011 at 11:31 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>  First run with valgrind http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind to make sure it does not have some "code bug cause".
>> 
>>   Do you get the same message on one process.
>> 
>>   My previous message still holds, if it is not a "code bug" then bi-CG-stab is just breaking down on that matrix and preconditioner combination.
>> 
>>   Barry
>> 
>> On Aug 26, 2011, at 4:27 PM, Dominik Szczerba wrote:
>> 
>>> Later in the message he only requested that I use "-ksp_norm_type
>>> unpreconditioned". So I did, and the error comes back, now fully
>>> documented below. As I wrote, it works fine with gmres, and the
>>> problem is very simple, diagonal dominant steady state diffusion.
>>> 
>>> Any hints are highly appreciated.
>>> 
>>> Dominik
>>> 
>>> #PETSc Option Table entries:
>>> -ksp_converged_reason
>>> -ksp_converged_use_initial_residual_norm
>>> -ksp_monitor_true_residual
>>> -ksp_norm_type unpreconditioned
>>> -ksp_rtol 1e-3
>>> -ksp_type bcgs
>>> -ksp_view
>>> -log_summary
>>> -pc_type jacobi
>>> #End of PETSc Option Table entries
>>>  0 KSP preconditioned resid norm 1.166190378969e+01 true resid norm
>>> 1.166190378969e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>>  1 KSP preconditioned resid norm 5.658835826231e-01 true resid norm
>>> 5.658835826231e-01 ||Ae||/||Ax|| 4.852411688762e-02
>>> [0]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [0]PETSC ERROR: Petsc has generated inconsistent data!
>>> [0]PETSC ERROR: Divide by zero!
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [2]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [2]PETSC ERROR: Petsc has generated inconsistent data!
>>> [2]PETSC ERROR: Divide by zero!
>>> [2]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [2]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>> 13:37:48 CDT 2011
>>> [2]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [2]PETSC ERROR: See docs/index.html for manual pages.
>>> [2]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [2]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on
>>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011
>>> [2]PETSC ERROR: Libraries linked from
>>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>>> [2]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011
>>> [2]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>>> --download-f-blas-lapack=CD3T10::SaveSolution()
>>> [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>> 13:37:48 CDT 2011
>>> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [0]PETSC ERROR: See docs/index.html for manual pages.
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on
>>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011
>>> [0]PETSC ERROR: Libraries linked from
>>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>>> [0]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011
>>> [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>>> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1
>>> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: KSPSolve_BCGS() line 75 in
>>> src/ksp/ksp/impls/bcgs/[1]PETSC ERROR: --------------------- Error
>>> Message ------------------------------------
>>> [1]PETSC ERROR: Petsc has generated inconsistent data!
>>> [1]PETSC ERROR: Divide by zero!
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [1]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>> 13:37:48 CDT 2011
>>> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [1]PETSC ERROR: See docs/index.html for manual pages.
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [1]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on
>>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011
>>> [1]PETSC ERROR: Libraries linked from
>>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>>> [1]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011
>>> [1]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>>> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1
>>> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1
>>> [2]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [2]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c
>>> [2]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c
>>> [2]PETSC ERROR: User provided function() line 1215 in
>>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx
>>> [3]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [3]PETSC ERROR: Petsc has generated inconsistent data!
>>> [3]PETSC ERROR: Divide by zero!
>>> [3]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [3]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>> 13:37:48 CDT 2011
>>> [3]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [3]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [3]PETSC ERROR: See docs/index.html for manual pages.
>>> [3]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [3]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on
>>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011
>>> [3]PETSC ERROR: Libraries linked from
>>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>>> [3]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011
>>> [3]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>>> --download-f-blas-lapack=bcgs.c
>>> [0]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c
>>> [0]PETSC ERROR: User provided function() line 1215 in
>>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx
>>> 1 --download-mpich=1 --download-hypre=1 --with-parmetis=1
>>> --download-parmetis=1 --with-x=0 --with-debugging=1
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [1]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c
>>> [1]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c
>>> [1]PETSC ERROR: User provided function() line 1215 in
>>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx
>>> 1 --download-mpich=1 --download-hypre=1 --with-parmetis=1
>>> --download-parmetis=1 --with-x=0 --with-debugging=1
>>> [3]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [3]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c
>>> [3]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c
>>> [3]PETSC ERROR: User provided function() line 1215 in
>>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx
>>> PetscSolver::Finalize()
>>> PetscSolver::FinalizePetsc()
>>> 
>>> 
>>> On Fri, Aug 26, 2011 at 11:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>  Are you sure that is the entire error message. It should print the routine and the line number where this happens.
>>>> 
>>>>   Likely it is at
>>>> 
>>>>  do {
>>>>    ierr = VecDot(R,RP,&rho);CHKERRQ(ierr);       /*   rho <- (r,rp)      */
>>>>    beta = (rho/rhoold) * (alpha/omegaold);
>>>>    ierr = VecAXPBYPCZ(P,1.0,-omegaold*beta,beta,R,V);CHKERRQ(ierr);  /* p <- r - omega * beta* v + beta * p */
>>>>    ierr = KSP_PCApplyBAorAB(ksp,P,V,T);CHKERRQ(ierr);  /*   v <- K p           */
>>>>    ierr = VecDot(V,RP,&d1);CHKERRQ(ierr);
>>>>    if (d1 == 0.0) SETERRQ(((PetscObject)ksp)->comm,PETSC_ERR_PLIB,"Divide by zero");
>>>>    alpha = rho / d1;                 /*   a <- rho / (v,rp)  */
>>>> 
>>>>  Which means bi-cg-stab has broken down. You'll need to consult references to Bi-CG-stab to see why this might happen (it can happen while GMRES is happy). It may be KSPBCGSL can proceed past this point with a problem values for
>>>>   Options Database Keys:
>>>> +  -ksp_bcgsl_ell <ell> Number of Krylov search directions
>>>> -  -ksp_bcgsl_cxpol Use a convex function of the MR and OR polynomials after the BiCG step
>>>> -  -ksp_bcgsl_xres <res> Threshold used to decide when to refresh computed residuals
>>>> 
>>>> but most likely the preconditioner or matrix is bogus in some way since I think Bi-CG-stab rarely breaks down in practice.
>>>> 
>>>> 
>>>>  Barry
>>>> 
>>>> 
>>>> 
>>>> On Aug 26, 2011, at 4:05 PM, Dominik Szczerba wrote:
>>>> 
>>>>> When solving my linear system with -ksp_type bcgs I get:
>>>>> 
>>>>>  0 KSP preconditioned resid norm 1.166190378969e+01 true resid norm
>>>>> 1.166190378969e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>>>>  1 KSP preconditioned resid norm 5.658835826231e-01 true resid norm
>>>>> 5.658835826231e-01 ||Ae||/||Ax|| 4.852411688762e-02
>>>>> [1]PETSC ERROR: --------------------- Error Message
>>>>> ------------------------------------
>>>>> [1]PETSC ERROR: Petsc has generated inconsistent data!
>>>>> [1]PETSC ERROR: Divide by zero!
>>>>> [1]PETSC ERROR:
>>>>> ------------------------------------------------------------------------
>>>>> [1]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>>>>> 13:37:48 CDT 2011
>>>>> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
>>>>> [1]PETSC ERROR: [3]PETSC ERROR: --------------------- Error Message
>>>>> ------------------------------------
>>>>> [3]PETSC ERROR: Petsc has generated inconsistent data!
>>>>> [3]PETSC ERROR: Divide by zero!
>>>>> 
>>>>> PETSc Option Table entries:
>>>>> -ksp_converged_reason
>>>>> -ksp_monitor
>>>>> -ksp_monitor_true_residual
>>>>> -ksp_norm_type unpreconditioned
>>>>> -ksp_right_pc
>>>>> -ksp_rtol 1e-3
>>>>> -ksp_type bcgs
>>>>> -ksp_view
>>>>> -log_summary
>>>>> -pc_type jacobi
>>>>> 
>>>>> When solving the same system with GMRES all works fine. This is a
>>>>> simple test diffusion problem. How can I find out what the problem is?
>>>>> 
>>>>> Dominik
>>>> 
>>>> 
>> 
>> 



More information about the petsc-users mailing list