[petsc-users] no-debug-mode problem

Barry Smith bsmith at mcs.anl.gov
Sat Jun 29 01:18:26 CDT 2013


   These ones are not the problem, but they are annoying since they make it hard to find real problems.

   Install PETSc with --download-mpich instead of using that MPI you have used. PETSc installs a valgrind clean MPI that will not have MPI false positives. Then compile your code with that version of PETSc and run with valgrind and send us the output.

   Barry



On Jun 29, 2013, at 1:10 AM, Longxiang Chen <suifengls at gmail.com> wrote:

> When I don't use -O2 or -O3, both compilers produce the correct answers.
> When I use ifort with -O2/-O3, it produces wrong answer.
> When I use gfortran-4.7 with -O2/-O3, it cannot even read in the input file. 
> When I use gofrtran-4.4 with -O2/-O3, it also produce correct answer.
> I think the problem is on the Fortran code.
> 
> All the error information is about "used uninitialized" from valgrind:
> 
> ==8397== Memcheck, a memory error detector
> ==8397== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
> ==8397== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info
> ==8397== Command: ./eco2n
> ==8397== 
> ==8397== Conditional jump or move depends on uninitialised value(s)
> ==8397==    at 0x11E1F367: __intel_sse2_strcat (in /opt/intel/composer_xe_2013.3.163/compiler/lib/intel64/libirc.so)
> ==8397==    by 0xC728F0: read_configuration_files (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xC20DE3: MPID_Init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBEAC54: MPIR_Init_thread (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBEA443: PMPI_Init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBB6CAC: mpi_init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0x43C724: MAIN__ (t2cg22.f:319)
> ==8397==    by 0x40C2CB: main (in /home/lchen/seq_v2/eco2n)
> ==8397== 
> ==8397== Use of uninitialised value of size 8
> ==8397==    at 0x11E1F3DA: __intel_sse2_strcat (in /opt/intel/composer_xe_2013.3.163/compiler/lib/intel64/libirc.so)
> ==8397==    by 0xC728F0: read_configuration_files (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xC20DE3: MPID_Init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBEAC54: MPIR_Init_thread (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBEA443: PMPI_Init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBB6CAC: mpi_init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0x43C724: MAIN__ (t2cg22.f:319)
> ==8397==    by 0x40C2CB: main (in /home/lchen/seq_v2/eco2n)
> 
> 
> ==8397== Memcheck, a memory error detector
> ==8397== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
> ==8397== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info
> ==8397== Command: ./eco2n
> ==8397== 
> ==8397== Conditional jump or move depends on uninitialised value(s)
> ==8397==    at 0x11E1F367: __intel_sse2_strcat (in /opt/intel/composer_xe_2013.3.163/compiler/lib/intel64/libirc.so)
> ==8397==    by 0xC728F0: read_configuration_files (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xC20DE3: MPID_Init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBEAC54: MPIR_Init_thread (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBEA443: PMPI_Init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBB6CAC: mpi_init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0x43C724: MAIN__ (t2cg22.f:319)
> ==8397==    by 0x40C2CB: main (in /home/lchen/seq_v2/eco2n)
> ==8397== 
> ==8397== Use of uninitialised value of size 8
> ==8397==    at 0x11E1F3DA: __intel_sse2_strcat (in /opt/intel/composer_xe_2013.3.163/compiler/lib/intel64/libirc.so)
> ==8397==    by 0xC728F0: read_configuration_files (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xC20DE3: MPID_Init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBEAC54: MPIR_Init_thread (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBEA443: PMPI_Init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0xBB6CAC: mpi_init (in /home/lchen/seq_v2/eco2n)
> ==8397==    by 0x43C724: MAIN__ (t2cg22.f:319)
> ==8397==    by 0x40C2CB: main (in /home/lchen/seq_v2/eco2n)
> 
> ...
> hundreds of similar error.
> 
> Best regards,
> Longxiang Chen
> 
> 
> 
> On Fri, Jun 28, 2013 at 11:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> On Jun 29, 2013, at 12:24 AM, Longxiang Chen <suifengls at gmail.com> wrote:
> 
> > I find that the fortran program cannot add optimization flag (-O2 or -O3) if I use gfortran-4.7.2 or ifort to compile.
> 
>    What do you mean? Do you mean to say that if use -O2 or -O3 flags with gfortran 4.7.2 or ifort  it produces the wrong answer? Does it produce the same wrong answer with both compilers?  If you use no optimization with both compilers does it produce the "correct" answer?
> 
> > When I change the compiler to gfortran-4.4, now the --with-debug=0 (adding -O3) works.
> 
>    but if you use -O3 flag with gfortran 4.4 it runs with the correct answer?
> 
>    Since it works incorrectly with two very different fortran compilers (optimized) it is more likely there is something wrong with your  code than with both compilers. You most definitely should run with valgrind and make sure there is no memory corruption in your code.
> 
>    Barry
> 
> >
> > Thank you all.
> >
> > Best regards,
> > Longxiang Chen
> >
> >
> >
> >
> > On Fri, Jun 28, 2013 at 3:25 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   Run both debugged and optimized versions with valgrind: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >
> >    Barry
> >
> > On Jun 28, 2013, at 3:30 PM, Longxiang Chen <suifengls at gmail.com> wrote:
> >
> > > A is formed by three arrays IA[NZ], JA[NZ] and VA[NZ]
> > > IA[i] is row index, JA[i] is column index and VA[i] is the value in (IA[i], JA[i]).
> > >
> > > For intel compiler:
> > > when I use --with-debugging=0, the VA[] is not correct. I don't know what kind of optimization it does.
> > >
> > > A[0][0] = 1.000000e-25
> > > A[0][1] = 0.000000e+00
> > > A[0][2] = 0.000000e+00
> > > A[1][0] = 0.000000e+00
> > > A[1][1] = -3.479028e+02
> > > A[1][2] = 0.000000e+00
> > > A[2][0] = 0.000000e+00
> > > A[2][1] = 0.000000e+00
> > > A[2][2] = -3.479028e+02
> > > A[3][3] = 1.000000e-25
> > > ...
> > >
> > > CORRECT:
> > > A[0][0] = -2.961372e-07
> > > A[0][1] = 1.160201e+02
> > > A[0][2] = 2.744589e+02
> > > A[1][0] = 0.000000e+00
> > > A[1][1] = -3.479028e+02
> > > A[1][2] = 0.000000e+00
> > > A[2][0] = -8.332708e-08
> > > A[2][1] = 0.000000e+00
> > > A[2][2] = -3.479028e+02
> > > A[3][3] = -3.027917e-07
> > > ...
> > >
> > > For gcc-4.7.2:
> > > when I use --with-debugging=0, the fortran main function cannot read input data before it starts the LOOP, the ksp_solver() is called inside the loop.
> > > ===>
> > >  ERRONEOUS DATA INITIALIZATION            STOP EXECUTION---------
> > >
> > > Best regards,
> > > Longxiang Chen
> > >
> > > Do something everyday that gets you closer to being done.
> > >
> > >
> > >
> > > On Thu, Jun 27, 2013 at 6:42 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > > On Jun 27, 2013, at 2:23 PM, Tabrez Ali <stali at geology.wisc.edu> wrote:
> > >
> > > > Fortran can be tricky.
> > > >
> > > > Try to run the program in valgrind and/or recheck all your arguments. I once forgot MatAssemblyType in the call to MatAssembly and the code still ran fine on one machine but failed on other. It is better to test with at least 2-3 compilers (GNU, Solaris and Open64 Fortran/C compilers are all free on Linux).
> > > >
> > > > T
> > >
> > >    You can also run both versions with -snes_monitor -ksp_monitor and see if they both start out the same way.
> > >
> > >    Barry
> > >
> > > >
> > > >
> > > > On 27.06.2013 14:52, Longxiang Chen wrote:
> > > >> Dear all,
> > > >>
> > > >> I use ksp_solver to solve Ax=b, where A is from an outer loop of PDE.
> > > >> Under debug mode(default), it solves the problem in about 4000
> > > >> iterations.
> > > >> And the final answer is correct (comparing to another solver).
> > > >>
> > > >> I use intel compiler.
> > > >> The program is in Fortran (by mpif90), except the solver is in c (by
> > > >> mpicc).
> > > >>
> > > >> However, when I re-configure with --with-debugging=0 (the only
> > > >> change),
> > > >> the program terminates in about 30 iterations with the wrong final
> > > >> solution.
> > > >>
> > > >> Thank you.
> > > >>
> > > >> Best regards,
> > > >> Longxiang Chen
> > > >>
> > > >> Do something everyday that gets you closer to being done.
> > > >
> > >
> > >
> >
> >
> 
> 



More information about the petsc-users mailing list