[petsc-users] The PETSC ERROR: VecMAXPY() when using GMRES
Barry Smith
bsmith at mcs.anl.gov
Wed Mar 11 22:22:14 CDT 2015
Interesting, I was wrong it has nothing to do with Nan. Either the MPI_Allreduce() in the sum or in max (for the check of consistency) is producing ever slightly different numbers thus triggering our error (though it appears no true error has occurred). I have never seen this before. We may have to rethink our consistency checker in the code.
You can force the debugging version of PETSc to not do the consistency checking by editing include/petsc-private/petscimpl.h and locating the lines
#if !defined(PETSC_USE_DEBUG)
#define PetscCheckSameType(a,arga,b,argb) do {} while (0)
and changing it to start with
#if 1
then run make in the PETSc root directory and relink your program. With this change the debug version should not produce the error message you have been seeing.
Has anyone else ever seen anything like this?
Barry
> On Mar 11, 2015, at 8:35 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
>
> Thanks for your suggestion. Yes, but sorry that unfortunately I'm not allowed to upgrade it.
>
> As for the preconditioner. I tried with -pc_type none and still have the same error message.
>
> Sorry, I am not clear. The code crashes at PetscValidLogicalCollectiveScalar(y,alpha[i],3) and I have seen that on both processors b2 = {-0.013169008988739605, 0.013169008988739673}, Forgive me if I ask a silly question. Is it because we compare two double precision number with the operator '!=' ? Looking at line 309, Of course, (-b2[0] != b2[1]) is true, right?
>
> Thanks for your help.
>
> I pasted PetscValidLogicalCollectiveScalar here:
>
> 303 #define PetscValidLogicalCollectiveScalar(a,b,c) \
> 304 do { \
> 305 PetscErrorCode _7_ierr; \
> 306 PetscReal b1[2],b2[2]; \
> 307 b1[0] = -PetscRealPart(b); b1[1] = PetscRealPart(b); \
> 308 _7_ierr = MPI_Allreduce(b1,b2,2,MPIU_REAL,MPIU_MAX,((PetscObject)a)->comm);CHKERRQ(_7_ierr); \
> 309 if (-b2[0] != b2[1]) SETERRQ1(((PetscObject)a)->comm,PETSC_ERR_ARG_WRONG,"Scalar value must be same on all processes, argument # %d",c); \
> 310 } while (0)
>
>
> Regards,
> Song
>
> 2015-03-11 20:16 GMT-04:00 Barry Smith <bsmith at mcs.anl.gov>:
>
> In IEEE floating point standard Nan is not equal to itself. This is what triggers these types of non-intuitive error messages.
>
> My guess is that the preconditioner is triggering a Nan; for example ILU sometimes triggers Nan due to very small pivots.
>
> You are using an old version of PETSc the first step I recommend is to upgrade http://www.mcs.anl.gov/petsc/documentation/changes/index.html; the more recent version of PETSc has more checks for Nan etc in the code.
>
> Barry
>
>
>
> > On Mar 11, 2015, at 7:05 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> >
> > Thank you. That's cool. Sorry, I'm not good at gdb.
> >
> > I did that. They are the same. One gave
> > (gdb) p *alpha
> > $1 = 0.013169008988739605
> >
> > and another gave
> > (gdb) p *alpha
> > $1 = 0.013169008988739673
> >
> > 2015-03-11 17:35 GMT-04:00 Matthew Knepley <knepley at gmail.com>:
> > On Wed, Mar 11, 2015 at 3:39 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> > Thanks.
> >
> > I run with two processes. When the code stop, I'm in raise() and alpha is not in the current context.
> >
> > Here you would use:
> >
> > (gdb) up 4
> >
> > (gdb) p *alpha
> >
> > Matt
> >
> > (gdb)p alpha
> > No symbol "alpha" in current context.
> > (gdb) bt
> > #0 0x0000003764432625 in raise () from /lib64/libc.so.6
> > #1 0x0000003764433e05 in abort () from /lib64/libc.so.6
> > #2 0x00000000015d02f5 in PetscAbortErrorHandler (comm=0x36e17a0, line=1186,
> > fun=0x279cad4 "VecMAXPY", file=0x279c404 "rvector.c",
> > dir=0x279c1c8 "src/vec/vec/interface/", n=62, p=PETSC_ERROR_INITIAL,
> > mess=0x7fff33ffa4c0 "Scalar value must be same on all processes, argument # 3", ctx=0x0) at errabort.c:62
> > #3 0x000000000130cf44 in PetscError (comm=0x36e17a0, line=1186,
> > func=0x279cad4 "VecMAXPY", file=0x279c404 "rvector.c",
> > dir=0x279c1c8 "src/vec/vec/interface/", n=62, p=PETSC_ERROR_INITIAL,
> > mess=0x279c720 "Scalar value must be same on all processes, argument # %d")
> > at err.c:356
> > #4 0x00000000013f8184 in VecMAXPY (y=0x3b35000, nv=20, alpha=0x3b31840,
> > x=0x3b33080) at rvector.c:1186
> > #5 0x0000000001581062 in KSPGMRESBuildSoln (nrs=0x3b31840, vs=0x3ab2090,
> > vdest=0x3ab2090, ksp=0x39a9700, it=19) at gmres.c:345
> >
> >
> > But I break at VecMAXPY. then print out alpha on both processes. For the first few times that the breakpoint is hit , I checked the values on both processes and they are the same.
> >
> >
> > (gdb) b VecMAXPY
> > Breakpoint 1 at 0x13f73e0: file rvector.c, line 1174.
> > (gdb) c
> > Continuing.
> > Breakpoint 1, VecMAXPY (y=0x3f2b790, nv=1, alpha=0x3f374e0, x=0x3f1fde0)
> > at rvector.c:1174
> > 1174 PetscFunctionBegin;
> > (gdb) p alpha
> > $1 = (const PetscScalar *) 0x3f374e0
> > (gdb) p *alpha
> > $2 = -0.54285016977140765
> > (gdb)
> >
> > 2015-03-11 15:52 GMT-04:00 Matthew Knepley <knepley at gmail.com>:
> > On Wed, Mar 11, 2015 at 2:33 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> > Hello,
> >
> > I'm solving Navier-Stokes equations by finite element method. I use KSP as the linear system solver. I run with 2 cpu. The code runs fine in non-debug version. But when I switch to the debug version, the code gives the following error.
> >
> > I output the matrix and rhs before calling KSPSolve to make sure no NAN or INF in them. The condition number of matrix is ~2e4. Seems okay.
> >
> > I also run the code with valgrind, but didn't find any other errors. The valgrind output is attached. Any ideas of what I can do next?
> >
> > Is there any chance you could spawn the debugger, -start_in_debugger, and when you get the error,
> > print out the value of 'alpha' on each process?
> >
> > Otherwise the best thing to do is output your Mat and RHS in binary and send them so we can try to reproduce.
> >
> > Matt
> >
> > Thanks,
> >
> > Matt
> >
> > Thanks in advance.
> >
> >
> > [0]PETSC ERROR: --------------------- Error Message ------------------------------------
> > [0]PETSC ERROR: Invalid argument!
> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3!
> > [0]PETSC ERROR: ------------------------------------------------------------------------
> > [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 7, Sat May 11 22:15:24 CDT 2013
> > [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > [0]PETSC ERROR: See docs/index.html for manual pages.
> > [0]PETSC ERROR: ------------------------------------------------------------------------
> > [0]PETSC ERROR: /home/cfd/sgao/mycodes/fensap_new_edge_coefficient/fensapng-mf-newmuscl-overledg_org/bin/fensapMPI_LINUX64_DEBUG on a linux named anakin by sgao Wed Mar 11 15:07:53 2015
> > [0]PETSC ERROR: Libraries linked from /tmp/PETSC33/petsc-3.3-p7/linux/lib
> > [0]PETSC ERROR: Configure run at Wed Jan 15 12:04:54 2014
> > [0]PETSC ERROR: Configure options --with-mpi-dir=/usr/local.linux64/lib64/MPI-openmpi-1.4.5/ --with-shared-libraries=0 --COPTFLAGS=-g --FOPTFLAGS=-g --with-debugging=yes
> > [0]PETSC ERROR: ------------------------------------------------------------------------
> > [0]PETSC ERROR: VecMAXPY() line 1186 in src/vec/vec/interface/rvector.c
> > [0]PETSC ERROR: KSPGMRESBuildSoln() line 345 in src/ksp/ksp/impls/gmres/gmres.c
> > [0]PETSC ERROR: KSPGMRESCycle() line 206 in src/ksp/ksp/impls/gmres/gmres.c
> > [0]PETSC ERROR: KSPSolve_GMRES() line 231 in src/ksp/ksp/impls/gmres/gmres.c
> > [0]PETSC ERROR: KSPSolve() line 446 in src/ksp/ksp/interface/itfunc.c
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> >
>
>
More information about the petsc-users
mailing list