bug in tfs preconditioner

Barry Smith bsmith at mcs.anl.gov
Wed Oct 7 11:33:57 CDT 2009


On Oct 7, 2009, at 9:34 AM, Stephan Kramer wrote:

> think I found a nasty bug in the tfs preconditioner/solver. The  
> first symptons were MPI complaints about corrupted messages in  
> MPI_WAIT on line 1552 of ksp/pc/impls/tfs/gs.c (the file is the same  
> in petsc-dev and 3.0.0). Valgrind suggested a buffer overrun (in  
> reading) in MPI_Isend of line 1533:
>
>
>  ierr = MPI_Isend(dptr3, *msg_size++, MPIU_SCALAR, *list++,  
> MSGTAG1+my_id, gs->gs_comm, msg_ids_out++);CHKERRQ(ierr);
>
> Stepping through with a debugger however it looked like everything  
> going into the MPI_Isends and MPI_Irecvs was perfectly fine. Until I  
> realised that they were both replaced by a macro from petsclog.h:
>
> #define MPI_Isend(buf,count,datatype,dest,tag,comm,request) \
> ((isend_ct++,0) || TypeSize(&isend_len,count,datatype) ||  
> MPI_Isend(buf,count,datatype,dest,tag,comm,request))
>
> Because count is used twice in that expression, the argument  
> *msg_size++ is evaluated twice, and only gives the right integer  
> value in calling TypeSize, and will be wrong in the actual MPI_Isend  
> call.   The same thing is going on on line 1327 of gs.c btw. If  
> someone has time to look into this, it would be much appreciated.  
> Has the tfs solver been used/applied much, as far as people know?

   Thanks for finding and reporting this bug.

   I have pushed a fix into petsc-3.0.0 and petsc-dev (it will be in  
the next 3.0.0 patch), please let us know if it does not resolve the  
problem.

    There is seemingly very little use of tfs.

   Barry




More information about the petsc-dev mailing list