[petsc-users] Valgrind Issue With Ghosted Vectors

Stefano Zampini stefano.zampini at gmail.com
Thu Mar 21 04:11:47 CDT 2019


Derek,

can you run with --track-origins=yes? There are few possibilities for the
uninitialized warning (candidates for the uninitialized errors are the
arrays starts, indices, and the variables bs or n) in the below code, and
this valgrind option will help.

PetscErrorCode VecScatterMemcpyPlanCreate_Index(PetscInt n,const PetscInt
*starts,const PetscInt *indices,PetscInt bs,VecScatterMemcpyPlan *plan)

{

  PetscErrorCode ierr;

  PetscInt       i,j,k,my_copies,n_copies=0,step;

  PetscBool      strided,has_strided;


  PetscFunctionBegin;

  ierr    = PetscMemzero(plan,sizeof(VecScatterMemcpyPlan));CHKERRQ(ierr);

  plan->n = n;

  ierr    = PetscMalloc2(n,&plan->optimized,n+1
,&plan->copy_offsets);CHKERRQ(ierr);


  /* check if each remote part of the scatter is made of copies, and count
total_copies */

  for (i=0; i<n; i++) { /* for each target processor procs[i] */

    my_copies = 1; /* count num. of copies for procs[i] */

    for (j=starts[i]; j<starts[i+1]-1; j++) { /* go through indices from
the first to the second to last */

      if (indices[j]+bs != indices[j+1]) my_copies++;

    }

    if (bs*(starts[i+1]-starts[i])*sizeof(PetscScalar)/my_copies >= 256) { /*
worth using memcpy? */

      plan->optimized[i] = PETSC_TRUE;

      n_copies += my_copies;

    } else {

      plan->optimized[i] = PETSC_FALSE;

    }

  }


  /* do malloc with the recently known n_copies */

-> THIS IS THE VAGRIND WARNING  ierr =
PetscMalloc2(n_copies,&plan->copy_starts,n_copies,&plan->copy_lengths);CHKERRQ(ierr);

Il giorno gio 21 mar 2019 alle ore 09:00 Stefano Zampini <
stefano.zampini at gmail.com> ha scritto:

> Derek
>
> I have fixed the optimized plan few weeks ago
>
>
> https://bitbucket.org/petsc/petsc/commits/c3caad8634d376283f7053f3b388606b45b3122c
>
> Maybe this will fix your problem too?
>
> Stefano
>
>
> Il Gio 21 Mar 2019, 04:21 Zhang, Junchao via petsc-users <
> petsc-users at mcs.anl.gov> ha scritto:
>
>> Hi, Derek,
>>   Try to apply this tiny (but dirty) patch on your version of PETSc to
>> disable the VecScatterMemcpyPlan optimization to see if it helps.
>>   Thanks.
>> --Junchao Zhang
>>
>> On Wed, Mar 20, 2019 at 6:33 PM Junchao Zhang <jczhang at mcs.anl.gov>
>> wrote:
>>
>>> Did you see the warning with small scale runs?  Is it possible to
>>> provide a test code?
>>> You mentioned "changing PETSc now would be pretty painful". Is it
>>> because it will affect your performance (but not your code)?  If yes, could
>>> you try PETSc master and run you code with or without -vecscatter_type sf.
>>> I want to isolate the problem and see if it is due to possible bugs in
>>> VecScatter.
>>> If the above suggestion is not feasible, I will disable
>>> VecScatterMemcpy. It is an optimization I added. Sorry I did not have an
>>> option to turn off it because I thought it was always useful:)  I will
>>> provide you a patch later to disable it. With that you can run again to
>>> isolate possible bugs in VecScatterMemcpy.
>>> Thanks.
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Mar 20, 2019 at 5:40 PM Derek Gaston via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>>
>>>> Trying to track down some memory corruption I'm seeing on larger scale
>>>> runs (3.5B+ unknowns).  Was able to run Valgrind on it... and I'm seeing
>>>> quite a lot of uninitialized value errors coming from ghost updating.  Here
>>>> are some of the traces:
>>>>
>>>> ==87695== Conditional jump or move depends on uninitialised value(s)
>>>> ==87695==    at 0x73236D3: PetscMallocAlign (mal.c:28)
>>>> ==87695==    by 0x7323C70: PetscMallocA (mal.c:390)
>>>> ==87695==    by 0x739048E: VecScatterMemcpyPlanCreate_Index
>>>> (vscat.c:284)
>>>> ==87695==    by 0x73A5D97: VecScatterMemcpyPlanCreate_PtoP
>>>> (vpscat_mpi1.c:312)
>>>> ==64730==    by 0x7393E8A: VecScatterSetUp_vectype_private (vscat.c:857)
>>>> ==64730==    by 0x7395E5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)
>>>> ==64730==    by 0x73DDD39: VecScatterSetUp (vscatfce.c:212)
>>>> ==64730==    by 0x73DCD73: VecScatterCreateWithData (vscreate.c:333)
>>>> ==64730==    by 0x7444232: VecCreateGhostWithArray (pbvec.c:685)
>>>> ==64730==    by 0x744490D: VecCreateGhost (pbvec.c:741)
>>>>
>>>> ==133582== Conditional jump or move depends on uninitialised value(s)
>>>> ==133582==    at 0x4030384: memcpy@@GLIBC_2.14
>>>> (vg_replace_strmem.c:1034)
>>>> ==133582==    by 0x739E4F9: PetscMemcpy (petscsys.h:1649)
>>>> ==133582==    by 0x739E4F9: VecScatterMemcpyPlanExecute_Pack
>>>> (vecscatterimpl.h:150)
>>>> ==133582==    by 0x739E4F9: VecScatterBeginMPI1_1 (vpscat_mpi1.h:69)
>>>> ==133582==    by 0x73DD964: VecScatterBegin (vscatfce.c:110)
>>>> ==133582==    by 0x744E195: VecGhostUpdateBegin (commonmpvec.c:225)
>>>>
>>>> This is from a Git checkout of PETSc... the hash I branched from is:
>>>> 0e667e8fea4aa from December 23rd (updating would be really hard at this
>>>> point as I've completed 90% of my dissertation with this version... and
>>>> changing PETSc now would be pretty painful!).
>>>>
>>>> Any ideas?  Is it possible it's in my code?  Is it possible that there
>>>> are later PETSc commits that already fix this?
>>>>
>>>> Thanks for any help,
>>>> Derek
>>>>
>>>>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190321/6087b5d1/attachment.html>


More information about the petsc-users mailing list