[petsc-users] Memory leak related to MatSetValue

Barry Smith bsmith at petsc.dev
Mon Jun 9 20:56:18 CDT 2025


src/mat/utils/matstash.c

In MatStashBlockTypeSetUp()

    PetscCallMPI(MPI_Type_create_resized(stype, 0, stash->blocktype_size, &stash->blocktype));

In MatStashScatterDestroy_BTS(MatStash *stash)

  if (stash->blocktype != MPI_DATATYPE_NULL) PetscCallMPI(MPI_Type_free(&stash->blocktype));

So either 

1) PETSc logic is preventing the correct MPI_Type_free() call from being made or 

2) a bug has crept into MPICH that prevents the MPI_Type_free from freeing everything it needs to

You can use the debugger (or even print statements inserted in the PETSc source) to determine in your very simple code 
that the MPI_Type_create_resized() is called exactly once and also the matching MPI_Type_free() to determine if the problem is PETSc logic or MPICH logic.

Since we test PETSc in our CI with valgrind it is unlikely a PETSc bug

Barry


> On Jun 9, 2025, at 7:47 PM, neil liu <liufield at gmail.com> wrote:
> 
> Dear Petsc community, 
> Recently, I encountered a memory leak while using Valgrind (3.25.1) with MPI (2 processes, MPICH 4.21.1) to test a PETSc-based code, which is a variant of the pestc's built-in example. 
> 
> #include <petscksp.h>
> static char help[] = "Demonstrate PCFIELDSPLIT after MatZeroRowsColumns() inside PCREDISTRIBUTE";
> int main(int argc, char **argv)
> {
>   PetscMPIInt rank, size;
>   Mat         A;
> 
>   PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>   PetscCallMPI(MPI_Comm_size(PETSC_COMM_WORLD, &size));
>   PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>   PetscCheck(size == 2, PETSC_COMM_WORLD, PETSC_ERR_WRONG_MPI_SIZE, "Must be run with 2 MPI processes");
> 
>   // Set up a small problem with 2 dofs on rank 0 and 4 on rank 1
>   PetscCall(MatCreate(PETSC_COMM_WORLD, &A));
>   PetscCall(MatSetSizes(A, !rank ? 2 : 4, !rank ? 2 : 4, PETSC_DETERMINE, PETSC_DETERMINE));
>   PetscCall(MatSetFromOptions(A));
>   if (rank == 0) {
>     PetscCall(MatSetValue(A, 0, 0, 2.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 0, 1, -1.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 1, 1, 3.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 1, 2, -1.0, ADD_VALUES));
>   } else if (rank == 1) {
>     PetscCall(MatSetValue(A, 1, 2, 40.0, ADD_VALUES));//Additional line added
>     PetscCall(MatSetValue(A, 2, 2, 4.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 2, 3, -1.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 3, 3, 5.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 3, 4, -1.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 4, 4, 6.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 4, 5, -1.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 5, 5, 7.0, ADD_VALUES));
>     PetscCall(MatSetValue(A, 5, 4, -0.5, ADD_VALUES));
>   }
>   PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
>   PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
>   PetscCall(MatView(A, PETSC_VIEWER_STDOUT_WORLD));
> 
>   PetscCall(MatDestroy(&A));
>   PetscCall(PetscFinalize());
>   return 0;
> }
> 
> Rank 0 and 1 own 2 (from 0 to 1) and 4 (from 2 to 5) local rows respectively. 
> I tried to add 
>     PetscCall(MatSetValue(A, 1, 2, 40.0, ADD_VALUES));
> for rank 1. //1 is owned by rank 0 only but is also modified in rank 1 now. 
> After adding this line, a memory leak occurred. Does this imply that we cannot assign values to entries owned by other processors? In my case, I am assembling a global matrix from a DMPlex. With overlap=0, it seems necessary to use MatSetValues for rows owned by other processes. I'm not certain whether these two scenarios are equivalent, but they both appear to trigger the same memory leak.
> Did I miss something? 
> 
> Thanks a lot,
> 
> Xiaodong 
> 
> ==3932339== 96 bytes in 1 blocks are definitely lost in loss record 2 of 3
> ==3932339==    at 0x4C392E1: malloc (vg_replace_malloc.c:446)
> ==3932339==    by 0xA267A97: MPL_malloc (mpl_trmem.h:373)
> ==3932339==    by 0xA267CE4: MPIR_Datatype_set_contents (mpir_datatype.h:420)
> ==3932339==    by 0xA26E26E: MPIR_Type_create_struct_impl (type_create.c:919)
> ==3932339==    by 0xA068F45: internal_Type_create_struct (c_binding.c:36491)
> ==3932339==    by 0xA06911F: PMPI_Type_create_struct (c_binding.c:36551)
> ==3932339==    by 0x4E78222: PMPI_Type_create_struct (libmpiwrap.c:2752)
> ==3932339==    by 0x6F5C7D6: MatStashBlockTypeSetUp (matstash.c:772)
> ==3932339==    by 0x6F61162: MatStashScatterBegin_BTS (matstash.c:838)
> ==3932339==    by 0x6F54511: MatStashScatterBegin_Private (matstash.c:437)
> ==3932339==    by 0x60BA9BD: MatAssemblyBegin_MPI_Hash (mpihashmat.h:59)
> ==3932339==    by 0x6E8D3AE: MatAssemblyBegin (matrix.c:5749)
> ==3932339== 
> ==3932339== 96 bytes in 1 blocks are definitely lost in loss record 3 of 3
> ==3932339==    at 0x4C392E1: malloc (vg_replace_malloc.c:446)
> ==3932339==    by 0xA2385FF: MPL_malloc (mpl_trmem.h:373)
> ==3932339==    by 0xA2395F8: MPII_Dataloop_alloc_and_copy (dataloop.c:400)
> ==3932339==    by 0xA2393DC: MPII_Dataloop_alloc (dataloop.c:319)
> ==3932339==    by 0xA23B239: MPIR_Dataloop_create_contiguous (dataloop_create_contig.c:56)
> ==3932339==    by 0xA23BFF9: MPIR_Dataloop_create_indexed (dataloop_create_indexed.c:89)
> ==3932339==    by 0xA23D5DC: create_basic_all_bytes_struct (dataloop_create_struct.c:252)
> ==3932339==    by 0xA23D178: MPIR_Dataloop_create_struct (dataloop_create_struct.c:146)
> ==3932339==    by 0xA25D4DB: MPIR_Typerep_commit (typerep_dataloop_commit.c:284)
> ==3932339==    by 0xA261549: MPIR_Type_commit_impl (datatype_impl.c:185)
> ==3932339==    by 0xA0624CA: internal_Type_commit (c_binding.c:34506)
> ==3932339==    by 0xA062679: PMPI_Type_commit (c_binding.c:34553)
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250609/2d219822/attachment-0001.html>


More information about the petsc-users mailing list