[petsc-users] Valgrind Errors
Barry Smith
bsmith at mcs.anl.gov
Fri Sep 12 15:11:48 CDT 2014
James (and Hong),
Do you ever see this problem in parallel runs?
You are not doing anything wrong.
Here is what is happening.
MatGetBrowsOfAoCols_MPIAIJ() which is used by MatMatMult_MPIAIJ_MPIAIJ() assumes that the VecScatters for the matrix-vector products are
gen_to = (VecScatter_MPI_General*)ctx->todata;
gen_from = (VecScatter_MPI_General*)ctx->from data;
but when run on one process the scatters are not of that form; hence the code accesses values in what it thinks is one struct but is actually a different one. Hence the valgrind errors.
But since the matrix only lives on one process there is actually nothing to move between processors hence no error happens in the computation. You can avoid the issue completely by using MATAIJ matrix for the type instead of MATMPIAIJ and then on one process it automatically uses MATSEQAIJ.
I don’t think the bug has anything in particular to do with the MatTranspose.
Hong,
Can you please fix this code? Essentially you can by pass parts of the code when the Mat is on only one process. (Maybe this also happens for MPIBAIJ matrices?) Send a response letting me know you saw this.
Thanks
Barry
On Sep 12, 2014, at 1:39 PM, James Balasalle <James.Balasalle at digitalglobe.com> wrote:
> Hello,
>
> I’m getting some valgrind errors in my PETSc code that looks like it’s related to MatTranspose(). I just figured I was doing something wrong. But I ran one of the examples (snes/ex70) which uses MatTranpose() through valgrind and see the same errors there as well. It seems that when the result of a MatTranspose is used as input to a MatMatMult() call valgrind is unhappy.
>
> Here’s the valgrind output. I’m not concerned with the first MPI uninitialized error. But that invalid read of size 8 in mpiaij.c looks a bit concerning.
>
> I’m probably doing something wrong. Any ideas?
>
> Thanks,
>
> James
>
>
> bash-4.1$ valgrind ./ex70
> ==21117== Memcheck, a memory error detector
> ==21117== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
> ==21117== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
> ==21117== Command: ./ex70
> ==21117==
> ==21117== Syscall param writev(vector[...]) points to uninitialised byte(s)
> ==21117== at 0x39898E0B2B: writev (in /lib64/libc-2.12.so)
> ==21117== by 0x8996F16: mca_oob_tcp_msg_send_handler (oob_tcp_msg.c:249)
> ==21117== by 0x8997F3C: mca_oob_tcp_peer_send (oob_tcp_peer.c:204)
> ==21117== by 0x899A2DC: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
> ==21117== by 0x8388955: orte_rml_oob_send (rml_oob_send.c:136)
> ==21117== by 0x8388B9F: orte_rml_oob_send_buffer (rml_oob_send.c:270)
> ==21117== by 0x8DA4F97: modex (grpcomm_bad_module.c:573)
> ==21117== by 0x6E31E6A: ompi_mpi_init (ompi_mpi_init.c:541)
> ==21117== by 0x6E4860F: PMPI_Init_thread (pinit_thread.c:84)
> ==21117== by 0x4DAA379: PetscInitialize (pinit.c:781)
> ==21117== by 0x409E29: main (ex70.c:668)
> ==21117== Address 0x9c7e261 is 161 bytes inside a block of size 256 alloc'd
> ==21117== at 0x4A06C9C: realloc (vg_replace_malloc.c:687)
> ==21117== by 0x6EB7FF2: opal_dss_buffer_extend (dss_internal_functions.c:63)
> ==21117== by 0x6EB81B4: opal_dss_copy_payload (dss_load_unload.c:164)
> ==21117== by 0x6E90C36: orte_grpcomm_base_pack_modex_entries (grpcomm_base_modex.c:861)
> ==21117== by 0x8DA4F4C: modex (grpcomm_bad_module.c:563)
> ==21117== by 0x6E31E6A: ompi_mpi_init (ompi_mpi_init.c:541)
> ==21117== by 0x6E4860F: PMPI_Init_thread (pinit_thread.c:84)
> ==21117== by 0x4DAA379: PetscInitialize (pinit.c:781)
> ==21117== by 0x409E29: main (ex70.c:668)
> ==21117==
> ==21117== Invalid read of size 8
> ==21117== at 0x553E504: MatGetBrowsOfAoCols_MPIAIJ (mpiaij.c:5220)
> ==21117== by 0x557D53D: MatMatMultSymbolic_MPIAIJ_MPIAIJ (mpimatmatmult.c:677)
> ==21117== by 0x55758FC: MatMatMult_MPIAIJ_MPIAIJ (mpimatmatmult.c:33)
> ==21117== by 0x5601808: MatMatMult (matrix.c:8714)
> ==21117== by 0x4067D0: StokesSetupApproxSchur (ex70.c:379)
> ==21117== by 0x406DB5: StokesSetupMatrix (ex70.c:399)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117== Address 0x9cdd420 is 0 bytes after a block of size 48 alloc'd
> ==21117== at 0x4A055DC: memalign (vg_replace_malloc.c:755)
> ==21117== by 0x4D42117: PetscMallocAlign (mal.c:27)
> ==21117== by 0x5016A8A: VecScatterCreate (vscat.c:1168)
> ==21117== by 0x5547B10: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
> ==21117== by 0x5509F30: MatAssemblyEnd_MPIAIJ (mpiaij.c:702)
> ==21117== by 0x55D978A: MatAssemblyEnd (matrix.c:4901)
> ==21117== by 0x551D7AD: MatTranspose_MPIAIJ (mpiaij.c:2024)
> ==21117== by 0x55D394A: MatTranspose (matrix.c:4382)
> ==21117== by 0x405CE4: StokesSetupMatBlock10 (ex70.c:337)
> ==21117== by 0x406C60: StokesSetupMatrix (ex70.c:396)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117==
> ==21117== Invalid read of size 8
> ==21117== at 0x553E516: MatGetBrowsOfAoCols_MPIAIJ (mpiaij.c:5221)
> ==21117== by 0x557D53D: MatMatMultSymbolic_MPIAIJ_MPIAIJ (mpimatmatmult.c:677)
> ==21117== by 0x55758FC: MatMatMult_MPIAIJ_MPIAIJ (mpimatmatmult.c:33)
> ==21117== by 0x5601808: MatMatMult (matrix.c:8714)
> ==21117== by 0x4067D0: StokesSetupApproxSchur (ex70.c:379)
> ==21117== by 0x406DB5: StokesSetupMatrix (ex70.c:399)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117== Address 0x9cdd3d0 is not stack'd, malloc'd or (recently) free'd
> ==21117==
> ==21117== Invalid read of size 8
> ==21117== at 0x553E64D: MatGetBrowsOfAoCols_MPIAIJ (mpiaij.c:5226)
> ==21117== by 0x557D53D: MatMatMultSymbolic_MPIAIJ_MPIAIJ (mpimatmatmult.c:677)
> ==21117== by 0x55758FC: MatMatMult_MPIAIJ_MPIAIJ (mpimatmatmult.c:33)
> ==21117== by 0x5601808: MatMatMult (matrix.c:8714)
> ==21117== by 0x4067D0: StokesSetupApproxSchur (ex70.c:379)
> ==21117== by 0x406DB5: StokesSetupMatrix (ex70.c:399)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117== Address 0x9cdd3b0 is 0 bytes after a block of size 16 alloc'd
> ==21117== at 0x4A055DC: memalign (vg_replace_malloc.c:755)
> ==21117== by 0x4D42117: PetscMallocAlign (mal.c:27)
> ==21117== by 0x5016A58: VecScatterCreate (vscat.c:1168)
> ==21117== by 0x5547B10: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
> ==21117== by 0x5509F30: MatAssemblyEnd_MPIAIJ (mpiaij.c:702)
> ==21117== by 0x55D978A: MatAssemblyEnd (matrix.c:4901)
> ==21117== by 0x551D7AD: MatTranspose_MPIAIJ (mpiaij.c:2024)
> ==21117== by 0x55D394A: MatTranspose (matrix.c:4382)
> ==21117== by 0x405CE4: StokesSetupMatBlock10 (ex70.c:337)
> ==21117== by 0x406C60: StokesSetupMatrix (ex70.c:396)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117==
> ==21117== Invalid read of size 8
> ==21117== at 0x553E66B: MatGetBrowsOfAoCols_MPIAIJ (mpiaij.c:5228)
> ==21117== by 0x557D53D: MatMatMultSymbolic_MPIAIJ_MPIAIJ (mpimatmatmult.c:677)
> ==21117== by 0x55758FC: MatMatMult_MPIAIJ_MPIAIJ (mpimatmatmult.c:33)
> ==21117== by 0x5601808: MatMatMult (matrix.c:8714)
> ==21117== by 0x4067D0: StokesSetupApproxSchur (ex70.c:379)
> ==21117== by 0x406DB5: StokesSetupMatrix (ex70.c:399)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117== Address 0x9cdd3b8 is 8 bytes after a block of size 16 alloc'd
> ==21117== at 0x4A055DC: memalign (vg_replace_malloc.c:755)
> ==21117== by 0x4D42117: PetscMallocAlign (mal.c:27)
> ==21117== by 0x5016A58: VecScatterCreate (vscat.c:1168)
> ==21117== by 0x5547B10: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
> ==21117== by 0x5509F30: MatAssemblyEnd_MPIAIJ (mpiaij.c:702)
> ==21117== by 0x55D978A: MatAssemblyEnd (matrix.c:4901)
> ==21117== by 0x551D7AD: MatTranspose_MPIAIJ (mpiaij.c:2024)
> ==21117== by 0x55D394A: MatTranspose (matrix.c:4382)
> ==21117== by 0x405CE4: StokesSetupMatBlock10 (ex70.c:337)
> ==21117== by 0x406C60: StokesSetupMatrix (ex70.c:396)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117==
> ==21117== Invalid read of size 8
> ==21117== at 0x553E504: MatGetBrowsOfAoCols_MPIAIJ (mpiaij.c:5220)
> ==21117== by 0x557C680: MatMatMultNumeric_MPIAIJ_MPIAIJ_Scalable (mpimatmatmult.c:560)
> ==21117== by 0x5575BBE: MatMatMult_MPIAIJ_MPIAIJ (mpimatmatmult.c:39)
> ==21117== by 0x5601808: MatMatMult (matrix.c:8714)
> ==21117== by 0x4067D0: StokesSetupApproxSchur (ex70.c:379)
> ==21117== by 0x406DB5: StokesSetupMatrix (ex70.c:399)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117== Address 0x9cdd420 is 0 bytes after a block of size 48 alloc'd
> ==21117== at 0x4A055DC: memalign (vg_replace_malloc.c:755)
> ==21117== by 0x4D42117: PetscMallocAlign (mal.c:27)
> ==21117== by 0x5016A8A: VecScatterCreate (vscat.c:1168)
> ==21117== by 0x5547B10: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
> ==21117== by 0x5509F30: MatAssemblyEnd_MPIAIJ (mpiaij.c:702)
> ==21117== by 0x55D978A: MatAssemblyEnd (matrix.c:4901)
> ==21117== by 0x551D7AD: MatTranspose_MPIAIJ (mpiaij.c:2024)
> ==21117== by 0x55D394A: MatTranspose (matrix.c:4382)
> ==21117== by 0x405CE4: StokesSetupMatBlock10 (ex70.c:337)
> ==21117== by 0x406C60: StokesSetupMatrix (ex70.c:396)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117==
> ==21117== Invalid read of size 8
> ==21117== at 0x553E516: MatGetBrowsOfAoCols_MPIAIJ (mpiaij.c:5221)
> ==21117== by 0x557C680: MatMatMultNumeric_MPIAIJ_MPIAIJ_Scalable (mpimatmatmult.c:560)
> ==21117== by 0x5575BBE: MatMatMult_MPIAIJ_MPIAIJ (mpimatmatmult.c:39)
> ==21117== by 0x5601808: MatMatMult (matrix.c:8714)
> ==21117== by 0x4067D0: StokesSetupApproxSchur (ex70.c:379)
> ==21117== by 0x406DB5: StokesSetupMatrix (ex70.c:399)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117== Address 0x9cdd3d0 is not stack'd, malloc'd or (recently) free'd
> ==21117==
> ==21117== Invalid read of size 8
> ==21117== at 0x553E64D: MatGetBrowsOfAoCols_MPIAIJ (mpiaij.c:5226)
> ==21117== by 0x557C680: MatMatMultNumeric_MPIAIJ_MPIAIJ_Scalable (mpimatmatmult.c:560)
> ==21117== by 0x5575BBE: MatMatMult_MPIAIJ_MPIAIJ (mpimatmatmult.c:39)
> ==21117== by 0x5601808: MatMatMult (matrix.c:8714)
> ==21117== by 0x4067D0: StokesSetupApproxSchur (ex70.c:379)
> ==21117== by 0x406DB5: StokesSetupMatrix (ex70.c:399)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117== Address 0x9cdd3b0 is 0 bytes after a block of size 16 alloc'd
> ==21117== at 0x4A055DC: memalign (vg_replace_malloc.c:755)
> ==21117== by 0x4D42117: PetscMallocAlign (mal.c:27)
> ==21117== by 0x5016A58: VecScatterCreate (vscat.c:1168)
> ==21117== by 0x5547B10: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
> ==21117== by 0x5509F30: MatAssemblyEnd_MPIAIJ (mpiaij.c:702)
> ==21117== by 0x55D978A: MatAssemblyEnd (matrix.c:4901)
> ==21117== by 0x551D7AD: MatTranspose_MPIAIJ (mpiaij.c:2024)
> ==21117== by 0x55D394A: MatTranspose (matrix.c:4382)
> ==21117== by 0x405CE4: StokesSetupMatBlock10 (ex70.c:337)
> ==21117== by 0x406C60: StokesSetupMatrix (ex70.c:396)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117==
> ==21117== Invalid read of size 8
> ==21117== at 0x553E66B: MatGetBrowsOfAoCols_MPIAIJ (mpiaij.c:5228)
> ==21117== by 0x557C680: MatMatMultNumeric_MPIAIJ_MPIAIJ_Scalable (mpimatmatmult.c:560)
> ==21117== by 0x5575BBE: MatMatMult_MPIAIJ_MPIAIJ (mpimatmatmult.c:39)
> ==21117== by 0x5601808: MatMatMult (matrix.c:8714)
> ==21117== by 0x4067D0: StokesSetupApproxSchur (ex70.c:379)
> ==21117== by 0x406DB5: StokesSetupMatrix (ex70.c:399)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117== Address 0x9cdd3b8 is 8 bytes after a block of size 16 alloc'd
> ==21117== at 0x4A055DC: memalign (vg_replace_malloc.c:755)
> ==21117== by 0x4D42117: PetscMallocAlign (mal.c:27)
> ==21117== by 0x5016A58: VecScatterCreate (vscat.c:1168)
> ==21117== by 0x5547B10: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
> ==21117== by 0x5509F30: MatAssemblyEnd_MPIAIJ (mpiaij.c:702)
> ==21117== by 0x55D978A: MatAssemblyEnd (matrix.c:4901)
> ==21117== by 0x551D7AD: MatTranspose_MPIAIJ (mpiaij.c:2024)
> ==21117== by 0x55D394A: MatTranspose (matrix.c:4382)
> ==21117== by 0x405CE4: StokesSetupMatBlock10 (ex70.c:337)
> ==21117== by 0x406C60: StokesSetupMatrix (ex70.c:396)
> ==21117== by 0x40A0D3: main (ex70.c:679)
> ==21117==
> residual u = 3.56267e-06
> residual p = 1.14951e-05
> residual [u,p] = 1.20346e-05
> discretization error u = 0.0106477
> discretization error p = 1.85783
> discretization error [u,p] = 1.85786
> ==21117==
> ==21117== HEAP SUMMARY:
> ==21117== in use at exit: 345,301 bytes in 3,773 blocks
> ==21117== total heap usage: 24,730 allocs, 20,957 frees, 16,608,714 bytes allocated
> ==21117==
> ==21117== LEAK SUMMARY:
> ==21117== definitely lost: 42,743 bytes in 40 blocks
> ==21117== indirectly lost: 11,134 bytes in 28 blocks
> ==21117== possibly lost: 0 bytes in 0 blocks
> ==21117== still reachable: 291,424 bytes in 3,705 blocks
> ==21117== suppressed: 0 bytes in 0 blocks
> ==21117== Rerun with --leak-check=full to see details of leaked memory
> ==21117==
> ==21117== For counts of detected and suppressed errors, rerun with: -v
> ==21117== Use --track-origins=yes to see where uninitialised values come from
> ==21117== ERROR SUMMARY: 9 errors from 9 contexts (suppressed: 6 from 6)
>
>
> This electronic communication and any attachments may contain confidential and proprietary
> information of DigitalGlobe, Inc. If you are not the intended recipient, or an agent or employee
> responsible for delivering this communication to the intended recipient, or if you have received
> this communication in error, please do not print, copy, retransmit, disseminate or
> otherwise use the information. Please indicate to the sender that you have received this
> communication in error, and delete the copy you received. DigitalGlobe reserves the
> right to monitor any electronic communication sent or received by its employees, agents
> or representatives.
>
More information about the petsc-users
mailing list