<div dir="ltr"><div>Stefano: the stupidity was all mine and had nothing to do with PETSc. Valgrind helped me track down a memory corruption issue that ultimately was just about a bad input file to my code (and obviously not enough error checking for input files!).</div><div><br></div><div>The issue is fixed.</div><div><br></div><div>Now - I'd like to understand a bit more about what happened here on the PETSc side. Was this valgrind issue something that was known and you already had a fix for it - but it wasn't on maint yet? Or was it just that I was using too old of a version of PETSc so I didn't have the fix?</div><div><br></div><div>Derek<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 22, 2019 at 4:29 AM Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com">stefano.zampini@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><br><div><br><blockquote type="cite"><div>On Mar 21, 2019, at 7:59 PM, Derek Gaston <<a href="mailto:friedmud@gmail.com" target="_blank">friedmud@gmail.com</a>> wrote:</div><br class="gmail-m_-3868866012181494595Apple-interchange-newline"><div><div dir="ltr"><div dir="ltr"><div>It sounds like you already tracked this down... but for completeness here is what track-origins gives:</div><div><br></div><div>==262923== Conditional jump or move depends on uninitialised value(s)<br>==262923== at 0x73C6548: VecScatterMemcpyPlanCreate_Index (vscat.c:294)<br>==262923== by 0x73DBD97: VecScatterMemcpyPlanCreate_PtoP (vpscat_mpi1.c:312)<br>==262923== by 0x73DE6AE: VecScatterCreateCommon_PtoS_MPI1 (vpscat_mpi1.c:2328)<br>==262923== by 0x73DFFEA: VecScatterCreateLocal_PtoS_MPI1 (vpscat_mpi1.c:2202)<br>==262923== by 0x73C7A51: VecScatterCreate_PtoS (vscat.c:608)<br>==262923== by 0x73C9E8A: VecScatterSetUp_vectype_private (vscat.c:857)<br>==262923== by 0x73CBE5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)<br>==262923== by 0x7413D39: VecScatterSetUp (vscatfce.c:212)<br>==262923== by 0x7412D73: VecScatterCreateWithData (vscreate.c:333)<br>==262923== by 0x747A232: VecCreateGhostWithArray (pbvec.c:685)<br>==262923== by 0x747A90D: VecCreateGhost (pbvec.c:741)<br>==262923== by 0x5C7FFD6: libMesh::PetscVector<double>::init(unsigned long, unsigned long, std::vector<unsigned long, std::allocator<unsigned long> > const&, bool, libMesh::ParallelType) (petsc_vector.h:752)<br>==262923== Uninitialised value was created by a heap allocation<br>==262923== at 0x402DDC6: memalign (vg_replace_malloc.c:899)<br>==262923== by 0x7359702: PetscMallocAlign (mal.c:41)<br>==262923== by 0x7359C70: PetscMallocA (mal.c:390)<br>==262923== by 0x73DECF0: VecScatterCreateLocal_PtoS_MPI1 (vpscat_mpi1.c:2061)<br>==262923== by 0x73C7A51: VecScatterCreate_PtoS (vscat.c:608)<br>==262923== by 0x73C9E8A: VecScatterSetUp_vectype_private (vscat.c:857)<br>==262923== by 0x73CBE5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)<br>==262923== by 0x7413D39: VecScatterSetUp (vscatfce.c:212)<br>==262923== by 0x7412D73: VecScatterCreateWithData (vscreate.c:333)<br>==262923== by 0x747A232: VecCreateGhostWithArray (pbvec.c:685)<br>==262923== by 0x747A90D: VecCreateGhost (pbvec.c:741)<br>==262923== by 0x5C7FFD6: libMesh::PetscVector<double>::init(unsigned long, unsigned long, std::vector<unsigned long, std::allocator<unsigned long> > const&, bool, libMesh::ParallelType) (petsc_vector.h:752)</div><div><br></div><div><br></div><div>BTW: This turned out not to be my actual problem. My actual problem was just some stupidity on my part... just a simple input parameter issue to my code (should have had better error checking!).</div><div><br></div><div>But: It sounds like my digging may have uncovered something real here... so it wasn't completely useless :-)<br></div></div></div></div></blockquote><div><br></div><div>Derek,</div><div><br></div><div>I don’t understand. Is your problem fixed or not? Would be nice to understand what was the “some stupidity on your part”, and if it was still leading to valid PETSc code or just to a broken setup.</div><div>In the first case, then we should investigate the valgrind error you reported.</div><div>In the second case, this is not a PETSc issue.</div><br><blockquote type="cite"><div><div dir="ltr"><div dir="ltr"><div><br></div><div>Thanks for your help everyone!</div><div><br></div><div>Derek</div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 21, 2019 at 10:38 AM Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com" target="_blank">stefano.zampini@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno mer 20 mar 2019 alle ore 23:40 Derek Gaston via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Trying to track down some memory corruption I'm seeing on larger scale runs (3.5B+ unknowns).</div></div></div></div></div></div></blockquote><div><br></div><div>Uhm.... are you using 32bit indices? is it possible there's integer overflow somewhere?</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Was able to run Valgrind on it... and I'm seeing quite a lot of uninitialized value errors coming from ghost updating. Here are some of the traces:</div><div><br></div><div>==87695== Conditional jump or move depends on uninitialised value(s)<br>==87695== at 0x73236D3: PetscMallocAlign (mal.c:28)<br>==87695== by 0x7323C70: PetscMallocA (mal.c:390)<br>==87695== by 0x739048E: VecScatterMemcpyPlanCreate_Index (vscat.c:284)<br>==87695== by 0x73A5D97: VecScatterMemcpyPlanCreate_PtoP (vpscat_mpi1.c:312)<br>==64730== by 0x7393E8A: VecScatterSetUp_vectype_private (vscat.c:857)<br>==64730== by 0x7395E5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)<br>==64730== by 0x73DDD39: VecScatterSetUp (vscatfce.c:212)<br>==64730== by 0x73DCD73: VecScatterCreateWithData (vscreate.c:333)<br>==64730== by 0x7444232: VecCreateGhostWithArray (pbvec.c:685)<br>==64730== by 0x744490D: VecCreateGhost (pbvec.c:741)<br></div><div><br></div><div>==133582== Conditional jump or move depends on uninitialised value(s)<br>==133582== at 0x4030384: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1034)<br>==133582== by 0x739E4F9: PetscMemcpy (petscsys.h:1649)<br>==133582== by 0x739E4F9: VecScatterMemcpyPlanExecute_Pack (vecscatterimpl.h:150)<br>==133582== by 0x739E4F9: VecScatterBeginMPI1_1 (vpscat_mpi1.h:69)<br>==133582== by 0x73DD964: VecScatterBegin (vscatfce.c:110)<br>==133582== by 0x744E195: VecGhostUpdateBegin (commonmpvec.c:225)<br></div><div><br></div><div>This is from a Git checkout of PETSc... the hash I branched from is: 0e667e8fea4aa from December 23rd (updating would be really hard at this point as I've completed 90% of my dissertation with this version... and changing PETSc now would be pretty painful!).</div><div><br></div><div>Any ideas? Is it possible it's in my code? Is it possible that there are later PETSc commits that already fix this?</div><div><br></div><div>Thanks for any help,</div><div>Derek<br></div><div><br></div></div></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-3868866012181494595gmail-m_-5560039915233761407gmail_signature">Stefano</div></div>
</blockquote></div>
</div></blockquote></div><br></div></blockquote></div>