From thibault.bridelbertomeu at gmail.com Tue Sep 1 01:24:23 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Tue, 1 Sep 2020 08:24:23 +0200 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: Hi everyone, hi Matt, Le lun. 31 ao?t 2020 ? 22:03, Matthew Knepley a ?crit : > On Mon, Aug 31, 2020 at 4:00 PM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> >> >> Le lun. 31 ao?t 2020 ? 20:35, Matthew Knepley a >> ?crit : >> >>> On Mon, Aug 31, 2020 at 9:45 AM Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>>> Hi Matt, >>>> >>>> OK so I tried to replicate the problem starting from one of the tests >>>> in PETSc repo. >>>> I found >>>> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that >>>> actually uses DMAdaptLabel. >>>> Just add >>>> >>>> { >>>> >>>> DM gdm; >>>> >>>> >>>> >>>> DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); >>>> >>>> DMDestroy (&dm); >>>> >>>> dm = gdm; >>>> >>>> } >>>> >>>> after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2. >>>> >>>> It should tell you that Triangle 18 has an invalid vertex index. >>>> >>>> That's the minimal example that I found that replicates the problem. >>>> >>>> Ah, okay. p4est knows to discard the ghost cells. I can add that to >>> Triangle. >>> >> >> I thought it was something like that, seeing what addition of code >> triggers the problem. >> Thanks for adding the treatment to Triangle ! >> >>> Regarding the serial character of the technique, I tried with a distributed mesh and it works. >>>> >>>> Hmm, it can't work. Maybe it appears to work. Triangle knows nothing >>> about parallelism. So this must be feeding the local mesh to triangle and >>> replacing it by >>> a refined mesh, but the parallel boundaries will not be correct, and >>> might not even match up. >>> >> >> Ok, yea, it appears to work. When asked to refine from scratch, not from >> AdaptLabel but with a -dm_refine order, the mesh is funky as if it was >> entirely re-made and the previous mesh thrown away. >> Can you think of a way where each processor would be able to call on >> Triangle on it?s own, with its own piece of mesh and maybe the surrounding >> ghost cells ? I imagine it could work for parallel refining of triangular >> meshes, couldn?t it ? >> > > It turns out that his is a very hairy problem. That is why almost no > parallel refinement packages exist. To my knowledge, this is only one: > Pragmatic. We support that package, but > it is in development, and we really need to update our interface. I am > working on it, but too much stuff gets in the way. > Oh really, only the one ? Okay okay, I guess I was too optimistic ! I'll look into Pragmatic even though it is in dev, maybe it'll be enough for what I wanna do for now. By the way, talking about things that get in the way, should I open an issue on the gitlab regarding Triangle not ignoring the ghost cells, would that be easier for you guys ? Thanks & have a great day, Thibault > Thanks, > > Matt > > >> Thanks for your replies, >> Have a great afternoon/evening ! >> >> Thibault >> >> >>> Thanks, >>> >>> Matt >>> >>>> So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? >>>> >>>> Thanks, >>>> >>>> Thibault >>>> >>>> Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a >>>> ?crit : >>>> >>>>> On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < >>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I have recently been playing around with the AMR capabilities >>>>>> embedded in PETSc for quad meshes using p4est. Based on the TS tutorial >>>>>> ex11, I was able to incorporate the AMR into a pre-existing code with >>>>>> different metrics for the adaptation process. >>>>>> Now I would like to do something similar using tri meshes. I read >>>>>> that compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access >>>>>> to refinement and coarsening capabilities on triangular meshes.When I try >>>>>> to execute the code with a triangular mesh (that i manipulate as a DMPLEX), >>>>>> it yields "Triangle 1700 has an invalid vertex index" when trying to adapt >>>>>> the mesh (the initial mesh indeed has 1700 cells). From what i could tell, >>>>>> it comes from the reconstruct method called by the triangulate method of >>>>>> triangle.c, the latter being called by either >>>>>> *DMPlexGenerate_Triangle *or *DMPlexRefine_Triangle *in PETSc, I >>>>>> cannot be sure. >>>>>> >>>>>> In substance, the code is the same as in ex11.c and the crash occurs >>>>>> in the first adaptation pass, i.e. an equivalent in ex11 is that it crashes >>>>>> after the SetInitialCondition in the first if (useAMR) located line 1835 >>>>>> when it calls adaptToleranceFVM (which I copied basically so the code is >>>>>> the same). >>>>>> >>>>>> Is the automatic mesh refinement feature on tri meshes supposed to >>>>>> work or am I trying something that has not been completed yet ? >>>>>> >>>>> >>>>> It is supposed to work, and does for some tests in the library. I >>>>> stopped using it because it is inherently serial and it is isotropic. >>>>> However, it should be fixed. >>>>> Is there something I can run to help me track down the problem? >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thank you very much for your help, as always. >>>>>> >>>>>> Thibault Bridel-Bertomeu >>>>>> ? >>>>>> Eng, MSc, PhD >>>>>> Research Engineer >>>>>> CEA/CESTA >>>>>> 33114 LE BARP >>>>>> Tel.: (+33)557046924 >>>>>> Mob.: (+33)611025322 >>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> -- >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Sep 1 06:03:57 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Sep 2020 07:03:57 -0400 Subject: [petsc-users] =?utf-8?q?error=3A_invalid_types_=E2=80=98PetscSca?= =?utf-8?q?lar*_=7Baka_double*=7D=5BPetscScalar_=7Baka_double=7D=5D?= =?utf-8?q?=E2=80=99_for_array_subscript?= In-Reply-To: <424b7a10861d4a2bbc81ad7b428e485a@hest.ethz.ch> References: <3a9eee1ee0214244b52790cd49737d88@hest.ethz.ch> <424b7a10861d4a2bbc81ad7b428e485a@hest.ethz.ch> Message-ID: On Tue, Sep 1, 2020 at 1:16 AM Smit Thijs wrote: > Hi Matt, > > > > Thanks for your replay. > > > > I am working with a structured grid and in this case the content of the > vector to be mapped does not overlap between processes. My understanding > (although I am a beginner with PETSc) is that I don?t have to use IS in > that case? > > > > ierr = DMCreateGlobalVector(da_elem, &xPhys); > > CHKERRQ(ierr); > > > > Is there a way to solve this problem without using IS? > What I say below is that you cannot index a C array with a real number. You have to do it with an integer. A PETSc IS is a list of integers, which is why I suggested that. Thanks, Matt > Best, Thijs > > > > *From:* Matthew Knepley > *Sent:* 21 August 2020 18:20 > *To:* Smit Thijs > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] error: invalid types ?PetscScalar* {aka > double*}[PetscScalar {aka double}]? for array subscript > > > > On Fri, Aug 21, 2020 at 11:49 AM Smit Thijs > wrote: > > Hi All, > > > > I am having the following error when I try to do a mapping with vectors > and I can?t figure out how to solve this or what is going wrong: > > error: invalid types ?PetscScalar* {aka double*}[PetscScalar {aka > double}]? for array subscript > > xpMMA[i] = xp[indicesMap[i]]; > > > > Herewith two code snippets: > > // total number of elements on core > > PetscInt nel; > > VecGetLocalSize(xPhys, &nel); > > > > // create xPassive vector > > ierr = VecDuplicate(xPhys, &xPassive); > > CHKERRQ(ierr); > > > > // create mapping vector > > ierr = VecDuplicate(xPhys, &indicator); > > CHKERRQ(ierr); > > > > // index set for xPassive and indicator > > PetscScalar *xpPassive, *xpIndicator; > > ierr = VecGetArray(xPassive, &xpPassive); > > CHKERRQ(ierr); > > ierr = VecGetArray(indicator, &xpIndicator); > > CHKERRQ(ierr); > > > > // counters for total and active elements on this processor > > PetscInt tcount = 0; // total number of elements > > PetscInt acount = 0; // number of active elements > > PetscInt scount = 0; // number of solid elements > > PetscInt rcount = 0; // number of rigid element > > > > // loop over all elements and update xPassive from wrapper data > > // count number of active elements, acount > > // set indicator vector > > for (PetscInt el = 0; el < nel; el++) { > > if (data.xPassive_w.size() > 1) { > > xpPassive[el] = data.xPassive_w[el]; > > tcount++; > > if (xpPassive[el] < 0) { > > xpIndicator[acount] = el; > > acount++; > > } > > } else { > > xpPassive[el] = -1.0; // default, if no xPassive_w than all > elements are active = -1.0 > > } > > } > > > > // printing > > //PetscPrintf(PETSC_COMM_WORLD, "tcount: %i\n", tcount); > > //PetscPrintf(PETSC_COMM_WORLD, "acount: %i\n", acount); > > > > // Allreduce, get number of active elements over all processes > > // tmp number of var on proces > > // acount total number of var sumed > > PetscInt tmp = acount; > > acount = 0.0; > > MPI_Allreduce(&tmp, &(acount), 1, MPIU_INT, MPI_SUM, PETSC_COMM_WORLD); > > > > //// create xMMA vector > > VecCreateMPI(PETSC_COMM_WORLD, tmp, acount, &xMMA); > > > > // Pointers to the vectors > > PetscScalar *xp, *xpMMA, *indicesMap; > > > > Here you declare indicesMap as PetscScalar[]. You cannot index an array > with this. I see that you > > want to store these indices in a Vec. You should use an IS instead. > > > > Thanks, > > > > Matt > > > > //PetscInt indicesMap; > > ierr = VecGetArray(MMAVector, &xpMMA); > > CHKERRQ(ierr); > > ierr = VecGetArray(elementVector, &xp); > > CHKERRQ(ierr); > > // Index set > > PetscInt nLocalVar; > > VecGetLocalSize(xMMA, &nLocalVar); > > > > // print number of var on pocessor > > PetscPrintf(PETSC_COMM_WORLD, "Local var: %i\n", nLocalVar); > > > > ierr = VecGetArray(indicator, &indicesMap); > > CHKERRQ(ierr); > > > > // Run through the indices > > for (PetscInt i = 0; i < nLocalVar; i++) { > > if (updateDirection > 0) { > > //PetscPrintf(PETSC_COMM_WORLD, "i: %i, xp[%i] = %f\n", i, > indicesMap[i], xp[indicesMap[i]]); > > xpMMA[i] = xp[indicesMap[i]]; > > } else if (updateDirection < 0) { > > xp[indicesMap[i]] = xpMMA[i]; > > //PetscPrintf(PETSC_COMM_WORLD, "i: %i, xp[%i] = %f\n", i, > indicesMap[i], xp[indicesMap[i]]); > > } > > } > > // Restore > > ierr = VecRestoreArray(elementVector, &xp); > > CHKERRQ(ierr); > > ierr = VecRestoreArray(MMAVector, &xpMMA); > > CHKERRQ(ierr); > > ierr = VecRestoreArray(indicator, &indicesMap); > > CHKERRQ(ierr); > > PetscPrintf(PETSC_COMM_WORLD, "FINISHED UpdateVariables \n"); > > > > The error message says that the type with which I try to index is wrong, I > think. But VecGetArray only excepts scalars. Furthermore, the el variable > is an int, but is seams like to turn out to be a scalar. Does anybody see > how to proceed with this? > > > > Best regards, > > > > Thijs Smit > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Tue Sep 1 06:14:54 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Tue, 1 Sep 2020 13:14:54 +0200 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: Hi Matthew, So I turned the pages of the User guide real quick and the only reference I found to Pragmatic was in DMAdaptMetric. It says it is based on a vertex-based metric, but I could not find any more information regarding the characteristics of the expected metric ... Would you by any chance have more documentation or maybe a test/tutorial/example that builds such a metric and calls DMAdaptMetric ? Thanks, Thibault Le mar. 1 sept. 2020 ? 08:24, Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> a ?crit : > Hi everyone, hi Matt, > > Le lun. 31 ao?t 2020 ? 22:03, Matthew Knepley a > ?crit : > >> On Mon, Aug 31, 2020 at 4:00 PM Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >>> >>> >>> Le lun. 31 ao?t 2020 ? 20:35, Matthew Knepley a >>> ?crit : >>> >>>> On Mon, Aug 31, 2020 at 9:45 AM Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> wrote: >>>> >>>>> Hi Matt, >>>>> >>>>> OK so I tried to replicate the problem starting from one of the tests >>>>> in PETSc repo. >>>>> I found >>>>> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that >>>>> actually uses DMAdaptLabel. >>>>> Just add >>>>> >>>>> { >>>>> >>>>> DM gdm; >>>>> >>>>> >>>>> >>>>> DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); >>>>> >>>>> DMDestroy (&dm); >>>>> >>>>> dm = gdm; >>>>> >>>>> } >>>>> >>>>> after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2. >>>>> >>>>> It should tell you that Triangle 18 has an invalid vertex index. >>>>> >>>>> That's the minimal example that I found that replicates the problem. >>>>> >>>>> Ah, okay. p4est knows to discard the ghost cells. I can add that to >>>> Triangle. >>>> >>> >>> I thought it was something like that, seeing what addition of code >>> triggers the problem. >>> Thanks for adding the treatment to Triangle ! >>> >>>> Regarding the serial character of the technique, I tried with a distributed mesh and it works. >>>>> >>>>> Hmm, it can't work. Maybe it appears to work. Triangle knows nothing >>>> about parallelism. So this must be feeding the local mesh to triangle and >>>> replacing it by >>>> a refined mesh, but the parallel boundaries will not be correct, and >>>> might not even match up. >>>> >>> >>> Ok, yea, it appears to work. When asked to refine from scratch, not from >>> AdaptLabel but with a -dm_refine order, the mesh is funky as if it was >>> entirely re-made and the previous mesh thrown away. >>> Can you think of a way where each processor would be able to call on >>> Triangle on it?s own, with its own piece of mesh and maybe the surrounding >>> ghost cells ? I imagine it could work for parallel refining of triangular >>> meshes, couldn?t it ? >>> >> >> It turns out that his is a very hairy problem. That is why almost no >> parallel refinement packages exist. To my knowledge, this is only one: >> Pragmatic. We support that package, but >> it is in development, and we really need to update our interface. I am >> working on it, but too much stuff gets in the way. >> > > Oh really, only the one ? Okay okay, I guess I was too optimistic ! I'll > look into Pragmatic even though it is in dev, maybe it'll be enough for > what I wanna do for now. > By the way, talking about things that get in the way, should I open an > issue on the gitlab regarding Triangle not ignoring the ghost cells, would > that be easier for you guys ? > > Thanks & have a great day, > > Thibault > > > >> Thanks, >> >> Matt >> >> >>> Thanks for your replies, >>> Have a great afternoon/evening ! >>> >>> Thibault >>> >>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>> So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? >>>>> >>>>> Thanks, >>>>> >>>>> Thibault >>>>> >>>>> Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a >>>>> ?crit : >>>>> >>>>>> On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < >>>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> I have recently been playing around with the AMR capabilities >>>>>>> embedded in PETSc for quad meshes using p4est. Based on the TS tutorial >>>>>>> ex11, I was able to incorporate the AMR into a pre-existing code with >>>>>>> different metrics for the adaptation process. >>>>>>> Now I would like to do something similar using tri meshes. I read >>>>>>> that compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access >>>>>>> to refinement and coarsening capabilities on triangular meshes.When I try >>>>>>> to execute the code with a triangular mesh (that i manipulate as a DMPLEX), >>>>>>> it yields "Triangle 1700 has an invalid vertex index" when trying to adapt >>>>>>> the mesh (the initial mesh indeed has 1700 cells). From what i could tell, >>>>>>> it comes from the reconstruct method called by the triangulate method of >>>>>>> triangle.c, the latter being called by either >>>>>>> *DMPlexGenerate_Triangle *or *DMPlexRefine_Triangle *in PETSc, I >>>>>>> cannot be sure. >>>>>>> >>>>>>> In substance, the code is the same as in ex11.c and the crash occurs >>>>>>> in the first adaptation pass, i.e. an equivalent in ex11 is that it crashes >>>>>>> after the SetInitialCondition in the first if (useAMR) located line 1835 >>>>>>> when it calls adaptToleranceFVM (which I copied basically so the code is >>>>>>> the same). >>>>>>> >>>>>>> Is the automatic mesh refinement feature on tri meshes supposed to >>>>>>> work or am I trying something that has not been completed yet ? >>>>>>> >>>>>> >>>>>> It is supposed to work, and does for some tests in the library. I >>>>>> stopped using it because it is inherently serial and it is isotropic. >>>>>> However, it should be fixed. >>>>>> Is there something I can run to help me track down the problem? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thank you very much for your help, as always. >>>>>>> >>>>>>> Thibault Bridel-Bertomeu >>>>>>> ? >>>>>>> Eng, MSc, PhD >>>>>>> Research Engineer >>>>>>> CEA/CESTA >>>>>>> 33114 LE BARP >>>>>>> Tel.: (+33)557046924 >>>>>>> Mob.: (+33)611025322 >>>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> -- >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karl.linkui at gmail.com Tue Sep 1 11:18:32 2020 From: karl.linkui at gmail.com (Karl Lin) Date: Tue, 1 Sep 2020 11:18:32 -0500 Subject: [petsc-users] is there a function to append matrix In-Reply-To: <877dtektwr.fsf@jedbrown.org> References: <87lfhukwd1.fsf@jedbrown.org> <87a6yakux4.fsf@jedbrown.org> <877dtektwr.fsf@jedbrown.org> Message-ID: Thanks Jed. On Mon, Aug 31, 2020 at 11:22 PM Jed Brown wrote: > Karl Lin writes: > > > Thanks for the feedback. What about if I build A to have as many rows as > A > > and B and then later on use MatGetRow and MatSetValues to add B matrix > > entries to A? Can MatGetRow and MatSetValues be used after MatAssembly is > > called? B is much much smaller than A so the number of rows can be added > to > > just the portion of A on one process. Will this work? Thanks. Regards. > > That would work fine, you'll just need to MatAssembly after your new > MatSetValues. > > Note that you'll likely want to think about the distribution of B relative > to A; you may not want B to come "at the end" because it'll all be on the > last rank, versus dispersed over the ranks. This is especially true if > those rows are heavier. > > > On Mon, Aug 31, 2020 at 11:00 PM Jed Brown wrote: > > > >> Karl Lin writes: > >> > >> > I guess another way to look at this is if I already build matrix A and > >> > MatAssembly has been called. Can I populate more rows to matrix A > later > >> on? > >> > With the number of columns and column ownership pattern not changed of > >> > course. Thank you. > >> > >> No. > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Tue Sep 1 12:45:09 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Tue, 1 Sep 2020 17:45:09 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> , <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> Message-ID: Hi Barry, I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: Mat mat; DMCreateMatrix(da, &mat); MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 I get an error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c PetscSolver::solve: TSSolve failed. Cheers, Sophie ________________________________ De : Barry Smith Envoy? : lundi 31 ao?t 2020 14:50 ? : Blondel, Sophie Cc : petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. The factor of 4 is lot, the 1.5 not so bad. You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. If you have any questions please let me know. Barry On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: Hi Barry, I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: * 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) * 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 18:31 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Sep 1 13:12:07 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 1 Sep 2020 13:12:07 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> Message-ID: Sophie, Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. Barry > On Sep 1, 2020, at 12:45 PM, Blondel, Sophie wrote: > > Hi Barry, > > I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: > Mat mat; > DMCreateMatrix(da, &mat); > MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); > > When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 > > I get an error: > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 > [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 > [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries > [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c > [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c > [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c > [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c > PetscSolver::solve: TSSolve failed. > > Cheers, > > Sophie > De : Barry Smith > > Envoy? : lundi 31 ao?t 2020 14:50 > ? : Blondel, Sophie > > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Objet : Re: [petsc-users] Matrix Free Method questions > > > Sophie, > > Thanks. > > The factor of 4 is lot, the 1.5 not so bad. > > You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. > > It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). > > Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. > > Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? > > Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); > > Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. > > A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. > > Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. > > Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. > > I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. > > > If you have any questions please let me know. > > Barry > > > > >> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: >> >> Hi Barry, >> >> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: >> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) >> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) >> Cheers, >> >> Sophie >> De : Barry Smith > >> Envoy? : vendredi 28 ao?t 2020 18:31 >> ? : Blondel, Sophie > >> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Objet : Re: [petsc-users] Matrix Free Method questions >> >> >> >>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >>> >>> Thank you Jed and Barry, >>> >>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >>> >>> To answer questions about the current per-conditioners: >>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? >> >> Yes, the number of MatMult is a good enough surrogate. >> >> So using matrix-free (which means no preconditioning) has >> >> 35846/160 >> >> ans = >> >> 224.0375 >> >> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. >> >> Barry >> >> >> >>> >>> Cheers, >>> >>> Sophie >>> De : Barry Smith > >>> Envoy? : vendredi 28 ao?t 2020 12:12 >>> ? : Blondel, Sophie > >>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>> Objet : Re: [petsc-users] Matrix Free Method questions >>> >>> [External Email] >>> >>> Sophie, >>> >>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >>> >>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >>> >>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >>> >>>> -pc_fieldsplit_detect_coupling >>> >>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >>> >>>> -fieldsplit_0_pc_type sor >>> >>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >>> >>>> -fieldsplit_1_pc_type redundant >>> >>> >>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >>> >>> ---- >>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >>> >>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >>> >>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >>> >>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >>> >>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >>> >>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >>> >>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >>> >>> Barry >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>>> >>>> Hi everyone, >>>> >>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>>> >>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>>> >>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>>> >>>> Best, >>>> >>>> Sophie >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Sep 1 14:13:10 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Sep 2020 15:13:10 -0400 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: On Tue, Sep 1, 2020 at 2:24 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Hi everyone, hi Matt, > > Le lun. 31 ao?t 2020 ? 22:03, Matthew Knepley a > ?crit : > >> On Mon, Aug 31, 2020 at 4:00 PM Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >>> >>> >>> Le lun. 31 ao?t 2020 ? 20:35, Matthew Knepley a >>> ?crit : >>> >>>> On Mon, Aug 31, 2020 at 9:45 AM Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> wrote: >>>> >>>>> Hi Matt, >>>>> >>>>> OK so I tried to replicate the problem starting from one of the tests >>>>> in PETSc repo. >>>>> I found >>>>> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that >>>>> actually uses DMAdaptLabel. >>>>> Just add >>>>> >>>>> { >>>>> >>>>> DM gdm; >>>>> >>>>> >>>>> >>>>> DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); >>>>> >>>>> DMDestroy (&dm); >>>>> >>>>> dm = gdm; >>>>> >>>>> } >>>>> >>>>> after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2. >>>>> >>>>> It should tell you that Triangle 18 has an invalid vertex index. >>>>> >>>>> That's the minimal example that I found that replicates the problem. >>>>> >>>>> Ah, okay. p4est knows to discard the ghost cells. I can add that to >>>> Triangle. >>>> >>> >>> I thought it was something like that, seeing what addition of code >>> triggers the problem. >>> Thanks for adding the treatment to Triangle ! >>> >>>> Regarding the serial character of the technique, I tried with a distributed mesh and it works. >>>>> >>>>> Hmm, it can't work. Maybe it appears to work. Triangle knows nothing >>>> about parallelism. So this must be feeding the local mesh to triangle and >>>> replacing it by >>>> a refined mesh, but the parallel boundaries will not be correct, and >>>> might not even match up. >>>> >>> >>> Ok, yea, it appears to work. When asked to refine from scratch, not from >>> AdaptLabel but with a -dm_refine order, the mesh is funky as if it was >>> entirely re-made and the previous mesh thrown away. >>> Can you think of a way where each processor would be able to call on >>> Triangle on it?s own, with its own piece of mesh and maybe the surrounding >>> ghost cells ? I imagine it could work for parallel refining of triangular >>> meshes, couldn?t it ? >>> >> >> It turns out that his is a very hairy problem. That is why almost no >> parallel refinement packages exist. To my knowledge, this is only one: >> Pragmatic. We support that package, but >> it is in development, and we really need to update our interface. I am >> working on it, but too much stuff gets in the way. >> > > Oh really, only the one ? Okay okay, I guess I was too optimistic ! I'll > look into Pragmatic even though it is in dev, maybe it'll be enough for > what I wanna do for now. > By the way, talking about things that get in the way, should I open an > issue on the gitlab regarding Triangle not ignoring the ghost cells, would > that be easier for you guys ? > I think it s is fixed: https://gitlab.com/petsc/petsc/-/merge_requests/3123 Thanks Matt > Thanks & have a great day, > > Thibault > > > >> Thanks, >> >> Matt >> >> >>> Thanks for your replies, >>> Have a great afternoon/evening ! >>> >>> Thibault >>> >>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>> So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? >>>>> >>>>> Thanks, >>>>> >>>>> Thibault >>>>> >>>>> Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a >>>>> ?crit : >>>>> >>>>>> On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < >>>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> I have recently been playing around with the AMR capabilities >>>>>>> embedded in PETSc for quad meshes using p4est. Based on the TS tutorial >>>>>>> ex11, I was able to incorporate the AMR into a pre-existing code with >>>>>>> different metrics for the adaptation process. >>>>>>> Now I would like to do something similar using tri meshes. I read >>>>>>> that compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access >>>>>>> to refinement and coarsening capabilities on triangular meshes.When I try >>>>>>> to execute the code with a triangular mesh (that i manipulate as a DMPLEX), >>>>>>> it yields "Triangle 1700 has an invalid vertex index" when trying to adapt >>>>>>> the mesh (the initial mesh indeed has 1700 cells). From what i could tell, >>>>>>> it comes from the reconstruct method called by the triangulate method of >>>>>>> triangle.c, the latter being called by either >>>>>>> *DMPlexGenerate_Triangle *or *DMPlexRefine_Triangle *in PETSc, I >>>>>>> cannot be sure. >>>>>>> >>>>>>> In substance, the code is the same as in ex11.c and the crash occurs >>>>>>> in the first adaptation pass, i.e. an equivalent in ex11 is that it crashes >>>>>>> after the SetInitialCondition in the first if (useAMR) located line 1835 >>>>>>> when it calls adaptToleranceFVM (which I copied basically so the code is >>>>>>> the same). >>>>>>> >>>>>>> Is the automatic mesh refinement feature on tri meshes supposed to >>>>>>> work or am I trying something that has not been completed yet ? >>>>>>> >>>>>> >>>>>> It is supposed to work, and does for some tests in the library. I >>>>>> stopped using it because it is inherently serial and it is isotropic. >>>>>> However, it should be fixed. >>>>>> Is there something I can run to help me track down the problem? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thank you very much for your help, as always. >>>>>>> >>>>>>> Thibault Bridel-Bertomeu >>>>>>> ? >>>>>>> Eng, MSc, PhD >>>>>>> Research Engineer >>>>>>> CEA/CESTA >>>>>>> 33114 LE BARP >>>>>>> Tel.: (+33)557046924 >>>>>>> Mob.: (+33)611025322 >>>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> -- >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Tue Sep 1 14:17:58 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Tue, 1 Sep 2020 21:17:58 +0200 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: Le mar. 1 sept. 2020 ? 21:13, Matthew Knepley a ?crit : > On Tue, Sep 1, 2020 at 2:24 AM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> Hi everyone, hi Matt, >> >> Le lun. 31 ao?t 2020 ? 22:03, Matthew Knepley a >> ?crit : >> >>> On Mon, Aug 31, 2020 at 4:00 PM Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>>> >>>> >>>> Le lun. 31 ao?t 2020 ? 20:35, Matthew Knepley a >>>> ?crit : >>>> >>>>> On Mon, Aug 31, 2020 at 9:45 AM Thibault Bridel-Bertomeu < >>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>> >>>>>> Hi Matt, >>>>>> >>>>>> OK so I tried to replicate the problem starting from one of the tests >>>>>> in PETSc repo. >>>>>> I found >>>>>> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that >>>>>> actually uses DMAdaptLabel. >>>>>> Just add >>>>>> >>>>>> { >>>>>> >>>>>> DM gdm; >>>>>> >>>>>> >>>>>> >>>>>> DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); >>>>>> >>>>>> DMDestroy (&dm); >>>>>> >>>>>> dm = gdm; >>>>>> >>>>>> } >>>>>> >>>>>> after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2. >>>>>> >>>>>> It should tell you that Triangle 18 has an invalid vertex index. >>>>>> >>>>>> That's the minimal example that I found that replicates the problem. >>>>>> >>>>>> Ah, okay. p4est knows to discard the ghost cells. I can add that to >>>>> Triangle. >>>>> >>>> >>>> I thought it was something like that, seeing what addition of code >>>> triggers the problem. >>>> Thanks for adding the treatment to Triangle ! >>>> >>>>> Regarding the serial character of the technique, I tried with a distributed mesh and it works. >>>>>> >>>>>> Hmm, it can't work. Maybe it appears to work. Triangle knows nothing >>>>> about parallelism. So this must be feeding the local mesh to triangle and >>>>> replacing it by >>>>> a refined mesh, but the parallel boundaries will not be correct, and >>>>> might not even match up. >>>>> >>>> >>>> Ok, yea, it appears to work. When asked to refine from scratch, not >>>> from AdaptLabel but with a -dm_refine order, the mesh is funky as if it was >>>> entirely re-made and the previous mesh thrown away. >>>> Can you think of a way where each processor would be able to call on >>>> Triangle on it?s own, with its own piece of mesh and maybe the surrounding >>>> ghost cells ? I imagine it could work for parallel refining of triangular >>>> meshes, couldn?t it ? >>>> >>> >>> It turns out that his is a very hairy problem. That is why almost no >>> parallel refinement packages exist. To my knowledge, this is only one: >>> Pragmatic. We support that package, but >>> it is in development, and we really need to update our interface. I am >>> working on it, but too much stuff gets in the way. >>> >> >> Oh really, only the one ? Okay okay, I guess I was too optimistic ! I'll >> look into Pragmatic even though it is in dev, maybe it'll be enough for >> what I wanna do for now. >> By the way, talking about things that get in the way, should I open an >> issue on the gitlab regarding Triangle not ignoring the ghost cells, would >> that be easier for you guys ? >> > > I think it s is fixed: > https://gitlab.com/petsc/petsc/-/merge_requests/3123 > Thanks Matthew, that was very fast ! I?ll try it tomorrow. Thibault > > > Matt > > >> Thanks & have a great day, >> >> Thibault >> >> >> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks for your replies, >>>> Have a great afternoon/evening ! >>>> >>>> Thibault >>>> >>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Thibault >>>>>> >>>>>> Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a >>>>>> ?crit : >>>>>> >>>>>>> On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < >>>>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>>>> >>>>>>>> Dear all, >>>>>>>> >>>>>>>> I have recently been playing around with the AMR capabilities >>>>>>>> embedded in PETSc for quad meshes using p4est. Based on the TS tutorial >>>>>>>> ex11, I was able to incorporate the AMR into a pre-existing code with >>>>>>>> different metrics for the adaptation process. >>>>>>>> Now I would like to do something similar using tri meshes. I read >>>>>>>> that compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access >>>>>>>> to refinement and coarsening capabilities on triangular meshes.When I try >>>>>>>> to execute the code with a triangular mesh (that i manipulate as a DMPLEX), >>>>>>>> it yields "Triangle 1700 has an invalid vertex index" when trying to adapt >>>>>>>> the mesh (the initial mesh indeed has 1700 cells). From what i could tell, >>>>>>>> it comes from the reconstruct method called by the triangulate method of >>>>>>>> triangle.c, the latter being called by either >>>>>>>> *DMPlexGenerate_Triangle *or *DMPlexRefine_Triangle *in PETSc, I >>>>>>>> cannot be sure. >>>>>>>> >>>>>>>> In substance, the code is the same as in ex11.c and the crash >>>>>>>> occurs in the first adaptation pass, i.e. an equivalent in ex11 is that it >>>>>>>> crashes after the SetInitialCondition in the first if (useAMR) located line >>>>>>>> 1835 when it calls adaptToleranceFVM (which I copied basically so the code >>>>>>>> is the same). >>>>>>>> >>>>>>>> Is the automatic mesh refinement feature on tri meshes supposed to >>>>>>>> work or am I trying something that has not been completed yet ? >>>>>>>> >>>>>>> >>>>>>> It is supposed to work, and does for some tests in the library. I >>>>>>> stopped using it because it is inherently serial and it is isotropic. >>>>>>> However, it should be fixed. >>>>>>> Is there something I can run to help me track down the problem? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Thank you very much for your help, as always. >>>>>>>> >>>>>>>> Thibault Bridel-Bertomeu >>>>>>>> ? >>>>>>>> Eng, MSc, PhD >>>>>>>> Research Engineer >>>>>>>> CEA/CESTA >>>>>>>> 33114 LE BARP >>>>>>>> Tel.: (+33)557046924 >>>>>>>> Mob.: (+33)611025322 >>>>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>>> -- >>>> Thibault Bridel-Bertomeu >>>> ? >>>> Eng, MSc, PhD >>>> Research Engineer >>>> CEA/CESTA >>>> 33114 LE BARP >>>> Tel.: (+33)557046924 >>>> Mob.: (+33)611025322 >>>> Mail: thibault.bridelbertomeu at gmail.com >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- Thibault Bridel-Bertomeu ? Eng, MSc, PhD Research Engineer CEA/CESTA 33114 LE BARP Tel.: (+33)557046924 Mob.: (+33)611025322 Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Sep 1 14:24:13 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Sep 2020 15:24:13 -0400 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: On Tue, Sep 1, 2020 at 7:15 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Hi Matthew, > > So I turned the pages of the User guide real quick and the only reference > I found to Pragmatic was in DMAdaptMetric. It says it is based on a > vertex-based metric, but I could not find any more information regarding > the characteristics of the expected metric ... Would you by any chance have > more documentation or maybe a test/tutorial/example that builds such a > metric and calls DMAdaptMetric ? > There are tests: https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex19.c but not much explanation. The input is actually the metric tensor at the vertices. You can see me making representative things in the example. Thanks, Matt > Thanks, > > Thibault > > Le mar. 1 sept. 2020 ? 08:24, Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> a ?crit : > >> Hi everyone, hi Matt, >> >> Le lun. 31 ao?t 2020 ? 22:03, Matthew Knepley a >> ?crit : >> >>> On Mon, Aug 31, 2020 at 4:00 PM Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>>> >>>> >>>> Le lun. 31 ao?t 2020 ? 20:35, Matthew Knepley a >>>> ?crit : >>>> >>>>> On Mon, Aug 31, 2020 at 9:45 AM Thibault Bridel-Bertomeu < >>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>> >>>>>> Hi Matt, >>>>>> >>>>>> OK so I tried to replicate the problem starting from one of the tests >>>>>> in PETSc repo. >>>>>> I found >>>>>> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that >>>>>> actually uses DMAdaptLabel. >>>>>> Just add >>>>>> >>>>>> { >>>>>> >>>>>> DM gdm; >>>>>> >>>>>> >>>>>> >>>>>> DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); >>>>>> >>>>>> DMDestroy (&dm); >>>>>> >>>>>> dm = gdm; >>>>>> >>>>>> } >>>>>> >>>>>> after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2. >>>>>> >>>>>> It should tell you that Triangle 18 has an invalid vertex index. >>>>>> >>>>>> That's the minimal example that I found that replicates the problem. >>>>>> >>>>>> Ah, okay. p4est knows to discard the ghost cells. I can add that to >>>>> Triangle. >>>>> >>>> >>>> I thought it was something like that, seeing what addition of code >>>> triggers the problem. >>>> Thanks for adding the treatment to Triangle ! >>>> >>>>> Regarding the serial character of the technique, I tried with a distributed mesh and it works. >>>>>> >>>>>> Hmm, it can't work. Maybe it appears to work. Triangle knows nothing >>>>> about parallelism. So this must be feeding the local mesh to triangle and >>>>> replacing it by >>>>> a refined mesh, but the parallel boundaries will not be correct, and >>>>> might not even match up. >>>>> >>>> >>>> Ok, yea, it appears to work. When asked to refine from scratch, not >>>> from AdaptLabel but with a -dm_refine order, the mesh is funky as if it was >>>> entirely re-made and the previous mesh thrown away. >>>> Can you think of a way where each processor would be able to call on >>>> Triangle on it?s own, with its own piece of mesh and maybe the surrounding >>>> ghost cells ? I imagine it could work for parallel refining of triangular >>>> meshes, couldn?t it ? >>>> >>> >>> It turns out that his is a very hairy problem. That is why almost no >>> parallel refinement packages exist. To my knowledge, this is only one: >>> Pragmatic. We support that package, but >>> it is in development, and we really need to update our interface. I am >>> working on it, but too much stuff gets in the way. >>> >> >> Oh really, only the one ? Okay okay, I guess I was too optimistic ! I'll >> look into Pragmatic even though it is in dev, maybe it'll be enough for >> what I wanna do for now. >> By the way, talking about things that get in the way, should I open an >> issue on the gitlab regarding Triangle not ignoring the ghost cells, would >> that be easier for you guys ? >> >> Thanks & have a great day, >> >> Thibault >> >> >> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks for your replies, >>>> Have a great afternoon/evening ! >>>> >>>> Thibault >>>> >>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Thibault >>>>>> >>>>>> Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a >>>>>> ?crit : >>>>>> >>>>>>> On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < >>>>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>>>> >>>>>>>> Dear all, >>>>>>>> >>>>>>>> I have recently been playing around with the AMR capabilities >>>>>>>> embedded in PETSc for quad meshes using p4est. Based on the TS tutorial >>>>>>>> ex11, I was able to incorporate the AMR into a pre-existing code with >>>>>>>> different metrics for the adaptation process. >>>>>>>> Now I would like to do something similar using tri meshes. I read >>>>>>>> that compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access >>>>>>>> to refinement and coarsening capabilities on triangular meshes.When I try >>>>>>>> to execute the code with a triangular mesh (that i manipulate as a DMPLEX), >>>>>>>> it yields "Triangle 1700 has an invalid vertex index" when trying to adapt >>>>>>>> the mesh (the initial mesh indeed has 1700 cells). From what i could tell, >>>>>>>> it comes from the reconstruct method called by the triangulate method of >>>>>>>> triangle.c, the latter being called by either >>>>>>>> *DMPlexGenerate_Triangle *or *DMPlexRefine_Triangle *in PETSc, I >>>>>>>> cannot be sure. >>>>>>>> >>>>>>>> In substance, the code is the same as in ex11.c and the crash >>>>>>>> occurs in the first adaptation pass, i.e. an equivalent in ex11 is that it >>>>>>>> crashes after the SetInitialCondition in the first if (useAMR) located line >>>>>>>> 1835 when it calls adaptToleranceFVM (which I copied basically so the code >>>>>>>> is the same). >>>>>>>> >>>>>>>> Is the automatic mesh refinement feature on tri meshes supposed to >>>>>>>> work or am I trying something that has not been completed yet ? >>>>>>>> >>>>>>> >>>>>>> It is supposed to work, and does for some tests in the library. I >>>>>>> stopped using it because it is inherently serial and it is isotropic. >>>>>>> However, it should be fixed. >>>>>>> Is there something I can run to help me track down the problem? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Thank you very much for your help, as always. >>>>>>>> >>>>>>>> Thibault Bridel-Bertomeu >>>>>>>> ? >>>>>>>> Eng, MSc, PhD >>>>>>>> Research Engineer >>>>>>>> CEA/CESTA >>>>>>>> 33114 LE BARP >>>>>>>> Tel.: (+33)557046924 >>>>>>>> Mob.: (+33)611025322 >>>>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>>> -- >>>> Thibault Bridel-Bertomeu >>>> ? >>>> Eng, MSc, PhD >>>> Research Engineer >>>> CEA/CESTA >>>> 33114 LE BARP >>>> Tel.: (+33)557046924 >>>> Mob.: (+33)611025322 >>>> Mail: thibault.bridelbertomeu at gmail.com >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Sep 2 09:38:08 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 2 Sep 2020 09:38:08 -0500 (CDT) Subject: [petsc-users] petsc-3.13.5.tar.gz now available Message-ID: Dear PETSc users, The patch release petsc-3.13.5 is now available for download, with change list at 'PETSc-3.13 Changelog' http://www.mcs.anl.gov/petsc/download/index.html Satish From thibault.bridelbertomeu at gmail.com Wed Sep 2 10:52:09 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Wed, 2 Sep 2020 17:52:09 +0200 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: Thank you Matthew for the link. I could get some doc from some articles talking about pragmatic so I guess I'll figure it out eventually. However, not all tests for ex19 are working, namely those tests involving the -do_L2 argument : it appears, when it reaches the first DMProjectFunction in the testProjectionL2 function (line 201) that it keeps failing in DMProjectLocal_Generic_Plex with the error : The section point (0) closure size 0 != dual space dimension 2 where the "dimension 2" changes to "dimension 3" if the case is 3D but nothing else. Whatever the FEM quadrature it fails at this point. Now I am just asking out of curiosity because I saw the TODO : broken mentions so I guess it is not actually supported on the master at this moment. Thank you for your help anyways ! Thibault Le mar. 1 sept. 2020 ? 21:24, Matthew Knepley a ?crit : > On Tue, Sep 1, 2020 at 7:15 AM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> Hi Matthew, >> >> So I turned the pages of the User guide real quick and the only reference >> I found to Pragmatic was in DMAdaptMetric. It says it is based on a >> vertex-based metric, but I could not find any more information regarding >> the characteristics of the expected metric ... Would you by any chance have >> more documentation or maybe a test/tutorial/example that builds such a >> metric and calls DMAdaptMetric ? >> > > There are tests: > https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex19.c > but not much explanation. The input is actually the metric tensor at the > vertices. You can see > me making representative things in the example. > > Thanks, > > Matt > > >> Thanks, >> >> Thibault >> >> Le mar. 1 sept. 2020 ? 08:24, Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> a ?crit : >> >>> Hi everyone, hi Matt, >>> >>> Le lun. 31 ao?t 2020 ? 22:03, Matthew Knepley a >>> ?crit : >>> >>>> On Mon, Aug 31, 2020 at 4:00 PM Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> wrote: >>>> >>>>> >>>>> >>>>> Le lun. 31 ao?t 2020 ? 20:35, Matthew Knepley a >>>>> ?crit : >>>>> >>>>>> On Mon, Aug 31, 2020 at 9:45 AM Thibault Bridel-Bertomeu < >>>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>>> >>>>>>> Hi Matt, >>>>>>> >>>>>>> OK so I tried to replicate the problem starting from one of the >>>>>>> tests in PETSc repo. >>>>>>> I found >>>>>>> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that >>>>>>> actually uses DMAdaptLabel. >>>>>>> Just add >>>>>>> >>>>>>> { >>>>>>> >>>>>>> DM gdm; >>>>>>> >>>>>>> >>>>>>> >>>>>>> DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); >>>>>>> >>>>>>> DMDestroy (&dm); >>>>>>> >>>>>>> dm = gdm; >>>>>>> >>>>>>> } >>>>>>> >>>>>>> after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2. >>>>>>> >>>>>>> It should tell you that Triangle 18 has an invalid vertex index. >>>>>>> >>>>>>> That's the minimal example that I found that replicates the problem. >>>>>>> >>>>>>> Ah, okay. p4est knows to discard the ghost cells. I can add that to >>>>>> Triangle. >>>>>> >>>>> >>>>> I thought it was something like that, seeing what addition of code >>>>> triggers the problem. >>>>> Thanks for adding the treatment to Triangle ! >>>>> >>>>>> Regarding the serial character of the technique, I tried with a distributed mesh and it works. >>>>>>> >>>>>>> Hmm, it can't work. Maybe it appears to work. Triangle knows nothing >>>>>> about parallelism. So this must be feeding the local mesh to triangle and >>>>>> replacing it by >>>>>> a refined mesh, but the parallel boundaries will not be correct, and >>>>>> might not even match up. >>>>>> >>>>> >>>>> Ok, yea, it appears to work. When asked to refine from scratch, not >>>>> from AdaptLabel but with a -dm_refine order, the mesh is funky as if it was >>>>> entirely re-made and the previous mesh thrown away. >>>>> Can you think of a way where each processor would be able to call on >>>>> Triangle on it?s own, with its own piece of mesh and maybe the surrounding >>>>> ghost cells ? I imagine it could work for parallel refining of triangular >>>>> meshes, couldn?t it ? >>>>> >>>> >>>> It turns out that his is a very hairy problem. That is why almost no >>>> parallel refinement packages exist. To my knowledge, this is only one: >>>> Pragmatic. We support that package, but >>>> it is in development, and we really need to update our interface. I am >>>> working on it, but too much stuff gets in the way. >>>> >>> >>> Oh really, only the one ? Okay okay, I guess I was too optimistic ! I'll >>> look into Pragmatic even though it is in dev, maybe it'll be enough for >>> what I wanna do for now. >>> By the way, talking about things that get in the way, should I open an >>> issue on the gitlab regarding Triangle not ignoring the ghost cells, would >>> that be easier for you guys ? >>> >>> Thanks & have a great day, >>> >>> Thibault >>> >>> >>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks for your replies, >>>>> Have a great afternoon/evening ! >>>>> >>>>> Thibault >>>>> >>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>>> So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Thibault >>>>>>> >>>>>>> Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a >>>>>>> ?crit : >>>>>>> >>>>>>>> On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < >>>>>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>>>>> >>>>>>>>> Dear all, >>>>>>>>> >>>>>>>>> I have recently been playing around with the AMR capabilities >>>>>>>>> embedded in PETSc for quad meshes using p4est. Based on the TS tutorial >>>>>>>>> ex11, I was able to incorporate the AMR into a pre-existing code with >>>>>>>>> different metrics for the adaptation process. >>>>>>>>> Now I would like to do something similar using tri meshes. I read >>>>>>>>> that compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access >>>>>>>>> to refinement and coarsening capabilities on triangular meshes.When I try >>>>>>>>> to execute the code with a triangular mesh (that i manipulate as a DMPLEX), >>>>>>>>> it yields "Triangle 1700 has an invalid vertex index" when trying to adapt >>>>>>>>> the mesh (the initial mesh indeed has 1700 cells). From what i could tell, >>>>>>>>> it comes from the reconstruct method called by the triangulate method of >>>>>>>>> triangle.c, the latter being called by either >>>>>>>>> *DMPlexGenerate_Triangle *or *DMPlexRefine_Triangle *in PETSc, I >>>>>>>>> cannot be sure. >>>>>>>>> >>>>>>>>> In substance, the code is the same as in ex11.c and the crash >>>>>>>>> occurs in the first adaptation pass, i.e. an equivalent in ex11 is that it >>>>>>>>> crashes after the SetInitialCondition in the first if (useAMR) located line >>>>>>>>> 1835 when it calls adaptToleranceFVM (which I copied basically so the code >>>>>>>>> is the same). >>>>>>>>> >>>>>>>>> Is the automatic mesh refinement feature on tri meshes supposed to >>>>>>>>> work or am I trying something that has not been completed yet ? >>>>>>>>> >>>>>>>> >>>>>>>> It is supposed to work, and does for some tests in the library. I >>>>>>>> stopped using it because it is inherently serial and it is isotropic. >>>>>>>> However, it should be fixed. >>>>>>>> Is there something I can run to help me track down the problem? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Thank you very much for your help, as always. >>>>>>>>> >>>>>>>>> Thibault Bridel-Bertomeu >>>>>>>>> ? >>>>>>>>> Eng, MSc, PhD >>>>>>>>> Research Engineer >>>>>>>>> CEA/CESTA >>>>>>>>> 33114 LE BARP >>>>>>>>> Tel.: (+33)557046924 >>>>>>>>> Mob.: (+33)611025322 >>>>>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>> Thibault Bridel-Bertomeu >>>>> ? >>>>> Eng, MSc, PhD >>>>> Research Engineer >>>>> CEA/CESTA >>>>> 33114 LE BARP >>>>> Tel.: (+33)557046924 >>>>> Mob.: (+33)611025322 >>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Wed Sep 2 13:44:06 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Wed, 2 Sep 2020 18:44:06 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> , Message-ID: Thank you Barry, The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. Cheers, Sophie ________________________________ De : Barry Smith Envoy? : mardi 1 septembre 2020 14:12 ? : Blondel, Sophie Cc : petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. Barry On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: Hi Barry, I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: Mat mat; DMCreateMatrix(da, &mat); MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 I get an error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c PetscSolver::solve: TSSolve failed. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : lundi 31 ao?t 2020 14:50 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. The factor of 4 is lot, the 1.5 not so bad. You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. If you have any questions please let me know. Barry On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: Hi Barry, I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: * 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) * 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 18:31 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 2 14:53:31 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 2 Sep 2020 14:53:31 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> Message-ID: <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> > On Sep 2, 2020, at 1:44 PM, Blondel, Sophie wrote: > > Thank you Barry, > > The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. Sophie, That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) > The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. Barry > > Cheers, > > Sophie > De : Barry Smith > > Envoy? : mardi 1 septembre 2020 14:12 > ? : Blondel, Sophie > > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Objet : Re: [petsc-users] Matrix Free Method questions > > > Sophie, > > Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. > > I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. > > Barry > > >> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: >> >> Hi Barry, >> >> I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: >> Mat mat; >> DMCreateMatrix(da, &mat); >> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >> >> When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 >> >> I get an error: >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: No support for this operation for this object type >> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined >> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 >> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 >> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries >> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c >> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >> [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c >> [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c >> [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c >> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c >> [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c >> [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c >> PetscSolver::solve: TSSolve failed. >> >> Cheers, >> >> Sophie >> De : Barry Smith > >> Envoy? : lundi 31 ao?t 2020 14:50 >> ? : Blondel, Sophie > >> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Objet : Re: [petsc-users] Matrix Free Method questions >> >> >> Sophie, >> >> Thanks. >> >> The factor of 4 is lot, the 1.5 not so bad. >> >> You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. >> >> It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). >> >> Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. >> >> Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? >> >> Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >> >> Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. >> >> A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. >> >> Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. >> >> Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. >> >> I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. >> >> >> If you have any questions please let me know. >> >> Barry >> >> >> >> >>> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: >>> >>> Hi Barry, >>> >>> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: >>> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) >>> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) >>> Cheers, >>> >>> Sophie >>> De : Barry Smith > >>> Envoy? : vendredi 28 ao?t 2020 18:31 >>> ? : Blondel, Sophie > >>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>> Objet : Re: [petsc-users] Matrix Free Method questions >>> >>> >>> >>>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >>>> >>>> Thank you Jed and Barry, >>>> >>>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >>>> >>>> To answer questions about the current per-conditioners: >>>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >>>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >>>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? >>> >>> Yes, the number of MatMult is a good enough surrogate. >>> >>> So using matrix-free (which means no preconditioning) has >>> >>> 35846/160 >>> >>> ans = >>> >>> 224.0375 >>> >>> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. >>> >>> Barry >>> >>> >>> >>>> >>>> Cheers, >>>> >>>> Sophie >>>> De : Barry Smith > >>>> Envoy? : vendredi 28 ao?t 2020 12:12 >>>> ? : Blondel, Sophie > >>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>> >>>> [External Email] >>>> >>>> Sophie, >>>> >>>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >>>> >>>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >>>> >>>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >>>> >>>>> -pc_fieldsplit_detect_coupling >>>> >>>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >>>> >>>>> -fieldsplit_0_pc_type sor >>>> >>>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >>>> >>>>> -fieldsplit_1_pc_type redundant >>>> >>>> >>>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >>>> >>>> ---- >>>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >>>> >>>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >>>> >>>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >>>> >>>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >>>> >>>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >>>> >>>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >>>> >>>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>>>> >>>>> Hi everyone, >>>>> >>>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>>>> >>>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>>>> >>>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>>>> >>>>> Best, >>>>> >>>>> Sophie >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From berend.vanwachem at ovgu.de Thu Sep 3 09:17:32 2020 From: berend.vanwachem at ovgu.de (Berend van Wachem) Date: Thu, 3 Sep 2020 16:17:32 +0200 Subject: [petsc-users] VTK format? Message-ID: Dear PETSc, What is the best way to write data from a DMPLEX vector to file, so it can be viewed with paraview? I've found that the standard VTK format works for a serial job, but if there is more than 1 processor, the geometry data gets messed up. I've attached a small working example for a cylinder and the visualised geometry with paraview for 1 processors and 4 processors. Any pointers or "best practice" very much appreciated. Best regards, Berend. -------------- next part -------------- A non-text attachment was scrubbed... Name: visualisemesh-1proc.png Type: image/png Size: 47186 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: visualisemesh-4proc.png Type: image/png Size: 35442 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: visualizemesh.c Type: text/x-csrc Size: 3143 bytes Desc: not available URL: From jed at jedbrown.org Thu Sep 3 09:53:55 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 03 Sep 2020 08:53:55 -0600 Subject: [petsc-users] VTK format? In-Reply-To: References: Message-ID: Use the xml format (not the legacy format) by naming your file.vtu instead of file.vtk On Thu, Sep 3, 2020, at 8:17 AM, Berend van Wachem wrote: > Dear PETSc, > > What is the best way to write data from a DMPLEX vector to file, so it > can be viewed with paraview? > I've found that the standard VTK format works for a serial job, but if > there is more than 1 processor, the geometry data gets messed up. > I've attached a small working example for a cylinder and the visualised > geometry with paraview for 1 processors and 4 processors. > Any pointers or "best practice" very much appreciated. > > Best regards, > > Berend. > > > > > *Attachments:* > * visualisemesh-1proc.png > * visualisemesh-4proc.png > * visualizemesh.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at lsu.edu Thu Sep 3 10:14:36 2020 From: bourdin at lsu.edu (Blaise A Bourdin) Date: Thu, 3 Sep 2020 15:14:36 +0000 Subject: [petsc-users] VTK format? In-Reply-To: References: Message-ID: <44CB9E13-C6E1-43AE-98EF-95BBB16C9240@lsu.edu> Hi, There is also support for the exodus format, with parallel IO. Have a look at src/dm/impls/plex/tests/ex26.c for an example. Exodus is a bit complicated and needs formatting before you can start dumping data in a file. It is supported by pretty much all post-processing software on the market. Blaise > On Sep 3, 2020, at 9:53 AM, Jed Brown wrote: > > Use the xml format (not the legacy format) by naming your file.vtu instead of file.vtk > > On Thu, Sep 3, 2020, at 8:17 AM, Berend van Wachem wrote: >> Dear PETSc, >> >> What is the best way to write data from a DMPLEX vector to file, so it >> can be viewed with paraview? >> I've found that the standard VTK format works for a serial job, but if >> there is more than 1 processor, the geometry data gets messed up. >> I've attached a small working example for a cylinder and the visualised >> geometry with paraview for 1 processors and 4 processors. >> Any pointers or "best practice" very much appreciated. >> >> Best regards, >> >> Berend. >> >> >> >> >> Attachments: >> ? visualisemesh-1proc.png >> ? visualisemesh-4proc.png >> ? visualizemesh.c -- A.K. & Shirley Barton Professor of Mathematics Adjunct Professor of Mechanical Engineering Adjunct of the Center for Computation & Technology Louisiana State University, Lockett Hall Room 344, Baton Rouge, LA 70803, USA Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 Web http://www.math.lsu.edu/~bourdin From olivier.jamond at cea.fr Thu Sep 3 10:43:13 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Thu, 3 Sep 2020 17:43:13 +0200 Subject: [petsc-users] Saddle point problem with nested matrix and a relatively small number of Lagrange multipliers Message-ID: <85f64541-bb8f-a71c-eb22-7c8bb5d05be3@cea.fr> Hello, I am working on a finite-elements/finite-volumes code, whose distributed solver is based on petsc. For FE, it relies on Lagrange multipliers for the imposition of various boundary conditions or interactions (simple dirichlet, contact, ...). This results in saddle point problems: [K C^t][U]=[F] [C? 0 ][L] [D] Most of the time, the relations related to the matrix C are applied to dofs?on the boundary of the domain. Then the size of L is much smaller than the size of U, which becomes more and more true as? the mesh is refined. The code construct this matrix as a nested matrix (one of the reason is that for some interactions such as contact, whereas?being quite small, the size of the matrix C change constantly, and having it 'blended' into a monolithic 'big' matrix would require to recompute its profile/ reallocate / ... each time), and use?fieldsplit preconditioner of type PC_COMPOSITE_SCHUR.?I would like to solve the system using iterative methods to access good extensibility on a large number of subdomains. Simple BC such as Dirichlet can be eliminated into K (and must be in order to make K invertible). My problem is the computational cost of these constraints treated with Lagrange multipliers, whereas their number becomes more and more neglectable as the mesh is refined. To give an idea, let's consider a simple elastic cube with dirichlet BCs which are all eliminated (to ensure invertibility?of K) but one on a single dof. -ksp_type preonly -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_factorization_type full -pc_fieldsplit_schur_precondition selfp -fieldsplit_u_ksp_type cg -fieldsplit_u_pc_type bjacobi -fieldsplit_l_ksp_type cg -fieldsplit_l_pc_type bjacobi it seems that my computation time is multiplied by a factor 3: 3 ksp solves of the big block 'K' are needed to apply the schur preconditioner (assuming that the ksp(S,Sp) converges in 1 iteration). It seems expensive for a single dof dirichlet! And for some relations treated by Lagrange multipliers which involve many?dofs, the number of ksp solve of the big block 'K' is ( 2 + number of iteration of?ksp(S,Sp)). To reduce this, one can think about solving the?ksp(S,Sp) with a direct solver, but then one must use "-pc_fieldsplit_schur_precondition self" which is advised against in the documentation... To illustrate this, on a small elasticity case: 32x32x32 cube on 8 processors, dirichlet on the top and bottom faces: * if all the dirichlet are eliminated?(no C matrix, monolithic?solve of the K bloc) ? - computation time for the solve: ~400ms * if only the dirichlet of the bottom face are eliminated ? -?computation time for the solve: ~35000ms ? - number of iteration?of?ksp(S,Sp): 37 ? - total number of iterations of ksp(K): 4939 *?only the dirichlet of the bottom face are eliminated with these options: -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_factorization_type full -pc_fieldsplit_schur_precondition selfp -fieldsplit_u_ksp_type cg -fieldsplit_u_pc_type bjacobi -fieldsplit_l_ksp_type cg -fieldsplit_l_pc_type bjacobi -fieldsplit_l_ksp_rtol 1e-10 -fieldsplit_l_inner_ksp_type preonly -fieldsplit_l_inner_pc_type jacobi -fieldsplit_l_upper_ksp_type preonly -fieldsplit_l_upper_pc_type jacobi ? -?computation time for the solve: ~50000ms ? - total number of iterations of ksp(K): 7424 ? - 'main' ksp number of iterations: 7424 Then in the end, my question is: is there a smarter way to handle such 'small' constraint matrices C, with the (maybe wrong) idea that a small number of extra dofs (the lagrange multipliers) should result in a small extra computation time ? Thanks! From bsmith at petsc.dev Thu Sep 3 10:45:30 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 3 Sep 2020 10:45:30 -0500 Subject: [petsc-users] VTK format? In-Reply-To: References: Message-ID: <753A4D7D-A9F4-4EF7-8FE7-B765CAAB60D8@petsc.dev> Shouldn't this, "just work". PETSc should not be dumping unreadable garbage into any files, does the broken PETSc code need to be removed, or error out until it can be fixed? Barry > On Sep 3, 2020, at 9:53 AM, Jed Brown wrote: > > Use the xml format (not the legacy format) by naming your file.vtu instead of file.vtk > > On Thu, Sep 3, 2020, at 8:17 AM, Berend van Wachem wrote: >> Dear PETSc, >> >> What is the best way to write data from a DMPLEX vector to file, so it >> can be viewed with paraview? >> I've found that the standard VTK format works for a serial job, but if >> there is more than 1 processor, the geometry data gets messed up. >> I've attached a small working example for a cylinder and the visualised >> geometry with paraview for 1 processors and 4 processors. >> Any pointers or "best practice" very much appreciated. >> >> Best regards, >> >> Berend. >> >> >> >> >> Attachments: >> visualisemesh-1proc.png >> visualisemesh-4proc.png >> visualizemesh.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Sep 3 10:47:40 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 03 Sep 2020 09:47:40 -0600 Subject: [petsc-users] VTK format? In-Reply-To: <753A4D7D-A9F4-4EF7-8FE7-B765CAAB60D8@petsc.dev> References: <753A4D7D-A9F4-4EF7-8FE7-B765CAAB60D8@petsc.dev> Message-ID: I'd have deleted the legacy vtk many years ago, but Matt says he uses it. On Thu, Sep 3, 2020, at 9:45 AM, Barry Smith wrote: > > Shouldn't this, "just work". PETSc should not be dumping unreadable garbage into any files, does the broken PETSc code need to be removed, or error out until it can be fixed? > > Barry > > >> On Sep 3, 2020, at 9:53 AM, Jed Brown wrote: >> >> Use the xml format (not the legacy format) by naming your file.vtu instead of file.vtk >> >> On Thu, Sep 3, 2020, at 8:17 AM, Berend van Wachem wrote: >>> Dear PETSc, >>> >>> What is the best way to write data from a DMPLEX vector to file, so it >>> can be viewed with paraview? >>> I've found that the standard VTK format works for a serial job, but if >>> there is more than 1 processor, the geometry data gets messed up. >>> I've attached a small working example for a cylinder and the visualised >>> geometry with paraview for 1 processors and 4 processors. >>> Any pointers or "best practice" very much appreciated. >>> >>> Best regards, >>> >>> Berend. >>> >>> >>> >>> >>> *Attachments:* >>> * visualisemesh-1proc.png >>> * visualisemesh-4proc.png >>> * visualizemesh.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 3 11:19:09 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 3 Sep 2020 11:19:09 -0500 Subject: [petsc-users] VTK format? In-Reply-To: References: <753A4D7D-A9F4-4EF7-8FE7-B765CAAB60D8@petsc.dev> Message-ID: <99C78421-D4C9-41A5-BFAF-B9BF8B836A09@petsc.dev> > On Sep 3, 2020, at 10:47 AM, Jed Brown wrote: > > I'd have deleted the legacy vtk many years ago, but Matt says he uses it. So he is plotting garbage or he never uses the broken stuff so it can be errorred out? > > On Thu, Sep 3, 2020, at 9:45 AM, Barry Smith wrote: >> >> Shouldn't this, "just work". PETSc should not be dumping unreadable garbage into any files, does the broken PETSc code need to be removed, or error out until it can be fixed? >> >> Barry >> >> >>> On Sep 3, 2020, at 9:53 AM, Jed Brown > wrote: >>> >>> Use the xml format (not the legacy format) by naming your file.vtu instead of file.vtk >>> >>> On Thu, Sep 3, 2020, at 8:17 AM, Berend van Wachem wrote: >>>> Dear PETSc, >>>> >>>> What is the best way to write data from a DMPLEX vector to file, so it >>>> can be viewed with paraview? >>>> I've found that the standard VTK format works for a serial job, but if >>>> there is more than 1 processor, the geometry data gets messed up. >>>> I've attached a small working example for a cylinder and the visualised >>>> geometry with paraview for 1 processors and 4 processors. >>>> Any pointers or "best practice" very much appreciated. >>>> >>>> Best regards, >>>> >>>> Berend. >>>> >>>> >>>> >>>> >>>> Attachments: >>>> visualisemesh-1proc.png >>>> visualisemesh-4proc.png >>>> visualizemesh.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Thu Sep 3 12:19:01 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Thu, 3 Sep 2020 12:19:01 -0500 Subject: [petsc-users] Questions about HDF5 parallel I/O Message-ID: Hi PETSc-developers, Currently, VecView loads an entire dataset from an hdf5 file into a PETSc vector (with any number of mpi ranks). Given that there is no routine to load a subset of an HDF5 dataset into a PETSc vector, the next best thing is to load the entire data-set into memory and select a smaller region as a sub-vector. Is there an example that demonstrates this ? (Mainly to get an idea on how to select a 2d array from a 3d array using a PETSc IS. Given that it's a regular 3D vector is it best to use a DMDA 3Dvec which gives ownership ranges that may aid with creating the IS ?) I've seen on earlier threads that XDMF can be used to create a map of where data is present in hdf5 files, is there an example for doing this with regular vectors to select subvectors as described above ? Also, is it possible to have different sub-comms read different hdf5 groups ? Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Thu Sep 3 12:26:48 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Thu, 3 Sep 2020 17:26:48 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> , <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> Message-ID: Hi Barry, Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. Cheers, Sophie ________________________________ De : Barry Smith Envoy? : mercredi 2 septembre 2020 15:53 ? : Blondel, Sophie Cc : petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Objet : Re: [petsc-users] Matrix Free Method questions On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: Thank you Barry, The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. Sophie, That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mardi 1 septembre 2020 14:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. Barry On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: Hi Barry, I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: Mat mat; DMCreateMatrix(da, &mat); MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 I get an error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c PetscSolver::solve: TSSolve failed. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : lundi 31 ao?t 2020 14:50 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. The factor of 4 is lot, the 1.5 not so bad. You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. If you have any questions please let me know. Barry On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: Hi Barry, I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: * 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) * 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 18:31 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_1.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_2.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_3.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_4.txt URL: From bsmith at petsc.dev Thu Sep 3 13:21:47 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 3 Sep 2020 13:21:47 -0500 Subject: [petsc-users] Questions about HDF5 parallel I/O In-Reply-To: References: Message-ID: <66EB797E-8ADD-49ED-BC5F-FCA342B1438B@petsc.dev> > On Sep 3, 2020, at 12:19 PM, Sajid Ali wrote: > > Hi PETSc-developers, > > Currently, VecView loads an entire dataset from an hdf5 file into a PETSc vector (with any number of mpi ranks). Given that there is no routine to load a subset of an HDF5 dataset into a PETSc vector, the next best thing is to load the entire data-set into memory and select a smaller region as a sub-vector. Is there an example that demonstrates this ? (Mainly to get an idea on how to select a 2d array from a 3d array using a PETSc IS. Given that it's a regular 3D vector is it best to use a DMDA 3Dvec which gives ownership ranges that may aid with creating the IS ?) Sajid, DMDACreatePatchIS() might be what you need. > > I've seen on earlier threads that XDMF can be used to create a map of where data is present in hdf5 files, is there an example for doing this with regular vectors to select subvectors as described above ? > > Also, is it possible to have different sub-comms read different hdf5 groups ? I think this should be possible if you have each sub-comm separately open the file. Barry > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 3 19:46:49 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 3 Sep 2020 20:46:49 -0400 Subject: [petsc-users] VTK format? In-Reply-To: <99C78421-D4C9-41A5-BFAF-B9BF8B836A09@petsc.dev> References: <753A4D7D-A9F4-4EF7-8FE7-B765CAAB60D8@petsc.dev> <99C78421-D4C9-41A5-BFAF-B9BF8B836A09@petsc.dev> Message-ID: I do not use it anymore. It can be thrown away. Matt On Thu, Sep 3, 2020 at 12:20 PM Barry Smith wrote: > > > On Sep 3, 2020, at 10:47 AM, Jed Brown wrote: > > I'd have deleted the legacy vtk many years ago, but Matt says he uses it. > > > So he is plotting garbage or he never uses the broken stuff so it can be > errorred out? > > > > On Thu, Sep 3, 2020, at 9:45 AM, Barry Smith wrote: > > > Shouldn't this, "just work". PETSc should not be dumping unreadable > garbage into any files, does the broken PETSc code need to be removed, or > error out until it can be fixed? > > Barry > > > On Sep 3, 2020, at 9:53 AM, Jed Brown wrote: > > Use the xml format (not the legacy format) by naming your file.vtu instead > of file.vtk > > On Thu, Sep 3, 2020, at 8:17 AM, Berend van Wachem wrote: > > Dear PETSc, > > What is the best way to write data from a DMPLEX vector to file, so it > can be viewed with paraview? > I've found that the standard VTK format works for a serial job, but if > there is more than 1 processor, the geometry data gets messed up. > I've attached a small working example for a cylinder and the visualised > geometry with paraview for 1 processors and 4 processors. > Any pointers or "best practice" very much appreciated. > > Best regards, > > Berend. > > > > > *Attachments:* > > - visualisemesh-1proc.png > - visualisemesh-4proc.png > - visualizemesh.c > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 4 00:06:54 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 4 Sep 2020 00:06:54 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> Message-ID: <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> Sophie, Thanks. I have started looking through the logs The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. * It roughly doubles the time since doing the matrix-free product requires a function evaluation * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations The change from 2 to 3 (not storing the entire matrix) * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). The change from 3 to 4 * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. Good luck, Barry > On Sep 3, 2020, at 12:26 PM, Blondel, Sophie wrote: > > Hi Barry, > > Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. > > Cheers, > > Sophie > De : Barry Smith > > Envoy? : mercredi 2 septembre 2020 15:53 > ? : Blondel, Sophie > > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Objet : Re: [petsc-users] Matrix Free Method questions > > > >> On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: >> >> Thank you Barry, >> >> The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. > > Sophie, > > That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) > >> The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. > > We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator > > You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. > > Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. > > > Barry > >> >> Cheers, >> >> Sophie >> De : Barry Smith > >> Envoy? : mardi 1 septembre 2020 14:12 >> ? : Blondel, Sophie > >> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Objet : Re: [petsc-users] Matrix Free Method questions >> >> >> Sophie, >> >> Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. >> >> I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. >> >> Barry >> >> >>> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: >>> >>> Hi Barry, >>> >>> I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: >>> Mat mat; >>> DMCreateMatrix(da, &mat); >>> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>> >>> When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 >>> >>> I get an error: >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: No support for this operation for this object type >>> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined >>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 >>> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 >>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries >>> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c >>> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>> [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c >>> [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>> [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c >>> [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c >>> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c >>> [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>> [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>> PetscSolver::solve: TSSolve failed. >>> >>> Cheers, >>> >>> Sophie >>> De : Barry Smith > >>> Envoy? : lundi 31 ao?t 2020 14:50 >>> ? : Blondel, Sophie > >>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>> Objet : Re: [petsc-users] Matrix Free Method questions >>> >>> >>> Sophie, >>> >>> Thanks. >>> >>> The factor of 4 is lot, the 1.5 not so bad. >>> >>> You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. >>> >>> It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). >>> >>> Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. >>> >>> Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? >>> >>> Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>> >>> Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. >>> >>> A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. >>> >>> Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. >>> >>> Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. >>> >>> I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. >>> >>> >>> If you have any questions please let me know. >>> >>> Barry >>> >>> >>> >>> >>>> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: >>>> >>>> Hi Barry, >>>> >>>> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: >>>> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) >>>> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) >>>> Cheers, >>>> >>>> Sophie >>>> De : Barry Smith > >>>> Envoy? : vendredi 28 ao?t 2020 18:31 >>>> ? : Blondel, Sophie > >>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>> >>>> >>>> >>>>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >>>>> >>>>> Thank you Jed and Barry, >>>>> >>>>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >>>>> >>>>> To answer questions about the current per-conditioners: >>>>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >>>>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >>>>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? >>>> >>>> Yes, the number of MatMult is a good enough surrogate. >>>> >>>> So using matrix-free (which means no preconditioning) has >>>> >>>> 35846/160 >>>> >>>> ans = >>>> >>>> 224.0375 >>>> >>>> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. >>>> >>>> Barry >>>> >>>> >>>> >>>>> >>>>> Cheers, >>>>> >>>>> Sophie >>>>> De : Barry Smith > >>>>> Envoy? : vendredi 28 ao?t 2020 12:12 >>>>> ? : Blondel, Sophie > >>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>> >>>>> [External Email] >>>>> >>>>> Sophie, >>>>> >>>>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >>>>> >>>>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >>>>> >>>>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >>>>> >>>>>> -pc_fieldsplit_detect_coupling >>>>> >>>>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >>>>> >>>>>> -fieldsplit_0_pc_type sor >>>>> >>>>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >>>>> >>>>>> -fieldsplit_1_pc_type redundant >>>>> >>>>> >>>>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >>>>> >>>>> ---- >>>>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >>>>> >>>>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >>>>> >>>>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >>>>> >>>>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >>>>> >>>>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >>>>> >>>>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >>>>> >>>>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>>>>> >>>>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>>>>> >>>>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>>>>> >>>>>> Best, >>>>>> >>>>>> Sophie >>>>> >>>>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Fri Sep 4 10:09:43 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Fri, 4 Sep 2020 17:09:43 +0200 Subject: [petsc-users] TS tutorial 11 and periodic DM Message-ID: Dear all, I am trying to figure out a working set of options for TS tuto ex11 involving the 3D Euler Linear Wave test. I cannot seem to go anywhere as it keeps crashing saying the physics did not compute the wave speed ... anyone could advise ? It is not however completely about ex11. I am trying to figure out how to set up periodic boundary conditions for a box mesh. I tried to get some infos here and there following examples : I got that you have to set DM_BOUNDARY_PERIODIC for the dimensions that you want in the array that you pass to DM Create Box Mesh, but i am not clear whether I have to do anything else after. Do I have to call DM Set Periodicity ? I also saw sometimes calls to DMLocalizeCoordinates ? ... ? And are there any constraints regarding the number of cells in the periodic dimensions ? I would like to just have 1 cell in Y dimension for instance to have a pseudo 1D computation, is that ok ? Thank you for your help, Thibault -- Thibault Bridel-Bertomeu ? Eng, MSc, PhD Research Engineer CEA/CESTA 33114 LE BARP Tel.: (+33)557046924 Mob.: (+33)611025322 Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 4 10:43:45 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 4 Sep 2020 11:43:45 -0400 Subject: [petsc-users] TS tutorial 11 and periodic DM In-Reply-To: References: Message-ID: On Fri, Sep 4, 2020 at 11:09 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Dear all, > > I am trying to figure out a working set of options for TS tuto ex11 > involving the 3D Euler Linear Wave test. I cannot seem to go anywhere as it > keeps crashing saying the physics did not compute the wave speed ... anyone > could advise ? > Does the shock test run for you? test: suffix: shock_0 requires: p4est !single !complex args: -ufv_vtk_interval 0 -monitor density,energy -f -grid_size 2,1 -grid_bounds -1,1.,0.,1 -bc_wall 1,2,3,4 -dm_type p4est -dm_forest_partition_overlap 1 -dm_forest_maximum_refinement 6 -dm_forest_minimum_refinement 2 -dm_forest_initial_refinement 2 -ufv_use_amr -refine_vec_tagger_box 0.5,inf -coarsen_vec_tagger_box 0,1.e-2 -refine_tag_view -coarsen_tag_view -physics euler -eu_type iv_shock -ufv_cfl 10 -eu_alpha 60. -grid_skew_60 -eu_gamma 1.4 -eu_amach 2.02 -eu_rho2 3. -petscfv_type leastsquares -petsclimiter_type minmod -petscfv_compute_gradients 0 -ts_max_time 0.5 -ts_ssp_type rks2 -ts_ssp_nstages 10 -ufv_vtk_basename ${wPETSC_DIR}/ex11 > It is not however completely about ex11. I am trying to figure out how to > set up periodic boundary conditions for a box mesh. I tried to get some > infos here and there following examples : I got that you have to set > DM_BOUNDARY_PERIODIC for the dimensions that you want in the array that you > pass to DM Create Box Mesh, but i am not clear whether I have to do > anything else after. Do I have to call DM Set Periodicity ? I also saw > sometimes calls to DMLocalizeCoordinates ? ... ? > DMSetPeriodicity() will be called by CreateBoxMesh(). You will need to call DMLocalizeCoordinates(). > And are there any constraints regarding the number of cells in the > periodic dimensions ? I would like to just have 1 cell in Y dimension for > instance to have a pseudo 1D computation, is that ok ? > Yes. 1 cell will not work right now. We now have everything in place to make it work, but have not put the code into CreateBoxMesh() to do it. For now, you can just put 3 cells in this direction. If you make an Issue, I will fix it as soon as I can. Thanks, Matt > Thank you for your help, > > Thibault > -- > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Sat Sep 5 07:54:27 2020 From: mlohry at gmail.com (Mark Lohry) Date: Sat, 5 Sep 2020 08:54:27 -0400 Subject: [petsc-users] Bus Error In-Reply-To: <4463F108-D33B-46C2-80BC-EDEBB3BBE140@petsc.dev> References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> <9EEB2628-D6ED-4466-A629-33EAC73BCE4C@petsc.dev> <386068BC-7972-455E-A9E8-C09F9DCF58BD@petsc.dev> <4463F108-D33B-46C2-80BC-EDEBB3BBE140@petsc.dev> Message-ID: Root cause update: failing DIMMs. Kinda boring, but there it is. On Thu, Aug 27, 2020 at 5:34 PM Barry Smith wrote: > > Mark, > > No problem, we'll have a few more automatic checks in PETSc due to this > to help everyone in the future debug these difficult situations a little > easier. > > Barry > > > On Aug 27, 2020, at 3:26 PM, Mark Lohry wrote: > > Alright, this time it crashed with a bus error before petsc had even been > initialized or anything in blas had ever been called. I'm told there was > also a known network failure on this cluster a few days ago that took out > one rack, so now I'm reasonably convinced there are legitimate hardware > faults elsewhere. > > Looking like a wild goose chase on the software side, but all the help is > hugely appreciated. > > On Thu, Aug 27, 2020 at 10:52 AM Barry Smith wrote: > >> >> Thanks, >> >> So this means that all the double precision array pointers that PETSc >> is passing into these BLAS calls are addressable. Which means nothing has >> corrupted any of these pointers before the calls. >> >> What my patch did. Before each BLAS call, for each double array >> argument it set a special exception handler and then accessed the first >> entry in the array. Since the exception handler was never called this means >> that the first entry of each array was accessible and would not produce a >> SEGV or SIGBUS. >> >> What else could be corrupted. >> >> 1) the size arguments passed to the BLAS calls, if they were too large >> they could result in accessing incorrect memory but IMHO that would usually >> produce a SEGV not a SIGBUS. It is hard to put a check in the code because >> these sizes are problem dependent and there is no way to know if they are >> wrong. >> >> 2) corruption of the stack? >> >> 3) hardware issue due to overheating or bad memory etc. I assume the MPI >> rank that crashes changes for each crashing run. I am adding code to our >> patch branch to print the node name that hopefully is constant for all >> runs, then one can see if the problem is always on the same node. Patch >> attached >> >> >> Can you try with a very different BLAS implementation? What are you >> using now? >> >> For example you could configure PETSc with --download-f2cblaslapack or >> if you are using MKL switch to non-MKL, or if you are using the system BLAS >> switch to MKL. >> >> Barry >> >> We can also replace the BLAS calls with direct C and see what happens but >> let's only do that after you try a different BLAS. >> >> >> >> >> >> On Aug 27, 2020, at 8:53 AM, Mark Lohry wrote: >> >> It was built with --with-debugging=1 >> >> On Thu, Aug 27, 2020 at 9:44 AM Barry Smith wrote: >> >>> >>> Mark, >>> >>> Did i tell you that this has to be built with the configure option >>> --with-debugging=1 and won't be turned off with --with-debugging=0 ? >>> >>> Barry >>> >>> >>> On Aug 27, 2020, at 8:10 AM, Mark Lohry wrote: >>> >>> Barry, no output from that patch i'm afraid: >>> >>> 54 KSP Residual norm 3.215013886664e+03 >>> 55 KSP Residual norm 3.049105434513e+03 >>> 56 KSP Residual norm 2.859123916860e+03 >>> [929]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [929]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly >>> illegal memory access >>> [929]PETSC ERROR: Try option -start_in_debugger or >>> -on_error_attach_debugger >>> [929]PETSC ERROR: or see >>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [929]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >>> OS X to find memory corruption errors >>> [929]PETSC ERROR: likely location of problem given in stack below >>> [929]PETSC ERROR: --------------------- Stack Frames >>> ------------------------------------ >>> [929]PETSC ERROR: Note: The EXACT line numbers in the stack are not >>> available, >>> [929]PETSC ERROR: INSTEAD the line number of the start of the >>> function >>> [929]PETSC ERROR: is given. >>> [929]PETSC ERROR: [929] BLASgemv line 1406 >>> /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >>> [929]PETSC ERROR: [929] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 >>> /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >>> [929]PETSC ERROR: [929] MatSolve line 3354 >>> /home/mlohry/petsc/src/mat/interface/matrix.c >>> [929]PETSC ERROR: [929] PCApply_ILU line 201 >>> /home/mlohry/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >>> [929]PETSC ERROR: [929] PCApply line 426 >>> /home/mlohry/petsc/src/ksp/pc/interface/precon.c >>> [929]PETSC ERROR: [929] KSP_PCApply line 279 >>> /home/mlohry/petsc/include/petsc/private/kspimpl.h >>> [929]PETSC ERROR: [929] KSPSolve_PREONLY line 16 >>> /home/mlohry/petsc/src/ksp/ksp/impls/preonly/preonly.c >>> [929]PETSC ERROR: [929] KSPSolve_Private line 590 >>> /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >>> [929]PETSC ERROR: [929] KSPSolve line 848 >>> /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >>> [929]PETSC ERROR: [929] PCApply_ASM line 441 >>> /home/mlohry/petsc/src/ksp/pc/impls/asm/asm.c >>> [929]PETSC ERROR: [929] PCApply line 426 >>> /home/mlohry/petsc/src/ksp/pc/interface/precon.c >>> [929]PETSC ERROR: [929] KSP_PCApply line 279 >>> /home/mlohry/petsc/include/petsc/private/kspimpl.h >>> srun: Job step aborted: Waiting up to 47 seconds for job step to finish. >>> [929]PETSC ERROR: [929] KSPFGMRESCycle line 108 >>> /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [929]PETSC ERROR: [929] KSPSolve_FGMRES line 274 >>> /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [929]PETSC ERROR: [929] KSPSolve_Private line 590 >>> /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >>> >>> On Mon, Aug 24, 2020 at 6:47 PM Mark Lohry wrote: >>> >>>> I don't think I do. Running a much smaller case with the same models I >>>> get the attached report from valgrind --show-leak-kinds=all >>>> --leak-check=full --track-origins=yes. I only see some HDF5 stuff and >>>> OpenMPI that I think are false positives. >>>> >>>> ==1286950== Memcheck, a memory error detector >>>> ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et >>>> al. >>>> ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun >>>> with -h for copyright info >>>> ==1286950== Command: ./verification_testing >>>> --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 >>>> --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 >>>> ==1286950== Parent PID: 1286932 >>>> ==1286950== >>>> --1286950-- >>>> --1286950-- Valgrind options: >>>> --1286950-- --show-leak-kinds=all >>>> --1286950-- --leak-check=full >>>> --1286950-- --track-origins=yes >>>> --1286950-- --log-file=valgrind-out.txt >>>> --1286950-- -v >>>> --1286950-- Contents of /proc/version: >>>> --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) >>>> (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 >>>> 14:32:27 UTC 2020 >>>> --1286950-- >>>> --1286950-- Arch and hwcaps: AMD64, LittleEndian, >>>> amd64-cx16-rdtscp-sse3-ssse3-avx >>>> --1286950-- Page sizes: currently 4096, max supported 4096 >>>> --1286950-- Valgrind library directory: >>>> /usr/lib/x86_64-linux-gnu/valgrind >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/verification_testing >>>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so >>>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. >>>> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >>>> --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. >>>> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >>>> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>>> ld-2.31.so .. >>>> --1286950-- .. CRC is valid >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- object doesn't have a dynamic symbol table >>>> --1286950-- Scheduler: using generic scheduler lock implementation. >>>> --1286950-- Reading suppressions file: >>>> /usr/lib/x86_64-linux-gnu/valgrind/default.supp >>>> ==1286950== embedded gdbserver: reading from >>>> /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? >>>> ==1286950== embedded gdbserver: writing to >>>> /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? >>>> ==1286950== embedded gdbserver: shared mem >>>> /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? >>>> ==1286950== >>>> ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably >>>> ==1286950== don't want to do, unless you know exactly what you're doing, >>>> ==1286950== or are doing some strange experiment): >>>> ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb >>>> --pid=1286950 ...command... >>>> ==1286950== >>>> ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this >>>> ==1286950== /path/to/gdb ./verification_testing >>>> ==1286950== and then give GDB the following command >>>> ==1286950== target remote | >>>> /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 >>>> ==1286950== --pid is optional if only one valgrind process is running >>>> ==1286950== >>>> --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected >>>> to 0x580c9ce2 (???) >>>> --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to >>>> 0x580c9cfc (???) >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so >>>> --1286950-- object doesn't have a symbol table >>>> ==1286950== WARNING: new redirection conflicts with existing -- >>>> ignoring it >>>> --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) >>>> 0x580c9ce2 ??? >>>> --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) >>>> 0x0483f060 strlen >>>> --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected >>>> to 0x483ffd0 (strcmp) >>>> --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected >>>> to 0x4843a20 (mempcpy) >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/initialization/libinitialization.so >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/program_options/libprogram_options.so >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ >>>> libpthread-2.31.so >>>> --1286950-- Considering >>>> /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. >>>> --1286950-- .. build-id is valid >>>> --1286948-- Reading syms from >>>> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 >>>> --1286937-- Reading syms from >>>> /home/mlohry/dev/cmake-build/parallel/libparallel.so >>>> --1286937-- Reading syms from >>>> /home/mlohry/dev/cmake-build/logger/liblogger.so >>>> --1286937-- Reading syms from >>>> /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so >>>> --1286945-- Reading syms from >>>> /home/mlohry/dev/cmake-build/utils/libutils.so >>>> --1286944-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 >>>> --1286938-- object doesn't have a symbol table >>>> --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so >>>> --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. >>>> --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) >>>> --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. >>>> --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) >>>> --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>>> libm-2.31.so .. >>>> --1286950-- .. CRC is valid >>>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so >>>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. >>>> --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >>>> --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. >>>> --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >>>> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>>> libc-2.31.so .. >>>> --1286950-- .. CRC is valid >>>> --1286940-- Reading syms from >>>> /home/mlohry/dev/cmake-build/file_io/libfileio.so >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 >>>> --1286950-- object doesn't have a symbol table >>>> --1286937-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 >>>> --1286937-- object doesn't have a symbol table >>>> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 >>>> --1286939-- object doesn't have a symbol table >>>> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so >>>> --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. >>>> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >>>> --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. >>>> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >>>> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>>> libdl-2.31.so .. >>>> --1286947-- .. CRC is valid >>>> --1286937-- Reading syms from >>>> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so >>>> --1286937-- Reading syms from >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 >>>> --1286942-- Reading syms from >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 >>>> --1286942-- Reading syms from >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 >>>> --1286942-- Reading syms from >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 >>>> --1286949-- Reading syms from >>>> /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so >>>> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 >>>> --1286944-- object doesn't have a symbol table >>>> --1286951-- Reading syms from >>>> /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so >>>> --1286951-- object doesn't have a symbol table >>>> --1286943-- Reading syms from >>>> /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 >>>> --1286951-- Reading syms from >>>> /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 >>>> --1286944-- Reading syms from >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 >>>> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 >>>> --1286944-- object doesn't have a symbol table >>>> --1286951-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 >>>> --1286951-- object doesn't have a symbol table >>>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so >>>> --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. >>>> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >>>> --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. >>>> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >>>> --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>>> libutil-2.31.so .. >>>> --1286939-- .. CRC is valid >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so >>>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. >>>> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >>>> --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. >>>> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >>>> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>>> librt-2.31.so .. >>>> --1286950-- .. CRC is valid >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 >>>> --1286950-- object doesn't have a symbol table >>>> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 >>>> --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >>>> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >>>> --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >>>> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >>>> --1286945-- object doesn't have a symbol table >>>> --1286945-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 >>>> --1286942-- object doesn't have a symbol table >>>> --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 >>>> --1286942-- object doesn't have a symbol table >>>> --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected >>>> to 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 >>>> (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to >>>> 0x483ea70 (__strrchr_sse2) >>>> --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 >>>> (calloc) >>>> --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 >>>> (malloc) >>>> --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected >>>> to 0x4840100 (memcpy at GLIBC_2.2.5) >>>> --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to >>>> 0x483efa0 (__strlen_sse2) >>>> --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to >>>> 0x483f7c0 (__strncmp_sse42) >>>> --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 >>>> (free) >>>> --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) >>>> redirected to 0x48428e0 (memset) >>>> --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to >>>> 0x4842150 (__memcmp_sse4_1) >>>> --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) >>>> redirected to 0x483fed0 (strcmp) >>>> --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned >>>> long)) redirected to 0x483bdf0 (operator new(unsigned long)) >>>> --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) >>>> redirected to 0x483cf50 (operator delete(void*)) >>>> --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned >>>> long)) redirected to 0x483c510 (operator new[](unsigned long)) >>>> --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) >>>> redirected to 0x483d6e0 (operator delete[](void*)) >>>> --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to >>>> 0x483eb90 (__strchr_sse2) >>>> --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to >>>> 0x483df30 (realloc) >>>> --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to >>>> 0x4843540 (strchrnul) >>>> --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) >>>> redirected to 0x4843c20 (strstr) >>>> --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) >>>> redirected to 0x4843660 (mempcpy) >>>> --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) >>>> redirected to 0x483f560 (__strncpy_sse2_unaligned) >>>> --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) >>>> redirected to 0x483ede0 (strncat) >>>> --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to >>>> 0x4843ca0 (__strstr_sse2) >>>> --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to >>>> 0x4843580 (rawmemchr) >>>> --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to >>>> 0x483f830 (strcasecmp) >>>> --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected >>>> to 0x483f910 (strncasecmp) >>>> --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to >>>> 0x4843ef0 (strspn) >>>> --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to >>>> 0x4843e10 (strcspn) >>>> --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to >>>> 0x4840050 (memchr) >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >>>> (have_dinfo 1) >>>> --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >>>> (have_dinfo 1) >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >>>> (have_dinfo 1) >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 >>>> (setenv) >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo >>>> 1) >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286948-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so >>>> --1286948-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286950-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so >>>> --1286950-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >>>> (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >>>> (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo >>>> 1) >>>> --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >>>> (have_dinfo 1) >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >>>> (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >>>> (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >>>> (have_dinfo 1) >>>> --1286936-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >>>> --1286936-- object doesn't have a symbol table >>>> --1286936-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >>>> --1286936-- object doesn't have a symbol table >>>> --1286936-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >>>> --1286936-- object doesn't have a symbol table >>>> --1286936-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >>>> --1286936-- object doesn't have a symbol table >>>> --1286936-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so >>>> --1286936-- object doesn't have a symbol table >>>> --1286936-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so >>>> --1286936-- object doesn't have a symbol table >>>> --1286936-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so >>>> --1286936-- object doesn't have a symbol table >>>> --1286936-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so >>>> --1286936-- object doesn't have a symbol table >>>> --1286936-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >>>> --1286936-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) >>>> redirected to 0x483f090 (strcpy) >>>> --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to >>>> 0x48331d0 (_vgnU_ifunc_wrapper) >>>> ==1286946== WARNING: new redirection conflicts with existing -- >>>> ignoring it >>>> --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) >>>> 0x04843b10 __memcpy_chk >>>> --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) >>>> 0x048434d0 __memmove_chk >>>> --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) >>>> redirected to 0x4843b10 (__memcpy_chk) >>>> --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to >>>> 0x483e1e0 (posix_memalign) >>>> --1286946-- Discarding syms at 0x9f15280-0x9f32932 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >>>> (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >>>> (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in >>>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in >>>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >>>> (have_dinfo 1) >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to >>>> 0x4843f80 (strcasestr) >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9fee680-0x9ff096c in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo >>>> 1) >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >>>> --1286946-- object doesn't have a symbol table >>>> --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) >>>> --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in >>>> /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9f94830-0x9fbafce in >>>> /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in >>>> /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in >>>> /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in >>>> /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in >>>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0xa968860-0xa9adf12 in >>>> /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in >>>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in >>>> /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) >>>> --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >>>> (have_dinfo 1) >>>> ==1286946== Thread 3: >>>> ==1286946== Syscall param writev(vector[...]) points to uninitialised >>>> byte(s) >>>> ==1286946== at 0x658A48D: __writev (writev.c:26) >>>> ==1286946== by 0x658A48D: writev (writev.c:24) >>>> ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>>> ==1286946== by 0x7CC413E: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286946== by 0x7CC487E: event_base_loop (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286946== by 0x8DBDD55: ??? (in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>>> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >>>> ==1286946== by 0x6595102: clone (clone.S:95) >>>> ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size >>>> 5,120 alloc'd >>>> ==1286946== at 0x483DFAF: realloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>>> ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>>> ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>>> ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>>> ==1286946== by 0x8D7CF91: ??? (in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>>> ==1286946== by 0x7CC3FDD: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286946== by 0x7CC487E: event_base_loop (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286946== by 0x8DBDD55: ??? (in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>>> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >>>> ==1286946== by 0x6595102: clone (clone.S:95) >>>> ==1286946== Uninitialised value was created by a stack allocation >>>> ==1286946== at 0x9F048D6: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) >>>> ==1286946== >>>> --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at >>>> 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at >>>> 0xaae1100-0xaae7d70 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >>>> (have_dinfo 1) >>>> --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: >>>> 0x61cee70 (libstdc++.so.6:operator delete(void*, unsigned long)) redirected >>>> to --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: >>>> 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 >>>> (stpncpy) >>>> --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to >>>> 0x483eee0 (strnlen) >>>> --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) >>>> redirected to 0x483ec20 (strcat) >>>> --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to >>>> 0x483e2a0 (memalign) >>>> --1286951-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >>>> --1286951-- object doesn't have a symbol table >>>> --1286951-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so >>>> --1286951-- object doesn't have a symbol table >>>> --1286941-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >>>> --1286941-- object doesn't have a symbol table >>>> --1286951-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so >>>> --1286951-- object doesn't have a symbol table >>>> --1286939-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >>>> --1286939-- object doesn't have a symbol table >>>> --1286939-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >>>> --1286939-- object doesn't have a symbol table >>>> --1286939-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >>>> --1286939-- object doesn't have a symbol table >>>> --1286939-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >>>> --1286939-- object doesn't have a symbol table >>>> --1286939-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >>>> --1286939-- object doesn't have a symbol table >>>> --1286939-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >>>> --1286939-- object doesn't have a symbol table >>>> --1286943-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >>>> --1286943-- object doesn't have a symbol table >>>> --1286943-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >>>> --1286943-- object doesn't have a symbol table >>>> --1286943-- Reading syms from >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >>>> --1286943-- object doesn't have a symbol table >>>> --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to >>>> 0x48435c0 (__strcpy_chk) >>>> --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo >>>> 1) >>>> --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo >>>> 1) >>>> --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo >>>> 1) >>>> --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo >>>> 1) >>>> --1286939-- Discarding syms at 0x9f40200-0x9f4126e in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo >>>> 1) >>>> --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9ece140-0x9ecebed in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x84da200-0x84daa5d in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8e45340-0x8e502da in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8d05520-0x8d06783 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8d44520-0x8d4556a in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in >>>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo >>>> 1) >>>> --1286939-- Discarding syms at 0x84d2580-0x84d518f in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x4a96120-0x4a9644f in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x84c74a0-0x84c901f in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x84e7460-0x84f52ca in >>>> /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x4a90360-0x4a91107 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9f46220-0x9f474cc in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo >>>> 1) >>>> --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo >>>> 1) >>>> --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0xa815460-0xa8177ab in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0xa81e260-0xa82033d in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >>>> (have_dinfo 1) >>>> --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >>>> (have_dinfo 1) >>>> ==1286939== >>>> ==1286939== HEAP SUMMARY: >>>> ==1286939== in use at exit: 74,054 bytes in 223 blocks >>>> ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, >>>> 34,062,479,959 bytes allocated >>>> ==1286939== >>>> ==1286939== Searching for pointers to 223 not-freed blocks >>>> ==1286939== Checked 3,415,912 bytes >>>> ==1286939== >>>> ==1286939== Thread 1: >>>> ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of >>>> 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>>> ==1286939== by 0x9F6A4B6: ??? >>>> ==1286939== by 0x9F47373: ??? >>>> ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4BA1734: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== >>>> ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of >>>> 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x764724C: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>>> ==1286939== by 0x7657B9A: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>>> ==1286939== by 0x7645679: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>>> ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ >>>> ld-2.31.so) >>>> ==1286939== by 0x3: ??? >>>> ==1286939== by 0x1FFEFFF926: ??? >>>> ==1286939== by 0x1FFEFFF93D: ??? >>>> ==1286939== by 0x1FFEFFF987: ??? >>>> ==1286939== by 0x1FFEFFF9A7: ??? >>>> ==1286939== >>>> ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of >>>> 44 >>>> ==1286939== at 0x483DD99: calloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x9F69B6F: ??? >>>> ==1286939== by 0x9F1CDED: ??? >>>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9EE3527: ??? >>>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286939== >>>> ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>>> ==1286939== by 0x7CC3657: event_config_avoid_method (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x68FEB5A: opal_event_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68FE8CA: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68B8BCF: opal_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x6860120: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== >>>> ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>>> ==1286939== by 0x9EDB189: ??? >>>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x6907C25: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4BA16D5: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286939== >>>> ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>>> ==1286939== by 0x9F5655C: ??? >>>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>>> ==1286939== by 0x65D6784: _dl_catch_exception >>>> (dl-error-skeleton.c:182) >>>> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >>>> ==1286939== by 0x65D6727: _dl_catch_exception >>>> (dl-error-skeleton.c:208) >>>> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >>>> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >>>> ==1286939== by 0x65D6727: _dl_catch_exception >>>> (dl-error-skeleton.c:208) >>>> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >>>> ==1286939== >>>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x9F1CBEB: ??? >>>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9EE3527: ??? >>>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286939== >>>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x9F1CC66: ??? >>>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9EE3527: ??? >>>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286939== >>>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x9F1CCDA: ??? >>>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9EE3527: ??? >>>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286939== >>>> ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>>> ==1286939== by 0x68F27BD: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B95259: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4B93FAE: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4BA1734: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== >>>> ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0xA9A859B: ??? >>>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>>> ==1286939== by 0x65D6784: _dl_catch_exception >>>> (dl-error-skeleton.c:182) >>>> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >>>> ==1286939== by 0x65D6727: _dl_catch_exception >>>> (dl-error-skeleton.c:208) >>>> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >>>> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >>>> ==1286939== by 0x65D6727: _dl_catch_exception >>>> (dl-error-skeleton.c:208) >>>> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >>>> ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) >>>> ==1286939== >>>> ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 >>>> of 44 >>>> ==1286939== at 0x483DD99: calloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x7CC353E: event_get_supported_methods (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x68FEA98: opal_event_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68FE8CA: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68B8BCF: opal_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x6860120: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== >>>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A5EB: ??? >>>> ==1286939== by 0x84D2B0A: ??? >>>> ==1286939== by 0x68602FB: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A5EB: ??? >>>> ==1286939== by 0x84D2BCE: ??? >>>> ==1286939== by 0x68602FB: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A5EB: ??? >>>> ==1286939== by 0x84D2CB2: ??? >>>> ==1286939== by 0x68602FB: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A5EB: ??? >>>> ==1286939== by 0x84D2D91: ??? >>>> ==1286939== by 0x68602FB: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E81BD8: ??? >>>> ==1286939== by 0x8E89F4B: ??? >>>> ==1286939== by 0x8D84A0D: ??? >>>> ==1286939== by 0x8DF79C1: ??? >>>> ==1286939== by 0x7CC3FDD: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x7CC487E: event_base_loop (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x8DBDD55: ??? >>>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>>> ==1286939== by 0x6595102: clone (clone.S:95) >>>> ==1286939== >>>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A767: ??? >>>> ==1286939== by 0x84D330E: ??? >>>> ==1286939== by 0x68602FB: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely >>>> lost in loss record 19 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A5EB: ??? >>>> ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x9F1E1E1: ??? >>>> ==1286939== by 0x4BA1A09: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x7CFF4B6: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>>> ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >>>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>>> ==1286939== by 0x68FE8E4: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68B8BCF: opal_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x6860120: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== >>>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x7CFF4B6: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>>> ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >>>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>>> ==1286939== by 0x68FE8E4: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68B8BCF: opal_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x6860120: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== >>>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x7CFF4B6: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>>> ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >>>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>>> ==1286939== by 0x68FE8E4: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68B8BCF: opal_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x6860120: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== >>>> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4B8560C: mca_io_base_file_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>>> ==1286939== >>>> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4B85638: mca_io_base_file_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>>> ==1286939== >>>> ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x7CC3647: event_config_avoid_method (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x68FEB5A: opal_event_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68FE8CA: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68B8BCF: opal_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x6860120: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== >>>> ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are >>>> definitely lost in loss record 26 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A767: ??? >>>> ==1286939== by 0x4AF6CD6: ompi_comm_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA194D: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== >>>> ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 >>>> of 44 >>>> ==1286939== at 0x483DD99: calloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x7CC1C86: event_config_new (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x68FEAC0: opal_event_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68FE8CA: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68B8BCF: opal_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x6860120: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== >>>> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x9F6E008: ??? >>>> ==1286939== by 0x9F7C654: ??? >>>> ==1286939== by 0x9F1CD3E: ??? >>>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9EE3527: ??? >>>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== >>>> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0xA957008: ??? >>>> ==1286939== by 0xA86B017: ??? >>>> ==1286939== by 0xA862FD8: ??? >>>> ==1286939== by 0xA828E15: ??? >>>> ==1286939== by 0xA829624: ??? >>>> ==1286939== by 0x9F77910: ??? >>>> ==1286939== by 0x4B85C53: ompi_mtl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x9F13E4D: ??? >>>> ==1286939== by 0x4B94673: mca_pml_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1789: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are >>>> definitely lost in loss record 30 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A767: ??? >>>> ==1286939== by 0x84D387F: ??? >>>> ==1286939== by 0x68602FB: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are >>>> definitely lost in loss record 31 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x9EDB12E: ??? >>>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x6907C25: ??? (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4BA16D5: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286939== >>>> ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4B8564E: mca_io_base_file_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>>> ==1286939== >>>> ==1286939== 231 bytes in 12 blocks are definitely lost in loss record >>>> 33 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>>> ==1286939== by 0x9F2B4B3: ??? >>>> ==1286939== by 0x9F2B85C: ??? >>>> ==1286939== by 0x9F2BBD7: ??? >>>> ==1286939== by 0x9F1CAAC: ??? >>>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9EE3527: ??? >>>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== >>>> ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 >>>> of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x4B85622: mca_io_base_file_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>>> ==1286939== >>>> ==1286939== 272 bytes in 44 blocks are definitely lost in loss record >>>> 35 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x9FCAEDB: ??? >>>> ==1286939== by 0x9FE42B2: ??? >>>> ==1286939== by 0x9FE47BB: ??? >>>> ==1286939== by 0x9FCDDBF: ??? >>>> ==1286939== by 0x9FA324A: ??? >>>> ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>>> ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>>> ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>>> ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>>> ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>>> ==1286939== >>>> ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are >>>> definitely lost in loss record 36 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A767: ??? >>>> ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B146C3: ompi_proc_complete_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA19A9: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record >>>> 37 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8DE9816: ??? >>>> ==1286939== by 0x8DEB1D2: ??? >>>> ==1286939== by 0x8DEB49A: ??? >>>> ==1286939== by 0x8DE8B12: ??? >>>> ==1286939== by 0x8E9D492: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A767: ??? >>>> ==1286939== >>>> ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are >>>> definitely lost in loss record 38 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A5EB: ??? >>>> ==1286939== by 0x9EF2F00: ??? >>>> ==1286939== by 0x9EEBF17: ??? >>>> ==1286939== by 0x9EE2F54: ??? >>>> ==1286939== by 0x9F1E1FB: ??? >>>> ==1286939== >>>> ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are >>>> definitely lost in loss record 39 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A767: ??? >>>> ==1286939== by 0x84D4800: ??? >>>> ==1286939== by 0x68602FB: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record >>>> 40 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9F1CD2D: ??? >>>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9EE3527: ??? >>>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286939== >>>> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record >>>> 41 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9F1CC50: ??? >>>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9EE3527: ??? >>>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286939== >>>> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record >>>> 42 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9F1CCC4: ??? >>>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286939== by 0x9EE3527: ??? >>>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286939== >>>> ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss >>>> record 43 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8DE9FA8: ??? >>>> ==1286939== by 0x8DEB032: ??? >>>> ==1286939== by 0x8DEB49A: ??? >>>> ==1286939== by 0x8DE8B12: ??? >>>> ==1286939== by 0x8E9D492: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A5EB: ??? >>>> ==1286939== >>>> ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are >>>> definitely lost in loss record 44 of 44 >>>> ==1286939== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8E9D3EB: ??? >>>> ==1286939== by 0x8E9F1C1: ??? >>>> ==1286939== by 0x8D0578C: ??? >>>> ==1286939== by 0x8D8605A: ??? >>>> ==1286939== by 0x8D87FE8: ??? >>>> ==1286939== by 0x8D88E4D: ??? >>>> ==1286939== by 0x8D1A5EB: ??? >>>> ==1286939== by 0x9F0398A: ??? >>>> ==1286939== by 0x9EE2F54: ??? >>>> ==1286939== by 0x9F1E1FB: ??? >>>> ==1286939== by 0x4BA1A09: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286939== >>>> ==1286939== LEAK SUMMARY: >>>> ==1286939== definitely lost: 9,837 bytes in 138 blocks >>>> ==1286939== indirectly lost: 63,435 bytes in 64 blocks >>>> ==1286939== possibly lost: 0 bytes in 0 blocks >>>> ==1286939== still reachable: 782 bytes in 21 blocks >>>> ==1286939== suppressed: 0 bytes in 0 blocks >>>> ==1286939== >>>> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 >>>> from 0) >>>> ==1286939== >>>> ==1286939== 1 errors in context 1 of 29: >>>> ==1286939== Thread 3: >>>> ==1286939== Syscall param writev(vector[...]) points to uninitialised >>>> byte(s) >>>> ==1286939== at 0x658A48D: __writev (writev.c:26) >>>> ==1286939== by 0x658A48D: writev (writev.c:24) >>>> ==1286939== by 0x8DF9B4C: ??? >>>> ==1286939== by 0x7CC413E: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x7CC487E: event_base_loop (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x8DBDD55: ??? >>>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>>> ==1286939== by 0x6595102: clone (clone.S:95) >>>> ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size >>>> 5,120 alloc'd >>>> ==1286939== at 0x483DFAF: realloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286939== by 0x8DE155A: ??? >>>> ==1286939== by 0x8DE3F4A: ??? >>>> ==1286939== by 0x8DE4900: ??? >>>> ==1286939== by 0x8DE4175: ??? >>>> ==1286939== by 0x8D7CF91: ??? >>>> ==1286939== by 0x7CC3FDD: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x7CC487E: event_base_loop (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286939== by 0x8DBDD55: ??? >>>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>>> ==1286939== by 0x6595102: clone (clone.S:95) >>>> ==1286939== Uninitialised value was created by a stack allocation >>>> ==1286939== at 0x9F048D6: ??? >>>> ==1286939== >>>> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 >>>> from 0) >>>> mpi/lib/libopen-pal.so.40.20.3) >>>> ==1286936== by 0x4B85622: mca_io_base_file_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4B0E68A: ompi_file_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4B3ADB8: PMPI_File_open (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>>> ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) >>>> ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) >>>> ==1286936== >>>> ==1286936== 272 bytes in 44 blocks are definitely lost in loss record >>>> 39 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x9FCAEDB: ??? >>>> ==1286936== by 0x9FE42B2: ??? >>>> ==1286936== by 0x9FE47BB: ??? >>>> ==1286936== by 0x9FCDDBF: ??? >>>> ==1286936== by 0x9FA324A: ??? >>>> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>>> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>>> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>>> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>>> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>>> ==1286936== >>>> ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 >>>> of 49 >>>> ==1286936== at 0x483BE63: operator new(unsigned long) (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() >>>> (in >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >>>> ==1286936== by 0x74E7C74: >>>> boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) >>>> (in >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >>>> ==1286936== by 0x73AFCEA: >>>> boost::log::v2_mt_posix::sources::aux::get_severity_level() (in >>>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) >>>> ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) >>>> ==1286936== by 0x5F71A6C: >>>> open_record_unlocked>>> const boost::log::v2_mt_posix::trivial::severity_level> > > >>>> (severity_feature.hpp:252) >>>> ==1286936== by 0x5F71A6C: >>>> open_record>>> const boost::log::v2_mt_posix::trivial::severity_level> > > >>>> (basic_logger.hpp:459) >>>> ==1286936== by 0x5F71A6C: >>>> Logger::TraceMessage(std::__cxx11::basic_string>>> std::char_traits, std::allocator >) (logger.cpp:328) >>>> ==1286936== by 0x5F729C7: >>>> Logger::Message(std::__cxx11::basic_string, >>>> std::allocator > const&, LogLevel) (logger.cpp:280) >>>> ==1286936== by 0x5F73CF1: >>>> Logger::Timer::Timer(std::__cxx11::basic_string>>> std::char_traits, std::allocator > const&, LogLevel) >>>> (logger.cpp:426) >>>> ==1286936== by 0x15718A: timer (logger.hpp:98) >>>> ==1286936== by 0x15718A: main (testing_main.cpp:9) >>>> ==1286936== >>>> ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are >>>> definitely lost in loss record 41 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x8E9D3EB: ??? >>>> ==1286936== by 0x8E9F1C1: ??? >>>> ==1286936== by 0x8D0578C: ??? >>>> ==1286936== by 0x8D8605A: ??? >>>> ==1286936== by 0x8D87FE8: ??? >>>> ==1286936== by 0x8D88E4D: ??? >>>> ==1286936== by 0x8D1A767: ??? >>>> ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4B146C3: ompi_proc_complete_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4BA19A9: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== >>>> ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record >>>> 42 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x8DE9816: ??? >>>> ==1286936== by 0x8DEB1D2: ??? >>>> ==1286936== by 0x8DEB49A: ??? >>>> ==1286936== by 0x8DE8B12: ??? >>>> ==1286936== by 0x8E9D492: ??? >>>> ==1286936== by 0x8E9F1C1: ??? >>>> ==1286936== by 0x8D0578C: ??? >>>> ==1286936== by 0x8D8605A: ??? >>>> ==1286936== by 0x8D87FE8: ??? >>>> ==1286936== by 0x8D88E4D: ??? >>>> ==1286936== by 0x8D1A767: ??? >>>> ==1286936== >>>> ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are >>>> definitely lost in loss record 43 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x8E9D3EB: ??? >>>> ==1286936== by 0x8E9F1C1: ??? >>>> ==1286936== by 0x8D0578C: ??? >>>> ==1286936== by 0x8D8605A: ??? >>>> ==1286936== by 0x8D87FE8: ??? >>>> ==1286936== by 0x8D88E4D: ??? >>>> ==1286936== by 0x8D1A5EB: ??? >>>> ==1286936== by 0x9EF2F00: ??? >>>> ==1286936== by 0x9EEBF17: ??? >>>> ==1286936== by 0x9EE2F54: ??? >>>> ==1286936== by 0x9F1E1FB: ??? >>>> ==1286936== >>>> ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are >>>> definitely lost in loss record 44 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x8E9D3EB: ??? >>>> ==1286936== by 0x8E9F1C1: ??? >>>> ==1286936== by 0x8D0578C: ??? >>>> ==1286936== by 0x8D8605A: ??? >>>> ==1286936== by 0x8D87FE8: ??? >>>> ==1286936== by 0x8D88E4D: ??? >>>> ==1286936== by 0x8D1A767: ??? >>>> ==1286936== by 0x84D4800: ??? >>>> ==1286936== by 0x68602FB: orte_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>>> ==1286936== by 0x4BA1322: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== >>>> ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record >>>> 45 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286936== by 0x9F1CD2D: ??? >>>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286936== by 0x9EE3527: ??? >>>> ==1286936== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286936== >>>> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record >>>> 46 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286936== by 0x9F1CC50: ??? >>>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286936== by 0x9EE3527: ??? >>>> ==1286936== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286936== >>>> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record >>>> 47 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286936== by 0x9F1CCC4: ??? >>>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>>> ==1286936== by 0x9EE3527: ??? >>>> ==1286936== by 0x4B6170A: mca_bml_base_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4BA1714: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4B450B0: PMPI_Init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>>> ==1286936== >>>> ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss >>>> record 48 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x8DE9FA8: ??? >>>> ==1286936== by 0x8DEB032: ??? >>>> ==1286936== by 0x8DEB49A: ??? >>>> ==1286936== by 0x8DE8B12: ??? >>>> ==1286936== by 0x8E9D492: ??? >>>> ==1286936== by 0x8E9F1C1: ??? >>>> ==1286936== by 0x8D0578C: ??? >>>> ==1286936== by 0x8D8605A: ??? >>>> ==1286936== by 0x8D87FE8: ??? >>>> ==1286936== by 0x8D88E4D: ??? >>>> ==1286936== by 0x8D1A5EB: ??? >>>> ==1286936== >>>> ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are >>>> definitely lost in loss record 49 of 49 >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x8E9D3EB: ??? >>>> ==1286936== by 0x8E9F1C1: ??? >>>> ==1286936== by 0x8D0578C: ??? >>>> ==1286936== by 0x8D8605A: ??? >>>> ==1286936== by 0x8D87FE8: ??? >>>> ==1286936== by 0x8D88E4D: ??? >>>> ==1286936== by 0x8D1A5EB: ??? >>>> ==1286936== by 0x9F0398A: ??? >>>> ==1286936== by 0x9EE2F54: ??? >>>> ==1286936== by 0x9F1E1FB: ??? >>>> ==1286936== by 0x4BA1A09: ompi_mpi_init (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== >>>> ==1286936== LEAK SUMMARY: >>>> ==1286936== definitely lost: 9,805 bytes in 137 blocks >>>> ==1286936== indirectly lost: 63,431 bytes in 63 blocks >>>> ==1286936== possibly lost: 0 bytes in 0 blocks >>>> ==1286936== still reachable: 1,174 bytes in 27 blocks >>>> ==1286936== suppressed: 0 bytes in 0 blocks >>>> ==1286936== >>>> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 >>>> from 0) >>>> ==1286936== >>>> ==1286936== 1 errors in context 1 of 29: >>>> ==1286936== Thread 3: >>>> ==1286936== Syscall param writev(vector[...]) points to uninitialised >>>> byte(s) >>>> ==1286936== at 0x658A48D: __writev (writev.c:26) >>>> ==1286936== by 0x658A48D: writev (writev.c:24) >>>> ==1286936== by 0x8DF9B4C: ??? >>>> ==1286936== by 0x7CC413E: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286936== by 0x7CC487E: event_base_loop (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286936== by 0x8DBDD55: ??? >>>> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >>>> ==1286936== by 0x6595102: clone (clone.S:95) >>>> ==1286936== Address 0xa290cbf is 127 bytes inside a block of size >>>> 5,120 alloc'd >>>> ==1286936== at 0x483DFAF: realloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x8DE155A: ??? >>>> ==1286936== by 0x8DE3F4A: ??? >>>> ==1286936== by 0x8DE4900: ??? >>>> ==1286936== by 0x8DE4175: ??? >>>> ==1286936== by 0x8D7CF91: ??? >>>> ==1286936== by 0x7CC3FDD: ??? (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286936== by 0x7CC487E: event_base_loop (in >>>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>>> ==1286936== by 0x8DBDD55: ??? >>>> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >>>> ==1286936== by 0x6595102: clone (clone.S:95) >>>> ==1286936== Uninitialised value was created by a stack allocation >>>> ==1286936== at 0x9F048D6: ??? >>>> ==1286936== >>>> ==1286936== >>>> ==1286936== 6 errors in context 2 of 29: >>>> ==1286936== Thread 1: >>>> ==1286936== Syscall param pwritev(vector[...]) points to uninitialised >>>> byte(s) >>>> ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) >>>> ==1286936== by 0x658A608: pwritev (pwritev64.c:28) >>>> ==1286936== by 0x9F46E25: ??? >>>> ==1286936== by 0x9FCE33B: ??? >>>> ==1286936== by 0x9FCDDBF: ??? >>>> ==1286936== by 0x9FA324A: ??? >>>> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in >>>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>>> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>>> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>>> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>>> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>>> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>>> ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) >>>> ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) >>>> ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 >>>> alloc'd >>>> ==1286936== at 0x483B7F3: malloc (in >>>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) >>>> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) >>>> ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) >>>> ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring >>>> (H5Cmpio.c:1371) >>>> ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries >>>> (H5Cmpio.c:1192) >>>> ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) >>>> ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush >>>> (H5ACmpio.c:1709) >>>> ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) >>>> ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) >>>> ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) >>>> ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) >>>> ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) >>>> ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) >>>> ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) >>>> ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) >>>> ==1286936== Uninitialised value was created by a stack allocation >>>> ==1286936== at 0x7695AF0: ??? (in >>>> /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) >>>> ==1286936== >>>> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 >>>> from 0) >>>> >>>> On Mon, Aug 24, 2020 at 5:00 PM Jed Brown wrote: >>>> >>>>> Do you potentially have a memory or other resource leak? SIGBUS would >>>>> be an odd result, but the symptom of crashing after running for a long time >>>>> sometimes fits with a resource leak. >>>>> >>>>> Mark Lohry writes: >>>>> >>>>> > I queued up some jobs with Barry's patch, so we'll see. >>>>> > >>>>> > Re Jed's suggestion at checkpointing, I don't *think* this is >>>>> something >>>>> > coming from the state of the solution -- running from the same point >>>>> I'm >>>>> > seeing it crash anywhere between 1 hour and 20 hours in. I'll >>>>> increase my >>>>> > file save frequency in case I'm wrong there though. >>>>> > >>>>> > My intel build with different blas just made it through a 6 hour >>>>> time slot >>>>> > without crash, whereas yesterday the same thing crashed after 3 >>>>> hours. But >>>>> > given the randomness so far I'd bet that's just dumb luck. >>>>> > >>>>> > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith >>>>> wrote: >>>>> > >>>>> >> >>>>> >> >>>>> >> > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: >>>>> >> > >>>>> >> > I'm thinking of something such as writing floating point data >>>>> into the >>>>> >> return address, which would be unaligned/garbage. >>>>> >> >>>>> >> Ok, my patch will detect this. This is what I was talking about, >>>>> messing >>>>> >> up the BLAS arguments which are the addresses of arrays. >>>>> >> >>>>> >> Valgrind is by far the preferred approach. >>>>> >> >>>>> >> Barry >>>>> >> >>>>> >> Another feature we could add to the malloc checking is when a >>>>> SEGV or >>>>> >> BUS error is encountered and we catch it we should run the >>>>> >> PetscMallocVerify() and check our memory for corruption reporting >>>>> any we >>>>> >> find. >>>>> >> >>>>> >> >>>>> >> >>>>> >> > >>>>> >> > Reproducing under Valgrind would help a lot. Perhaps it's >>>>> possible to >>>>> >> checkpoint such that the breakage can be reproduced more quickly? >>>>> >> > >>>>> >> > Barry Smith writes: >>>>> >> > >>>>> >> >> https://en.wikipedia.org/wiki/Bus_error < >>>>> >> https://en.wikipedia.org/wiki/Bus_error> >>>>> >> >> >>>>> >> >> But perhaps not true for Intel? >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley >>>> > >>>>> >> wrote: >>>>> >> >>> >>>>> >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith >>>> >>>> >> bsmith at petsc.dev>> wrote: >>>>> >> >>> >>>>> >> >>> >>>>> >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown >>>> >>>> >> jed at jedbrown.org>> wrote: >>>>> >> >>>> >>>>> >> >>>> Barry Smith > >>>>> writes: >>>>> >> >>>> >>>>> >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown >>>> >>>> >> jed at jedbrown.org>> wrote: >>>>> >> >>>>>> >>>>> >> >>>>>> Barry Smith > >>>>> writes: >>>>> >> >>>>>> >>>>> >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input >>>>> error >>>>> >> of just not proper double/complex alignment? Or some other very >>>>> strange >>>>> >> thing? >>>>> >> >>>>>> >>>>> >> >>>>>> I would suspect memory corruption. >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> Corruption meaning what specifically? >>>>> >> >>>>> >>>>> >> >>>>> The routines crashing are dgemv which only take double >>>>> precision >>>>> >> arrays, regardless of what garbage is in those arrays i don't think >>>>> there >>>>> >> can be BUS errors resulting. They don't take integer arrays whose >>>>> >> corruption could result in bad indexing and then BUS errors. >>>>> >> >>>>> >>>>> >> >>>>> So then it can only be corruption of the pointers passed in, >>>>> correct? >>>>> >> >>>> >>>>> >> >>>> Such as those pointers pointing into data on the stack with >>>>> incorrect >>>>> >> sizes. >>>>> >> >>> >>>>> >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >>>>> >> >>> >>>>> >> >>> My understanding was that roughly memory errors in the heap are >>>>> SEGV >>>>> >> and memory errors on the stack are SIGBUS. Is that not true? >>>>> >> >>> >>>>> >> >>> Matt >>>>> >> >>> >>>>> >> >>> -- >>>>> >> >>> What most experimenters take for granted before they begin their >>>>> >> experiments is infinitely more interesting than any results to >>>>> which their >>>>> >> experiments lead. >>>>> >> >>> -- Norbert Wiener >>>>> >> >>> >>>>> >> >>> https://www.cse.buffalo.edu/~knepley/ < >>>>> >> http://www.cse.buffalo.edu/~knepley/> >>>>> >> >>>>> >> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sun Sep 6 19:41:34 2020 From: jed at jedbrown.org (Jed Brown) Date: Sun, 06 Sep 2020 18:41:34 -0600 Subject: [petsc-users] VTK format? In-Reply-To: References: <753A4D7D-A9F4-4EF7-8FE7-B765CAAB60D8@petsc.dev> <99C78421-D4C9-41A5-BFAF-B9BF8B836A09@petsc.dev> Message-ID: <87mu22mn9d.fsf@jedbrown.org> This MR deprecates it for 3.14. We can delete it for a future release. https://gitlab.com/petsc/petsc/-/merge_requests/3142 Matthew Knepley writes: > I do not use it anymore. It can be thrown away. > > Matt > > On Thu, Sep 3, 2020 at 12:20 PM Barry Smith wrote: > >> >> >> On Sep 3, 2020, at 10:47 AM, Jed Brown wrote: >> >> I'd have deleted the legacy vtk many years ago, but Matt says he uses it. >> >> >> So he is plotting garbage or he never uses the broken stuff so it can be >> errorred out? >> >> >> >> On Thu, Sep 3, 2020, at 9:45 AM, Barry Smith wrote: >> >> >> Shouldn't this, "just work". PETSc should not be dumping unreadable >> garbage into any files, does the broken PETSc code need to be removed, or >> error out until it can be fixed? >> >> Barry >> >> >> On Sep 3, 2020, at 9:53 AM, Jed Brown wrote: >> >> Use the xml format (not the legacy format) by naming your file.vtu instead >> of file.vtk >> >> On Thu, Sep 3, 2020, at 8:17 AM, Berend van Wachem wrote: >> >> Dear PETSc, >> >> What is the best way to write data from a DMPLEX vector to file, so it >> can be viewed with paraview? >> I've found that the standard VTK format works for a serial job, but if >> there is more than 1 processor, the geometry data gets messed up. >> I've attached a small working example for a cylinder and the visualised >> geometry with paraview for 1 processors and 4 processors. >> Any pointers or "best practice" very much appreciated. >> >> Best regards, >> >> Berend. >> >> >> >> >> *Attachments:* >> >> - visualisemesh-1proc.png >> - visualisemesh-4proc.png >> - visualizemesh.c >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Mon Sep 7 08:44:29 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 7 Sep 2020 09:44:29 -0400 Subject: [petsc-users] Saddle point problem with nested matrix and a relatively small number of Lagrange multipliers In-Reply-To: <85f64541-bb8f-a71c-eb22-7c8bb5d05be3@cea.fr> References: <85f64541-bb8f-a71c-eb22-7c8bb5d05be3@cea.fr> Message-ID: On Thu, Sep 3, 2020 at 11:43 AM Olivier Jamond wrote: > Hello, > > I am working on a finite-elements/finite-volumes code, whose distributed > solver is based on petsc. For FE, it relies on Lagrange multipliers for > the imposition of various boundary conditions or interactions (simple > dirichlet, contact, ...). This results in saddle point problems: > > [K C^t][U]=[F] > [C 0 ][L] [D] > > Most of the time, the relations related to the matrix C are applied to > dofs on the boundary of the domain. Then the size of L is much smaller > than the size of U, which becomes more and more true as the mesh is > refined. > > The code construct this matrix as a nested matrix (one of the reason is > that for some interactions such as contact, whereas being quite small, > the size of the matrix C change constantly, and having it 'blended' into > a monolithic 'big' matrix would require to recompute its profile/ > reallocate / ... each time), and use fieldsplit preconditioner of type > PC_COMPOSITE_SCHUR. I would like to solve the system using iterative > methods to access good extensibility on a large number of subdomains. > > Simple BC such as Dirichlet can be eliminated into K (and must be in > order to make K invertible). > > My problem is the computational cost of these constraints treated with > Lagrange multipliers, whereas their number becomes more and more > neglectable as the mesh is refined. To give an idea, let's consider a > simple elastic cube with dirichlet BCs which are all eliminated (to > ensure invertibility of K) but one on a single dof. > > -ksp_type preonly > -pc_type fieldsplit > -pc_fieldsplit_type schur > -pc_fieldsplit_schur_factorization_type full > -pc_fieldsplit_schur_precondition selfp > > -fieldsplit_u_ksp_type cg > -fieldsplit_u_pc_type bjacobi > > -fieldsplit_l_ksp_type cg > -fieldsplit_l_pc_type bjacobi > > it seems that my computation time is multiplied by a factor 3: 3 ksp > solves of the big block 'K' are needed to apply the schur preconditioner > (assuming that the ksp(S,Sp) converges in 1 iteration). It seems > expensive for a single dof dirichlet! > I am not sure you can get around this cost. In this case, it reduces to the well-known Sherman-Morrison formula ( https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula), which Woodbury generalized. It seems to have the same number of solves. Thanks, Matt > And for some relations treated by Lagrange multipliers which involve > many dofs, the number of ksp solve of the big block 'K' is ( 2 + number > of iteration of ksp(S,Sp)). To reduce this, one can think about solving > the ksp(S,Sp) with a direct solver, but then one must use > "-pc_fieldsplit_schur_precondition self" which is advised against in the > documentation... > > To illustrate this, on a small elasticity case: 32x32x32 cube on 8 > processors, dirichlet on the top and bottom faces: > * if all the dirichlet are eliminated (no C matrix, monolithic solve of > the K bloc) > - computation time for the solve: ~400ms > * if only the dirichlet of the bottom face are eliminated > - computation time for the solve: ~35000ms > - number of iteration of ksp(S,Sp): 37 > - total number of iterations of ksp(K): 4939 > * only the dirichlet of the bottom face are eliminated with these options: > -ksp_type fgmres > -pc_type fieldsplit > -pc_fieldsplit_type schur > -pc_fieldsplit_schur_factorization_type full > -pc_fieldsplit_schur_precondition selfp > > -fieldsplit_u_ksp_type cg > -fieldsplit_u_pc_type bjacobi > > -fieldsplit_l_ksp_type cg > -fieldsplit_l_pc_type bjacobi > -fieldsplit_l_ksp_rtol 1e-10 > -fieldsplit_l_inner_ksp_type preonly > -fieldsplit_l_inner_pc_type jacobi > -fieldsplit_l_upper_ksp_type preonly > -fieldsplit_l_upper_pc_type jacobi > > - computation time for the solve: ~50000ms > - total number of iterations of ksp(K): 7424 > - 'main' ksp number of iterations: 7424 > > Then in the end, my question is: is there a smarter way to handle such > 'small' constraint matrices C, with the (maybe wrong) idea that a small > number of extra dofs (the lagrange multipliers) should result in a small > extra computation time ? > > Thanks! > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From berend.vanwachem at ovgu.de Mon Sep 7 08:58:15 2020 From: berend.vanwachem at ovgu.de (Berend van Wachem) Date: Mon, 7 Sep 2020 15:58:15 +0200 Subject: [petsc-users] VTK format? In-Reply-To: References: Message-ID: <20b990f1-b15f-e876-1d6b-62bdb34b2e49@ovgu.de> Dear Jed, Thanks for the suggestion. I've tried writing to .vtu (see attached code), but that doesn't seem to work for a section containing multiple fields? Upon running the attached code, I get the error: $ mpirun -np 1 visualizemesh [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: Total number of field components 1 != block size 4 [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 [0]PETSC ERROR: visualizemesh on a linux-gcc-opt named ivt14.mb.uni-magdeburg.de by berend Mon Sep 7 15:55:10 2020 [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-clean --download-metis=yes --download-parmetis=yes --download-hdf5 --download-p4est --download-triangle --download-tetgen --with-zlib-lib=/usr/lib64/libz.a --with-zlib-include=/usr/include [0]PETSC ERROR: #1 DMPlexVTKWriteAll_VTU() line 334 in /usr/local/petsc-3.13.3/src/dm/impls/plex/plexvtu.c [0]PETSC ERROR: #2 DMPlexVTKWriteAll() line 688 in /usr/local/petsc-3.13.3/src/dm/impls/plex/plexvtk.c [0]PETSC ERROR: #3 PetscViewerFlush_VTK() line 100 in /usr/local/petsc-3.13.3/src/sys/classes/viewer/impls/vtk/vtkv.c [0]PETSC ERROR: #4 PetscViewerFlush() line 26 in /usr/local/petsc-3.13.3/src/sys/classes/viewer/interface/flush.c [0]PETSC ERROR: #5 PetscViewerDestroy() line 113 in /usr/local/petsc-3.13.3/src/sys/classes/viewer/interface/view.c [0]PETSC ERROR: #6 main() line 102 in visualizemesh.c [0]PETSC ERROR: No PETSc Option Table entries [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_SELF, 102077) - process 0 Or am I making a mistake? Thanks, best wishes, Berend. On 2020-09-03 16:53, Jed Brown wrote: > Use the xml format (not the legacy format) by naming your file.vtu > instead of file.vtk > > On Thu, Sep 3, 2020, at 8:17 AM, Berend van Wachem wrote: >> Dear PETSc, >> >> What is the best way to write data from a DMPLEX vector to file, so it >> can be viewed with paraview? >> I've found that the standard VTK format works for a serial job, but if >> there is more than 1 processor, the geometry data gets messed up. >> I've attached a small working example for a cylinder and the visualised >> geometry with paraview for 1 processors and 4 processors. >> Any pointers or "best practice" very much appreciated. >> >> Best regards, >> >> Berend. >> >> >> >> >> *Attachments:* >> >> * visualisemesh-1proc.png >> * visualisemesh-4proc.png >> * visualizemesh.c -------------- next part -------------- A non-text attachment was scrubbed... Name: visualizemesh.c Type: text/x-csrc Size: 3017 bytes Desc: not available URL: From knepley at gmail.com Mon Sep 7 09:33:11 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 7 Sep 2020 10:33:11 -0400 Subject: [petsc-users] VTK format? In-Reply-To: <20b990f1-b15f-e876-1d6b-62bdb34b2e49@ovgu.de> References: <20b990f1-b15f-e876-1d6b-62bdb34b2e49@ovgu.de> Message-ID: On Mon, Sep 7, 2020 at 9:58 AM Berend van Wachem wrote: > Dear Jed, > > Thanks for the suggestion. I've tried writing to .vtu (see attached > code), but that doesn't seem to work for a section containing multiple > fields? Upon running the attached code, I get the error: > Yes, Jed removed that error check today. It should make it to master is a few days. You can just delete that line I think. I still prefer to use HDF5 and XDMF since it is much more flexible, and easier to store many things in. I put lots of fields in the HDF5 file (and hopefully will get time to support many meshes in it), and also support timesteps. I recognize that VTU is familiar, which is a plus for that format. Thanks, Matt > $ mpirun -np 1 visualizemesh > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Total number of field components 1 != block size 4 > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 > [0]PETSC ERROR: visualizemesh on a linux-gcc-opt named > ivt14.mb.uni-magdeburg.de by berend Mon Sep 7 15:55:10 2020 > [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O3 > -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native > -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-clean > --download-metis=yes --download-parmetis=yes --download-hdf5 > --download-p4est --download-triangle --download-tetgen > --with-zlib-lib=/usr/lib64/libz.a --with-zlib-include=/usr/include > [0]PETSC ERROR: #1 DMPlexVTKWriteAll_VTU() line 334 in > /usr/local/petsc-3.13.3/src/dm/impls/plex/plexvtu.c > [0]PETSC ERROR: #2 DMPlexVTKWriteAll() line 688 in > /usr/local/petsc-3.13.3/src/dm/impls/plex/plexvtk.c > [0]PETSC ERROR: #3 PetscViewerFlush_VTK() line 100 in > /usr/local/petsc-3.13.3/src/sys/classes/viewer/impls/vtk/vtkv.c > [0]PETSC ERROR: #4 PetscViewerFlush() line 26 in > /usr/local/petsc-3.13.3/src/sys/classes/viewer/interface/flush.c > [0]PETSC ERROR: #5 PetscViewerDestroy() line 113 in > /usr/local/petsc-3.13.3/src/sys/classes/viewer/interface/view.c > [0]PETSC ERROR: #6 main() line 102 in visualizemesh.c > [0]PETSC ERROR: No PETSc Option Table entries > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_SELF, 102077) - process 0 > > > Or am I making a mistake? > > Thanks, best wishes, Berend. > > > > > On 2020-09-03 16:53, Jed Brown wrote: > > Use the xml format (not the legacy format) by naming your file.vtu > > instead of file.vtk > > > > On Thu, Sep 3, 2020, at 8:17 AM, Berend van Wachem wrote: > >> Dear PETSc, > >> > >> What is the best way to write data from a DMPLEX vector to file, so it > >> can be viewed with paraview? > >> I've found that the standard VTK format works for a serial job, but if > >> there is more than 1 processor, the geometry data gets messed up. > >> I've attached a small working example for a cylinder and the visualised > >> geometry with paraview for 1 processors and 4 processors. > >> Any pointers or "best practice" very much appreciated. > >> > >> Best regards, > >> > >> Berend. > >> > >> > >> > >> > >> *Attachments:* > >> > >> * visualisemesh-1proc.png > >> * visualisemesh-4proc.png > >> * visualizemesh.c > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dowlinah at clarkson.edu Mon Sep 7 09:58:33 2020 From: dowlinah at clarkson.edu (Anthony Dowling) Date: Mon, 7 Sep 2020 10:58:33 -0400 Subject: [petsc-users] Converting a Parallel(MPI) Matrix to a Sequential Matrix Message-ID: Hello all, Is there a method to convert a parallel(MPI) dense matrix to a sequential dense matrix? Also to do the same in reverse? The code I am using needs to be able to convert a float** matrix to a Petsc matrix, and then later convert that Petsc matrix between MPI dense and sequential dense. How might this be achieved? The code seems to be converting float** matrices to Petsc properly, but I am unable to find a method to convert between MPI and sequential matrices. Thanks in advance, Anthony Dowling -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 7 10:12:25 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 7 Sep 2020 11:12:25 -0400 Subject: [petsc-users] Converting a Parallel(MPI) Matrix to a Sequential Matrix In-Reply-To: References: Message-ID: On Mon, Sep 7, 2020 at 11:00 AM Anthony Dowling wrote: > Hello all, > > Is there a method to convert a parallel(MPI) dense matrix to a sequential > dense matrix? Also to do the same in reverse? The code I am using needs to > be able to convert a float** matrix to a Petsc matrix, and then later > convert that Petsc matrix between MPI dense and sequential dense. How might > this be achieved? The code seems to be converting float** matrices to Petsc > properly, but I am unable to find a method to convert between MPI and > sequential matrices. > If you want the serial matrix everywhere, this is easy: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateRedundantMatrix.html If you want it just on 1 process, you can use: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateSubMatrices.html Thanks, Matt > Thanks in advance, > Anthony Dowling > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dowlinah at clarkson.edu Mon Sep 7 10:16:54 2020 From: dowlinah at clarkson.edu (Anthony Dowling) Date: Mon, 7 Sep 2020 11:16:54 -0400 Subject: [petsc-users] Converting a Parallel(MPI) Matrix to a Sequential Matrix In-Reply-To: References: Message-ID: Thank you for your help, Matthew. With those methods, will I be able to copy the contents of the created serial matrices back to a parallel matrix if needed? If so, what is a good way to do that? Thanks, Anthony Dowling On Mon, Sep 7, 2020 at 11:12 AM Matthew Knepley wrote: > On Mon, Sep 7, 2020 at 11:00 AM Anthony Dowling > wrote: > >> Hello all, >> >> Is there a method to convert a parallel(MPI) dense matrix to a sequential >> dense matrix? Also to do the same in reverse? The code I am using needs to >> be able to convert a float** matrix to a Petsc matrix, and then later >> convert that Petsc matrix between MPI dense and sequential dense. How might >> this be achieved? The code seems to be converting float** matrices to Petsc >> properly, but I am unable to find a method to convert between MPI and >> sequential matrices. >> > > If you want the serial matrix everywhere, this is easy: > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateRedundantMatrix.html > If you want it just on 1 process, you can use: > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateSubMatrices.html > > Thanks, > > Matt > > >> Thanks in advance, >> Anthony Dowling >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 7 10:19:57 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 7 Sep 2020 11:19:57 -0400 Subject: [petsc-users] Converting a Parallel(MPI) Matrix to a Sequential Matrix In-Reply-To: References: Message-ID: On Mon, Sep 7, 2020 at 11:17 AM Anthony Dowling wrote: > Thank you for your help, Matthew. > > With those methods, will I be able to copy the contents of the created > serial matrices back to a parallel matrix if needed? If so, what is a good > way to do that? > There is nothing that does that right now. We would have to write it. Thanks, Matt > Thanks, > Anthony Dowling > > > On Mon, Sep 7, 2020 at 11:12 AM Matthew Knepley wrote: > >> On Mon, Sep 7, 2020 at 11:00 AM Anthony Dowling >> wrote: >> >>> Hello all, >>> >>> Is there a method to convert a parallel(MPI) dense matrix to a >>> sequential dense matrix? Also to do the same in reverse? The code I am >>> using needs to be able to convert a float** matrix to a Petsc matrix, and >>> then later convert that Petsc matrix between MPI dense and sequential >>> dense. How might this be achieved? The code seems to be converting float** >>> matrices to Petsc properly, but I am unable to find a method to convert >>> between MPI and sequential matrices. >>> >> >> If you want the serial matrix everywhere, this is easy: >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateRedundantMatrix.html >> If you want it just on 1 process, you can use: >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateSubMatrices.html >> >> Thanks, >> >> Matt >> >> >>> Thanks in advance, >>> Anthony Dowling >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dowlinah at clarkson.edu Mon Sep 7 10:21:43 2020 From: dowlinah at clarkson.edu (Anthony Dowling) Date: Mon, 7 Sep 2020 11:21:43 -0400 Subject: [petsc-users] Converting a Parallel(MPI) Matrix to a Sequential Matrix In-Reply-To: References: Message-ID: But in theory, should I be able to get the rows or values from the sequential matrix and insert them into a parallel matrix manually? I think that would do what I need. Thanks, Anthony Dowling On Mon, Sep 7, 2020 at 11:20 AM Matthew Knepley wrote: > On Mon, Sep 7, 2020 at 11:17 AM Anthony Dowling > wrote: > >> Thank you for your help, Matthew. >> >> With those methods, will I be able to copy the contents of the created >> serial matrices back to a parallel matrix if needed? If so, what is a good >> way to do that? >> > > There is nothing that does that right now. We would have to write it. > > Thanks, > > Matt > > >> Thanks, >> Anthony Dowling >> >> >> On Mon, Sep 7, 2020 at 11:12 AM Matthew Knepley >> wrote: >> >>> On Mon, Sep 7, 2020 at 11:00 AM Anthony Dowling >>> wrote: >>> >>>> Hello all, >>>> >>>> Is there a method to convert a parallel(MPI) dense matrix to a >>>> sequential dense matrix? Also to do the same in reverse? The code I am >>>> using needs to be able to convert a float** matrix to a Petsc matrix, and >>>> then later convert that Petsc matrix between MPI dense and sequential >>>> dense. How might this be achieved? The code seems to be converting float** >>>> matrices to Petsc properly, but I am unable to find a method to convert >>>> between MPI and sequential matrices. >>>> >>> >>> If you want the serial matrix everywhere, this is easy: >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateRedundantMatrix.html >>> If you want it just on 1 process, you can use: >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateSubMatrices.html >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks in advance, >>>> Anthony Dowling >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 7 10:25:04 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 7 Sep 2020 11:25:04 -0400 Subject: [petsc-users] Converting a Parallel(MPI) Matrix to a Sequential Matrix In-Reply-To: References: Message-ID: On Mon, Sep 7, 2020 at 11:21 AM Anthony Dowling wrote: > But in theory, should I be able to get the rows or values from the > sequential matrix and insert them into a parallel matrix manually? I think > that would do what I need. > Sure, that will always work. There just isn't a function that moves directly from a serial matrix to a parallel matrix. You could use MatCreateSubMatrix() if you first swapped the communicator with the parallel one, which is what we would have to write. Thanks, Matt > Thanks, > Anthony Dowling > > > On Mon, Sep 7, 2020 at 11:20 AM Matthew Knepley wrote: > >> On Mon, Sep 7, 2020 at 11:17 AM Anthony Dowling >> wrote: >> >>> Thank you for your help, Matthew. >>> >>> With those methods, will I be able to copy the contents of the created >>> serial matrices back to a parallel matrix if needed? If so, what is a good >>> way to do that? >>> >> >> There is nothing that does that right now. We would have to write it. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Anthony Dowling >>> >>> >>> On Mon, Sep 7, 2020 at 11:12 AM Matthew Knepley >>> wrote: >>> >>>> On Mon, Sep 7, 2020 at 11:00 AM Anthony Dowling >>>> wrote: >>>> >>>>> Hello all, >>>>> >>>>> Is there a method to convert a parallel(MPI) dense matrix to a >>>>> sequential dense matrix? Also to do the same in reverse? The code I am >>>>> using needs to be able to convert a float** matrix to a Petsc matrix, and >>>>> then later convert that Petsc matrix between MPI dense and sequential >>>>> dense. How might this be achieved? The code seems to be converting float** >>>>> matrices to Petsc properly, but I am unable to find a method to convert >>>>> between MPI and sequential matrices. >>>>> >>>> >>>> If you want the serial matrix everywhere, this is easy: >>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateRedundantMatrix.html >>>> If you want it just on 1 process, you can use: >>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateSubMatrices.html >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks in advance, >>>>> Anthony Dowling >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dowlinah at clarkson.edu Mon Sep 7 10:26:20 2020 From: dowlinah at clarkson.edu (Anthony Dowling) Date: Mon, 7 Sep 2020 11:26:20 -0400 Subject: [petsc-users] Converting a Parallel(MPI) Matrix to a Sequential Matrix In-Reply-To: References: Message-ID: Okay, thank you for your help, Matthew. Thanks, Anthony Dowling On Mon, Sep 7, 2020 at 11:25 AM Matthew Knepley wrote: > On Mon, Sep 7, 2020 at 11:21 AM Anthony Dowling > wrote: > >> But in theory, should I be able to get the rows or values from the >> sequential matrix and insert them into a parallel matrix manually? I think >> that would do what I need. >> > > Sure, that will always work. There just isn't a function that moves > directly from a serial matrix to a parallel matrix. You could use > MatCreateSubMatrix() if you first > swapped the communicator with the parallel one, which is what we would > have to write. > > Thanks, > > Matt > > >> Thanks, >> Anthony Dowling >> >> >> On Mon, Sep 7, 2020 at 11:20 AM Matthew Knepley >> wrote: >> >>> On Mon, Sep 7, 2020 at 11:17 AM Anthony Dowling >>> wrote: >>> >>>> Thank you for your help, Matthew. >>>> >>>> With those methods, will I be able to copy the contents of the created >>>> serial matrices back to a parallel matrix if needed? If so, what is a good >>>> way to do that? >>>> >>> >>> There is nothing that does that right now. We would have to write it. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Anthony Dowling >>>> >>>> >>>> On Mon, Sep 7, 2020 at 11:12 AM Matthew Knepley >>>> wrote: >>>> >>>>> On Mon, Sep 7, 2020 at 11:00 AM Anthony Dowling >>>>> wrote: >>>>> >>>>>> Hello all, >>>>>> >>>>>> Is there a method to convert a parallel(MPI) dense matrix to a >>>>>> sequential dense matrix? Also to do the same in reverse? The code I am >>>>>> using needs to be able to convert a float** matrix to a Petsc matrix, and >>>>>> then later convert that Petsc matrix between MPI dense and sequential >>>>>> dense. How might this be achieved? The code seems to be converting float** >>>>>> matrices to Petsc properly, but I am unable to find a method to convert >>>>>> between MPI and sequential matrices. >>>>>> >>>>> >>>>> If you want the serial matrix everywhere, this is easy: >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateRedundantMatrix.html >>>>> If you want it just on 1 process, you can use: >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateSubMatrices.html >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks in advance, >>>>>> Anthony Dowling >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.jamond at cea.fr Tue Sep 8 11:43:43 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Tue, 8 Sep 2020 18:43:43 +0200 Subject: [petsc-users] Saddle point problem with nested matrix and a relatively small number of Lagrange multipliers In-Reply-To: References: <85f64541-bb8f-a71c-eb22-7c8bb5d05be3@cea.fr> Message-ID: Thanks for your answer, whereas being 'sad' to me! Do you have any idea how the structure FE code geared toward HPC deals with their Lagrange multipliers for their BCs? Do they 'accept' that cost? To my understanding, I didn't know the Sherman-Morrison formula, and I am not sure to see how it applies to my case... Could you please help me on that? Many thanks, Olivier > > I am not sure you can get around this cost. In this case, it reduces > to the well-known > Sherman-Morrison formula > (https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula), > which Woodbury generalized. It seems to have the same number of solves. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Sep 8 12:00:53 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 8 Sep 2020 13:00:53 -0400 Subject: [petsc-users] Saddle point problem with nested matrix and a relatively small number of Lagrange multipliers In-Reply-To: References: <85f64541-bb8f-a71c-eb22-7c8bb5d05be3@cea.fr> Message-ID: On Tue, Sep 8, 2020 at 12:43 PM Olivier Jamond wrote: > Thanks for your answer, whereas being 'sad' to me! Do you have any idea > how the structure FE code geared toward HPC deals with their Lagrange > multipliers for their BCs? > I don't use Lagrange multipliers for Dirichlet conditions, just direct modification of the approximation space. > Do they 'accept' that cost? > If you have Lagrange multipliers, I think yes. > To my understanding, I didn't know the Sherman-Morrison formula, and I am > not sure to see how it applies to my case... Could you please help me on > that? > SM is a formula for inverting a matrix + a rank-one addition. This looks like your case with the Lagrange multiplier fixing one degree of freedom. Thanks, Matt > Many thanks, > Olivier > > > I am not sure you can get around this cost. In this case, it reduces to > the well-known > Sherman-Morrison formula ( > https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula), > which Woodbury generalized. It seems to have the same number of solves. > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Tue Sep 8 13:20:47 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Tue, 8 Sep 2020 13:20:47 -0500 Subject: [petsc-users] Question about MatGetRowMax Message-ID: Hi PETSc-developers, While trying to use MatGetRowMax, I?m getting the following error : [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] MatGetRowMax_SeqAIJ line 3182 /home/sajid/packages/petsc/src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: [0] MatGetRowMax line 4798 /home/sajid/packages/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: [0] MatGetRowMax_MPIAIJ line 2432 /home/sajid/packages/petsc/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: [0] MatGetRowMax line 4798 /home/sajid/packages/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: [0] construct_matrix line 25 /home/sajid/Documents/intern/pirt/src/matrix.cxx [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.5-2756-g0264f47704 GIT Date: 2020-09-06 15:08:48 -0500 [0]PETSC ERROR: /home/sajid/Documents/intern/pirt/src/pirt on a arch-linux-c-debug named xrm-backup by sajid Tue Sep 8 13:56:53 2020 [0]PETSC ERROR: Configure options --with-hdf5=1 --with-debugging=yes [0]PETSC ERROR: #1 User provided function() line 0 in unknown file application called MPI_Abort(MPI_COMM_WORLD, 50161059) - process 0 Could someone point out what could cause such a segfault ? PS : Should it be useful, the error occurs in the (nested) call to MatGetRowMax for the off-diagonal SeqAIJ matrix and the segmentation violation occurs for the first row of a matrix whose ncols=0 (for the off diagonal part). MatGetOwnershipRangeColumn was used to set the diagonal and off-diagonal preallocation and all the columns were set to be in the diagonal SeqAIJ matrix as shown below : (gdb) frame #0 MatGetRowMax (mat=0x7ed340, v=0x982470, idx=0x0) at /home/sajid/packages/petsc/src/mat/interface/matrix.c:4803 4803 if (!mat->ops->getrowmax) SETERRQ1(PetscObjectComm((PetscObject)mat),PETSC_ERR_SUP,"Mat type %s",((PetscObject)mat)->type_name); (gdb) print mat->cmap->rstart $12 = 0 (gdb) print mat->cmap->rend $13 = 65536 (gdb) step 4804 MatCheckPreallocated(mat,1); (gdb) next 4806 ierr = (*mat->ops->getrowmax)(mat,v,idx);CHKERRQ(ierr); (gdb) next Program received signal SIGSEGV, Segmentation fault. 0x00007ffff60ca248 in MatGetRowMax_SeqAIJ (A=0x8d1450, v=0x9b1220, idx=0x99a620) at /home/sajid/packages/petsc/src/mat/impls/aij/seq/aij.c:3195 3195 x[i] = *aa; if (idx) idx[i] = 0; (gdb) Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Sep 8 14:03:51 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Tue, 8 Sep 2020 19:03:51 +0000 Subject: [petsc-users] Question about MatGetRowMax In-Reply-To: References: Message-ID: Sajid Ali, It might be a bug in petsc when off-diagonal matrix B has ncols=0. I'll check it and get back to you soon. Hong ________________________________ From: petsc-users on behalf of Sajid Ali Sent: Tuesday, September 8, 2020 1:20 PM To: PETSc Subject: [petsc-users] Question about MatGetRowMax Hi PETSc-developers, While trying to use MatGetRowMax, I?m getting the following error : [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] MatGetRowMax_SeqAIJ line 3182 /home/sajid/packages/petsc/src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: [0] MatGetRowMax line 4798 /home/sajid/packages/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: [0] MatGetRowMax_MPIAIJ line 2432 /home/sajid/packages/petsc/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: [0] MatGetRowMax line 4798 /home/sajid/packages/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: [0] construct_matrix line 25 /home/sajid/Documents/intern/pirt/src/matrix.cxx [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.5-2756-g0264f47704 GIT Date: 2020-09-06 15:08:48 -0500 [0]PETSC ERROR: /home/sajid/Documents/intern/pirt/src/pirt on a arch-linux-c-debug named xrm-backup by sajid Tue Sep 8 13:56:53 2020 [0]PETSC ERROR: Configure options --with-hdf5=1 --with-debugging=yes [0]PETSC ERROR: #1 User provided function() line 0 in unknown file application called MPI_Abort(MPI_COMM_WORLD, 50161059) - process 0 Could someone point out what could cause such a segfault ? PS : Should it be useful, the error occurs in the (nested) call to MatGetRowMax for the off-diagonal SeqAIJ matrix and the segmentation violation occurs for the first row of a matrix whose ncols=0 (for the off diagonal part). MatGetOwnershipRangeColumn was used to set the diagonal and off-diagonal preallocation and all the columns were set to be in the diagonal SeqAIJ matrix as shown below : (gdb) frame #0 MatGetRowMax (mat=0x7ed340, v=0x982470, idx=0x0) at /home/sajid/packages/petsc/src/mat/interface/matrix.c:4803 4803 if (!mat->ops->getrowmax) SETERRQ1(PetscObjectComm((PetscObject)mat),PETSC_ERR_SUP,"Mat type %s",((PetscObject)mat)->type_name); (gdb) print mat->cmap->rstart $12 = 0 (gdb) print mat->cmap->rend $13 = 65536 (gdb) step 4804 MatCheckPreallocated(mat,1); (gdb) next 4806 ierr = (*mat->ops->getrowmax)(mat,v,idx);CHKERRQ(ierr); (gdb) next Program received signal SIGSEGV, Segmentation fault. 0x00007ffff60ca248 in MatGetRowMax_SeqAIJ (A=0x8d1450, v=0x9b1220, idx=0x99a620) at /home/sajid/packages/petsc/src/mat/impls/aij/seq/aij.c:3195 3195 x[i] = *aa; if (idx) idx[i] = 0; (gdb) Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Tue Sep 8 14:37:01 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Tue, 8 Sep 2020 14:37:01 -0500 Subject: [petsc-users] Question about MatGetRowMax In-Reply-To: References: Message-ID: Hi Hong, A related bugfix is that lines 2444 and 2447 from src/mat/impls/aij/mpi/mpiaij.c in the current petsc-master are missing a check for validity of idx. Adding a check ( if (idx) ...) before accessing the entries of idx might be necessary (since the docs say that the idx argument is optional). Thanks for the insight into the cause of this bug. -- Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Wed Sep 9 16:23:09 2020 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Wed, 9 Sep 2020 14:23:09 -0700 Subject: [petsc-users] petscvec.h90 Message-ID: Hello, Has the header file petscvec.h90 been removed from include/petsc/finclude/ in recent Petsc releases? Should it then be replaced by petscvec.h? Thanks, Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 9 17:05:58 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 9 Sep 2020 17:05:58 -0500 Subject: [petsc-users] petscvec.h90 In-Reply-To: References: Message-ID: <660212A7-165A-4897-A532-0A39945BA0AE@petsc.dev> Anthony, See https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html https://www.mcs.anl.gov/petsc/documentation/changes/38.html Barry > On Sep 9, 2020, at 4:23 PM, Anthony Paul Haas wrote: > > Hello, > > Has the header file petscvec.h90 been removed from include/petsc/finclude/ in recent Petsc releases? > > Should it then be replaced by petscvec.h? > > Thanks, > > Anthony > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Wed Sep 9 17:55:34 2020 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Wed, 9 Sep 2020 15:55:34 -0700 Subject: [petsc-users] [EXT]Re: petscvec.h90 In-Reply-To: <660212A7-165A-4897-A532-0A39945BA0AE@petsc.dev> References: <660212A7-165A-4897-A532-0A39945BA0AE@petsc.dev> Message-ID: Hi Barry, I made the changes according to the information you sent me (see below snippet module LinearSolverClass please). But I am getting the following errors: error #7013 error #6457 Do you know what is going on? Thanks, Anthony .......rc/../dependencies/eigen -I/home/u7/aph/bitcart/src/driver2/material_response -c navier_stokes/LinearSolverClass.F90 navier_stokes/LinearSolverClass.F90(45): error #7013: This module file was not generated by any release of this compiler. [PETSCVEC] use petscvec ------^ navier_stokes/LinearSolverClass.F90(110): error #6457: This derived type name has not been declared. [TVEC] type(tVec) :: solution_ps,FirstSingularVec ----------^ navier_stokes/LinearSolverClass.F90(111): error #6457: This derived type name has not been declared. [TVEC] type(tVec) :: rhs_ps!,b_ps ----------^ navier_stokes/LinearSolverClass.F90(115): error #6457: This derived type name has not been declared. [TKSP] type(tKSP) :: ksp ----------^ module LinearSolverClass ! #include #include use petscvec use petscksp ! use LHS, only: matLHS,rhs,sol use paramesh_dimensions use physicaldata use tree use workspace use utilities_data, only: mkdir use typedef, only: dist_fcn,solverType,mypeno use fill_guardcells, only: fill_guardcell_res !use navierstokes_data, only: fill_guardcell_phi use amr_1blk_bcset_mod, only: updateType ! use Turb_Models, only: nvars_turb ! implicit none ! ! --------------------- ! Class type definition ! --------------------- ! TYPE, public :: LinearSolver_C ! Integer,private :: numVars Integer,private :: numVarsp ...... On Wed, Sep 9, 2020 at 3:06 PM Barry Smith wrote: > *External Email* > Anthony, > > See > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html > > https://www.mcs.anl.gov/petsc/documentation/changes/38.html > > > Barry > > > On Sep 9, 2020, at 4:23 PM, Anthony Paul Haas > wrote: > > Hello, > > Has the header file petscvec.h90 been removed from include/petsc/finclude/ > in recent Petsc releases? > > Should it then be replaced by petscvec.h? > > Thanks, > > Anthony > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 9 18:32:24 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 9 Sep 2020 18:32:24 -0500 Subject: [petsc-users] [EXT]Re: petscvec.h90 In-Reply-To: References: <660212A7-165A-4897-A532-0A39945BA0AE@petsc.dev> Message-ID: .......rc/../dependencies/eigen -I/home/u7/aph/bitcart/src/driver2/material_response -c navier_stokes/LinearSolverClass.F90 navier_stokes/LinearSolverClass.F90(45): error #7013: This module file was not generated by any release of this compiler. [PETSCVEC] It is not reading the module it finds so cannot find the definitions in the module. In the PETSc directory start all over again rm -rf $PETSC_ARCH ./configure your options here make all make check > On Sep 9, 2020, at 5:55 PM, Anthony Paul Haas wrote: > > Hi Barry, > > I made the changes according to the information you sent me (see below snippet module LinearSolverClass please). But I am getting the following errors: > > error #7013 > error #6457 > > Do you know what is going on? > > Thanks, > > Anthony > > .......rc/../dependencies/eigen -I/home/u7/aph/bitcart/src/driver2/material_response -c navier_stokes/LinearSolverClass.F90 > navier_stokes/LinearSolverClass.F90(45): error #7013: This module file was not generated by any release of this compiler. [PETSCVEC] > use petscvec > ------^ > navier_stokes/LinearSolverClass.F90(110): error #6457: This derived type name has not been declared. [TVEC] > type(tVec) :: solution_ps,FirstSingularVec > ----------^ > navier_stokes/LinearSolverClass.F90(111): error #6457: This derived type name has not been declared. [TVEC] > type(tVec) :: rhs_ps!,b_ps > ----------^ > navier_stokes/LinearSolverClass.F90(115): error #6457: This derived type name has not been declared. [TKSP] > type(tKSP) :: ksp > ----------^ > > > module LinearSolverClass > ! > #include > #include > > use petscvec > use petscksp > > ! use LHS, only: matLHS,rhs,sol > use paramesh_dimensions > use physicaldata > use tree > use workspace > use utilities_data, only: mkdir > use typedef, only: dist_fcn,solverType,mypeno > use fill_guardcells, only: fill_guardcell_res > !use navierstokes_data, only: fill_guardcell_phi > use amr_1blk_bcset_mod, only: updateType > ! use Turb_Models, only: nvars_turb > > ! > implicit none > > ! > ! --------------------- > ! Class type definition > ! --------------------- > ! > TYPE, public :: LinearSolver_C > ! > Integer,private :: numVars > Integer,private :: numVarsp > > > ...... > > > On Wed, Sep 9, 2020 at 3:06 PM Barry Smith > wrote: > External Email > > Anthony, > > See > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html > > https://www.mcs.anl.gov/petsc/documentation/changes/38.html > > > Barry > > >> On Sep 9, 2020, at 4:23 PM, Anthony Paul Haas > wrote: >> >> Hello, >> >> Has the header file petscvec.h90 been removed from include/petsc/finclude/ in recent Petsc releases? >> >> Should it then be replaced by petscvec.h? >> >> Thanks, >> >> Anthony >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Wed Sep 9 18:52:42 2020 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Wed, 9 Sep 2020 16:52:42 -0700 Subject: [petsc-users] [EXT]Re: petscvec.h90 In-Reply-To: References: <660212A7-165A-4897-A532-0A39945BA0AE@petsc.dev> Message-ID: Thanks On Wed, Sep 9, 2020 at 4:32 PM Barry Smith wrote: > *External Email* > > .......rc/../dependencies/eigen > -I/home/u7/aph/bitcart/src/driver2/material_response -c > navier_stokes/LinearSolverClass.F90 > navier_stokes/LinearSolverClass.F90(45): error #7013: This module file was > not generated by any release of this compiler. [PETSCVEC] > > It is not reading the module it finds so cannot find the definitions in > the module. > > In the PETSc directory start all over again > > rm -rf $PETSC_ARCH > ./configure your options here > make all > make check > > > > On Sep 9, 2020, at 5:55 PM, Anthony Paul Haas > wrote: > > Hi Barry, > > I made the changes according to the information you sent me (see below > snippet module LinearSolverClass please). But I am getting the following > errors: > > error #7013 > error #6457 > > Do you know what is going on? > > Thanks, > > Anthony > > .......rc/../dependencies/eigen > -I/home/u7/aph/bitcart/src/driver2/material_response -c > navier_stokes/LinearSolverClass.F90 > navier_stokes/LinearSolverClass.F90(45): error #7013: This module file was > not generated by any release of this compiler. [PETSCVEC] > use petscvec > ------^ > navier_stokes/LinearSolverClass.F90(110): error #6457: This derived type > name has not been declared. [TVEC] > type(tVec) :: solution_ps,FirstSingularVec > ----------^ > navier_stokes/LinearSolverClass.F90(111): error #6457: This derived type > name has not been declared. [TVEC] > type(tVec) :: rhs_ps!,b_ps > ----------^ > navier_stokes/LinearSolverClass.F90(115): error #6457: This derived type > name has not been declared. [TKSP] > type(tKSP) :: ksp > ----------^ > > > module LinearSolverClass > ! > > > #include > #include > > use petscvec > use petscksp > > ! use LHS, only: matLHS,rhs,sol > > > use paramesh_dimensions > use physicaldata > use tree > use workspace > use utilities_data, only: mkdir > use typedef, only: dist_fcn,solverType,mypeno > use fill_guardcells, only: fill_guardcell_res > !use navierstokes_data, only: fill_guardcell_phi > > > use amr_1blk_bcset_mod, only: updateType > ! use Turb_Models, only: nvars_turb > > > > ! > > > implicit none > > ! > > > ! --------------------- > > > ! Class type definition > > > ! --------------------- > > > ! > > > TYPE, public :: LinearSolver_C > ! > > > Integer,private :: numVars > Integer,private :: numVarsp > > > ...... > > > On Wed, Sep 9, 2020 at 3:06 PM Barry Smith wrote: > >> *External Email* >> Anthony, >> >> See >> >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html >> >> https://www.mcs.anl.gov/petsc/documentation/changes/38.html >> >> >> Barry >> >> >> On Sep 9, 2020, at 4:23 PM, Anthony Paul Haas >> wrote: >> >> Hello, >> >> Has the header file petscvec.h90 been removed from include/petsc/finclude/ >> in recent Petsc releases? >> >> Should it then be replaced by petscvec.h? >> >> Thanks, >> >> Anthony >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Thu Sep 10 14:46:20 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Thu, 10 Sep 2020 19:46:20 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> , <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> Message-ID: Hi Barry, Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it but now when I run step 3 again I get the following error: [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Argument out of range [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. Cheers, Sophie ________________________________ From: Barry Smith Sent: Friday, September 4, 2020 01:06 To: Blondel, Sophie Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Subject: Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. I have started looking through the logs The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. * It roughly doubles the time since doing the matrix-free product requires a function evaluation * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations The change from 2 to 3 (not storing the entire matrix) * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). The change from 3 to 4 * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. Good luck, Barry On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: Hi Barry, Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mercredi 2 septembre 2020 15:53 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: Thank you Barry, The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. Sophie, That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mardi 1 septembre 2020 14:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. Barry On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: Hi Barry, I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: Mat mat; DMCreateMatrix(da, &mat); MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 I get an error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c PetscSolver::solve: TSSolve failed. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : lundi 31 ao?t 2020 14:50 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. The factor of 4 is lot, the 1.5 not so bad. You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. If you have any questions please let me know. Barry On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: Hi Barry, I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: * 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) * 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 18:31 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 10 16:04:37 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 10 Sep 2020 16:04:37 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> Message-ID: <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> > On Sep 10, 2020, at 2:46 PM, Blondel, Sophie wrote: > > Hi Barry, > > Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). > but now when I run step 3 again I get the following error: > > [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [2]PETSC ERROR: Argument out of range > [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 > [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 > [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 > [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c > [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c > [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c > [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c > PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. > > Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. Barry > > Cheers, > > Sophie > > > From: Barry Smith > > Sent: Friday, September 4, 2020 01:06 > To: Blondel, Sophie > > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Subject: Re: [petsc-users] Matrix Free Method questions > > > Sophie, > > Thanks. I have started looking through the logs > > The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. > > * It roughly doubles the time since doing the matrix-free product requires a function evaluation > > * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations > > The change from 2 to 3 (not storing the entire matrix) > > * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix > > * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. > > * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect > > * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). > > The change from 3 to 4 > > * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. > > I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. > > To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. > > Good luck, > > Barry > > > > >> On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: >> >> Hi Barry, >> >> Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. >> >> Cheers, >> >> Sophie >> De : Barry Smith > >> Envoy? : mercredi 2 septembre 2020 15:53 >> ? : Blondel, Sophie > >> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Objet : Re: [petsc-users] Matrix Free Method questions >> >> >> >>> On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: >>> >>> Thank you Barry, >>> >>> The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. >> >> Sophie, >> >> That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) >> >>> The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. >> >> We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator >> >> You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. >> >> Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. >> >> >> Barry >> >>> >>> Cheers, >>> >>> Sophie >>> De : Barry Smith > >>> Envoy? : mardi 1 septembre 2020 14:12 >>> ? : Blondel, Sophie > >>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>> Objet : Re: [petsc-users] Matrix Free Method questions >>> >>> >>> Sophie, >>> >>> Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. >>> >>> I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. >>> >>> Barry >>> >>> >>>> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: >>>> >>>> Hi Barry, >>>> >>>> I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: >>>> Mat mat; >>>> DMCreateMatrix(da, &mat); >>>> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>> >>>> When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 >>>> >>>> I get an error: >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [0]PETSC ERROR: No support for this operation for this object type >>>> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined >>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 >>>> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 >>>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries >>>> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>> [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c >>>> [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>> [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c >>>> [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c >>>> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c >>>> [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>> [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>> PetscSolver::solve: TSSolve failed. >>>> >>>> Cheers, >>>> >>>> Sophie >>>> De : Barry Smith > >>>> Envoy? : lundi 31 ao?t 2020 14:50 >>>> ? : Blondel, Sophie > >>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>> >>>> >>>> Sophie, >>>> >>>> Thanks. >>>> >>>> The factor of 4 is lot, the 1.5 not so bad. >>>> >>>> You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. >>>> >>>> It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). >>>> >>>> Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. >>>> >>>> Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? >>>> >>>> Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>> >>>> Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. >>>> >>>> A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. >>>> >>>> Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. >>>> >>>> Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. >>>> >>>> I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. >>>> >>>> >>>> If you have any questions please let me know. >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>>> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: >>>>> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) >>>>> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) >>>>> Cheers, >>>>> >>>>> Sophie >>>>> De : Barry Smith > >>>>> Envoy? : vendredi 28 ao?t 2020 18:31 >>>>> ? : Blondel, Sophie > >>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>> >>>>> >>>>> >>>>>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >>>>>> >>>>>> Thank you Jed and Barry, >>>>>> >>>>>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >>>>>> >>>>>> To answer questions about the current per-conditioners: >>>>>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >>>>>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >>>>>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? >>>>> >>>>> Yes, the number of MatMult is a good enough surrogate. >>>>> >>>>> So using matrix-free (which means no preconditioning) has >>>>> >>>>> 35846/160 >>>>> >>>>> ans = >>>>> >>>>> 224.0375 >>>>> >>>>> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Sophie >>>>>> De : Barry Smith > >>>>>> Envoy? : vendredi 28 ao?t 2020 12:12 >>>>>> ? : Blondel, Sophie > >>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>> >>>>>> [External Email] >>>>>> >>>>>> Sophie, >>>>>> >>>>>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >>>>>> >>>>>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >>>>>> >>>>>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >>>>>> >>>>>>> -pc_fieldsplit_detect_coupling >>>>>> >>>>>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >>>>>> >>>>>>> -fieldsplit_0_pc_type sor >>>>>> >>>>>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >>>>>> >>>>>>> -fieldsplit_1_pc_type redundant >>>>>> >>>>>> >>>>>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >>>>>> >>>>>> ---- >>>>>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >>>>>> >>>>>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >>>>>> >>>>>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >>>>>> >>>>>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >>>>>> >>>>>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >>>>>> >>>>>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >>>>>> >>>>>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>>>>>> >>>>>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>>>>>> >>>>>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Sophie >>>>>> >>>>>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexprescott at email.arizona.edu Thu Sep 10 17:15:41 2020 From: alexprescott at email.arizona.edu (Alexander B Prescott) Date: Thu, 10 Sep 2020 15:15:41 -0700 Subject: [petsc-users] Dynamic SNESVI bounds Message-ID: Hi there, I have a quick question (hopefully) that I didn't find addressed in the documentation or user list archives. Is it possible to change the SNES Variational Inequality bounds from one solver iteration to the next? My goal is to update the bounds such that a specific entry in the solution vector remains the supremum throughout the entire execution. Best, Alexander -- Alexander Prescott alexprescott at email.arizona.edu PhD Candidate, The University of Arizona Department of Geosciences 1040 E. 4th Street Tucson, AZ, 85721 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 10 18:26:48 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 10 Sep 2020 18:26:48 -0500 Subject: [petsc-users] Dynamic SNESVI bounds In-Reply-To: References: Message-ID: <0038262D-5336-4F68-B71F-8C783249A4BD@petsc.dev> Yes, it should be simple to write the code to do this. Provide a function that calls SNESVISetVariableBounds() using your criteria then call SNESSetUpdate() to have that function called on each iteration of SNES to reset the bounds. If this will converge to what you desire I have no clue. But each step will find a result that satisfies the current bounds you set. Barry > On Sep 10, 2020, at 5:15 PM, Alexander B Prescott wrote: > > Hi there, > > I have a quick question (hopefully) that I didn't find addressed in the documentation or user list archives. Is it possible to change the SNES Variational Inequality bounds from one solver iteration to the next? > My goal is to update the bounds such that a specific entry in the solution vector remains the supremum throughout the entire execution. > > Best, > Alexander > > -- > Alexander Prescott > alexprescott at email.arizona.edu > PhD Candidate, The University of Arizona > Department of Geosciences > 1040 E. 4th Street > Tucson, AZ, 85721 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenzhuotj at gmail.com Thu Sep 10 19:52:17 2020 From: chenzhuotj at gmail.com (Zhuo Chen) Date: Thu, 10 Sep 2020 18:52:17 -0600 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Message-ID: Dear Petsc users, I found an ancient thread discussing this problem. https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html However, when I add call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) call PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) the program will tell me WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There is one unused database option. It is: Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 I would like to know the most correct way to activate the modified Gram-Schmidt orthogonalization process in Fortran. Thank you very much! Best regards. -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Sep 10 20:08:14 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Fri, 11 Sep 2020 01:08:14 +0000 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: Zhuo, Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, petsc/src/ksp/ksp/tutorials mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt KSP Object: 2 MPI processes type: gmres restart=30, using Modified Gram-Schmidt Orthogonalization happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: bjacobi ... You can call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) in your program. Hong ________________________________ From: petsc-users on behalf of Zhuo Chen Sent: Thursday, September 10, 2020 7:52 PM To: petsc-users at mcs.anl.gov Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Dear Petsc users, I found an ancient thread discussing this problem. https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html However, when I add call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) call PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) the program will tell me WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There is one unused database option. It is: Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 I would like to know the most correct way to activate the modified Gram-Schmidt orthogonalization process in Fortran. Thank you very much! Best regards. -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenzhuotj at gmail.com Thu Sep 10 20:17:12 2020 From: chenzhuotj at gmail.com (Zhuo Chen) Date: Thu, 10 Sep 2020 19:17:12 -0600 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: Hi Hong, Thank you very much for your help. It seems that if I simply append -ksp_gmres_modifiedgramschmidt the warning goes away. However KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) has another issue. Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has no IMPLICIT type Is it because the argument is too long? I am using gcc 8.4.0 instead of ifort On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong wrote: > Zhuo, > Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, > petsc/src/ksp/ksp/tutorials > mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt > KSP Object: 2 MPI processes > type: gmres > restart=30, using Modified Gram-Schmidt Orthogonalization > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 2 MPI processes > type: bjacobi > ... > > You can > call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) > in your program. > > Hong > > ------------------------------ > *From:* petsc-users on behalf of Zhuo > Chen > *Sent:* Thursday, September 10, 2020 7:52 PM > *To:* petsc-users at mcs.anl.gov > *Subject:* [petsc-users] How to activate the modified Gram-Schmidt > orthogonalization process in Fortran? > > Dear Petsc users, > > I found an ancient thread discussing this problem. > > https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html > > However, when I add > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > call > PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) > > the program will tell me > > WARNING! There are options you set that were not used! > WARNING! could be spelling mistake, etc! > There is one unused database option. It is: > Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 > > I would like to know the most correct way to activate the modified > Gram-Schmidt orthogonalization process in Fortran. Thank you very much! > > Best regards. > > > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Sep 10 20:32:28 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Fri, 11 Sep 2020 01:32:28 +0000 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: , Message-ID: Zhuo, Call KSPSetType(ksp,KSPGMRES); KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); Hong ________________________________ From: Zhuo Chen Sent: Thursday, September 10, 2020 8:17 PM To: Zhang, Hong Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Hi Hong, Thank you very much for your help. It seems that if I simply append -ksp_gmres_modifiedgramschmidt the warning goes away. However KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) has another issue. Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has no IMPLICIT type Is it because the argument is too long? I am using gcc 8.4.0 instead of ifort On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong > wrote: Zhuo, Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, petsc/src/ksp/ksp/tutorials mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt KSP Object: 2 MPI processes type: gmres restart=30, using Modified Gram-Schmidt Orthogonalization happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: bjacobi ... You can call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) in your program. Hong ________________________________ From: petsc-users > on behalf of Zhuo Chen > Sent: Thursday, September 10, 2020 7:52 PM To: petsc-users at mcs.anl.gov > Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Dear Petsc users, I found an ancient thread discussing this problem. https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html However, when I add call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) call PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) the program will tell me WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There is one unused database option. It is: Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 I would like to know the most correct way to activate the modified Gram-Schmidt orthogonalization process in Fortran. Thank you very much! Best regards. -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 10 20:38:13 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 10 Sep 2020 20:38:13 -0500 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: The options database option should work also. Are you calling KSPSetFromOptions()? Also -ksp_view should include information about the orthogonalization used. Barry > On Sep 10, 2020, at 8:32 PM, Zhang, Hong via petsc-users wrote: > > Zhuo, > Call > KSPSetType(ksp,KSPGMRES); > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); > Hong > > From: Zhuo Chen > Sent: Thursday, September 10, 2020 8:17 PM > To: Zhang, Hong > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? > > Hi Hong, > > Thank you very much for your help. > > It seems that if I simply append -ksp_gmres_modifiedgramschmidt the warning goes away. However KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) has another issue. > > Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has no IMPLICIT type > > Is it because the argument is too long? I am using gcc 8.4.0 instead of ifort > > On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong > wrote: > Zhuo, > Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, > petsc/src/ksp/ksp/tutorials > mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt > KSP Object: 2 MPI processes > type: gmres > restart=30, using Modified Gram-Schmidt Orthogonalization > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 2 MPI processes > type: bjacobi > ... > > You can call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) in your program. > > Hong > > From: petsc-users > on behalf of Zhuo Chen > > Sent: Thursday, September 10, 2020 7:52 PM > To: petsc-users at mcs.anl.gov > > Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? > > Dear Petsc users, > > I found an ancient thread discussing this problem. > > https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html > > However, when I add > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > call PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) > > the program will tell me > > WARNING! There are options you set that were not used! > WARNING! could be spelling mistake, etc! > There is one unused database option. It is: > Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 > > I would like to know the most correct way to activate the modified Gram-Schmidt orthogonalization process in Fortran. Thank you very much! > > Best regards. > > > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenzhuotj at gmail.com Thu Sep 10 20:41:30 2020 From: chenzhuotj at gmail.com (Zhuo Chen) Date: Thu, 10 Sep 2020 19:41:30 -0600 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: Hi Hong, According to that very old thread, KSPGMRESSetOrthogonalization was not implemented in Fortran. I did as you suggested and the compiler will tell me undefined reference to `kspgmressetorthogonalization_' I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank you so much! On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong wrote: > Zhuo, > Call > KSPSetType(ksp,KSPGMRES); > > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); > Hong > > ------------------------------ > *From:* Zhuo Chen > *Sent:* Thursday, September 10, 2020 8:17 PM > *To:* Zhang, Hong > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt > orthogonalization process in Fortran? > > Hi Hong, > > Thank you very much for your help. > > It seems that if I simply append -ksp_gmres_modifiedgramschmidt the > warning goes away. However > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) > has another issue. > > Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has no > IMPLICIT type > > Is it because the argument is too long? I am using gcc 8.4.0 instead of > ifort > > On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong wrote: > > Zhuo, > Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, > petsc/src/ksp/ksp/tutorials > mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt > KSP Object: 2 MPI processes > type: gmres > restart=30, using Modified Gram-Schmidt Orthogonalization > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 2 MPI processes > type: bjacobi > ... > > You can > call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) > in your program. > > Hong > > ------------------------------ > *From:* petsc-users on behalf of Zhuo > Chen > *Sent:* Thursday, September 10, 2020 7:52 PM > *To:* petsc-users at mcs.anl.gov > *Subject:* [petsc-users] How to activate the modified Gram-Schmidt > orthogonalization process in Fortran? > > Dear Petsc users, > > I found an ancient thread discussing this problem. > > https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html > > However, when I add > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > call > PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) > > the program will tell me > > WARNING! There are options you set that were not used! > WARNING! could be spelling mistake, etc! > There is one unused database option. It is: > Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 > > I would like to know the most correct way to activate the modified > Gram-Schmidt orthogonalization process in Fortran. Thank you very much! > > Best regards. > > > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Fri Sep 11 07:45:06 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Fri, 11 Sep 2020 12:45:06 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> , <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> Message-ID: Thank you Barry, Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations. I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code). Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms? Cheers, Sophie ________________________________ From: Barry Smith Sent: Thursday, September 10, 2020 17:04 To: Blondel, Sophie Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Subject: Re: [petsc-users] Matrix Free Method questions On Sep 10, 2020, at 2:46 PM, Blondel, Sophie > wrote: Hi Barry, Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). but now when I run step 3 again I get the following error: [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Argument out of range [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. Barry Cheers, Sophie ________________________________ From: Barry Smith > Sent: Friday, September 4, 2020 01:06 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. I have started looking through the logs The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. * It roughly doubles the time since doing the matrix-free product requires a function evaluation * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations The change from 2 to 3 (not storing the entire matrix) * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). The change from 3 to 4 * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. Good luck, Barry On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: Hi Barry, Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mercredi 2 septembre 2020 15:53 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: Thank you Barry, The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. Sophie, That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mardi 1 septembre 2020 14:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. Barry On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: Hi Barry, I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: Mat mat; DMCreateMatrix(da, &mat); MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 I get an error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c PetscSolver::solve: TSSolve failed. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : lundi 31 ao?t 2020 14:50 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. The factor of 4 is lot, the 1.5 not so bad. You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. If you have any questions please let me know. Barry On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: Hi Barry, I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: * 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) * 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 18:31 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_3_bis.txt URL: From C.Klaij at marin.nl Fri Sep 11 07:50:26 2020 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Fri, 11 Sep 2020 12:50:26 +0000 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: <1599828626280.72184@marin.nl> Make me feel ancient. Would be nice to have the fortran binding though... Chris > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 10 Sep 2020 19:41:30 -0600 > From: Zhuo Chen > To: "Zhang, Hong" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt > orthogonalization process in Fortran? > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi Hong, > > According to that very old thread, KSPGMRESSetOrthogonalization was not > implemented in Fortran. I did as you suggested and the compiler will tell > me > > undefined reference to `kspgmressetorthogonalization_' > > I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank you so > much! > dr. ir. Christiaan Klaij | Senior Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong wrote: > > > Zhuo, > > Call > > KSPSetType(ksp,KSPGMRES); > > > > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); > > Hong > > > > ------------------------------ > > *From:* Zhuo Chen > > *Sent:* Thursday, September 10, 2020 8:17 PM > > *To:* Zhang, Hong > > *Cc:* petsc-users at mcs.anl.gov > > *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt > > orthogonalization process in Fortran? > > > > Hi Hong, > > > > Thank you very much for your help. > > > > It seems that if I simply append -ksp_gmres_modifiedgramschmidt the > > warning goes away. However > > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) > > has another issue. > > > > Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has no > > IMPLICIT type > > > > Is it because the argument is too long? I am using gcc 8.4.0 instead of > > ifort > > > > On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong wrote: > > > > Zhuo, > > Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, > > petsc/src/ksp/ksp/tutorials > > mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt > > KSP Object: 2 MPI processes > > type: gmres > > restart=30, using Modified Gram-Schmidt Orthogonalization > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 2 MPI processes > > type: bjacobi > > ... > > > > You can > > call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) > > in your program. > > > > Hong > > > > ------------------------------ > > *From:* petsc-users on behalf of Zhuo > > Chen > > *Sent:* Thursday, September 10, 2020 7:52 PM > > *To:* petsc-users at mcs.anl.gov > > *Subject:* [petsc-users] How to activate the modified Gram-Schmidt > > orthogonalization process in Fortran? > > > > Dear Petsc users, > > > > I found an ancient thread discussing this problem. > > > > https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html > > > > However, when I add > > > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > > call > > PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) > > > > the program will tell me > > > > WARNING! There are options you set that were not used! > > WARNING! could be spelling mistake, etc! > > There is one unused database option. It is: > > Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 > > > > I would like to know the most correct way to activate the modified > > Gram-Schmidt orthogonalization process in Fortran. Thank you very much! > > > > Best regards. > > > > From hzhang at mcs.anl.gov Fri Sep 11 09:57:54 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Fri, 11 Sep 2020 14:57:54 +0000 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: <1599828626280.72184@marin.nl> References: , <1599828626280.72184@marin.nl> Message-ID: Sorry, we have not done it. Can you use PetscOptionsSetValue("-ksp_gmres_modifiedgramschmidt", "1") for now? We'll try to add the fortran binding later. Hong ________________________________ From: petsc-users on behalf of Klaij, Christiaan Sent: Friday, September 11, 2020 7:50 AM To: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Make me feel ancient. Would be nice to have the fortran binding though... Chris > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 10 Sep 2020 19:41:30 -0600 > From: Zhuo Chen > To: "Zhang, Hong" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt > orthogonalization process in Fortran? > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi Hong, > > According to that very old thread, KSPGMRESSetOrthogonalization was not > implemented in Fortran. I did as you suggested and the compiler will tell > me > > undefined reference to `kspgmressetorthogonalization_' > > I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank you so > much! > dr. ir. Christiaan Klaij | Senior Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong wrote: > > > Zhuo, > > Call > > KSPSetType(ksp,KSPGMRES); > > > > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); > > Hong > > > > ------------------------------ > > *From:* Zhuo Chen > > *Sent:* Thursday, September 10, 2020 8:17 PM > > *To:* Zhang, Hong > > *Cc:* petsc-users at mcs.anl.gov > > *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt > > orthogonalization process in Fortran? > > > > Hi Hong, > > > > Thank you very much for your help. > > > > It seems that if I simply append -ksp_gmres_modifiedgramschmidt the > > warning goes away. However > > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) > > has another issue. > > > > Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has no > > IMPLICIT type > > > > Is it because the argument is too long? I am using gcc 8.4.0 instead of > > ifort > > > > On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong wrote: > > > > Zhuo, > > Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, > > petsc/src/ksp/ksp/tutorials > > mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt > > KSP Object: 2 MPI processes > > type: gmres > > restart=30, using Modified Gram-Schmidt Orthogonalization > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 2 MPI processes > > type: bjacobi > > ... > > > > You can > > call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) > > in your program. > > > > Hong > > > > ------------------------------ > > *From:* petsc-users on behalf of Zhuo > > Chen > > *Sent:* Thursday, September 10, 2020 7:52 PM > > *To:* petsc-users at mcs.anl.gov > > *Subject:* [petsc-users] How to activate the modified Gram-Schmidt > > orthogonalization process in Fortran? > > > > Dear Petsc users, > > > > I found an ancient thread discussing this problem. > > > > https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html > > > > However, when I add > > > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > > call > > PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) > > > > the program will tell me > > > > WARNING! There are options you set that were not used! > > WARNING! could be spelling mistake, etc! > > There is one unused database option. It is: > > Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 > > > > I would like to know the most correct way to activate the modified > > Gram-Schmidt orthogonalization process in Fortran. Thank you very much! > > > > Best regards. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Fri Sep 11 10:00:51 2020 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Fri, 11 Sep 2020 15:00:51 +0000 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: , <1599828626280.72184@marin.nl>, Message-ID: <1599836451969.14360@marin.nl> Sure, that was the advise 9 years ago in the ancient thread. It's not a big problem. Chris dr. ir. Christiaan Klaij | Senior Researcher | Research & Development MARIN | T +31 317 49 33 44 | C.Klaij at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: New publication on noise measurements of a cavitating propeller ________________________________ From: Zhang, Hong Sent: Friday, September 11, 2020 4:57 PM To: Klaij, Christiaan; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Sorry, we have not done it. Can you use PetscOptionsSetValue("-ksp_gmres_modifiedgramschmidt", "1") for now? We'll try to add the fortran binding later. Hong ________________________________ From: petsc-users on behalf of Klaij, Christiaan Sent: Friday, September 11, 2020 7:50 AM To: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Make me feel ancient. Would be nice to have the fortran binding though... Chris > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 10 Sep 2020 19:41:30 -0600 > From: Zhuo Chen > To: "Zhang, Hong" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt > orthogonalization process in Fortran? > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi Hong, > > According to that very old thread, KSPGMRESSetOrthogonalization was not > implemented in Fortran. I did as you suggested and the compiler will tell > me > > undefined reference to `kspgmressetorthogonalization_' > > I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank you so > much! > dr. ir. Christiaan Klaij | Senior Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong wrote: > > > Zhuo, > > Call > > KSPSetType(ksp,KSPGMRES); > > > > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); > > Hong > > > > ------------------------------ > > *From:* Zhuo Chen > > *Sent:* Thursday, September 10, 2020 8:17 PM > > *To:* Zhang, Hong > > *Cc:* petsc-users at mcs.anl.gov > > *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt > > orthogonalization process in Fortran? > > > > Hi Hong, > > > > Thank you very much for your help. > > > > It seems that if I simply append -ksp_gmres_modifiedgramschmidt the > > warning goes away. However > > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) > > has another issue. > > > > Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has no > > IMPLICIT type > > > > Is it because the argument is too long? I am using gcc 8.4.0 instead of > > ifort > > > > On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong wrote: > > > > Zhuo, > > Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, > > petsc/src/ksp/ksp/tutorials > > mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt > > KSP Object: 2 MPI processes > > type: gmres > > restart=30, using Modified Gram-Schmidt Orthogonalization > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 2 MPI processes > > type: bjacobi > > ... > > > > You can > > call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) > > in your program. > > > > Hong > > > > ------------------------------ > > *From:* petsc-users on behalf of Zhuo > > Chen > > *Sent:* Thursday, September 10, 2020 7:52 PM > > *To:* petsc-users at mcs.anl.gov > > *Subject:* [petsc-users] How to activate the modified Gram-Schmidt > > orthogonalization process in Fortran? > > > > Dear Petsc users, > > > > I found an ancient thread discussing this problem. > > > > https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html > > > > However, when I add > > > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > > call > > PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) > > > > the program will tell me > > > > WARNING! There are options you set that were not used! > > WARNING! could be spelling mistake, etc! > > There is one unused database option. It is: > > Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 > > > > I would like to know the most correct way to activate the modified > > Gram-Schmidt orthogonalization process in Fortran. Thank you very much! > > > > Best regards. > > > > Help us improve the spam filter. If this message contains SPAM, click here to report. Thank you, MARIN Support Group -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagefd46c4.PNG Type: image/png Size: 293 bytes Desc: imagefd46c4.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image97655b.PNG Type: image/png Size: 331 bytes Desc: image97655b.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagedce627.PNG Type: image/png Size: 333 bytes Desc: imagedce627.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagef2324a.PNG Type: image/png Size: 253 bytes Desc: imagef2324a.PNG URL: From hzhang at mcs.anl.gov Fri Sep 11 10:05:37 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Fri, 11 Sep 2020 15:05:37 +0000 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: , Message-ID: Zhuo, I'll try to get it done after the incoming release. My hands are full with more urgent tasks at moment. I'll let you know after I'm done. Thanks for your patience. Hong ________________________________ From: Zhuo Chen Sent: Thursday, September 10, 2020 8:41 PM To: Zhang, Hong Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Hi Hong, According to that very old thread, KSPGMRESSetOrthogonalization was not implemented in Fortran. I did as you suggested and the compiler will tell me undefined reference to `kspgmressetorthogonalization_' I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank you so much! On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong > wrote: Zhuo, Call KSPSetType(ksp,KSPGMRES); KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); Hong ________________________________ From: Zhuo Chen > Sent: Thursday, September 10, 2020 8:17 PM To: Zhang, Hong > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Hi Hong, Thank you very much for your help. It seems that if I simply append -ksp_gmres_modifiedgramschmidt the warning goes away. However KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) has another issue. Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has no IMPLICIT type Is it because the argument is too long? I am using gcc 8.4.0 instead of ifort On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong > wrote: Zhuo, Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, petsc/src/ksp/ksp/tutorials mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt KSP Object: 2 MPI processes type: gmres restart=30, using Modified Gram-Schmidt Orthogonalization happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: bjacobi ... You can call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) in your program. Hong ________________________________ From: petsc-users > on behalf of Zhuo Chen > Sent: Thursday, September 10, 2020 7:52 PM To: petsc-users at mcs.anl.gov > Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? Dear Petsc users, I found an ancient thread discussing this problem. https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html However, when I add call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) call PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) the program will tell me WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There is one unused database option. It is: Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 I would like to know the most correct way to activate the modified Gram-Schmidt orthogonalization process in Fortran. Thank you very much! Best regards. -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenzhuotj at gmail.com Fri Sep 11 14:04:24 2020 From: chenzhuotj at gmail.com (Zhuo Chen) Date: Fri, 11 Sep 2020 13:04:24 -0600 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: Hi Hong, Thank you very much for the plan. Though it may be obvious to many Petsc gurus. I would like to summarize my solution to activate the modified Gram-Schmidt orthogonalization process in Fortran now. It may help some new Petsc users. Option 1: append the -ksp_gmres_modifiedgramschmidt at runtime, the code would look like call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) and to run the program, use mpiexec -np 2 ./run -ksp_gmres_modifiedgramschmidt Option 2: use PetscOptionsSetValue() call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) call PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) and to run the program, use mpiexec -np 2 ./run Best. On Fri, Sep 11, 2020 at 9:05 AM Zhang, Hong wrote: > Zhuo, > I'll try to get it done after the incoming release. My hands are full with > more urgent tasks at moment. I'll let you know after I'm done. > Thanks for your patience. > Hong > > ------------------------------ > *From:* Zhuo Chen > *Sent:* Thursday, September 10, 2020 8:41 PM > *To:* Zhang, Hong > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt > orthogonalization process in Fortran? > > Hi Hong, > > According to that very old thread, KSPGMRESSetOrthogonalization was not > implemented in Fortran. I did as you suggested and the compiler will tell > me > > undefined reference to `kspgmressetorthogonalization_' > > I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank you so > much! > > On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong wrote: > > Zhuo, > Call > KSPSetType(ksp,KSPGMRES); > > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); > Hong > > ------------------------------ > *From:* Zhuo Chen > *Sent:* Thursday, September 10, 2020 8:17 PM > *To:* Zhang, Hong > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt > orthogonalization process in Fortran? > > Hi Hong, > > Thank you very much for your help. > > It seems that if I simply append -ksp_gmres_modifiedgramschmidt the > warning goes away. However > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) > has another issue. > > Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has no > IMPLICIT type > > Is it because the argument is too long? I am using gcc 8.4.0 instead of > ifort > > On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong wrote: > > Zhuo, > Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, > petsc/src/ksp/ksp/tutorials > mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt > KSP Object: 2 MPI processes > type: gmres > restart=30, using Modified Gram-Schmidt Orthogonalization > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 2 MPI processes > type: bjacobi > ... > > You can > call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) > in your program. > > Hong > > ------------------------------ > *From:* petsc-users on behalf of Zhuo > Chen > *Sent:* Thursday, September 10, 2020 7:52 PM > *To:* petsc-users at mcs.anl.gov > *Subject:* [petsc-users] How to activate the modified Gram-Schmidt > orthogonalization process in Fortran? > > Dear Petsc users, > > I found an ancient thread discussing this problem. > > https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html > > However, when I add > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > call > PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) > > the program will tell me > > WARNING! There are options you set that were not used! > WARNING! could be spelling mistake, etc! > There is one unused database option. It is: > Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 > > I would like to know the most correct way to activate the modified > Gram-Schmidt orthogonalization process in Fortran. Thank you very much! > > Best regards. > > > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 11 14:31:18 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 11 Sep 2020 15:31:18 -0400 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: On Fri, Sep 11, 2020 at 3:05 PM Zhuo Chen wrote: > Hi Hong, > > Thank you very much for the plan. > > Though it may be obvious to many Petsc gurus. I would like to summarize my > solution to activate the modified Gram-Schmidt orthogonalization process in > Fortran now. It may help some new Petsc users. > > Option 1: append the -ksp_gmres_modifiedgramschmidt at runtime, the code > would look like > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) > > and to run the program, use > > mpiexec -np 2 ./run -ksp_gmres_modifiedgramschmidt > > Option 2: use PetscOptionsSetValue() > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) > call > PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) > Does it work if you call SetValue() after SetFromOptions()? I would not think that would work. Thanks, Matt > and to run the program, use > > mpiexec -np 2 ./run > > Best. > > > > On Fri, Sep 11, 2020 at 9:05 AM Zhang, Hong wrote: > >> Zhuo, >> I'll try to get it done after the incoming release. My hands are full >> with more urgent tasks at moment. I'll let you know after I'm done. >> Thanks for your patience. >> Hong >> >> ------------------------------ >> *From:* Zhuo Chen >> *Sent:* Thursday, September 10, 2020 8:41 PM >> *To:* Zhang, Hong >> *Cc:* petsc-users at mcs.anl.gov >> *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt >> orthogonalization process in Fortran? >> >> Hi Hong, >> >> According to that very old thread, KSPGMRESSetOrthogonalization was not >> implemented in Fortran. I did as you suggested and the compiler will tell >> me >> >> undefined reference to `kspgmressetorthogonalization_' >> >> I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank you >> so much! >> >> On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong wrote: >> >> Zhuo, >> Call >> KSPSetType(ksp,KSPGMRES); >> >> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); >> Hong >> >> ------------------------------ >> *From:* Zhuo Chen >> *Sent:* Thursday, September 10, 2020 8:17 PM >> *To:* Zhang, Hong >> *Cc:* petsc-users at mcs.anl.gov >> *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt >> orthogonalization process in Fortran? >> >> Hi Hong, >> >> Thank you very much for your help. >> >> It seems that if I simply append -ksp_gmres_modifiedgramschmidt the >> warning goes away. However >> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) >> has another issue. >> >> Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has >> no IMPLICIT type >> >> Is it because the argument is too long? I am using gcc 8.4.0 instead of >> ifort >> >> On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong wrote: >> >> Zhuo, >> Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, >> petsc/src/ksp/ksp/tutorials >> mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt >> KSP Object: 2 MPI processes >> type: gmres >> restart=30, using Modified Gram-Schmidt Orthogonalization >> happy breakdown tolerance 1e-30 >> maximum iterations=10000, initial guess is zero >> tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 2 MPI processes >> type: bjacobi >> ... >> >> You can >> call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) >> in your program. >> >> Hong >> >> ------------------------------ >> *From:* petsc-users on behalf of Zhuo >> Chen >> *Sent:* Thursday, September 10, 2020 7:52 PM >> *To:* petsc-users at mcs.anl.gov >> *Subject:* [petsc-users] How to activate the modified Gram-Schmidt >> orthogonalization process in Fortran? >> >> Dear Petsc users, >> >> I found an ancient thread discussing this problem. >> >> https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html >> >> However, when I add >> >> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >> call >> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >> >> the program will tell me >> >> WARNING! There are options you set that were not used! >> WARNING! could be spelling mistake, etc! >> There is one unused database option. It is: >> Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 >> >> I would like to know the most correct way to activate the modified >> Gram-Schmidt orthogonalization process in Fortran. Thank you very much! >> >> Best regards. >> >> >> >> >> -- >> Zhuo Chen >> Department of Physics >> University of Alberta >> Edmonton Alberta, Canada T6G 2E1 >> http://www.pas.rochester.edu/~zchen25/ >> >> >> >> -- >> Zhuo Chen >> Department of Physics >> University of Alberta >> Edmonton Alberta, Canada T6G 2E1 >> http://www.pas.rochester.edu/~zchen25/ >> >> >> >> -- >> Zhuo Chen >> Department of Physics >> University of Alberta >> Edmonton Alberta, Canada T6G 2E1 >> http://www.pas.rochester.edu/~zchen25/ >> > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From elbueler at alaska.edu Fri Sep 11 14:55:43 2020 From: elbueler at alaska.edu (Ed Bueler) Date: Fri, 11 Sep 2020 11:55:43 -0800 Subject: [petsc-users] recommended error-checking for PetscInitialize() and PetscFinalize() Message-ID: Dear PETSc -- I notice in the users manual that the C examples show ierr = PetscInitialize(&argc,&args,(char*)0,help);if (ierr) return ierr; at the start of main() and ierr = PetscFinalize(); return ierr; at the end of main(). Is this the deliberate, recommended style? My understanding of these choices is that if PetscInitialize() fails then CHKERRQ(ierr) may not do the right thing, while if PetscFinalize() fails then that should be the result of main() (without any fiddling by CHKERRQ etc.). Is this the correct understanding? Thanks, Ed -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenzhuotj at gmail.com Fri Sep 11 14:56:01 2020 From: chenzhuotj at gmail.com (Zhuo Chen) Date: Fri, 11 Sep 2020 13:56:01 -0600 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: Hi Matthew, Yes, if use call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) call PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) call PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_view','',ierr);CHKERRQ(ierr) to see if the modified Gram-Schmidt process is actually active. The output is KSP Object: 4 MPI processes type: gmres restart=30, using Modified Gram-Schmidt Orthogonalization happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-11, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test I think that means calling PetscOptionsSetValue() after KSPSetFromOptions() works. Correct me if I am wrong. Best. On Fri, Sep 11, 2020 at 1:31 PM Matthew Knepley wrote: > On Fri, Sep 11, 2020 at 3:05 PM Zhuo Chen wrote: > >> Hi Hong, >> >> Thank you very much for the plan. >> >> Though it may be obvious to many Petsc gurus. I would like to summarize >> my solution to activate the modified Gram-Schmidt orthogonalization process >> in Fortran now. It may help some new Petsc users. >> >> Option 1: append the -ksp_gmres_modifiedgramschmidt at runtime, the code >> would look like >> >> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >> >> and to run the program, use >> >> mpiexec -np 2 ./run -ksp_gmres_modifiedgramschmidt >> >> Option 2: use PetscOptionsSetValue() >> >> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >> call >> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >> > > Does it work if you call SetValue() after SetFromOptions()? I would not > think that would work. > > Thanks, > > Matt > > >> and to run the program, use >> >> mpiexec -np 2 ./run >> >> Best. >> >> >> >> On Fri, Sep 11, 2020 at 9:05 AM Zhang, Hong wrote: >> >>> Zhuo, >>> I'll try to get it done after the incoming release. My hands are full >>> with more urgent tasks at moment. I'll let you know after I'm done. >>> Thanks for your patience. >>> Hong >>> >>> ------------------------------ >>> *From:* Zhuo Chen >>> *Sent:* Thursday, September 10, 2020 8:41 PM >>> *To:* Zhang, Hong >>> *Cc:* petsc-users at mcs.anl.gov >>> *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt >>> orthogonalization process in Fortran? >>> >>> Hi Hong, >>> >>> According to that very old thread, KSPGMRESSetOrthogonalization was not >>> implemented in Fortran. I did as you suggested and the compiler will tell >>> me >>> >>> undefined reference to `kspgmressetorthogonalization_' >>> >>> I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank you >>> so much! >>> >>> On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong wrote: >>> >>> Zhuo, >>> Call >>> KSPSetType(ksp,KSPGMRES); >>> >>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); >>> Hong >>> >>> ------------------------------ >>> *From:* Zhuo Chen >>> *Sent:* Thursday, September 10, 2020 8:17 PM >>> *To:* Zhang, Hong >>> *Cc:* petsc-users at mcs.anl.gov >>> *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt >>> orthogonalization process in Fortran? >>> >>> Hi Hong, >>> >>> Thank you very much for your help. >>> >>> It seems that if I simply append -ksp_gmres_modifiedgramschmidt the >>> warning goes away. However >>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) >>> has another issue. >>> >>> Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has >>> no IMPLICIT type >>> >>> Is it because the argument is too long? I am using gcc 8.4.0 instead of >>> ifort >>> >>> On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong wrote: >>> >>> Zhuo, >>> Run your code with option '-ksp_gmres_modifiedgramschmidt'. For example, >>> petsc/src/ksp/ksp/tutorials >>> mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt >>> KSP Object: 2 MPI processes >>> type: gmres >>> restart=30, using Modified Gram-Schmidt Orthogonalization >>> happy breakdown tolerance 1e-30 >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using PRECONDITIONED norm type for convergence test >>> PC Object: 2 MPI processes >>> type: bjacobi >>> ... >>> >>> You can >>> call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) >>> in your program. >>> >>> Hong >>> >>> ------------------------------ >>> *From:* petsc-users on behalf of Zhuo >>> Chen >>> *Sent:* Thursday, September 10, 2020 7:52 PM >>> *To:* petsc-users at mcs.anl.gov >>> *Subject:* [petsc-users] How to activate the modified Gram-Schmidt >>> orthogonalization process in Fortran? >>> >>> Dear Petsc users, >>> >>> I found an ancient thread discussing this problem. >>> >>> https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html >>> >>> However, when I add >>> >>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>> call >>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>> >>> the program will tell me >>> >>> WARNING! There are options you set that were not used! >>> WARNING! could be spelling mistake, etc! >>> There is one unused database option. It is: >>> Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 >>> >>> I would like to know the most correct way to activate the modified >>> Gram-Schmidt orthogonalization process in Fortran. Thank you very much! >>> >>> Best regards. >>> >>> >>> >>> >>> -- >>> Zhuo Chen >>> Department of Physics >>> University of Alberta >>> Edmonton Alberta, Canada T6G 2E1 >>> http://www.pas.rochester.edu/~zchen25/ >>> >>> >>> >>> -- >>> Zhuo Chen >>> Department of Physics >>> University of Alberta >>> Edmonton Alberta, Canada T6G 2E1 >>> http://www.pas.rochester.edu/~zchen25/ >>> >>> >>> >>> -- >>> Zhuo Chen >>> Department of Physics >>> University of Alberta >>> Edmonton Alberta, Canada T6G 2E1 >>> http://www.pas.rochester.edu/~zchen25/ >>> >> >> >> -- >> Zhuo Chen >> Department of Physics >> University of Alberta >> Edmonton Alberta, Canada T6G 2E1 >> http://www.pas.rochester.edu/~zchen25/ >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 11 14:58:10 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 11 Sep 2020 15:58:10 -0400 Subject: [petsc-users] recommended error-checking for PetscInitialize() and PetscFinalize() In-Reply-To: References: Message-ID: On Fri, Sep 11, 2020 at 3:57 PM Ed Bueler wrote: > Dear PETSc -- > > I notice in the users manual that the C examples show > > ierr = PetscInitialize(&argc,&args,(char*)0,help);if (ierr) return ierr; > > at the start of main() and > > ierr = PetscFinalize(); > return ierr; > > at the end of main(). Is this the deliberate, recommended style? > Yes. > My understanding of these choices is that if PetscInitialize() fails > then CHKERRQ(ierr) may not do the right thing, > Yes, failure early-on in Initialize() can predate setting up error handling. > while if PetscFinalize() fails then that should be the result of main() > (without any fiddling by CHKERRQ etc.). Is this the correct understanding? > Yes. Thanks, Matt > Thanks, > > Ed > > -- > Ed Bueler > Dept of Mathematics and Statistics > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 306C Chapman > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 11 15:06:53 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 11 Sep 2020 16:06:53 -0400 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: On Fri, Sep 11, 2020 at 3:56 PM Zhuo Chen wrote: > Hi Matthew, > > Yes, if use > > call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) > call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) > call > PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) > call > PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_view','',ierr);CHKERRQ(ierr) > Are you running in a loop? Thanks, Matt > to see if the modified Gram-Schmidt process is actually active. The output > is > > KSP Object: 4 MPI processes > type: gmres > restart=30, using Modified Gram-Schmidt Orthogonalization > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-11, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > > I think that means calling PetscOptionsSetValue() > after KSPSetFromOptions() works. Correct me if I am wrong. > > Best. > > On Fri, Sep 11, 2020 at 1:31 PM Matthew Knepley wrote: > >> On Fri, Sep 11, 2020 at 3:05 PM Zhuo Chen wrote: >> >>> Hi Hong, >>> >>> Thank you very much for the plan. >>> >>> Though it may be obvious to many Petsc gurus. I would like to summarize >>> my solution to activate the modified Gram-Schmidt orthogonalization process >>> in Fortran now. It may help some new Petsc users. >>> >>> Option 1: append the -ksp_gmres_modifiedgramschmidt at runtime, the code >>> would look like >>> >>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>> >>> and to run the program, use >>> >>> mpiexec -np 2 ./run -ksp_gmres_modifiedgramschmidt >>> >>> Option 2: use PetscOptionsSetValue() >>> >>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>> call >>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>> >> >> Does it work if you call SetValue() after SetFromOptions()? I would not >> think that would work. >> >> Thanks, >> >> Matt >> >> >>> and to run the program, use >>> >>> mpiexec -np 2 ./run >>> >>> Best. >>> >>> >>> >>> On Fri, Sep 11, 2020 at 9:05 AM Zhang, Hong wrote: >>> >>>> Zhuo, >>>> I'll try to get it done after the incoming release. My hands are full >>>> with more urgent tasks at moment. I'll let you know after I'm done. >>>> Thanks for your patience. >>>> Hong >>>> >>>> ------------------------------ >>>> *From:* Zhuo Chen >>>> *Sent:* Thursday, September 10, 2020 8:41 PM >>>> *To:* Zhang, Hong >>>> *Cc:* petsc-users at mcs.anl.gov >>>> *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt >>>> orthogonalization process in Fortran? >>>> >>>> Hi Hong, >>>> >>>> According to that very old thread, KSPGMRESSetOrthogonalization was not >>>> implemented in Fortran. I did as you suggested and the compiler will tell >>>> me >>>> >>>> undefined reference to `kspgmressetorthogonalization_' >>>> >>>> I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank you >>>> so much! >>>> >>>> On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong wrote: >>>> >>>> Zhuo, >>>> Call >>>> KSPSetType(ksp,KSPGMRES); >>>> >>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); >>>> Hong >>>> >>>> ------------------------------ >>>> *From:* Zhuo Chen >>>> *Sent:* Thursday, September 10, 2020 8:17 PM >>>> *To:* Zhang, Hong >>>> *Cc:* petsc-users at mcs.anl.gov >>>> *Subject:* Re: [petsc-users] How to activate the modified Gram-Schmidt >>>> orthogonalization process in Fortran? >>>> >>>> Hi Hong, >>>> >>>> Thank you very much for your help. >>>> >>>> It seems that if I simply append -ksp_gmres_modifiedgramschmidt the >>>> warning goes away. However >>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) >>>> has another issue. >>>> >>>> Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) has >>>> no IMPLICIT type >>>> >>>> Is it because the argument is too long? I am using gcc 8.4.0 instead of >>>> ifort >>>> >>>> On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong wrote: >>>> >>>> Zhuo, >>>> Run your code with option '-ksp_gmres_modifiedgramschmidt'. For >>>> example, >>>> petsc/src/ksp/ksp/tutorials >>>> mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt >>>> KSP Object: 2 MPI processes >>>> type: gmres >>>> restart=30, using Modified Gram-Schmidt Orthogonalization >>>> happy breakdown tolerance 1e-30 >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using PRECONDITIONED norm type for convergence test >>>> PC Object: 2 MPI processes >>>> type: bjacobi >>>> ... >>>> >>>> You can >>>> call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) >>>> in your program. >>>> >>>> Hong >>>> >>>> ------------------------------ >>>> *From:* petsc-users on behalf of >>>> Zhuo Chen >>>> *Sent:* Thursday, September 10, 2020 7:52 PM >>>> *To:* petsc-users at mcs.anl.gov >>>> *Subject:* [petsc-users] How to activate the modified Gram-Schmidt >>>> orthogonalization process in Fortran? >>>> >>>> Dear Petsc users, >>>> >>>> I found an ancient thread discussing this problem. >>>> >>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html >>>> >>>> However, when I add >>>> >>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>> call >>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>> >>>> the program will tell me >>>> >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 >>>> >>>> I would like to know the most correct way to activate the modified >>>> Gram-Schmidt orthogonalization process in Fortran. Thank you very much! >>>> >>>> Best regards. >>>> >>>> >>>> >>>> >>>> -- >>>> Zhuo Chen >>>> Department of Physics >>>> University of Alberta >>>> Edmonton Alberta, Canada T6G 2E1 >>>> http://www.pas.rochester.edu/~zchen25/ >>>> >>>> >>>> >>>> -- >>>> Zhuo Chen >>>> Department of Physics >>>> University of Alberta >>>> Edmonton Alberta, Canada T6G 2E1 >>>> http://www.pas.rochester.edu/~zchen25/ >>>> >>>> >>>> >>>> -- >>>> Zhuo Chen >>>> Department of Physics >>>> University of Alberta >>>> Edmonton Alberta, Canada T6G 2E1 >>>> http://www.pas.rochester.edu/~zchen25/ >>>> >>> >>> >>> -- >>> Zhuo Chen >>> Department of Physics >>> University of Alberta >>> Edmonton Alberta, Canada T6G 2E1 >>> http://www.pas.rochester.edu/~zchen25/ >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenzhuotj at gmail.com Fri Sep 11 15:10:30 2020 From: chenzhuotj at gmail.com (Zhuo Chen) Date: Fri, 11 Sep 2020 14:10:30 -0600 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: Hi Matthew, Yes. These four lines are in a do while loop. On Fri, Sep 11, 2020 at 2:07 PM Matthew Knepley wrote: > On Fri, Sep 11, 2020 at 3:56 PM Zhuo Chen wrote: > >> Hi Matthew, >> >> Yes, if use >> >> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >> call >> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >> call >> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_view','',ierr);CHKERRQ(ierr) >> > > Are you running in a loop? > > Thanks, > > Matt > > >> to see if the modified Gram-Schmidt process is actually active. The >> output is >> >> KSP Object: 4 MPI processes >> type: gmres >> restart=30, using Modified Gram-Schmidt Orthogonalization >> happy breakdown tolerance 1e-30 >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-11, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> >> I think that means calling PetscOptionsSetValue() >> after KSPSetFromOptions() works. Correct me if I am wrong. >> >> Best. >> >> On Fri, Sep 11, 2020 at 1:31 PM Matthew Knepley >> wrote: >> >>> On Fri, Sep 11, 2020 at 3:05 PM Zhuo Chen wrote: >>> >>>> Hi Hong, >>>> >>>> Thank you very much for the plan. >>>> >>>> Though it may be obvious to many Petsc gurus. I would like to summarize >>>> my solution to activate the modified Gram-Schmidt orthogonalization process >>>> in Fortran now. It may help some new Petsc users. >>>> >>>> Option 1: append the -ksp_gmres_modifiedgramschmidt at runtime, the >>>> code would look like >>>> >>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>> >>>> and to run the program, use >>>> >>>> mpiexec -np 2 ./run -ksp_gmres_modifiedgramschmidt >>>> >>>> Option 2: use PetscOptionsSetValue() >>>> >>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>> call >>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>> >>> >>> Does it work if you call SetValue() after SetFromOptions()? I would not >>> think that would work. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> and to run the program, use >>>> >>>> mpiexec -np 2 ./run >>>> >>>> Best. >>>> >>>> >>>> >>>> On Fri, Sep 11, 2020 at 9:05 AM Zhang, Hong wrote: >>>> >>>>> Zhuo, >>>>> I'll try to get it done after the incoming release. My hands are full >>>>> with more urgent tasks at moment. I'll let you know after I'm done. >>>>> Thanks for your patience. >>>>> Hong >>>>> >>>>> ------------------------------ >>>>> *From:* Zhuo Chen >>>>> *Sent:* Thursday, September 10, 2020 8:41 PM >>>>> *To:* Zhang, Hong >>>>> *Cc:* petsc-users at mcs.anl.gov >>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>> >>>>> Hi Hong, >>>>> >>>>> According to that very old thread, KSPGMRESSetOrthogonalization was >>>>> not implemented in Fortran. I did as you suggested and the compiler will >>>>> tell me >>>>> >>>>> undefined reference to `kspgmressetorthogonalization_' >>>>> >>>>> I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank >>>>> you so much! >>>>> >>>>> On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong >>>>> wrote: >>>>> >>>>> Zhuo, >>>>> Call >>>>> KSPSetType(ksp,KSPGMRES); >>>>> >>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); >>>>> Hong >>>>> >>>>> ------------------------------ >>>>> *From:* Zhuo Chen >>>>> *Sent:* Thursday, September 10, 2020 8:17 PM >>>>> *To:* Zhang, Hong >>>>> *Cc:* petsc-users at mcs.anl.gov >>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>> >>>>> Hi Hong, >>>>> >>>>> Thank you very much for your help. >>>>> >>>>> It seems that if I simply append -ksp_gmres_modifiedgramschmidt the >>>>> warning goes away. However >>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) >>>>> has another issue. >>>>> >>>>> Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) >>>>> has no IMPLICIT type >>>>> >>>>> Is it because the argument is too long? I am using gcc 8.4.0 instead >>>>> of ifort >>>>> >>>>> On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong >>>>> wrote: >>>>> >>>>> Zhuo, >>>>> Run your code with option '-ksp_gmres_modifiedgramschmidt'. For >>>>> example, >>>>> petsc/src/ksp/ksp/tutorials >>>>> mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt >>>>> KSP Object: 2 MPI processes >>>>> type: gmres >>>>> restart=30, using Modified Gram-Schmidt Orthogonalization >>>>> happy breakdown tolerance 1e-30 >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using PRECONDITIONED norm type for convergence test >>>>> PC Object: 2 MPI processes >>>>> type: bjacobi >>>>> ... >>>>> >>>>> You can >>>>> call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) >>>>> in your program. >>>>> >>>>> Hong >>>>> >>>>> ------------------------------ >>>>> *From:* petsc-users on behalf of >>>>> Zhuo Chen >>>>> *Sent:* Thursday, September 10, 2020 7:52 PM >>>>> *To:* petsc-users at mcs.anl.gov >>>>> *Subject:* [petsc-users] How to activate the modified Gram-Schmidt >>>>> orthogonalization process in Fortran? >>>>> >>>>> Dear Petsc users, >>>>> >>>>> I found an ancient thread discussing this problem. >>>>> >>>>> >>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html >>>>> >>>>> However, when I add >>>>> >>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>> call >>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>> >>>>> the program will tell me >>>>> >>>>> WARNING! There are options you set that were not used! >>>>> WARNING! could be spelling mistake, etc! >>>>> There is one unused database option. It is: >>>>> Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 >>>>> >>>>> I would like to know the most correct way to activate the modified >>>>> Gram-Schmidt orthogonalization process in Fortran. Thank you very much! >>>>> >>>>> Best regards. >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Zhuo Chen >>>>> Department of Physics >>>>> University of Alberta >>>>> Edmonton Alberta, Canada T6G 2E1 >>>>> http://www.pas.rochester.edu/~zchen25/ >>>>> >>>>> >>>>> >>>>> -- >>>>> Zhuo Chen >>>>> Department of Physics >>>>> University of Alberta >>>>> Edmonton Alberta, Canada T6G 2E1 >>>>> http://www.pas.rochester.edu/~zchen25/ >>>>> >>>>> >>>>> >>>>> -- >>>>> Zhuo Chen >>>>> Department of Physics >>>>> University of Alberta >>>>> Edmonton Alberta, Canada T6G 2E1 >>>>> http://www.pas.rochester.edu/~zchen25/ >>>>> >>>> >>>> >>>> -- >>>> Zhuo Chen >>>> Department of Physics >>>> University of Alberta >>>> Edmonton Alberta, Canada T6G 2E1 >>>> http://www.pas.rochester.edu/~zchen25/ >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> Zhuo Chen >> Department of Physics >> University of Alberta >> Edmonton Alberta, Canada T6G 2E1 >> http://www.pas.rochester.edu/~zchen25/ >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 11 15:16:57 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 11 Sep 2020 16:16:57 -0400 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: On Fri, Sep 11, 2020 at 4:10 PM Zhuo Chen wrote: > Hi Matthew, > > Yes. These four lines are in a do while loop. > The first time through, it will not work :) Thanks, Matt > On Fri, Sep 11, 2020 at 2:07 PM Matthew Knepley wrote: > >> On Fri, Sep 11, 2020 at 3:56 PM Zhuo Chen wrote: >> >>> Hi Matthew, >>> >>> Yes, if use >>> >>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>> call >>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>> call >>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_view','',ierr);CHKERRQ(ierr) >>> >> >> Are you running in a loop? >> >> Thanks, >> >> Matt >> >> >>> to see if the modified Gram-Schmidt process is actually active. The >>> output is >>> >>> KSP Object: 4 MPI processes >>> type: gmres >>> restart=30, using Modified Gram-Schmidt Orthogonalization >>> happy breakdown tolerance 1e-30 >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-11, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using PRECONDITIONED norm type for convergence test >>> >>> I think that means calling PetscOptionsSetValue() >>> after KSPSetFromOptions() works. Correct me if I am wrong. >>> >>> Best. >>> >>> On Fri, Sep 11, 2020 at 1:31 PM Matthew Knepley >>> wrote: >>> >>>> On Fri, Sep 11, 2020 at 3:05 PM Zhuo Chen wrote: >>>> >>>>> Hi Hong, >>>>> >>>>> Thank you very much for the plan. >>>>> >>>>> Though it may be obvious to many Petsc gurus. I would like to >>>>> summarize my solution to activate the modified Gram-Schmidt >>>>> orthogonalization process in Fortran now. It may help some new Petsc users. >>>>> >>>>> Option 1: append the -ksp_gmres_modifiedgramschmidt at runtime, the >>>>> code would look like >>>>> >>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>> >>>>> and to run the program, use >>>>> >>>>> mpiexec -np 2 ./run -ksp_gmres_modifiedgramschmidt >>>>> >>>>> Option 2: use PetscOptionsSetValue() >>>>> >>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>> call >>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>> >>>> >>>> Does it work if you call SetValue() after SetFromOptions()? I would not >>>> think that would work. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> and to run the program, use >>>>> >>>>> mpiexec -np 2 ./run >>>>> >>>>> Best. >>>>> >>>>> >>>>> >>>>> On Fri, Sep 11, 2020 at 9:05 AM Zhang, Hong >>>>> wrote: >>>>> >>>>>> Zhuo, >>>>>> I'll try to get it done after the incoming release. My hands are full >>>>>> with more urgent tasks at moment. I'll let you know after I'm done. >>>>>> Thanks for your patience. >>>>>> Hong >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Zhuo Chen >>>>>> *Sent:* Thursday, September 10, 2020 8:41 PM >>>>>> *To:* Zhang, Hong >>>>>> *Cc:* petsc-users at mcs.anl.gov >>>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>>> >>>>>> Hi Hong, >>>>>> >>>>>> According to that very old thread, KSPGMRESSetOrthogonalization was >>>>>> not implemented in Fortran. I did as you suggested and the compiler will >>>>>> tell me >>>>>> >>>>>> undefined reference to `kspgmressetorthogonalization_' >>>>>> >>>>>> I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank >>>>>> you so much! >>>>>> >>>>>> On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong >>>>>> wrote: >>>>>> >>>>>> Zhuo, >>>>>> Call >>>>>> KSPSetType(ksp,KSPGMRES); >>>>>> >>>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); >>>>>> Hong >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Zhuo Chen >>>>>> *Sent:* Thursday, September 10, 2020 8:17 PM >>>>>> *To:* Zhang, Hong >>>>>> *Cc:* petsc-users at mcs.anl.gov >>>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>>> >>>>>> Hi Hong, >>>>>> >>>>>> Thank you very much for your help. >>>>>> >>>>>> It seems that if I simply append -ksp_gmres_modifiedgramschmidt the >>>>>> warning goes away. However >>>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) >>>>>> has another issue. >>>>>> >>>>>> Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) >>>>>> has no IMPLICIT type >>>>>> >>>>>> Is it because the argument is too long? I am using gcc 8.4.0 instead >>>>>> of ifort >>>>>> >>>>>> On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong >>>>>> wrote: >>>>>> >>>>>> Zhuo, >>>>>> Run your code with option '-ksp_gmres_modifiedgramschmidt'. For >>>>>> example, >>>>>> petsc/src/ksp/ksp/tutorials >>>>>> mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt >>>>>> KSP Object: 2 MPI processes >>>>>> type: gmres >>>>>> restart=30, using Modified Gram-Schmidt Orthogonalization >>>>>> happy breakdown tolerance 1e-30 >>>>>> maximum iterations=10000, initial guess is zero >>>>>> tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. >>>>>> left preconditioning >>>>>> using PRECONDITIONED norm type for convergence test >>>>>> PC Object: 2 MPI processes >>>>>> type: bjacobi >>>>>> ... >>>>>> >>>>>> You can >>>>>> call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) >>>>>> in your program. >>>>>> >>>>>> Hong >>>>>> >>>>>> ------------------------------ >>>>>> *From:* petsc-users on behalf of >>>>>> Zhuo Chen >>>>>> *Sent:* Thursday, September 10, 2020 7:52 PM >>>>>> *To:* petsc-users at mcs.anl.gov >>>>>> *Subject:* [petsc-users] How to activate the modified Gram-Schmidt >>>>>> orthogonalization process in Fortran? >>>>>> >>>>>> Dear Petsc users, >>>>>> >>>>>> I found an ancient thread discussing this problem. >>>>>> >>>>>> >>>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html >>>>>> >>>>>> However, when I add >>>>>> >>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>> call >>>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>>> >>>>>> the program will tell me >>>>>> >>>>>> WARNING! There are options you set that were not used! >>>>>> WARNING! could be spelling mistake, etc! >>>>>> There is one unused database option. It is: >>>>>> Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 >>>>>> >>>>>> I would like to know the most correct way to activate the modified >>>>>> Gram-Schmidt orthogonalization process in Fortran. Thank you very much! >>>>>> >>>>>> Best regards. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Zhuo Chen >>>>>> Department of Physics >>>>>> University of Alberta >>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Zhuo Chen >>>>>> Department of Physics >>>>>> University of Alberta >>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Zhuo Chen >>>>>> Department of Physics >>>>>> University of Alberta >>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>> >>>>> >>>>> >>>>> -- >>>>> Zhuo Chen >>>>> Department of Physics >>>>> University of Alberta >>>>> Edmonton Alberta, Canada T6G 2E1 >>>>> http://www.pas.rochester.edu/~zchen25/ >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> Zhuo Chen >>> Department of Physics >>> University of Alberta >>> Edmonton Alberta, Canada T6G 2E1 >>> http://www.pas.rochester.edu/~zchen25/ >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenzhuotj at gmail.com Fri Sep 11 15:37:57 2020 From: chenzhuotj at gmail.com (Zhuo Chen) Date: Fri, 11 Sep 2020 14:37:57 -0600 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: Hi Matthew, I am sorry if I misunderstood. Do you mean the modified Gram-Schmidt will be working for the first time but not for the subsequent times or the reverse? I have checked the output of a complete loop, and it print out the same lines, i.e., KSP Object: 4 MPI processes type: gmres restart=30, using Modified Gram-Schmidt Orthogonalization happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test in each loop. On Fri, Sep 11, 2020 at 2:17 PM Matthew Knepley wrote: > On Fri, Sep 11, 2020 at 4:10 PM Zhuo Chen wrote: > >> Hi Matthew, >> >> Yes. These four lines are in a do while loop. >> > > The first time through, it will not work :) > > Thanks, > > Matt > > >> On Fri, Sep 11, 2020 at 2:07 PM Matthew Knepley >> wrote: >> >>> On Fri, Sep 11, 2020 at 3:56 PM Zhuo Chen wrote: >>> >>>> Hi Matthew, >>>> >>>> Yes, if use >>>> >>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>> call >>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>> call >>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_view','',ierr);CHKERRQ(ierr) >>>> >>> >>> Are you running in a loop? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> to see if the modified Gram-Schmidt process is actually active. The >>>> output is >>>> >>>> KSP Object: 4 MPI processes >>>> type: gmres >>>> restart=30, using Modified Gram-Schmidt Orthogonalization >>>> happy breakdown tolerance 1e-30 >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-11, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using PRECONDITIONED norm type for convergence test >>>> >>>> I think that means calling PetscOptionsSetValue() >>>> after KSPSetFromOptions() works. Correct me if I am wrong. >>>> >>>> Best. >>>> >>>> On Fri, Sep 11, 2020 at 1:31 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Fri, Sep 11, 2020 at 3:05 PM Zhuo Chen >>>>> wrote: >>>>> >>>>>> Hi Hong, >>>>>> >>>>>> Thank you very much for the plan. >>>>>> >>>>>> Though it may be obvious to many Petsc gurus. I would like to >>>>>> summarize my solution to activate the modified Gram-Schmidt >>>>>> orthogonalization process in Fortran now. It may help some new Petsc users. >>>>>> >>>>>> Option 1: append the -ksp_gmres_modifiedgramschmidt at runtime, the >>>>>> code would look like >>>>>> >>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>>> >>>>>> and to run the program, use >>>>>> >>>>>> mpiexec -np 2 ./run -ksp_gmres_modifiedgramschmidt >>>>>> >>>>>> Option 2: use PetscOptionsSetValue() >>>>>> >>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>>> call >>>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>>> >>>>> >>>>> Does it work if you call SetValue() after SetFromOptions()? I would >>>>> not think that would work. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> and to run the program, use >>>>>> >>>>>> mpiexec -np 2 ./run >>>>>> >>>>>> Best. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 11, 2020 at 9:05 AM Zhang, Hong >>>>>> wrote: >>>>>> >>>>>>> Zhuo, >>>>>>> I'll try to get it done after the incoming release. My hands are >>>>>>> full with more urgent tasks at moment. I'll let you know after I'm done. >>>>>>> Thanks for your patience. >>>>>>> Hong >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Zhuo Chen >>>>>>> *Sent:* Thursday, September 10, 2020 8:41 PM >>>>>>> *To:* Zhang, Hong >>>>>>> *Cc:* petsc-users at mcs.anl.gov >>>>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>>>> >>>>>>> Hi Hong, >>>>>>> >>>>>>> According to that very old thread, KSPGMRESSetOrthogonalization was >>>>>>> not implemented in Fortran. I did as you suggested and the compiler will >>>>>>> tell me >>>>>>> >>>>>>> undefined reference to `kspgmressetorthogonalization_' >>>>>>> >>>>>>> I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank >>>>>>> you so much! >>>>>>> >>>>>>> On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong >>>>>>> wrote: >>>>>>> >>>>>>> Zhuo, >>>>>>> Call >>>>>>> KSPSetType(ksp,KSPGMRES); >>>>>>> >>>>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); >>>>>>> Hong >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Zhuo Chen >>>>>>> *Sent:* Thursday, September 10, 2020 8:17 PM >>>>>>> *To:* Zhang, Hong >>>>>>> *Cc:* petsc-users at mcs.anl.gov >>>>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>>>> >>>>>>> Hi Hong, >>>>>>> >>>>>>> Thank you very much for your help. >>>>>>> >>>>>>> It seems that if I simply append -ksp_gmres_modifiedgramschmidt the >>>>>>> warning goes away. However >>>>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) >>>>>>> has another issue. >>>>>>> >>>>>>> Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) >>>>>>> has no IMPLICIT type >>>>>>> >>>>>>> Is it because the argument is too long? I am using gcc 8.4.0 instead >>>>>>> of ifort >>>>>>> >>>>>>> On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong >>>>>>> wrote: >>>>>>> >>>>>>> Zhuo, >>>>>>> Run your code with option '-ksp_gmres_modifiedgramschmidt'. For >>>>>>> example, >>>>>>> petsc/src/ksp/ksp/tutorials >>>>>>> mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt >>>>>>> KSP Object: 2 MPI processes >>>>>>> type: gmres >>>>>>> restart=30, using Modified Gram-Schmidt Orthogonalization >>>>>>> happy breakdown tolerance 1e-30 >>>>>>> maximum iterations=10000, initial guess is zero >>>>>>> tolerances: relative=0.000138889, absolute=1e-50, >>>>>>> divergence=10000. >>>>>>> left preconditioning >>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>> PC Object: 2 MPI processes >>>>>>> type: bjacobi >>>>>>> ... >>>>>>> >>>>>>> You can >>>>>>> call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) >>>>>>> in your program. >>>>>>> >>>>>>> Hong >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* petsc-users on behalf of >>>>>>> Zhuo Chen >>>>>>> *Sent:* Thursday, September 10, 2020 7:52 PM >>>>>>> *To:* petsc-users at mcs.anl.gov >>>>>>> *Subject:* [petsc-users] How to activate the modified Gram-Schmidt >>>>>>> orthogonalization process in Fortran? >>>>>>> >>>>>>> Dear Petsc users, >>>>>>> >>>>>>> I found an ancient thread discussing this problem. >>>>>>> >>>>>>> >>>>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html >>>>>>> >>>>>>> However, when I add >>>>>>> >>>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>>> call >>>>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>>>> >>>>>>> the program will tell me >>>>>>> >>>>>>> WARNING! There are options you set that were not used! >>>>>>> WARNING! could be spelling mistake, etc! >>>>>>> There is one unused database option. It is: >>>>>>> Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 >>>>>>> >>>>>>> I would like to know the most correct way to activate the modified >>>>>>> Gram-Schmidt orthogonalization process in Fortran. Thank you very much! >>>>>>> >>>>>>> Best regards. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Zhuo Chen >>>>>>> Department of Physics >>>>>>> University of Alberta >>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Zhuo Chen >>>>>>> Department of Physics >>>>>>> University of Alberta >>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Zhuo Chen >>>>>>> Department of Physics >>>>>>> University of Alberta >>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Zhuo Chen >>>>>> Department of Physics >>>>>> University of Alberta >>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>> >>>> -- >>>> Zhuo Chen >>>> Department of Physics >>>> University of Alberta >>>> Edmonton Alberta, Canada T6G 2E1 >>>> http://www.pas.rochester.edu/~zchen25/ >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> Zhuo Chen >> Department of Physics >> University of Alberta >> Edmonton Alberta, Canada T6G 2E1 >> http://www.pas.rochester.edu/~zchen25/ >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 11 15:39:44 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 11 Sep 2020 16:39:44 -0400 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: On Fri, Sep 11, 2020 at 4:38 PM Zhuo Chen wrote: > Hi Matthew, > > I am sorry if I misunderstood. Do you mean the modified Gram-Schmidt will > be working for the first time but not for the subsequent times or the > reverse? I have checked the output of a complete loop, and it print out the > same lines, i.e., > > KSP Object: 4 MPI processes > type: gmres > restart=30, using Modified Gram-Schmidt Orthogonalization > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > > in each loop. > If you put the SetValue() call before the SetFromOptions() call, everything will be fine. The first time through your loop, the options will not be in the database, and thus you would not get what you expect. Thanks, Matt > On Fri, Sep 11, 2020 at 2:17 PM Matthew Knepley wrote: > >> On Fri, Sep 11, 2020 at 4:10 PM Zhuo Chen wrote: >> >>> Hi Matthew, >>> >>> Yes. These four lines are in a do while loop. >>> >> >> The first time through, it will not work :) >> >> Thanks, >> >> Matt >> >> >>> On Fri, Sep 11, 2020 at 2:07 PM Matthew Knepley >>> wrote: >>> >>>> On Fri, Sep 11, 2020 at 3:56 PM Zhuo Chen wrote: >>>> >>>>> Hi Matthew, >>>>> >>>>> Yes, if use >>>>> >>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>> call >>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>> call >>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_view','',ierr);CHKERRQ(ierr) >>>>> >>>> >>>> Are you running in a loop? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> to see if the modified Gram-Schmidt process is actually active. The >>>>> output is >>>>> >>>>> KSP Object: 4 MPI processes >>>>> type: gmres >>>>> restart=30, using Modified Gram-Schmidt Orthogonalization >>>>> happy breakdown tolerance 1e-30 >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-11, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using PRECONDITIONED norm type for convergence test >>>>> >>>>> I think that means calling PetscOptionsSetValue() >>>>> after KSPSetFromOptions() works. Correct me if I am wrong. >>>>> >>>>> Best. >>>>> >>>>> On Fri, Sep 11, 2020 at 1:31 PM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Fri, Sep 11, 2020 at 3:05 PM Zhuo Chen >>>>>> wrote: >>>>>> >>>>>>> Hi Hong, >>>>>>> >>>>>>> Thank you very much for the plan. >>>>>>> >>>>>>> Though it may be obvious to many Petsc gurus. I would like to >>>>>>> summarize my solution to activate the modified Gram-Schmidt >>>>>>> orthogonalization process in Fortran now. It may help some new Petsc users. >>>>>>> >>>>>>> Option 1: append the -ksp_gmres_modifiedgramschmidt at runtime, the >>>>>>> code would look like >>>>>>> >>>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>>>> >>>>>>> and to run the program, use >>>>>>> >>>>>>> mpiexec -np 2 ./run -ksp_gmres_modifiedgramschmidt >>>>>>> >>>>>>> Option 2: use PetscOptionsSetValue() >>>>>>> >>>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>>>> call >>>>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>>>> >>>>>> >>>>>> Does it work if you call SetValue() after SetFromOptions()? I would >>>>>> not think that would work. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> and to run the program, use >>>>>>> >>>>>>> mpiexec -np 2 ./run >>>>>>> >>>>>>> Best. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 11, 2020 at 9:05 AM Zhang, Hong >>>>>>> wrote: >>>>>>> >>>>>>>> Zhuo, >>>>>>>> I'll try to get it done after the incoming release. My hands are >>>>>>>> full with more urgent tasks at moment. I'll let you know after I'm done. >>>>>>>> Thanks for your patience. >>>>>>>> Hong >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> *From:* Zhuo Chen >>>>>>>> *Sent:* Thursday, September 10, 2020 8:41 PM >>>>>>>> *To:* Zhang, Hong >>>>>>>> *Cc:* petsc-users at mcs.anl.gov >>>>>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>>>>> >>>>>>>> Hi Hong, >>>>>>>> >>>>>>>> According to that very old thread, KSPGMRESSetOrthogonalization was >>>>>>>> not implemented in Fortran. I did as you suggested and the compiler will >>>>>>>> tell me >>>>>>>> >>>>>>>> undefined reference to `kspgmressetorthogonalization_' >>>>>>>> >>>>>>>> I think I will use the -ksp_gmres_modifiedgramschmidt method. Thank >>>>>>>> you so much! >>>>>>>> >>>>>>>> On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong >>>>>>>> wrote: >>>>>>>> >>>>>>>> Zhuo, >>>>>>>> Call >>>>>>>> KSPSetType(ksp,KSPGMRES); >>>>>>>> >>>>>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); >>>>>>>> Hong >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> *From:* Zhuo Chen >>>>>>>> *Sent:* Thursday, September 10, 2020 8:17 PM >>>>>>>> *To:* Zhang, Hong >>>>>>>> *Cc:* petsc-users at mcs.anl.gov >>>>>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>>>>> >>>>>>>> Hi Hong, >>>>>>>> >>>>>>>> Thank you very much for your help. >>>>>>>> >>>>>>>> It seems that if I simply append -ksp_gmres_modifiedgramschmidt the >>>>>>>> warning goes away. However >>>>>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) >>>>>>>> has another issue. >>>>>>>> >>>>>>>> Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at (1) >>>>>>>> has no IMPLICIT type >>>>>>>> >>>>>>>> Is it because the argument is too long? I am using gcc 8.4.0 >>>>>>>> instead of ifort >>>>>>>> >>>>>>>> On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong >>>>>>>> wrote: >>>>>>>> >>>>>>>> Zhuo, >>>>>>>> Run your code with option '-ksp_gmres_modifiedgramschmidt'. For >>>>>>>> example, >>>>>>>> petsc/src/ksp/ksp/tutorials >>>>>>>> mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt >>>>>>>> KSP Object: 2 MPI processes >>>>>>>> type: gmres >>>>>>>> restart=30, using Modified Gram-Schmidt Orthogonalization >>>>>>>> happy breakdown tolerance 1e-30 >>>>>>>> maximum iterations=10000, initial guess is zero >>>>>>>> tolerances: relative=0.000138889, absolute=1e-50, >>>>>>>> divergence=10000. >>>>>>>> left preconditioning >>>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>>> PC Object: 2 MPI processes >>>>>>>> type: bjacobi >>>>>>>> ... >>>>>>>> >>>>>>>> You can >>>>>>>> call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) >>>>>>>> in your program. >>>>>>>> >>>>>>>> Hong >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> *From:* petsc-users on behalf of >>>>>>>> Zhuo Chen >>>>>>>> *Sent:* Thursday, September 10, 2020 7:52 PM >>>>>>>> *To:* petsc-users at mcs.anl.gov >>>>>>>> *Subject:* [petsc-users] How to activate the modified Gram-Schmidt >>>>>>>> orthogonalization process in Fortran? >>>>>>>> >>>>>>>> Dear Petsc users, >>>>>>>> >>>>>>>> I found an ancient thread discussing this problem. >>>>>>>> >>>>>>>> >>>>>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html >>>>>>>> >>>>>>>> However, when I add >>>>>>>> >>>>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>>>> call >>>>>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>>>>> >>>>>>>> the program will tell me >>>>>>>> >>>>>>>> WARNING! There are options you set that were not used! >>>>>>>> WARNING! could be spelling mistake, etc! >>>>>>>> There is one unused database option. It is: >>>>>>>> Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 >>>>>>>> >>>>>>>> I would like to know the most correct way to activate the modified >>>>>>>> Gram-Schmidt orthogonalization process in Fortran. Thank you very much! >>>>>>>> >>>>>>>> Best regards. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Zhuo Chen >>>>>>>> Department of Physics >>>>>>>> University of Alberta >>>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Zhuo Chen >>>>>>>> Department of Physics >>>>>>>> University of Alberta >>>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Zhuo Chen >>>>>>>> Department of Physics >>>>>>>> University of Alberta >>>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Zhuo Chen >>>>>>> Department of Physics >>>>>>> University of Alberta >>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Zhuo Chen >>>>> Department of Physics >>>>> University of Alberta >>>>> Edmonton Alberta, Canada T6G 2E1 >>>>> http://www.pas.rochester.edu/~zchen25/ >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> Zhuo Chen >>> Department of Physics >>> University of Alberta >>> Edmonton Alberta, Canada T6G 2E1 >>> http://www.pas.rochester.edu/~zchen25/ >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenzhuotj at gmail.com Fri Sep 11 15:42:44 2020 From: chenzhuotj at gmail.com (Zhuo Chen) Date: Fri, 11 Sep 2020 14:42:44 -0600 Subject: [petsc-users] How to activate the modified Gram-Schmidt orthogonalization process in Fortran? In-Reply-To: References: Message-ID: Wow, thank you so much. Indeed, there is one output missing. Yes, should put SetValue() before SetFromOptions() Best regards. On Fri, Sep 11, 2020 at 2:39 PM Matthew Knepley wrote: > On Fri, Sep 11, 2020 at 4:38 PM Zhuo Chen wrote: > >> Hi Matthew, >> >> I am sorry if I misunderstood. Do you mean the modified Gram-Schmidt will >> be working for the first time but not for the subsequent times or the >> reverse? I have checked the output of a complete loop, and it print out the >> same lines, i.e., >> >> KSP Object: 4 MPI processes >> type: gmres >> restart=30, using Modified Gram-Schmidt Orthogonalization >> happy breakdown tolerance 1e-30 >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-08, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> >> in each loop. >> > > If you put the SetValue() call before the SetFromOptions() call, > everything will be fine. > > The first time through your loop, the options will not be in the database, > and thus you would not get what you expect. > > Thanks, > > Matt > > >> On Fri, Sep 11, 2020 at 2:17 PM Matthew Knepley >> wrote: >> >>> On Fri, Sep 11, 2020 at 4:10 PM Zhuo Chen wrote: >>> >>>> Hi Matthew, >>>> >>>> Yes. These four lines are in a do while loop. >>>> >>> >>> The first time through, it will not work :) >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> On Fri, Sep 11, 2020 at 2:07 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Fri, Sep 11, 2020 at 3:56 PM Zhuo Chen >>>>> wrote: >>>>> >>>>>> Hi Matthew, >>>>>> >>>>>> Yes, if use >>>>>> >>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>>> call >>>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>>> call >>>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_view','',ierr);CHKERRQ(ierr) >>>>>> >>>>> >>>>> Are you running in a loop? >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> to see if the modified Gram-Schmidt process is actually active. The >>>>>> output is >>>>>> >>>>>> KSP Object: 4 MPI processes >>>>>> type: gmres >>>>>> restart=30, using Modified Gram-Schmidt Orthogonalization >>>>>> happy breakdown tolerance 1e-30 >>>>>> maximum iterations=10000, initial guess is zero >>>>>> tolerances: relative=1e-11, absolute=1e-50, divergence=10000. >>>>>> left preconditioning >>>>>> using PRECONDITIONED norm type for convergence test >>>>>> >>>>>> I think that means calling PetscOptionsSetValue() >>>>>> after KSPSetFromOptions() works. Correct me if I am wrong. >>>>>> >>>>>> Best. >>>>>> >>>>>> On Fri, Sep 11, 2020 at 1:31 PM Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Fri, Sep 11, 2020 at 3:05 PM Zhuo Chen >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Hong, >>>>>>>> >>>>>>>> Thank you very much for the plan. >>>>>>>> >>>>>>>> Though it may be obvious to many Petsc gurus. I would like to >>>>>>>> summarize my solution to activate the modified Gram-Schmidt >>>>>>>> orthogonalization process in Fortran now. It may help some new Petsc users. >>>>>>>> >>>>>>>> Option 1: append the -ksp_gmres_modifiedgramschmidt at runtime, the >>>>>>>> code would look like >>>>>>>> >>>>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>>>>> >>>>>>>> and to run the program, use >>>>>>>> >>>>>>>> mpiexec -np 2 ./run -ksp_gmres_modifiedgramschmidt >>>>>>>> >>>>>>>> Option 2: use PetscOptionsSetValue() >>>>>>>> >>>>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>>>> call KSPSetFromOptions(ksp,ierr);CHKERRQ(ierr) >>>>>>>> call >>>>>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>>>>> >>>>>>> >>>>>>> Does it work if you call SetValue() after SetFromOptions()? I would >>>>>>> not think that would work. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> and to run the program, use >>>>>>>> >>>>>>>> mpiexec -np 2 ./run >>>>>>>> >>>>>>>> Best. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 11, 2020 at 9:05 AM Zhang, Hong >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Zhuo, >>>>>>>>> I'll try to get it done after the incoming release. My hands are >>>>>>>>> full with more urgent tasks at moment. I'll let you know after I'm done. >>>>>>>>> Thanks for your patience. >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> ------------------------------ >>>>>>>>> *From:* Zhuo Chen >>>>>>>>> *Sent:* Thursday, September 10, 2020 8:41 PM >>>>>>>>> *To:* Zhang, Hong >>>>>>>>> *Cc:* petsc-users at mcs.anl.gov >>>>>>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>>>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>>>>>> >>>>>>>>> Hi Hong, >>>>>>>>> >>>>>>>>> According to that very old thread, KSPGMRESSetOrthogonalization >>>>>>>>> was not implemented in Fortran. I did as you suggested and the compiler >>>>>>>>> will tell me >>>>>>>>> >>>>>>>>> undefined reference to `kspgmressetorthogonalization_' >>>>>>>>> >>>>>>>>> I think I will use the -ksp_gmres_modifiedgramschmidt method. >>>>>>>>> Thank you so much! >>>>>>>>> >>>>>>>>> On Thu, Sep 10, 2020 at 7:32 PM Zhang, Hong >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Zhuo, >>>>>>>>> Call >>>>>>>>> KSPSetType(ksp,KSPGMRES); >>>>>>>>> >>>>>>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization); >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> ------------------------------ >>>>>>>>> *From:* Zhuo Chen >>>>>>>>> *Sent:* Thursday, September 10, 2020 8:17 PM >>>>>>>>> *To:* Zhang, Hong >>>>>>>>> *Cc:* petsc-users at mcs.anl.gov >>>>>>>>> *Subject:* Re: [petsc-users] How to activate the modified >>>>>>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>>>>>> >>>>>>>>> Hi Hong, >>>>>>>>> >>>>>>>>> Thank you very much for your help. >>>>>>>>> >>>>>>>>> It seems that if I simply append -ksp_gmres_modifiedgramschmidt >>>>>>>>> the warning goes away. However >>>>>>>>> KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization,ierr) >>>>>>>>> has another issue. >>>>>>>>> >>>>>>>>> Error: Symbol ?kspgmresmodifiedgramschmidtorthogonalization? at >>>>>>>>> (1) has no IMPLICIT type >>>>>>>>> >>>>>>>>> Is it because the argument is too long? I am using gcc 8.4.0 >>>>>>>>> instead of ifort >>>>>>>>> >>>>>>>>> On Thu, Sep 10, 2020 at 7:08 PM Zhang, Hong >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Zhuo, >>>>>>>>> Run your code with option '-ksp_gmres_modifiedgramschmidt'. For >>>>>>>>> example, >>>>>>>>> petsc/src/ksp/ksp/tutorials >>>>>>>>> mpiexec -n 2 ./ex2 -ksp_view -ksp_gmres_modifiedgramschmidt >>>>>>>>> KSP Object: 2 MPI processes >>>>>>>>> type: gmres >>>>>>>>> restart=30, using Modified Gram-Schmidt Orthogonalization >>>>>>>>> happy breakdown tolerance 1e-30 >>>>>>>>> maximum iterations=10000, initial guess is zero >>>>>>>>> tolerances: relative=0.000138889, absolute=1e-50, >>>>>>>>> divergence=10000. >>>>>>>>> left preconditioning >>>>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>>>> PC Object: 2 MPI processes >>>>>>>>> type: bjacobi >>>>>>>>> ... >>>>>>>>> >>>>>>>>> You can >>>>>>>>> call KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization) >>>>>>>>> in your program. >>>>>>>>> >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> ------------------------------ >>>>>>>>> *From:* petsc-users on behalf >>>>>>>>> of Zhuo Chen >>>>>>>>> *Sent:* Thursday, September 10, 2020 7:52 PM >>>>>>>>> *To:* petsc-users at mcs.anl.gov >>>>>>>>> *Subject:* [petsc-users] How to activate the modified >>>>>>>>> Gram-Schmidt orthogonalization process in Fortran? >>>>>>>>> >>>>>>>>> Dear Petsc users, >>>>>>>>> >>>>>>>>> I found an ancient thread discussing this problem. >>>>>>>>> >>>>>>>>> >>>>>>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2011-October/010607.html >>>>>>>>> >>>>>>>>> However, when I add >>>>>>>>> >>>>>>>>> call KSPSetType(ksp,KSPGMRES,ierr);CHKERRQ(ierr) >>>>>>>>> call >>>>>>>>> PetscOptionsSetValue(PETSC_NULL_OPTIONS,'-ksp_gmres_modifiedgramschmidt','1',ierr);CHKERRQ(ierr) >>>>>>>>> >>>>>>>>> the program will tell me >>>>>>>>> >>>>>>>>> WARNING! There are options you set that were not used! >>>>>>>>> WARNING! could be spelling mistake, etc! >>>>>>>>> There is one unused database option. It is: >>>>>>>>> Option left: name:-ksp_gmres_modifiedgramschmidt value: 1 >>>>>>>>> >>>>>>>>> I would like to know the most correct way to activate the modified >>>>>>>>> Gram-Schmidt orthogonalization process in Fortran. Thank you very much! >>>>>>>>> >>>>>>>>> Best regards. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Zhuo Chen >>>>>>>>> Department of Physics >>>>>>>>> University of Alberta >>>>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Zhuo Chen >>>>>>>>> Department of Physics >>>>>>>>> University of Alberta >>>>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Zhuo Chen >>>>>>>>> Department of Physics >>>>>>>>> University of Alberta >>>>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Zhuo Chen >>>>>>>> Department of Physics >>>>>>>> University of Alberta >>>>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Zhuo Chen >>>>>> Department of Physics >>>>>> University of Alberta >>>>>> Edmonton Alberta, Canada T6G 2E1 >>>>>> http://www.pas.rochester.edu/~zchen25/ >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>> >>>> -- >>>> Zhuo Chen >>>> Department of Physics >>>> University of Alberta >>>> Edmonton Alberta, Canada T6G 2E1 >>>> http://www.pas.rochester.edu/~zchen25/ >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> Zhuo Chen >> Department of Physics >> University of Alberta >> Edmonton Alberta, Canada T6G 2E1 >> http://www.pas.rochester.edu/~zchen25/ >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 11 17:03:29 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 11 Sep 2020 17:03:29 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> Message-ID: <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> > On Sep 11, 2020, at 7:45 AM, Blondel, Sophie wrote: > > Thank you Barry, > > Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations. > > I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code). > > Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms? Yes, you need to leave in everything that affects the diagonal otherwise the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner and likely perform poorly. Barry > > Cheers, > > Sophie > From: Barry Smith > > Sent: Thursday, September 10, 2020 17:04 > To: Blondel, Sophie > > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Subject: Re: [petsc-users] Matrix Free Method questions > > > >> On Sep 10, 2020, at 2:46 PM, Blondel, Sophie > wrote: >> >> Hi Barry, >> >> Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it > > I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). > >> but now when I run step 3 again I get the following error: >> >> [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [2]PETSC ERROR: Argument out of range >> [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix >> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 >> [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 >> [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 >> [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c >> [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >> [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >> [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >> PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. >> >> Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. > > In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). > > The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. > > The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is > > When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). > > If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. > > Barry > > > > >> >> Cheers, >> >> Sophie >> >> >> From: Barry Smith > >> Sent: Friday, September 4, 2020 01:06 >> To: Blondel, Sophie > >> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Subject: Re: [petsc-users] Matrix Free Method questions >> >> >> Sophie, >> >> Thanks. I have started looking through the logs >> >> The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. >> >> * It roughly doubles the time since doing the matrix-free product requires a function evaluation >> >> * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations >> >> The change from 2 to 3 (not storing the entire matrix) >> >> * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix >> >> * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. >> >> * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect >> >> * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). >> >> The change from 3 to 4 >> >> * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. >> >> I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. >> >> To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. >> >> Good luck, >> >> Barry >> >> >> >> >>> On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: >>> >>> Hi Barry, >>> >>> Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. >>> >>> Cheers, >>> >>> Sophie >>> >>> De : Barry Smith > >>> Envoy? : mercredi 2 septembre 2020 15:53 >>> ? : Blondel, Sophie > >>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>> Objet : Re: [petsc-users] Matrix Free Method questions >>> >>> >>> >>>> On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: >>>> >>>> Thank you Barry, >>>> >>>> The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. >>> >>> Sophie, >>> >>> That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) >>> >>>> The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. >>> >>> We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator >>> >>> You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. >>> >>> Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. >>> >>> >>> Barry >>> >>>> >>>> Cheers, >>>> >>>> Sophie >>>> De : Barry Smith > >>>> Envoy? : mardi 1 septembre 2020 14:12 >>>> ? : Blondel, Sophie > >>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>> >>>> >>>> Sophie, >>>> >>>> Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. >>>> >>>> I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. >>>> >>>> Barry >>>> >>>> >>>>> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: >>>>> Mat mat; >>>>> DMCreateMatrix(da, &mat); >>>>> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>> >>>>> When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 >>>>> >>>>> I get an error: >>>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined >>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 >>>>> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 >>>>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries >>>>> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c >>>>> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>> [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c >>>>> [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c >>>>> [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c >>>>> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c >>>>> [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>> [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>> PetscSolver::solve: TSSolve failed. >>>>> >>>>> Cheers, >>>>> >>>>> Sophie >>>>> De : Barry Smith > >>>>> Envoy? : lundi 31 ao?t 2020 14:50 >>>>> ? : Blondel, Sophie > >>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>> >>>>> >>>>> Sophie, >>>>> >>>>> Thanks. >>>>> >>>>> The factor of 4 is lot, the 1.5 not so bad. >>>>> >>>>> You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. >>>>> >>>>> It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). >>>>> >>>>> Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. >>>>> >>>>> Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? >>>>> >>>>> Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>> >>>>> Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. >>>>> >>>>> A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. >>>>> >>>>> Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. >>>>> >>>>> Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. >>>>> >>>>> I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. >>>>> >>>>> >>>>> If you have any questions please let me know. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>>> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: >>>>>> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) >>>>>> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) >>>>>> Cheers, >>>>>> >>>>>> Sophie >>>>>> De : Barry Smith > >>>>>> Envoy? : vendredi 28 ao?t 2020 18:31 >>>>>> ? : Blondel, Sophie > >>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>> >>>>>> >>>>>> >>>>>>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >>>>>>> >>>>>>> Thank you Jed and Barry, >>>>>>> >>>>>>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >>>>>>> >>>>>>> To answer questions about the current per-conditioners: >>>>>>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >>>>>>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >>>>>>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? >>>>>> >>>>>> Yes, the number of MatMult is a good enough surrogate. >>>>>> >>>>>> So using matrix-free (which means no preconditioning) has >>>>>> >>>>>> 35846/160 >>>>>> >>>>>> ans = >>>>>> >>>>>> 224.0375 >>>>>> >>>>>> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Sophie >>>>>>> De : Barry Smith > >>>>>>> Envoy? : vendredi 28 ao?t 2020 12:12 >>>>>>> ? : Blondel, Sophie > >>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>> >>>>>>> [External Email] >>>>>>> >>>>>>> Sophie, >>>>>>> >>>>>>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >>>>>>> >>>>>>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >>>>>>> >>>>>>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >>>>>>> >>>>>>>> -pc_fieldsplit_detect_coupling >>>>>>> >>>>>>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >>>>>>> >>>>>>>> -fieldsplit_0_pc_type sor >>>>>>> >>>>>>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >>>>>>> >>>>>>>> -fieldsplit_1_pc_type redundant >>>>>>> >>>>>>> >>>>>>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >>>>>>> >>>>>>> ---- >>>>>>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >>>>>>> >>>>>>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >>>>>>> >>>>>>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >>>>>>> >>>>>>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >>>>>>> >>>>>>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >>>>>>> >>>>>>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >>>>>>> >>>>>>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>>>>>>> >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>>>>>>> >>>>>>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>>>>>>> >>>>>>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Sophie >>>>>>> >>>>>>> >>> >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 11 17:05:54 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 11 Sep 2020 17:05:54 -0500 Subject: [petsc-users] recommended error-checking for PetscInitialize() and PetscFinalize() In-Reply-To: References: Message-ID: > On Sep 11, 2020, at 2:58 PM, Matthew Knepley wrote: > > On Fri, Sep 11, 2020 at 3:57 PM Ed Bueler > wrote: > Dear PETSc -- > > I notice in the users manual that the C examples show > > ierr = PetscInitialize(&argc,&args,(char*)0,help);if (ierr) return ierr; > > at the start of main() and > > ierr = PetscFinalize(); > return ierr; > > at the end of main(). Is this the deliberate, recommended style? > > Yes. > > My understanding of these choices is that if PetscInitialize() fails then CHKERRQ(ierr) may not do the right thing, > > Yes, failure early-on in Initialize() can predate setting up error handling. > > while if PetscFinalize() fails then that should be the result of main() (without any fiddling by CHKERRQ etc.). Is this the correct understanding? > > Yes. If there is a failure in PetscFinalize() then PETSc will be in some unknown state where the CHKERRQ macros will no longer work so PETSc error handling should not be called. Barry > > Thanks, > > Matt > > Thanks, > > Ed > > -- > Ed Bueler > Dept of Mathematics and Statistics > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 306C Chapman > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From elbueler at alaska.edu Fri Sep 11 17:31:39 2020 From: elbueler at alaska.edu (Ed Bueler) Date: Fri, 11 Sep 2020 14:31:39 -0800 Subject: [petsc-users] recommended error-checking for PetscInitialize() and PetscFinalize() In-Reply-To: References: Message-ID: Thanks Matt and Barry. Clear. Ed On Fri, Sep 11, 2020 at 2:05 PM Barry Smith wrote: > > > On Sep 11, 2020, at 2:58 PM, Matthew Knepley wrote: > > On Fri, Sep 11, 2020 at 3:57 PM Ed Bueler wrote: > >> Dear PETSc -- >> >> I notice in the users manual that the C examples show >> >> ierr = PetscInitialize(&argc,&args,(char*)0,help);if (ierr) return ierr; >> >> at the start of main() and >> >> ierr = PetscFinalize(); >> return ierr; >> >> at the end of main(). Is this the deliberate, recommended style? >> > > Yes. > > >> My understanding of these choices is that if PetscInitialize() fails >> then CHKERRQ(ierr) may not do the right thing, >> > > Yes, failure early-on in Initialize() can predate setting up error > handling. > > >> while if PetscFinalize() fails then that should be the result of main() >> (without any fiddling by CHKERRQ etc.). Is this the correct understanding? >> > > Yes. > > > If there is a failure in PetscFinalize() then PETSc will be in some > unknown state where the CHKERRQ macros will no longer work so PETSc error > handling should not be called. > > Barry > > > Thanks, > > Matt > > >> Thanks, >> >> Ed >> >> -- >> Ed Bueler >> Dept of Mathematics and Statistics >> University of Alaska Fairbanks >> Fairbanks, AK 99775-6660 >> 306C Chapman >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexprescott at email.arizona.edu Fri Sep 11 19:08:33 2020 From: alexprescott at email.arizona.edu (Alexander B Prescott) Date: Fri, 11 Sep 2020 17:08:33 -0700 Subject: [petsc-users] [EXT]Re: Dynamic SNESVI bounds In-Reply-To: <0038262D-5336-4F68-B71F-8C783249A4BD@petsc.dev> References: <0038262D-5336-4F68-B71F-8C783249A4BD@petsc.dev> Message-ID: Hi Barry, thanks for the help, I've done as you suggested. Now, I get an error that I'm unfamiliar with that goes away if I comment out SNESSetUpdate(). This error pops up after several successful iterations, so I'm not sure what's goin on here. The full message is copied below. mpirun -n 1 a -snes_type vinewtonrsls -snes_monitor -snes_mf -snes_converged_reason 0 SNES Function norm 1.319957381248e+02 1 SNES Function norm 3.164228677282e+01 2 SNES Function norm 5.157408019535e+00 3 SNES Function norm 2.290604723696e-01 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: Object already free: Parameter # 2 [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.13.2, unknown [0]PETSC ERROR: a on a arch-linux2-c-debug named alexprescott-ThinkPad-T420s by alexprescott Fri Sep 11 16:49:38 2020 [0]PETSC ERROR: Configure options [0]PETSC ERROR: #1 SNESComputeJacobian() line 2676 in /home/alexprescott/Documents/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #2 SNESSolve_VINEWTONRSLS() line 386 in /home/alexprescott/Documents/petsc/src/snes/impls/vi/rs/virs.c [0]PETSC ERROR: #3 SNESSolve() line 4519 in /home/alexprescott/Documents/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #4 PetscWrapperFcn() line 222 in petscshell_leq9nodes.c [0]PETSC ERROR: #5 main() line 285 in petscshell_leq9nodes.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -snes_converged_reason [0]PETSC ERROR: -snes_mf [0]PETSC ERROR: -snes_monitor [0]PETSC ERROR: -snes_type vinewtonrsls [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- Best, Alexander On Thu, Sep 10, 2020 at 4:27 PM Barry Smith wrote: > *External Email* > > Yes, it should be simple to write the code to do this. > > Provide a function that calls SNESVISetVariableBounds() using your > criteria then call SNESSetUpdate() to have that function called on each > iteration of SNES to reset the bounds. > > If this will converge to what you desire I have no clue. But each step > will find a result that satisfies the current bounds you set. > > Barry > > > > > On Sep 10, 2020, at 5:15 PM, Alexander B Prescott < > alexprescott at email.arizona.edu> wrote: > > Hi there, > > I have a quick question (hopefully) that I didn't find addressed in the > documentation or user list archives. Is it possible to change the SNES > Variational Inequality bounds from one solver iteration to the next? > My goal is to update the bounds such that a specific entry in the solution > vector remains the supremum throughout the entire execution. > > Best, > Alexander > > -- > Alexander Prescott > alexprescott at email.arizona.edu > PhD Candidate, The University of Arizona > Department of Geosciences > 1040 E. 4th Street > Tucson, AZ, 85721 > > > -- Alexander Prescott alexprescott at email.arizona.edu PhD Candidate, The University of Arizona Department of Geosciences 1040 E. 4th Street Tucson, AZ, 85721 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 11 19:46:48 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 11 Sep 2020 20:46:48 -0400 Subject: [petsc-users] [EXT]Re: Dynamic SNESVI bounds In-Reply-To: References: <0038262D-5336-4F68-B71F-8C783249A4BD@petsc.dev> Message-ID: On Fri, Sep 11, 2020 at 8:09 PM Alexander B Prescott < alexprescott at email.arizona.edu> wrote: > Hi Barry, thanks for the help, I've done as you suggested. Now, I get an > error that I'm unfamiliar with that goes away if I comment out > SNESSetUpdate(). This error pops up after several successful iterations, so > I'm not sure what's goin on here. The full message is copied below. > It thinks snes>vec_sol has been freed. It seems like something illegal was done in Update(). Can you run under valgrind? Thanks, Matt > mpirun -n 1 a -snes_type vinewtonrsls -snes_monitor -snes_mf > -snes_converged_reason > > 0 SNES Function norm 1.319957381248e+02 > 1 SNES Function norm 3.164228677282e+01 > 2 SNES Function norm 5.157408019535e+00 > 3 SNES Function norm 2.290604723696e-01 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Corrupt argument: > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: Object already free: Parameter # 2 > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.13.2, unknown > [0]PETSC ERROR: a on a arch-linux2-c-debug named > alexprescott-ThinkPad-T420s by alexprescott Fri Sep 11 16:49:38 2020 > [0]PETSC ERROR: Configure options > [0]PETSC ERROR: #1 SNESComputeJacobian() line 2676 in > /home/alexprescott/Documents/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #2 SNESSolve_VINEWTONRSLS() line 386 in > /home/alexprescott/Documents/petsc/src/snes/impls/vi/rs/virs.c > [0]PETSC ERROR: #3 SNESSolve() line 4519 in > /home/alexprescott/Documents/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #4 PetscWrapperFcn() line 222 in petscshell_leq9nodes.c > [0]PETSC ERROR: #5 main() line 285 in petscshell_leq9nodes.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -snes_converged_reason > [0]PETSC ERROR: -snes_mf > [0]PETSC ERROR: -snes_monitor > [0]PETSC ERROR: -snes_type vinewtonrsls > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > Best, > Alexander > > On Thu, Sep 10, 2020 at 4:27 PM Barry Smith wrote: > >> *External Email* >> >> Yes, it should be simple to write the code to do this. >> >> Provide a function that calls SNESVISetVariableBounds() using your >> criteria then call SNESSetUpdate() to have that function called on each >> iteration of SNES to reset the bounds. >> >> If this will converge to what you desire I have no clue. But each step >> will find a result that satisfies the current bounds you set. >> >> Barry >> >> >> >> >> On Sep 10, 2020, at 5:15 PM, Alexander B Prescott < >> alexprescott at email.arizona.edu> wrote: >> >> Hi there, >> >> I have a quick question (hopefully) that I didn't find addressed in the >> documentation or user list archives. Is it possible to change the SNES >> Variational Inequality bounds from one solver iteration to the next? >> My goal is to update the bounds such that a specific entry in the >> solution vector remains the supremum throughout the entire execution. >> >> Best, >> Alexander >> >> -- >> Alexander Prescott >> alexprescott at email.arizona.edu >> PhD Candidate, The University of Arizona >> Department of Geosciences >> 1040 E. 4th Street >> Tucson, AZ, 85721 >> >> >> > > -- > Alexander Prescott > alexprescott at email.arizona.edu > PhD Candidate, The University of Arizona > Department of Geosciences > 1040 E. 4th Street > Tucson, AZ, 85721 > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexprescott at email.arizona.edu Sat Sep 12 20:57:53 2020 From: alexprescott at email.arizona.edu (Alexander B Prescott) Date: Sat, 12 Sep 2020 18:57:53 -0700 Subject: [petsc-users] [EXT]Re: Dynamic SNESVI bounds In-Reply-To: References: <0038262D-5336-4F68-B71F-8C783249A4BD@petsc.dev> Message-ID: Hi Matt, I was able to find the problem after reading your note. I was calling VecDestroy() at the end of my SNESSetUpdate() function in an attempt to just destroy the local Vec variable, but apparently it does more than that. Removing the call to VecDestroy() removed the error code. Here's a code snippet of the relevant portions of the function for reference. PetscErrorCode FormBounds(SNES snes,PetscInt step) { Vec x; PetscScalar *xx; ierr = SNESGetSolution(snes,&x);CHKERRQ(ierr); ierr = VecGetArray(x,&xx);CHKERRQ(ierr); ..... do stuff that updates the VI bounds ..... ierr = VecRestoreArray(x,&xx); ierr = VecDestroy(&x);CHKERRQ(ierr); } Best, Alexander On Fri, Sep 11, 2020 at 5:47 PM Matthew Knepley wrote: > *External Email* > On Fri, Sep 11, 2020 at 8:09 PM Alexander B Prescott < > alexprescott at email.arizona.edu> wrote: > >> Hi Barry, thanks for the help, I've done as you suggested. Now, I get an >> error that I'm unfamiliar with that goes away if I comment out >> SNESSetUpdate(). This error pops up after several successful iterations, so >> I'm not sure what's goin on here. The full message is copied below. >> > > It thinks snes>vec_sol has been freed. It seems like something illegal was > done in Update(). Can you run under valgrind? > > Thanks, > > Matt > > >> mpirun -n 1 a -snes_type vinewtonrsls -snes_monitor -snes_mf >> -snes_converged_reason >> >> 0 SNES Function norm 1.319957381248e+02 >> 1 SNES Function norm 3.164228677282e+01 >> 2 SNES Function norm 5.157408019535e+00 >> 3 SNES Function norm 2.290604723696e-01 >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Corrupt argument: >> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [0]PETSC ERROR: Object already free: Parameter # 2 >> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.13.2, unknown >> [0]PETSC ERROR: a on a arch-linux2-c-debug named >> alexprescott-ThinkPad-T420s by alexprescott Fri Sep 11 16:49:38 2020 >> [0]PETSC ERROR: Configure options >> [0]PETSC ERROR: #1 SNESComputeJacobian() line 2676 in >> /home/alexprescott/Documents/petsc/src/snes/interface/snes.c >> [0]PETSC ERROR: #2 SNESSolve_VINEWTONRSLS() line 386 in >> /home/alexprescott/Documents/petsc/src/snes/impls/vi/rs/virs.c >> [0]PETSC ERROR: #3 SNESSolve() line 4519 in >> /home/alexprescott/Documents/petsc/src/snes/interface/snes.c >> [0]PETSC ERROR: #4 PetscWrapperFcn() line 222 in petscshell_leq9nodes.c >> [0]PETSC ERROR: #5 main() line 285 in petscshell_leq9nodes.c >> [0]PETSC ERROR: PETSc Option Table entries: >> [0]PETSC ERROR: -snes_converged_reason >> [0]PETSC ERROR: -snes_mf >> [0]PETSC ERROR: -snes_monitor >> [0]PETSC ERROR: -snes_type vinewtonrsls >> [0]PETSC ERROR: ----------------End of Error Message -------send entire >> error message to petsc-maint at mcs.anl.gov---------- >> >> Best, >> Alexander >> >> On Thu, Sep 10, 2020 at 4:27 PM Barry Smith wrote: >> >>> *External Email* >>> >>> Yes, it should be simple to write the code to do this. >>> >>> Provide a function that calls SNESVISetVariableBounds() using your >>> criteria then call SNESSetUpdate() to have that function called on each >>> iteration of SNES to reset the bounds. >>> >>> If this will converge to what you desire I have no clue. But each step >>> will find a result that satisfies the current bounds you set. >>> >>> Barry >>> >>> >>> >>> >>> On Sep 10, 2020, at 5:15 PM, Alexander B Prescott < >>> alexprescott at email.arizona.edu> wrote: >>> >>> Hi there, >>> >>> I have a quick question (hopefully) that I didn't find addressed in the >>> documentation or user list archives. Is it possible to change the SNES >>> Variational Inequality bounds from one solver iteration to the next? >>> My goal is to update the bounds such that a specific entry in the >>> solution vector remains the supremum throughout the entire execution. >>> >>> Best, >>> Alexander >>> >>> -- >>> Alexander Prescott >>> alexprescott at email.arizona.edu >>> PhD Candidate, The University of Arizona >>> Department of Geosciences >>> 1040 E. 4th Street >>> Tucson, AZ, 85721 >>> >>> >>> >> >> -- >> Alexander Prescott >> alexprescott at email.arizona.edu >> PhD Candidate, The University of Arizona >> Department of Geosciences >> 1040 E. 4th Street >> Tucson, AZ, 85721 >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Alexander Prescott alexprescott at email.arizona.edu PhD Candidate, The University of Arizona Department of Geosciences 1040 E. 4th Street Tucson, AZ, 85721 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Sep 12 21:25:49 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 12 Sep 2020 21:25:49 -0500 Subject: [petsc-users] [EXT]Re: Dynamic SNESVI bounds In-Reply-To: References: <0038262D-5336-4F68-B71F-8C783249A4BD@petsc.dev> Message-ID: <911D3505-331E-4AD1-9008-F5AE05FC5066@petsc.dev> The XXXGetYYY routines generally do not increase the reference count of what they obtain, some are paired with an XXXRestoreYYY routine but the ones without closing partners generally do not need a destroy called. The intention is that the gotten object is just used inside the calling routine and not kept in a data structure for later use. Barry > On Sep 12, 2020, at 8:57 PM, Alexander B Prescott wrote: > > Hi Matt, I was able to find the problem after reading your note. I was calling VecDestroy() at the end of my SNESSetUpdate() function in an attempt to just destroy the local Vec variable, but apparently it does more than that. Removing the call to VecDestroy() removed the error code. Here's a code snippet of the relevant portions of the function for reference. > > PetscErrorCode FormBounds(SNES snes,PetscInt step) > { > Vec x; > PetscScalar *xx; > ierr = SNESGetSolution(snes,&x);CHKERRQ(ierr); > ierr = VecGetArray(x,&xx);CHKERRQ(ierr); > > ..... do stuff that updates the VI bounds ..... > > ierr = VecRestoreArray(x,&xx); > ierr = VecDestroy(&x);CHKERRQ(ierr); > } > > Best, > Alexander > > On Fri, Sep 11, 2020 at 5:47 PM Matthew Knepley > wrote: > External Email > > On Fri, Sep 11, 2020 at 8:09 PM Alexander B Prescott > wrote: > Hi Barry, thanks for the help, I've done as you suggested. Now, I get an error that I'm unfamiliar with that goes away if I comment out SNESSetUpdate(). This error pops up after several successful iterations, so I'm not sure what's goin on here. The full message is copied below. > > It thinks snes>vec_sol has been freed. It seems like something illegal was done in Update(). Can you run under valgrind? > > Thanks, > > Matt > > mpirun -n 1 a -snes_type vinewtonrsls -snes_monitor -snes_mf -snes_converged_reason > > 0 SNES Function norm 1.319957381248e+02 > 1 SNES Function norm 3.164228677282e+01 > 2 SNES Function norm 5.157408019535e+00 > 3 SNES Function norm 2.290604723696e-01 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Corrupt argument: https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: Object already free: Parameter # 2 > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.13.2, unknown > [0]PETSC ERROR: a on a arch-linux2-c-debug named alexprescott-ThinkPad-T420s by alexprescott Fri Sep 11 16:49:38 2020 > [0]PETSC ERROR: Configure options > [0]PETSC ERROR: #1 SNESComputeJacobian() line 2676 in /home/alexprescott/Documents/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #2 SNESSolve_VINEWTONRSLS() line 386 in /home/alexprescott/Documents/petsc/src/snes/impls/vi/rs/virs.c > [0]PETSC ERROR: #3 SNESSolve() line 4519 in /home/alexprescott/Documents/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #4 PetscWrapperFcn() line 222 in petscshell_leq9nodes.c > [0]PETSC ERROR: #5 main() line 285 in petscshell_leq9nodes.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -snes_converged_reason > [0]PETSC ERROR: -snes_mf > [0]PETSC ERROR: -snes_monitor > [0]PETSC ERROR: -snes_type vinewtonrsls > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > > Best, > Alexander > > On Thu, Sep 10, 2020 at 4:27 PM Barry Smith > wrote: > External Email > > > Yes, it should be simple to write the code to do this. > > Provide a function that calls SNESVISetVariableBounds() using your criteria then call SNESSetUpdate() to have that function called on each iteration of SNES to reset the bounds. > > If this will converge to what you desire I have no clue. But each step will find a result that satisfies the current bounds you set. > > Barry > > > > >> On Sep 10, 2020, at 5:15 PM, Alexander B Prescott > wrote: >> >> Hi there, >> >> I have a quick question (hopefully) that I didn't find addressed in the documentation or user list archives. Is it possible to change the SNES Variational Inequality bounds from one solver iteration to the next? >> My goal is to update the bounds such that a specific entry in the solution vector remains the supremum throughout the entire execution. >> >> Best, >> Alexander >> >> -- >> Alexander Prescott >> alexprescott at email.arizona.edu >> PhD Candidate, The University of Arizona >> Department of Geosciences >> 1040 E. 4th Street >> Tucson, AZ, 85721 > > > > -- > Alexander Prescott > alexprescott at email.arizona.edu > PhD Candidate, The University of Arizona > Department of Geosciences > 1040 E. 4th Street > Tucson, AZ, 85721 > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > Alexander Prescott > alexprescott at email.arizona.edu > PhD Candidate, The University of Arizona > Department of Geosciences > 1040 E. 4th Street > Tucson, AZ, 85721 -------------- next part -------------- An HTML attachment was scrubbed... URL: From teivml at gmail.com Mon Sep 14 07:26:08 2020 From: teivml at gmail.com (teivml at gmail.com) Date: Mon, 14 Sep 2020 21:26:08 +0900 Subject: [petsc-users] What is the right way to use split mode asynchronous reduction? Message-ID: Dear Petsc users, I would like to confirm that the asynchronous calculation of the vector norm is faster than the synchronous calculation with the following code. PetscLogDouble tt1,tt2; ierr = VecSet(c,one); ierr = VecSet(u,one); ierr = VecSet(b,one); ierr = KSPCreate(PETSC_COMM_WORLD,&ksp); CHKERRQ(ierr); ierr = KSP_MatMult(ksp,A,x,Ax); CHKERRQ(ierr); ierr = PetscTime(&tt1);CHKERRQ(ierr); ierr = VecNormBegin(u,NORM_2,&norm1); ierr = PetscCommSplitReductionBegin(PetscObjectComm((PetscObject)Ax)); ierr = KSP_MatMult(ksp,A,c,Ac); ierr = VecNormEnd(u,NORM_2,&norm1); ierr = PetscTime(&tt2);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "The time used for the asynchronous calculation: %f\n",tt2-tt1); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD,"+ |u| = %g\n",(double) norm1); CHKERRQ(ierr); ierr = PetscTime(&tt1);CHKERRQ(ierr); ierr = VecNorm(b,NORM_2,&norm2); CHKERRQ(ierr); ierr = KSP_MatMult(ksp,A,c,Ac); ierr = PetscTime(&tt2);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "The time used for the synchronous calculation: %f\n",tt2-tt1); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD,"+ |b| = %g\n",(double) norm2); CHKERRQ(ierr); This code computes a matrix-vector product and a vector norm asynchronously and synchronously. The calculation is carried out on a single node PC with a Xeon CPU. The result of the code above shows that the synchronous calculation is faster than the asynchronous calculation. The MPI library is MPICH 3.3 and the parallel number is n = 20. The time used for the asynchronous calculation: 0.001622 + |u| = 100. The time used for the synchronous calculation: 0.000062 + |b| = 100. Is there anything I should consider in order to properly take advantage of the Petsc's asynchronous progress? Thank you for any help you can provide. Sincerely, Teiv. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 14 07:39:45 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 14 Sep 2020 08:39:45 -0400 Subject: [petsc-users] What is the right way to use split mode asynchronous reduction? In-Reply-To: References: Message-ID: On Mon, Sep 14, 2020 at 8:27 AM wrote: > Dear Petsc users, > > I would like to confirm that the asynchronous calculation of the vector > norm is faster than the synchronous calculation with the following code. > > > PetscLogDouble tt1,tt2; > ierr = VecSet(c,one); > ierr = VecSet(u,one); > ierr = VecSet(b,one); > > ierr = KSPCreate(PETSC_COMM_WORLD,&ksp); CHKERRQ(ierr); > ierr = KSP_MatMult(ksp,A,x,Ax); CHKERRQ(ierr); > > > ierr = PetscTime(&tt1);CHKERRQ(ierr); > > ierr = VecNormBegin(u,NORM_2,&norm1); > ierr = PetscCommSplitReductionBegin(PetscObjectComm((PetscObject)Ax)); > ierr = KSP_MatMult(ksp,A,c,Ac); > ierr = VecNormEnd(u,NORM_2,&norm1); > > > ierr = PetscTime(&tt2);CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "The time used for the asynchronous > calculation: %f\n",tt2-tt1); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD,"+ |u| = %g\n",(double) norm1); > CHKERRQ(ierr); > > > ierr = PetscTime(&tt1);CHKERRQ(ierr); > ierr = VecNorm(b,NORM_2,&norm2); CHKERRQ(ierr); > ierr = KSP_MatMult(ksp,A,c,Ac); > > > ierr = PetscTime(&tt2);CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "The time used for the synchronous > calculation: %f\n",tt2-tt1); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD,"+ |b| = %g\n",(double) norm2); > CHKERRQ(ierr); > > > This code computes a matrix-vector product and a vector norm > asynchronously and synchronously. > > The calculation is carried out on a single node PC with a Xeon CPU. > The result of the code above shows that the synchronous calculation is > faster than the asynchronous calculation. The MPI library is MPICH 3.3 and > the parallel number is n = 20. > > The time used for the asynchronous calculation: 0.001622 > + |u| = 100. > The time used for the synchronous calculation: 0.000062 > + |b| = 100. > > > Is there anything I should consider in order to properly take advantage of > the Petsc's asynchronous progress? > There is overhead in the asynchronous calculation. In order to see improvement, you would have to be running an example for which communication time was larger (hopefully significantly) than this overhead. Second, if the computation is perfectly load balanced, this also makes it harder to see improvement for reducing synchronizations. A single node is unlikely to benefit from any of this stuff. Thanks, Matt > Thank you for any help you can provide. > Sincerely, > Teiv. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Mon Sep 14 17:14:20 2020 From: jeremy at seamplex.com (Jeremy Theler) Date: Mon, 14 Sep 2020 19:14:20 -0300 Subject: [petsc-users] Finding which cell an arbitrary point belongs to in DMPlex Message-ID: Hello all Say I have a fully-interpolated 3D DMPlex and a point with arbitrary coordinates x,y,z. What's the most efficient way to know which cell this point belongs to in parallel? Cells can be either tets or hexes. Regards -- jeremy theler www.seamplex.com From knepley at gmail.com Mon Sep 14 19:28:40 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 14 Sep 2020 20:28:40 -0400 Subject: [petsc-users] Finding which cell an arbitrary point belongs to in DMPlex In-Reply-To: References: Message-ID: On Mon, Sep 14, 2020 at 6:15 PM Jeremy Theler wrote: > Hello all > > Say I have a fully-interpolated 3D DMPlex and a point with arbitrary > coordinates x,y,z. What's the most efficient way to know which cell > this point belongs to in parallel? Cells can be either tets or hexes. > I should make a tutorial on this, but have not had time so far. The intention is that you use https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMLocatePoints.html This will just brute force search unless you also give -dm_plex_hash_location which builds a grid hash to accelerate it. I should probably expose DMPlexLocatePoint_Internal() which handles the single cell queries. If you just had one point, that might make it simpler, although you would still write your own loop. If your intention is to interpolate a field at these locations, I created https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/DMInterpolationCreate.html which no one but me uses so far, but I think it is convenient. Thanks, Matt > Regards > -- > jeremy theler > www.seamplex.com > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Tue Sep 15 17:12:20 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Tue, 15 Sep 2020 22:12:20 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> , <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> Message-ID: Hi Barry, I fixed everything and re-ran the 4 cases in 1D. They took more time than before because I used the Kokkos serial backend on the Xolotl side instead of the CUDA one previously (long story short, I tried to update CUDA and messed up the whole installation). Step 4 looks much better than prevously, I was even able to remove MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and it ran without throwing errors. The log files are attached. Cheers, Sophie ________________________________ From: Barry Smith Sent: Friday, September 11, 2020 18:03 To: Blondel, Sophie Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Subject: Re: [petsc-users] Matrix Free Method questions On Sep 11, 2020, at 7:45 AM, Blondel, Sophie > wrote: Thank you Barry, Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations. I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code). Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms? Yes, you need to leave in everything that affects the diagonal otherwise the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner and likely perform poorly. Barry Cheers, Sophie ________________________________ From: Barry Smith > Sent: Thursday, September 10, 2020 17:04 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions On Sep 10, 2020, at 2:46 PM, Blondel, Sophie > wrote: Hi Barry, Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). but now when I run step 3 again I get the following error: [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Argument out of range [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. Barry Cheers, Sophie ________________________________ From: Barry Smith > Sent: Friday, September 4, 2020 01:06 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. I have started looking through the logs The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. * It roughly doubles the time since doing the matrix-free product requires a function evaluation * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations The change from 2 to 3 (not storing the entire matrix) * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). The change from 3 to 4 * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. Good luck, Barry On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: Hi Barry, Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mercredi 2 septembre 2020 15:53 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: Thank you Barry, The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. Sophie, That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mardi 1 septembre 2020 14:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. Barry On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: Hi Barry, I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: Mat mat; DMCreateMatrix(da, &mat); MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 I get an error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c PetscSolver::solve: TSSolve failed. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : lundi 31 ao?t 2020 14:50 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. The factor of 4 is lot, the 1.5 not so bad. You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. If you have any questions please let me know. Barry On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: Hi Barry, I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: * 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) * 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 18:31 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_1.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_2.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_3.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_4.txt URL: From jeremy at seamplex.com Tue Sep 15 17:17:57 2020 From: jeremy at seamplex.com (Jeremy Theler) Date: Tue, 15 Sep 2020 19:17:57 -0300 Subject: [petsc-users] Finding which cell an arbitrary point belongs to in DMPlex In-Reply-To: References: Message-ID: <70fd5e6fb338d6fc4b2471ec17d7dc7ddaaa9599.camel@seamplex.com> On Mon, 2020-09-14 at 20:28 -0400, Matthew Knepley wrote: > On Mon, Sep 14, 2020 at 6:15 PM Jeremy Theler > wrote: > > Hello all > > > > Say I have a fully-interpolated 3D DMPlex and a point with > > arbitrary > > coordinates x,y,z. What's the most efficient way to know which cell > > this point belongs to in parallel? Cells can be either tets or > > hexes. > > I should make a tutorial on this, but have not had time so far. Thank you very much for this mini-tutorial. > > The intention is that you use > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMLocatePoints.html > > This will just brute force search unless you also give > > -dm_plex_hash_location Well, for a 3D DMplex PETSc (and git blame) tells me that you "have only coded this for 2D." :-) > which builds a grid hash to accelerate it. I should probably expose > > DMPlexLocatePoint_Internal() > > which handles the single cell queries. If you just had one point, > that might make it simpler, > although you would still write your own loop. I see that DMLocatePoints() loops over all the cells until it finds the right one. I was thinking about finding first the nearest vertex to the point and then sweeping over all the cells that share this vertex testing for DMPlexLocatePoint_Internal(). The nearest node ought to be found using an octree or similar. Any direction regarding this idea? > If your intention is to interpolate a field at these > locations, I created > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/DMInterpolationCreate.html > > which no one but me uses so far, but I think it is convenient. Any other example apart from src/snes/tutorials/ex63.c? Thank you. > > Thanks, > > Matt > > > Regards > > -- > > jeremy theler > > www.seamplex.com > > > > > > > From bsmith at petsc.dev Tue Sep 15 17:37:19 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 15 Sep 2020 17:37:19 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> Message-ID: <56056045-E253-44BE-AE4C-7EFE44D867ED@petsc.dev> Sophie, Great, everything looks good. So the new version takes about 7 times longer, due to the relatively modest increase (about 25 percent) in the number of iterations from the poorer preconditioner convergence and the rest from the much slower matrix-vector product due to using matrix free instead of matrix based precondtioner. Both of these are expected. The matrix is taking about 10% of the memory it used to require, also expected. I noticed in the logging the memory for the vectors Vector 85 85 82303208 0. Matrix 15 15 8744032 0. is substantial/huge, with the much smaller matrix memory the vector memory dominates. It indicates 85 vectors are used. This is a large number, there are some needed for the TS (maybe 5?) and some needed for the KSP solve (maybe about 37) but I am not sure why there are so many. Perhaps this number could be reduced. Are there are lot of vectors created in the Xolotyl code? I would it could run with about 45 vectors. Barry > On Sep 15, 2020, at 5:12 PM, Blondel, Sophie wrote: > > Hi Barry, > > I fixed everything and re-ran the 4 cases in 1D. They took more time than before because I used the Kokkos serial backend on the Xolotl side instead of the CUDA one previously (long story short, I tried to update CUDA and messed up the whole installation). Step 4 looks much better than prevously, I was even able to remove MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and it ran without throwing errors. The log files are attached. > > Cheers, > > Sophie > From: Barry Smith > Sent: Friday, September 11, 2020 18:03 > To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions > > > >> On Sep 11, 2020, at 7:45 AM, Blondel, Sophie > wrote: >> >> Thank you Barry, >> >> Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations. >> >> I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code). >> >> Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms? > > Yes, you need to leave in everything that affects the diagonal otherwise the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner and likely perform poorly. > > Barry > >> >> Cheers, >> >> Sophie >> From: Barry Smith > >> Sent: Thursday, September 10, 2020 17:04 >> To: Blondel, Sophie > >> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Subject: Re: [petsc-users] Matrix Free Method questions >> >> >> >>> On Sep 10, 2020, at 2:46 PM, Blondel, Sophie > wrote: >>> >>> Hi Barry, >>> >>> Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it >> >> I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). >> >>> but now when I run step 3 again I get the following error: >>> >>> [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [2]PETSC ERROR: Argument out of range >>> [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix >>> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 >>> [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 >>> [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 >>> [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c >>> [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>> [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>> [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>> PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. >>> >>> Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. >> >> In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). >> >> The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. >> >> The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is >> >> When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). >> >> If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. >> >> Barry >> >> >> >> >>> >>> Cheers, >>> >>> Sophie >>> >>> >>> From: Barry Smith > >>> Sent: Friday, September 4, 2020 01:06 >>> To: Blondel, Sophie > >>> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>> Subject: Re: [petsc-users] Matrix Free Method questions >>> >>> >>> Sophie, >>> >>> Thanks. I have started looking through the logs >>> >>> The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. >>> >>> * It roughly doubles the time since doing the matrix-free product requires a function evaluation >>> >>> * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations >>> >>> The change from 2 to 3 (not storing the entire matrix) >>> >>> * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix >>> >>> * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. >>> >>> * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect >>> >>> * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). >>> >>> The change from 3 to 4 >>> >>> * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. >>> >>> I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. >>> >>> To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. >>> >>> Good luck, >>> >>> Barry >>> >>> >>> >>> >>>> On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: >>>> >>>> Hi Barry, >>>> >>>> Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. >>>> >>>> Cheers, >>>> >>>> Sophie >>>> >>>> De : Barry Smith > >>>> Envoy? : mercredi 2 septembre 2020 15:53 >>>> ? : Blondel, Sophie > >>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>> >>>> >>>> >>>>> On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: >>>>> >>>>> Thank you Barry, >>>>> >>>>> The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. >>>> >>>> Sophie, >>>> >>>> That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) >>>> >>>>> The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. >>>> >>>> We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator >>>> >>>> You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. >>>> >>>> Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. >>>> >>>> >>>> Barry >>>> >>>>> >>>>> Cheers, >>>>> >>>>> Sophie >>>>> De : Barry Smith > >>>>> Envoy? : mardi 1 septembre 2020 14:12 >>>>> ? : Blondel, Sophie > >>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>> >>>>> >>>>> Sophie, >>>>> >>>>> Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. >>>>> >>>>> I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: >>>>>> Mat mat; >>>>>> DMCreateMatrix(da, &mat); >>>>>> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>>> >>>>>> When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 >>>>>> >>>>>> I get an error: >>>>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined >>>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 >>>>>> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 >>>>>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries >>>>>> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c >>>>>> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>>> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>>> [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c >>>>>> [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c >>>>>> [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c >>>>>> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c >>>>>> [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>>> [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>>> PetscSolver::solve: TSSolve failed. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Sophie >>>>>> De : Barry Smith > >>>>>> Envoy? : lundi 31 ao?t 2020 14:50 >>>>>> ? : Blondel, Sophie > >>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>> >>>>>> >>>>>> Sophie, >>>>>> >>>>>> Thanks. >>>>>> >>>>>> The factor of 4 is lot, the 1.5 not so bad. >>>>>> >>>>>> You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. >>>>>> >>>>>> It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). >>>>>> >>>>>> Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. >>>>>> >>>>>> Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? >>>>>> >>>>>> Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>>> >>>>>> Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. >>>>>> >>>>>> A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. >>>>>> >>>>>> Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. >>>>>> >>>>>> Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. >>>>>> >>>>>> I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. >>>>>> >>>>>> >>>>>> If you have any questions please let me know. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: >>>>>>> >>>>>>> Hi Barry, >>>>>>> >>>>>>> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: >>>>>>> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) >>>>>>> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) >>>>>>> Cheers, >>>>>>> >>>>>>> Sophie >>>>>>> De : Barry Smith > >>>>>>> Envoy? : vendredi 28 ao?t 2020 18:31 >>>>>>> ? : Blondel, Sophie > >>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >>>>>>>> >>>>>>>> Thank you Jed and Barry, >>>>>>>> >>>>>>>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >>>>>>>> >>>>>>>> To answer questions about the current per-conditioners: >>>>>>>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >>>>>>>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >>>>>>>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? >>>>>>> >>>>>>> Yes, the number of MatMult is a good enough surrogate. >>>>>>> >>>>>>> So using matrix-free (which means no preconditioning) has >>>>>>> >>>>>>> 35846/160 >>>>>>> >>>>>>> ans = >>>>>>> >>>>>>> 224.0375 >>>>>>> >>>>>>> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Sophie >>>>>>>> De : Barry Smith > >>>>>>>> Envoy? : vendredi 28 ao?t 2020 12:12 >>>>>>>> ? : Blondel, Sophie > >>>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>>> >>>>>>>> [External Email] >>>>>>>> >>>>>>>> Sophie, >>>>>>>> >>>>>>>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >>>>>>>> >>>>>>>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >>>>>>>> >>>>>>>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >>>>>>>> >>>>>>>>> -pc_fieldsplit_detect_coupling >>>>>>>> >>>>>>>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >>>>>>>> >>>>>>>>> -fieldsplit_0_pc_type sor >>>>>>>> >>>>>>>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >>>>>>>> >>>>>>>>> -fieldsplit_1_pc_type redundant >>>>>>>> >>>>>>>> >>>>>>>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >>>>>>>> >>>>>>>> ---- >>>>>>>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >>>>>>>> >>>>>>>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >>>>>>>> >>>>>>>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >>>>>>>> >>>>>>>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >>>>>>>> >>>>>>>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >>>>>>>> >>>>>>>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >>>>>>>> >>>>>>>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>>>>>>>> >>>>>>>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>>>>>>>> >>>>>>>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Sophie >>>>>>>> >>>>>>>> >>>> >>>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Sep 15 18:44:57 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 15 Sep 2020 19:44:57 -0400 Subject: [petsc-users] Finding which cell an arbitrary point belongs to in DMPlex In-Reply-To: <70fd5e6fb338d6fc4b2471ec17d7dc7ddaaa9599.camel@seamplex.com> References: <70fd5e6fb338d6fc4b2471ec17d7dc7ddaaa9599.camel@seamplex.com> Message-ID: On Tue, Sep 15, 2020 at 6:18 PM Jeremy Theler wrote: > On Mon, 2020-09-14 at 20:28 -0400, Matthew Knepley wrote: > > On Mon, Sep 14, 2020 at 6:15 PM Jeremy Theler > > wrote: > > > Hello all > > > > > > Say I have a fully-interpolated 3D DMPlex and a point with > > > arbitrary > > > coordinates x,y,z. What's the most efficient way to know which cell > > > this point belongs to in parallel? Cells can be either tets or > > > hexes. > > > > I should make a tutorial on this, but have not had time so far. > > Thank you very much for this mini-tutorial. > > > > > The intention is that you use > > > > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMLocatePoints.html > > > > This will just brute force search unless you also give > > > > -dm_plex_hash_location > > Well, for a 3D DMplex PETSc (and git blame) tells me that you "have > only coded this for 2D." :-) > Crap. I need to do 3D. It's not hard, just work. > > which builds a grid hash to accelerate it. I should probably expose > > > > DMPlexLocatePoint_Internal() > > > > which handles the single cell queries. If you just had one point, > > that might make it simpler, > > although you would still write your own loop. > > I see that DMLocatePoints() loops over all the cells until it finds the > right one. I was thinking about finding first the nearest vertex to the > point and then sweeping over all the cells that share this vertex > testing for DMPlexLocatePoint_Internal(). The nearest node ought to be > found using an octree or similar. Any direction regarding this idea? > So you can imagine both a topological search and a geometric search. Generally, people want geometric. The geometric hash we use is just to bin elements on a regular grid. > > If your intention is to interpolate a field at these > > locations, I created > > > > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/DMInterpolationCreate.html > > > > which no one but me uses so far, but I think it is convenient. > > Any other example apart from src/snes/tutorials/ex63.c? > That is the only one in PETSc. The PyLith code uses this to interpolate to seismic stations. Thanks, Matt > Thank you. > > > > > Thanks, > > > > Matt > > > > > Regards > > > -- > > > jeremy theler > > > www.seamplex.com > > > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From vaclav.hapla at erdw.ethz.ch Wed Sep 16 09:29:24 2020 From: vaclav.hapla at erdw.ethz.ch (Hapla Vaclav) Date: Wed, 16 Sep 2020 14:29:24 +0000 Subject: [petsc-users] Finding which cell an arbitrary point belongs to in DMPlex In-Reply-To: References: <70fd5e6fb338d6fc4b2471ec17d7dc7ddaaa9599.camel@seamplex.com> Message-ID: <929B80C1-9066-441C-B8E5-49413C36C640@erdw.ethz.ch> There is also DMPlexFindVertices() which finds the nearest vertex to the given coords in the given radius. You can then get support or its transitive closure for that vertex. I wrote it some time ago mainly for debug purposes. It uses just brute force. I'm not sure it deserves to exist :-) Maybe we should somehow merge these functionalities. Thanks, Vaclav On 16 Sep 2020, at 01:44, Matthew Knepley > wrote: On Tue, Sep 15, 2020 at 6:18 PM Jeremy Theler > wrote: On Mon, 2020-09-14 at 20:28 -0400, Matthew Knepley wrote: > On Mon, Sep 14, 2020 at 6:15 PM Jeremy Theler > > wrote: > > Hello all > > > > Say I have a fully-interpolated 3D DMPlex and a point with > > arbitrary > > coordinates x,y,z. What's the most efficient way to know which cell > > this point belongs to in parallel? Cells can be either tets or > > hexes. > > I should make a tutorial on this, but have not had time so far. Thank you very much for this mini-tutorial. > > The intention is that you use > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMLocatePoints.html > > This will just brute force search unless you also give > > -dm_plex_hash_location Well, for a 3D DMplex PETSc (and git blame) tells me that you "have only coded this for 2D." :-) Crap. I need to do 3D. It's not hard, just work. > which builds a grid hash to accelerate it. I should probably expose > > DMPlexLocatePoint_Internal() > > which handles the single cell queries. If you just had one point, > that might make it simpler, > although you would still write your own loop. I see that DMLocatePoints() loops over all the cells until it finds the right one. I was thinking about finding first the nearest vertex to the point and then sweeping over all the cells that share this vertex testing for DMPlexLocatePoint_Internal(). The nearest node ought to be found using an octree or similar. Any direction regarding this idea? So you can imagine both a topological search and a geometric search. Generally, people want geometric. The geometric hash we use is just to bin elements on a regular grid. > If your intention is to interpolate a field at these > locations, I created > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/DMInterpolationCreate.html > > which no one but me uses so far, but I think it is convenient. Any other example apart from src/snes/tutorials/ex63.c? That is the only one in PETSc. The PyLith code uses this to interpolate to seismic stations. Thanks, Matt Thank you. > > Thanks, > > Matt > > > Regards > > -- > > jeremy theler > > www.seamplex.com > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Wed Sep 16 13:54:09 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Wed, 16 Sep 2020 18:54:09 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: <56056045-E253-44BE-AE4C-7EFE44D867ED@petsc.dev> References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> , <56056045-E253-44BE-AE4C-7EFE44D867ED@petsc.dev> Message-ID: Hi Barry, I don't think we're explicitly creating many PETSc vectors in Xolotl. There is a global one created for the solution when the TS is set up, and local ones in RHSFunction and RHSJacobian; everywhere else we just get the array from it with DMDAVecGetArrayDOF and DMDAVecRestoreArrayDOF. I tried a few things to see if it changed the number of Vec from 85 (removing monitors, fewer time steps, fewer MPI tasks) but it stayed the same, except when I changed the PC option from "-fieldsplit_1_pc_type redundant" to "-fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1" where I got 10567 vectors. Cheers, Sophie ________________________________ From: Barry Smith Sent: Tuesday, September 15, 2020 18:37 To: Blondel, Sophie Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Subject: Re: [petsc-users] Matrix Free Method questions Sophie, Great, everything looks good. So the new version takes about 7 times longer, due to the relatively modest increase (about 25 percent) in the number of iterations from the poorer preconditioner convergence and the rest from the much slower matrix-vector product due to using matrix free instead of matrix based precondtioner. Both of these are expected. The matrix is taking about 10% of the memory it used to require, also expected. I noticed in the logging the memory for the vectors Vector 85 85 82303208 0. Matrix 15 15 8744032 0. is substantial/huge, with the much smaller matrix memory the vector memory dominates. It indicates 85 vectors are used. This is a large number, there are some needed for the TS (maybe 5?) and some needed for the KSP solve (maybe about 37) but I am not sure why there are so many. Perhaps this number could be reduced. Are there are lot of vectors created in the Xolotyl code? I would it could run with about 45 vectors. Barry On Sep 15, 2020, at 5:12 PM, Blondel, Sophie > wrote: Hi Barry, I fixed everything and re-ran the 4 cases in 1D. They took more time than before because I used the Kokkos serial backend on the Xolotl side instead of the CUDA one previously (long story short, I tried to update CUDA and messed up the whole installation). Step 4 looks much better than prevously, I was even able to remove MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and it ran without throwing errors. The log files are attached. Cheers, Sophie ________________________________ From: Barry Smith > Sent: Friday, September 11, 2020 18:03 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions On Sep 11, 2020, at 7:45 AM, Blondel, Sophie > wrote: Thank you Barry, Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations. I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code). Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms? Yes, you need to leave in everything that affects the diagonal otherwise the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner and likely perform poorly. Barry Cheers, Sophie ________________________________ From: Barry Smith > Sent: Thursday, September 10, 2020 17:04 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions On Sep 10, 2020, at 2:46 PM, Blondel, Sophie > wrote: Hi Barry, Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). but now when I run step 3 again I get the following error: [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Argument out of range [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. Barry Cheers, Sophie ________________________________ From: Barry Smith > Sent: Friday, September 4, 2020 01:06 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. I have started looking through the logs The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. * It roughly doubles the time since doing the matrix-free product requires a function evaluation * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations The change from 2 to 3 (not storing the entire matrix) * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). The change from 3 to 4 * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. Good luck, Barry On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: Hi Barry, Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mercredi 2 septembre 2020 15:53 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: Thank you Barry, The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. Sophie, That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mardi 1 septembre 2020 14:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. Barry On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: Hi Barry, I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: Mat mat; DMCreateMatrix(da, &mat); MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 I get an error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c PetscSolver::solve: TSSolve failed. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : lundi 31 ao?t 2020 14:50 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. The factor of 4 is lot, the 1.5 not so bad. You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. If you have any questions please let me know. Barry On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: Hi Barry, I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: * 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) * 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 18:31 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 16 15:38:43 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 16 Sep 2020 15:38:43 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> <56056045-E253-44BE-AE4C-7EFE44D867ED@petsc.dev> Message-ID: Yikes, GAMG is using a lot of vectors. But many of these are much smaller vectors so not of major concern. I think this will just have to be an ongoing issue to see where the vectors are created internally and reuse or eliminate as many extra as possible. The option -log_view_memory causes the PETSc logging summary to print additional columns showing the memory allocated during the different events in PETSc. This can be useful to see "when" the memory is mostly created; it does not tell us "why" it is created but at least tells us were to look. Barry > On Sep 16, 2020, at 1:54 PM, Blondel, Sophie wrote: > > Hi Barry, > > I don't think we're explicitly creating many PETSc vectors in Xolotl. There is a global one created for the solution when the TS is set up, and local ones in RHSFunction and RHSJacobian; everywhere else we just get the array from it with DMDAVecGetArrayDOF and DMDAVecRestoreArrayDOF. I tried a few things to see if it changed the number of Vec from 85 (removing monitors, fewer time steps, fewer MPI tasks) but it stayed the same, except when I changed the PC option from "-fieldsplit_1_pc_type redundant" to "-fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1" where I got 10567 vectors. > > Cheers, > > Sophie > From: Barry Smith > > Sent: Tuesday, September 15, 2020 18:37 > To: Blondel, Sophie > > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Subject: Re: [petsc-users] Matrix Free Method questions > > > Sophie, > > Great, everything looks good. > > So the new version takes about 7 times longer, due to the relatively modest increase (about 25 percent) in the number of iterations from the poorer preconditioner convergence and the rest from the much slower matrix-vector product due to using matrix free instead of matrix based precondtioner. Both of these are expected. > > The matrix is taking about 10% of the memory it used to require, also expected. > > I noticed in the logging the memory for the vectors > > Vector 85 85 82303208 0. > Matrix 15 15 8744032 0. > > is substantial/huge, with the much smaller matrix memory the vector memory dominates. > > It indicates 85 vectors are used. This is a large number, there are some needed for the TS (maybe 5?) and some needed for the KSP solve (maybe about 37) but I am not sure why there are so many. Perhaps this number could be reduced. Are there are lot of vectors created in the Xolotyl code? I would it could run with about 45 vectors. > > Barry > > > > >> On Sep 15, 2020, at 5:12 PM, Blondel, Sophie > wrote: >> >> Hi Barry, >> >> I fixed everything and re-ran the 4 cases in 1D. They took more time than before because I used the Kokkos serial backend on the Xolotl side instead of the CUDA one previously (long story short, I tried to update CUDA and messed up the whole installation). Step 4 looks much better than prevously, I was even able to remove MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and it ran without throwing errors. The log files are attached. >> >> Cheers, >> >> Sophie >> From: Barry Smith > >> Sent: Friday, September 11, 2020 18:03 >> To: Blondel, Sophie > >> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Subject: Re: [petsc-users] Matrix Free Method questions >> >> >> >>> On Sep 11, 2020, at 7:45 AM, Blondel, Sophie > wrote: >>> >>> Thank you Barry, >>> >>> Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations. >>> >>> I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code). >>> >>> Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms? >> >> Yes, you need to leave in everything that affects the diagonal otherwise the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner and likely perform poorly. >> >> Barry >> >>> >>> Cheers, >>> >>> Sophie >>> >>> From: Barry Smith > >>> Sent: Thursday, September 10, 2020 17:04 >>> To: Blondel, Sophie > >>> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>> Subject: Re: [petsc-users] Matrix Free Method questions >>> >>> >>> >>>> On Sep 10, 2020, at 2:46 PM, Blondel, Sophie > wrote: >>>> >>>> Hi Barry, >>>> >>>> Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it >>> >>> I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). >>> >>>> but now when I run step 3 again I get the following error: >>>> >>>> [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [2]PETSC ERROR: Argument out of range >>>> [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix >>>> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 >>>> [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 >>>> [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 >>>> [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c >>>> [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>>> [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>>> [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>>> PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. >>>> >>>> Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. >>> >>> In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). >>> >>> The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. >>> >>> The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is >>> >>> When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). >>> >>> If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. >>> >>> Barry >>> >>> >>> >>> >>>> >>>> Cheers, >>>> >>>> Sophie >>>> >>>> >>>> From: Barry Smith > >>>> Sent: Friday, September 4, 2020 01:06 >>>> To: Blondel, Sophie > >>>> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>> Subject: Re: [petsc-users] Matrix Free Method questions >>>> >>>> >>>> Sophie, >>>> >>>> Thanks. I have started looking through the logs >>>> >>>> The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. >>>> >>>> * It roughly doubles the time since doing the matrix-free product requires a function evaluation >>>> >>>> * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations >>>> >>>> The change from 2 to 3 (not storing the entire matrix) >>>> >>>> * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix >>>> >>>> * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. >>>> >>>> * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect >>>> >>>> * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). >>>> >>>> The change from 3 to 4 >>>> >>>> * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. >>>> >>>> I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. >>>> >>>> To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. >>>> >>>> Good luck, >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>>> On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. >>>>> >>>>> Cheers, >>>>> >>>>> Sophie >>>>> >>>>> De : Barry Smith > >>>>> Envoy? : mercredi 2 septembre 2020 15:53 >>>>> ? : Blondel, Sophie > >>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>> >>>>> >>>>> >>>>>> On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: >>>>>> >>>>>> Thank you Barry, >>>>>> >>>>>> The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. >>>>> >>>>> Sophie, >>>>> >>>>> That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) >>>>> >>>>>> The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. >>>>> >>>>> We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator >>>>> >>>>> You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. >>>>> >>>>> Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. >>>>> >>>>> >>>>> Barry >>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Sophie >>>>>> De : Barry Smith > >>>>>> Envoy? : mardi 1 septembre 2020 14:12 >>>>>> ? : Blondel, Sophie > >>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>> >>>>>> >>>>>> Sophie, >>>>>> >>>>>> Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. >>>>>> >>>>>> I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>>> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: >>>>>>> >>>>>>> Hi Barry, >>>>>>> >>>>>>> I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: >>>>>>> Mat mat; >>>>>>> DMCreateMatrix(da, &mat); >>>>>>> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>>>> >>>>>>> When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 >>>>>>> >>>>>>> I get an error: >>>>>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined >>>>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 >>>>>>> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 >>>>>>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries >>>>>>> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c >>>>>>> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>>>> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>>>> [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c >>>>>>> [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>> [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c >>>>>>> [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c >>>>>>> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c >>>>>>> [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>>>> [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>>>> PetscSolver::solve: TSSolve failed. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Sophie >>>>>>> De : Barry Smith > >>>>>>> Envoy? : lundi 31 ao?t 2020 14:50 >>>>>>> ? : Blondel, Sophie > >>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>> >>>>>>> >>>>>>> Sophie, >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> The factor of 4 is lot, the 1.5 not so bad. >>>>>>> >>>>>>> You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. >>>>>>> >>>>>>> It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). >>>>>>> >>>>>>> Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. >>>>>>> >>>>>>> Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? >>>>>>> >>>>>>> Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>>>> >>>>>>> Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. >>>>>>> >>>>>>> A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. >>>>>>> >>>>>>> Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. >>>>>>> >>>>>>> Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. >>>>>>> >>>>>>> I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. >>>>>>> >>>>>>> >>>>>>> If you have any questions please let me know. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: >>>>>>>> >>>>>>>> Hi Barry, >>>>>>>> >>>>>>>> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: >>>>>>>> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) >>>>>>>> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Sophie >>>>>>>> De : Barry Smith > >>>>>>>> Envoy? : vendredi 28 ao?t 2020 18:31 >>>>>>>> ? : Blondel, Sophie > >>>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >>>>>>>>> >>>>>>>>> Thank you Jed and Barry, >>>>>>>>> >>>>>>>>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >>>>>>>>> >>>>>>>>> To answer questions about the current per-conditioners: >>>>>>>>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >>>>>>>>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >>>>>>>>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? >>>>>>>> >>>>>>>> Yes, the number of MatMult is a good enough surrogate. >>>>>>>> >>>>>>>> So using matrix-free (which means no preconditioning) has >>>>>>>> >>>>>>>> 35846/160 >>>>>>>> >>>>>>>> ans = >>>>>>>> >>>>>>>> 224.0375 >>>>>>>> >>>>>>>> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Sophie >>>>>>>>> De : Barry Smith > >>>>>>>>> Envoy? : vendredi 28 ao?t 2020 12:12 >>>>>>>>> ? : Blondel, Sophie > >>>>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>>>> >>>>>>>>> [External Email] >>>>>>>>> >>>>>>>>> Sophie, >>>>>>>>> >>>>>>>>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >>>>>>>>> >>>>>>>>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >>>>>>>>> >>>>>>>>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >>>>>>>>> >>>>>>>>>> -pc_fieldsplit_detect_coupling >>>>>>>>> >>>>>>>>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >>>>>>>>> >>>>>>>>>> -fieldsplit_0_pc_type sor >>>>>>>>> >>>>>>>>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >>>>>>>>> >>>>>>>>>> -fieldsplit_1_pc_type redundant >>>>>>>>> >>>>>>>>> >>>>>>>>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >>>>>>>>> >>>>>>>>> ---- >>>>>>>>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >>>>>>>>> >>>>>>>>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >>>>>>>>> >>>>>>>>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >>>>>>>>> >>>>>>>>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >>>>>>>>> >>>>>>>>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >>>>>>>>> >>>>>>>>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >>>>>>>>> >>>>>>>>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>>>>>>>>> >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>>>>>>>>> >>>>>>>>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>>>>>>>>> >>>>>>>>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> >>>>>>>>>> Sophie >>>>>>>>> >>>>>>>>> >>>>> >>>>> >>> >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexprescott at email.arizona.edu Wed Sep 16 18:09:22 2020 From: alexprescott at email.arizona.edu (Alexander B Prescott) Date: Wed, 16 Sep 2020 16:09:22 -0700 Subject: [petsc-users] Is using PETSc appropriate for this problem Message-ID: Hello PETSc listserv, This is an inquiry about code structure and the appropriateness of using SNES for a specific problem. I've found PETSc powerful and quite useful for my other problems, but for this application I'm concerned about computational overhead. Our setup involves many thousands of independent calls to the nonlinear solver on small subproblems, i.e. 2<=d.o.f.<=9. Speed of execution is the primary concern. Now straight to my questions: - does it even make sense to use PETSc for a problem like this? Would it be like using a nuclear reactor to warm a quesadilla? - if it does make sense, is it better to create/destroy the SNES structures with each new subproblem, OR to create the structures once and modify them every time? Best, Alexander -- Alexander Prescott alexprescott at email.arizona.edu PhD Candidate, The University of Arizona Department of Geosciences 1040 E. 4th Street Tucson, AZ, 85721 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 16 20:36:09 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 16 Sep 2020 20:36:09 -0500 Subject: [petsc-users] Is using PETSc appropriate for this problem In-Reply-To: References: Message-ID: Alexander, A few background questions. Do the small solves need to be done sequentially, that is is the input of one needed by the next or can many solves be done "at the same time". Would you be using Newton's method with an analytic Jacobian? For the larger problems 9 unknowns is there a consistent sparsity of the Jacobian (say 20 to 30 nonzeros) or are they essentially dense? Are the problems of varying nonlinearity, that is will some converge with say a couple of Newton iterations while others require more, say 8 or more Newton steps? > On Sep 16, 2020, at 6:09 PM, Alexander B Prescott wrote: > > Hello PETSc listserv, > > This is an inquiry about code structure and the appropriateness of using SNES for a specific problem. I've found PETSc powerful and quite useful for my other problems, but for this application I'm concerned about computational overhead. Our setup involves many thousands of independent calls to the nonlinear solver on small subproblems, i.e. 2<=d.o.f.<=9. Speed of execution is the primary concern. Now straight to my questions: > does it even make sense to use PETSc for a problem like this? Would it be like using a nuclear reactor to warm a quesadilla? There is a good deal of overhead for that small a problem size, but much of the overhead is in the initial construction of the PETSc objects, once they are created the extra overhead may be acceptable. There are plenty of tricks to bring down the extra overhead by avoiding the more expensive functions that are beneficial for larger problems but just add overhead for small problems, such as the calls to BLAS (and calls to more expensive linear solvers). The most extreme is to remove the use of the virtual functions and essentially inline everything, some of this might be automatable. > if it does make sense, is it better to create/destroy the SNES structures with each new subproblem, OR to create the structures once and modify them every time? You would definitely benefit from creating a SNES for each size 2 to 9 and reusing that one for all those of the same size. If you have hundreds of thousands that can be done simultaneously (but independently) then GPUs could perform extremely well. > Best, > Alexander > > -- > Alexander Prescott > alexprescott at email.arizona.edu > PhD Candidate, The University of Arizona > Department of Geosciences > 1040 E. 4th Street > Tucson, AZ, 85721 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexprescott at email.arizona.edu Wed Sep 16 22:28:08 2020 From: alexprescott at email.arizona.edu (Alexander B Prescott) Date: Wed, 16 Sep 2020 20:28:08 -0700 Subject: [petsc-users] [EXT]Re: Is using PETSc appropriate for this problem In-Reply-To: References: Message-ID: Hi Barry, thank you for the thoughtful response, I've answered your questions below. Best, Alexander On Wed, Sep 16, 2020 at 6:36 PM Barry Smith wrote: > *External Email* > > Alexander, > > A few background questions. > > Do the small solves need to be done sequentially, that is is the > input of one needed by the next or can many solves be done "at the same > time". > Sequentially > Would you be using Newton's method with an analytic Jacobian? > Yes > For the larger problems 9 unknowns is there a consistent sparsity of > the Jacobian (say 20 to 30 nonzeros) or are they essentially dense? > Dense > Are the problems of varying nonlinearity, that is will some converge > with say a couple of Newton iterations while others require more, say 8 or > more Newton steps? > The nonlinearity should be pretty similar, the problem setup is the same at every node but the global domain needs to be traversed in a specific order. > > On Sep 16, 2020, at 6:09 PM, Alexander B Prescott < > alexprescott at email.arizona.edu> wrote: > > Hello PETSc listserv, > > This is an inquiry about code structure and the appropriateness of using > SNES for a specific problem. I've found PETSc powerful and quite useful for > my other problems, but for this application I'm concerned about > computational overhead. Our setup involves many thousands of independent > calls to the nonlinear solver on small subproblems, i.e. 2<=d.o.f.<=9. > Speed of execution is the primary concern. Now straight to my questions: > > - does it even make sense to use PETSc for a problem like this? Would > it be like using a nuclear reactor to warm a quesadilla? > > > There is a good deal of overhead for that small a problem size, but > much of the overhead is in the initial construction of the PETSc objects, > once they are created the extra overhead may be acceptable. There are > plenty of tricks to bring down the extra overhead by avoiding the more > expensive functions that are beneficial for larger problems but just add > overhead for small problems, such as the calls to BLAS (and calls to more > expensive linear solvers). The most extreme is to remove the use of the > virtual functions and essentially inline everything, some of this might be > automatable. > > > - if it does make sense, is it better to create/destroy the SNES > structures with each new subproblem, OR to create the structures once and > modify them every time? > > You would definitely benefit from creating a SNES for each size 2 to 9 > and reusing that one for all those of the same size. > > If you have hundreds of thousands that can be done simultaneously > (but independently) then GPUs could perform extremely well. > > > Best, > Alexander > > -- > Alexander Prescott > alexprescott at email.arizona.edu > PhD Candidate, The University of Arizona > Department of Geosciences > 1040 E. 4th Street > Tucson, AZ, 85721 > > > -- Alexander Prescott alexprescott at email.arizona.edu PhD Candidate, The University of Arizona Department of Geosciences 1040 E. 4th Street Tucson, AZ, 85721 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Sep 16 23:23:23 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 16 Sep 2020 22:23:23 -0600 Subject: [petsc-users] [EXT]Re: Is using PETSc appropriate for this problem In-Reply-To: References: Message-ID: <87v9gduj44.fsf@jedbrown.org> Alexander B Prescott writes: >> Are the problems of varying nonlinearity, that is will some converge >> with say a couple of Newton iterations while others require more, say 8 or >> more Newton steps? >> > The nonlinearity should be pretty similar, the problem setup is the same at > every node but the global domain needs to be traversed in a specific order. It sounds like you may have a Newton solver now for each individual problem? If so, could you make a histogram of number of iterations necessary to solve? Does it have a long tail or does every problem take 3 and 4 iterations (for example). If there is no long tail, then you can batch. If there is a long tail, you really want a solver that does one problem at a time, or a more dynamic system that checks which have completed and shrinks the active problem down. (That complexity has a development and execution time cost.) From bsmith at petsc.dev Thu Sep 17 00:56:26 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Sep 2020 00:56:26 -0500 Subject: [petsc-users] [EXT]Re: Is using PETSc appropriate for this problem In-Reply-To: <87v9gduj44.fsf@jedbrown.org> References: <87v9gduj44.fsf@jedbrown.org> Message-ID: <0AF52116-E40F-4D3D-92F1-8E6F5DB9ACF3@petsc.dev> > On Sep 16, 2020, at 11:23 PM, Jed Brown wrote: > > Alexander B Prescott writes: > >>> Are the problems of varying nonlinearity, that is will some converge >>> with say a couple of Newton iterations while others require more, say 8 or >>> more Newton steps? >>> >> The nonlinearity should be pretty similar, the problem setup is the same at >> every node but the global domain needs to be traversed in a specific order. Sounds a bit like a non-smoother (Gauss-Seidel type), speculating based on these few words. > > > It sounds like you may have a Newton solver now for each individual problem? If so, could you make a histogram of number of iterations necessary to solve? Does it have a long tail or does every problem take 3 and 4 iterations (for example). > > If there is no long tail, then you can batch. If there is a long tail, you really want a solver that does one problem at a time, or a more dynamic system that checks which have completed and shrinks the active problem down. (That complexity has a development and execution time cost.) From knepley at gmail.com Thu Sep 17 08:04:41 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Sep 2020 09:04:41 -0400 Subject: [petsc-users] [EXT]Re: Is using PETSc appropriate for this problem In-Reply-To: <87v9gduj44.fsf@jedbrown.org> References: <87v9gduj44.fsf@jedbrown.org> Message-ID: On Thu, Sep 17, 2020 at 12:23 AM Jed Brown wrote: > Alexander B Prescott writes: > > >> Are the problems of varying nonlinearity, that is will some > converge > >> with say a couple of Newton iterations while others require more, say 8 > or > >> more Newton steps? > >> > > The nonlinearity should be pretty similar, the problem setup is the same > at > > every node but the global domain needs to be traversed in a specific > order. > > > It sounds like you may have a Newton solver now for each individual > problem? If so, could you make a histogram of number of iterations > necessary to solve? Does it have a long tail or does every problem take 3 > and 4 iterations (for example). > > If there is no long tail, then you can batch. If there is a long tail, > you really want a solver that does one problem at a time, or a more dynamic > system that checks which have completed and shrinks the active problem > down. (That complexity has a development and execution time cost.) > He cannot batch if the solves are sequential, as he says above. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yann.jobic at univ-amu.fr Thu Sep 17 09:48:27 2020 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Thu, 17 Sep 2020 16:48:27 +0200 Subject: [petsc-users] 1D parallel split of dense matrix Message-ID: Hi all, I want to decompose a matrix in parallel for only one direction, the columns, for a dense matrix, loaded from a binary file. I tried to use : MatCreateDense(PETSC_COMM_WORLD,NumberOfRhs,matLocNbCols,NumberOfRhs,matNbCols,NULL,&NRhs); with matLocNbCols the number of local columns of a linear system (matrix A). However i've got an error when i use MatLoad : Read from file failed, read past end of file. Maybe because the number of local rows is equal to the number of global ones (variable NumberOfRhs) ? The sequential version runs fine. Is it possible ? I'm trying to set up a specific rhs matrix in order to solve the same linear system (ksp) with different right hand sides (rhs). In order to do so, i transposed the matrix containing all the rhs in a file, which i would like to load in parallel. The idea is to use MatGetRow, in order to create the rhs Vec, and then call KSPSolve (the inverse matrix of A should have been kept). However, i would like a particular setup (1D parallel split) for the parallel layout of the rhs matrix : The number of columns of A is equal to the number of column of the matrix rhs, and the number of rows of rhs is equal to the number of rhs that i want to solve. I want this rhs to be split in the same way of the linear system, and i want each processors to have the whole global number of rows, that is to say, i want a 1D decomposition of my rhs matrix (corresponding to MatCreateVecs). I then have to copy a row of rhs in a vec, and solve, for all the rows of rhs. Is it a bad way to handle a N rhs problem with KSPSolve ? Thanks, Yann ps : the error : Reading ComplexMatrix.bin Matrix... Matrix ComplexMatrix.bin read global size of A matrix (583335,583335) PROC : 1 Number of local columns of the matrix A : 291667 PROC : 1 Number of local columns of the vector : 291667 global size of NRhs Matrix : (15,583335) PROC : 0 Number of local columns of the matrix A : 291668 PROC : 0 Number of local columns of the vector : 291668 Reading NRhs ComplexRhs.bin Matrix... [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Read from file failed [0]PETSC ERROR: Read past end of file From adantra at gmail.com Thu Sep 17 11:00:32 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Thu, 17 Sep 2020 11:00:32 -0500 Subject: [petsc-users] Overhead in snes function calls Message-ID: I am porting a non-linear solution problem implemented explicitly to an snes implementation. Everything seems to be working fine, except for the fact that function calls done from snes to my jacobian and residual construction subroutines are slower than the regular direct call to the same subroutines (I hope this is clear). I wonder if somebody have observed this behavior and found a solution. Regards, Adolfo -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.jolivet at enseeiht.fr Thu Sep 17 11:08:53 2020 From: pierre.jolivet at enseeiht.fr (Pierre Jolivet) Date: Thu, 17 Sep 2020 18:08:53 +0200 Subject: [petsc-users] 1D parallel split of dense matrix Message-ID: Hello Yann, This is probably not fully answering your question, but the proper way to solve a system with N RHS is _not_ to use KSPSolve(), but instead KSPMatSolve(), cf. https://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/KSP/KSPMatSolve.html . If you are tracking master (from the GitLab repository), it?s available out of the box. If you are using the release tarballs, it will be available in 3.14.0 scheduled to be released in a couple of days. If you want to know more about the current status of block solvers in PETSc, please feel free to have a look at this preprint: http://jolivet.perso.enseeiht.fr/article.pdf If you are using a specific PC which is not ?multi-RHS ready?, see the list at the top of page 5, please let me know and I?ll tell you how easy to support it. Thanks, Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu Sep 17 11:29:12 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 17 Sep 2020 18:29:12 +0200 Subject: [petsc-users] 1D parallel split of dense matrix In-Reply-To: References: Message-ID: Yann you want to have the number of columns equal to the number of rhs. Not the number of rows. This is also consistent in terms of logical layouts. Simply create and dump the matrix this way and you can read in parallel splitting yhe rows ( ie the logical size per process of the vectors) Il Gio 17 Set 2020, 18:09 Pierre Jolivet ha scritto: > Hello Yann, > This is probably not fully answering your question, but the proper way to > solve a system with N RHS is _not_ to use KSPSolve(), but instead > KSPMatSolve(), cf. > https://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/KSP/KSPMatSolve.html > . > If you are tracking master (from the GitLab repository), it?s available > out of the box. If you are using the release tarballs, it will be available > in 3.14.0 scheduled to be released in a couple of days. > If you want to know more about the current status of block solvers in > PETSc, please feel free to have a look at this preprint: > http://jolivet.perso.enseeiht.fr/article.pdf > If you are using a specific PC which is not ?multi-RHS ready?, see the > list at the top of page 5, please let me know and I?ll tell you how easy to > support it. > Thanks, > Pierre > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 17 12:05:00 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Sep 2020 13:05:00 -0400 Subject: [petsc-users] Overhead in snes function calls In-Reply-To: References: Message-ID: On Thu, Sep 17, 2020 at 12:00 PM Adolfo Rodriguez wrote: > I am porting a non-linear solution problem implemented explicitly to an > snes implementation. Everything seems to be working fine, except for the > fact that function calls done from snes to my jacobian and residual > construction subroutines are slower than the regular direct call to the > same subroutines (I hope this is clear). I wonder if somebody have observed > this behavior and found a solution. > It is not quite clear. I do not see a way that the calls themselves are slower, but perhaps there are intervening computations? How are you timing things. Thanks, Matt > Regards, > > Adolfo > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From adantra at gmail.com Thu Sep 17 12:55:09 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Thu, 17 Sep 2020 12:55:09 -0500 Subject: [petsc-users] Overhead in snes function calls In-Reply-To: References: Message-ID: Sorry I was not clear before. I have to call some functions in order to assemble the Jacobian. I created a FormJacobian function according to the specifications and set this function using SNESSetJacobian (snes,J,J,FormJacobian,&ctx); within FormJacobian I need to make some calls to some functions implemented in my code which is written in c++. These functions are passed through the context pointer (ctx) I noticed that when I call these functions within FormJacobian, there is an overhead which in my case is about 0.08 seconds per call and I need to make a few calls like this. I am measuring the time by doing clock_t t0; t0=clock(); function call dtime = (double)(clock()-t0)/CLOCKS_PER_SEC; When I do this call and compute the time anywhere in y code I found dtime = x seconds, while if I make the same call inside FormJacobian I get dtime ~ x + 0.08 seconds. I hope the explanation is clear. Any clue? Adolfo On Thu, Sep 17, 2020 at 12:05 PM Matthew Knepley wrote: > On Thu, Sep 17, 2020 at 12:00 PM Adolfo Rodriguez > wrote: > >> I am porting a non-linear solution problem implemented explicitly to an >> snes implementation. Everything seems to be working fine, except for the >> fact that function calls done from snes to my jacobian and residual >> construction subroutines are slower than the regular direct call to the >> same subroutines (I hope this is clear). I wonder if somebody have observed >> this behavior and found a solution. >> > > It is not quite clear. I do not see a way that the calls themselves are > slower, but perhaps there are intervening computations? How are you timing > things. > > Thanks, > > Matt > > >> Regards, >> >> Adolfo >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 17 13:52:07 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 17 Sep 2020 14:52:07 -0400 Subject: [petsc-users] osx error Message-ID: I rebased over master and started getting this error. I did reinstall MPICH (brew) recently. Any ideas? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 936555 bytes Desc: not available URL: From alexprescott at email.arizona.edu Thu Sep 17 13:53:48 2020 From: alexprescott at email.arizona.edu (Alexander B Prescott) Date: Thu, 17 Sep 2020 11:53:48 -0700 Subject: [petsc-users] [EXT]Re: Is using PETSc appropriate for this problem In-Reply-To: References: <87v9gduj44.fsf@jedbrown.org> Message-ID: Thank you all for your input. Matt is right, I cannot batch as this formulation must be done sequentially. >> Sounds a bit like a non-smoother (Gauss-Seidel type), speculating based on these few words. Barry, it is similar to a Gauss-Seidel solver in that solution updates from previous solves are used in the most recent Newton solve, though I'm not exactly sure what you mean by "non-smoother". Best, Alexander On Thu, Sep 17, 2020 at 6:06 AM Matthew Knepley wrote: > *External Email* > On Thu, Sep 17, 2020 at 12:23 AM Jed Brown wrote: > >> Alexander B Prescott writes: >> >> >> Are the problems of varying nonlinearity, that is will some >> converge >> >> with say a couple of Newton iterations while others require more, say >> 8 or >> >> more Newton steps? >> >> >> > The nonlinearity should be pretty similar, the problem setup is the >> same at >> > every node but the global domain needs to be traversed in a specific >> order. >> >> >> It sounds like you may have a Newton solver now for each individual >> problem? If so, could you make a histogram of number of iterations >> necessary to solve? Does it have a long tail or does every problem take 3 >> and 4 iterations (for example). >> >> If there is no long tail, then you can batch. If there is a long tail, >> you really want a solver that does one problem at a time, or a more dynamic >> system that checks which have completed and shrinks the active problem >> down. (That complexity has a development and execution time cost.) >> > > He cannot batch if the solves are sequential, as he says above. > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Alexander Prescott alexprescott at email.arizona.edu PhD Candidate, The University of Arizona Department of Geosciences 1040 E. 4th Street Tucson, AZ, 85721 -------------- next part -------------- An HTML attachment was scrubbed... URL: From adantra at gmail.com Thu Sep 17 15:30:12 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Thu, 17 Sep 2020 15:30:12 -0500 Subject: [petsc-users] Overhead in snes function calls In-Reply-To: References: Message-ID: Disregard this thread. I think the problem has to do with my data structure. Thanks! On Thu, Sep 17, 2020 at 12:55 PM Adolfo Rodriguez wrote: > Sorry I was not clear before. > > I have to call some functions in order to assemble the Jacobian. I created > a FormJacobian function according to the specifications and set this > function using > > SNESSetJacobian (snes,J,J,FormJacobian,&ctx); > > within FormJacobian I need to make some calls to some functions implemented in my code which is written in c++. These functions are passed through the context pointer (ctx) > > I noticed that when I call these functions within FormJacobian, there is an overhead which in my case is about 0.08 seconds per call and I need to make a few calls like this. > > I am measuring the time by doing > > clock_t t0; > > t0=clock(); > > function call > > dtime = (double)(clock()-t0)/CLOCKS_PER_SEC; > > > When I do this call and compute the time anywhere in y code I found dtime = x seconds, while if I make the same call inside FormJacobian I get dtime ~ x + 0.08 seconds. > > > I hope the explanation is clear. > > > Any clue? > > > Adolfo > > > On Thu, Sep 17, 2020 at 12:05 PM Matthew Knepley > wrote: > >> On Thu, Sep 17, 2020 at 12:00 PM Adolfo Rodriguez >> wrote: >> >>> I am porting a non-linear solution problem implemented explicitly to an >>> snes implementation. Everything seems to be working fine, except for the >>> fact that function calls done from snes to my jacobian and residual >>> construction subroutines are slower than the regular direct call to the >>> same subroutines (I hope this is clear). I wonder if somebody have observed >>> this behavior and found a solution. >>> >> >> It is not quite clear. I do not see a way that the calls themselves are >> slower, but perhaps there are intervening computations? How are you timing >> things. >> >> Thanks, >> >> Matt >> >> >>> Regards, >>> >>> Adolfo >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 17 16:05:50 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Sep 2020 17:05:50 -0400 Subject: [petsc-users] [EXT]Re: Is using PETSc appropriate for this problem In-Reply-To: References: <87v9gduj44.fsf@jedbrown.org> Message-ID: On Thu, Sep 17, 2020 at 2:54 PM Alexander B Prescott < alexprescott at email.arizona.edu> wrote: > Thank you all for your input. Matt is right, I cannot batch as this > formulation must be done sequentially. > > >> Sounds a bit like a non-smoother (Gauss-Seidel type), speculating > based on these few words. > > Barry, it is similar to a Gauss-Seidel solver in that solution updates > from previous solves are used in the most recent Newton solve, though I'm > not exactly sure what you mean by "non-smoother". > He means a nonlinear smoother. You iterate over your domain solving small nonlinear problems in order to get closer to the solution of the big nonlinear problem. Depending on what you are doing, it might be possible to decouple these, which would likely be much more efficient. Thanks, Matt > Best, > Alexander > > > > On Thu, Sep 17, 2020 at 6:06 AM Matthew Knepley wrote: > >> *External Email* >> On Thu, Sep 17, 2020 at 12:23 AM Jed Brown wrote: >> >>> Alexander B Prescott writes: >>> >>> >> Are the problems of varying nonlinearity, that is will some >>> converge >>> >> with say a couple of Newton iterations while others require more, say >>> 8 or >>> >> more Newton steps? >>> >> >>> > The nonlinearity should be pretty similar, the problem setup is the >>> same at >>> > every node but the global domain needs to be traversed in a specific >>> order. >>> >>> >>> It sounds like you may have a Newton solver now for each individual >>> problem? If so, could you make a histogram of number of iterations >>> necessary to solve? Does it have a long tail or does every problem take 3 >>> and 4 iterations (for example). >>> >>> If there is no long tail, then you can batch. If there is a long tail, >>> you really want a solver that does one problem at a time, or a more dynamic >>> system that checks which have completed and shrinks the active problem >>> down. (That complexity has a development and execution time cost.) >>> >> >> He cannot batch if the solves are sequential, as he says above. >> >> Matt >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Alexander Prescott > alexprescott at email.arizona.edu > PhD Candidate, The University of Arizona > Department of Geosciences > 1040 E. 4th Street > Tucson, AZ, 85721 > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yann.jobic at univ-amu.fr Thu Sep 17 16:26:20 2020 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Thu, 17 Sep 2020 23:26:20 +0200 Subject: [petsc-users] 1D parallel split of dense matrix In-Reply-To: References: Message-ID: <40fb5134-f43f-eadc-5531-313d6e6db18a@univ-amu.fr> Hello Pierre, I just tested it, it's working just fine ! I thought i could have problems using mumps, as i rapidly read your article, and that i may be using MatMatSolve, but no, it's working. I have correct results. This interface for multi-RHS is indeed very handy. One question : I used KSPSetMatSolveBlockSize(ksp,1); I don't know what it is doing, i saw it on your example. Is it mandatory ? Thanks a lot, Yann Le 9/17/2020 ? 6:08 PM, Pierre Jolivet a ?crit?: > Hello Yann, > This is probably not fully answering your question, but the proper way > to solve a system with N RHS is _not_ to use KSPSolve(), but instead > KSPMatSolve(), cf. > https://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/KSP/KSPMatSolve.html. > If you are tracking master (from the GitLab repository), it?s available > out of the box. If you are using the release tarballs, it will be > available in 3.14.0 scheduled to be released in a couple of days. > If you want to know more about the current status of block solvers in > PETSc, please feel free to have a look at this preprint: > http://jolivet.perso.enseeiht.fr/article.pdf > If you are using a specific PC which is not ?multi-RHS ready?, see the > list at the top of page 5, please let me know and I?ll tell you how easy > to support it. > Thanks, > Pierre From balay at mcs.anl.gov Thu Sep 17 16:31:23 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 17 Sep 2020 16:31:23 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: On Thu, 17 Sep 2020, Mark Adams wrote: > I rebased over master and started getting this error. > I did reinstall MPICH (brew) recently. > Any ideas? > Thanks, > Mark ------------------------------------------------------------------------------- Exception: Your hostname will not work with MPI, perhaps you have VPN running whose network settings may not play well with MPI or your network is misconfigured ******************************************************************************* Ok - this is a new test that got added to check for broken network that breaks MPI. Here is the check: Executing: ping -c 2 MarksMac-302.local The check says you have broken network settings. [as its not responding to ping.] Does MPI work fine on this box? You can try disabling this check (manually) - and do the build, and run Does MPI run fine? Satish >>>>>>>>>>> diff --git a/config/BuildSystem/config/packages/MPI.py b/config/BuildSystem/config/packages/MPI.py index 2e130fdcfe..8464de6773 100644 --- a/config/BuildSystem/config/packages/MPI.py +++ b/config/BuildSystem/config/packages/MPI.py @@ -267,7 +267,7 @@ shared libraries and run with --known-mpi-shared-libraries=1') if ret != 0: raise RuntimeError(errormessage+" Return code %s\n" % ret) except: - raise RuntimeError("Exception: "+errormessage) + pass else: self.logPrint("Unable to get result from hostname, skipping ping check\n") From yann.jobic at univ-amu.fr Thu Sep 17 16:36:05 2020 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Thu, 17 Sep 2020 23:36:05 +0200 Subject: [petsc-users] 1D parallel split of dense matrix In-Reply-To: References: Message-ID: Hi Stefano, You're right, i just switched the rows and colomns, and it's working fine. However, i've done this in order to have better performance when i access the rows, with MatGetRow. With this order, i need MatGetColumnVector, which is believe is kind of slow, as stated in the comments of the implementation (but i couldn't find it again, maybe i was looking at an old implementation?). I still don't understand why i can not use the transpose of this matrix, as i give the right parallel decomposition. It should be a bijective operation no ? I think i'll be using the KSPMatSolve of Pierre, so i don't have to redevelop this part. Thanks a lot, Yann Le 9/17/2020 ? 6:29 PM, Stefano Zampini a ?crit?: > Yann > > ?you want to have the number of columns equal to the number of rhs. Not > the number of rows. > This is also consistent in terms of logical layouts. Simply create and > dump the matrix this way and you can read in parallel splitting yhe rows > ( ie the logical size per process of the vectors) > > Il Gio 17 Set 2020, 18:09 Pierre Jolivet > ha scritto: > > Hello Yann, > This is probably not fully answering your question, but the proper > way to solve a system with N RHS is _not_ to use KSPSolve(), but > instead KSPMatSolve(), cf. > https://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/KSP/KSPMatSolve.html. > If you are tracking master (from the GitLab repository), it?s > available out of the box. If you are using the release tarballs, it > will be available in 3.14.0 scheduled to be released in a couple of > days. > If you want to know more about the current status of block solvers > in PETSc, please feel free to have a look at this preprint: > http://jolivet.perso.enseeiht.fr/article.pdf > If you are using a specific PC which is not ?multi-RHS ready?, see > the list at the top of page 5, please let me know and I?ll tell you > how easy to support it. > Thanks, > Pierre > From knepley at gmail.com Thu Sep 17 16:47:42 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Sep 2020 17:47:42 -0400 Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: We should have an option to disable the test. Matt On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < petsc-users at mcs.anl.gov> wrote: > On Thu, 17 Sep 2020, Mark Adams wrote: > > > I rebased over master and started getting this error. > > I did reinstall MPICH (brew) recently. > > Any ideas? > > Thanks, > > Mark > > > ------------------------------------------------------------------------------- > Exception: Your hostname will not work with MPI, perhaps you have VPN > running whose network settings may not play well with MPI or your network > is misconfigured > > ******************************************************************************* > > > Ok - this is a new test that got added to check for broken network that > breaks MPI. > > Here is the check: > > Executing: ping -c 2 MarksMac-302.local > > The check says you have broken network settings. [as its not responding to > ping.] > > Does MPI work fine on this box? You can try disabling this check > (manually) - and do the build, and run > > Does MPI run fine? > > Satish > > > >>>>>>>>>>> > > diff --git a/config/BuildSystem/config/packages/MPI.py > b/config/BuildSystem/config/packages/MPI.py > index 2e130fdcfe..8464de6773 100644 > --- a/config/BuildSystem/config/packages/MPI.py > +++ b/config/BuildSystem/config/packages/MPI.py > @@ -267,7 +267,7 @@ shared libraries and run with > --known-mpi-shared-libraries=1') > if ret != 0: > raise RuntimeError(errormessage+" Return code %s\n" % ret) > except: > - raise RuntimeError("Exception: "+errormessage) > + pass > else: > self.logPrint("Unable to get result from hostname, skipping > ping check\n") > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Sep 17 16:59:38 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 17 Sep 2020 16:59:38 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: Here is a fix: echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts Satish On Thu, 17 Sep 2020, Matthew Knepley wrote: > We should have an option to disable the test. > > Matt > > On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > On Thu, 17 Sep 2020, Mark Adams wrote: > > > > > I rebased over master and started getting this error. > > > I did reinstall MPICH (brew) recently. > > > Any ideas? > > > Thanks, > > > Mark > > > > > > ------------------------------------------------------------------------------- > > Exception: Your hostname will not work with MPI, perhaps you have VPN > > running whose network settings may not play well with MPI or your network > > is misconfigured > > > > ******************************************************************************* > > > > > > Ok - this is a new test that got added to check for broken network that > > breaks MPI. > > > > Here is the check: > > > > Executing: ping -c 2 MarksMac-302.local > > > > The check says you have broken network settings. [as its not responding to > > ping.] > > > > Does MPI work fine on this box? You can try disabling this check > > (manually) - and do the build, and run > > > > Does MPI run fine? > > > > Satish > > > > > > >>>>>>>>>>> > > > > diff --git a/config/BuildSystem/config/packages/MPI.py > > b/config/BuildSystem/config/packages/MPI.py > > index 2e130fdcfe..8464de6773 100644 > > --- a/config/BuildSystem/config/packages/MPI.py > > +++ b/config/BuildSystem/config/packages/MPI.py > > @@ -267,7 +267,7 @@ shared libraries and run with > > --known-mpi-shared-libraries=1') > > if ret != 0: > > raise RuntimeError(errormessage+" Return code %s\n" % ret) > > except: > > - raise RuntimeError("Exception: "+errormessage) > > + pass > > else: > > self.logPrint("Unable to get result from hostname, skipping > > ping check\n") > > > > > > From balay at mcs.anl.gov Thu Sep 17 17:00:26 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 17 Sep 2020 17:00:26 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: But would really like to know if the test is a false positive. Satish On Thu, 17 Sep 2020, Satish Balay wrote: > Here is a fix: > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > Satish > > On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > We should have an option to disable the test. > > > > Matt > > > > On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > > > On Thu, 17 Sep 2020, Mark Adams wrote: > > > > > > > I rebased over master and started getting this error. > > > > I did reinstall MPICH (brew) recently. > > > > Any ideas? > > > > Thanks, > > > > Mark > > > > > > > > > ------------------------------------------------------------------------------- > > > Exception: Your hostname will not work with MPI, perhaps you have VPN > > > running whose network settings may not play well with MPI or your network > > > is misconfigured > > > > > > ******************************************************************************* > > > > > > > > > Ok - this is a new test that got added to check for broken network that > > > breaks MPI. > > > > > > Here is the check: > > > > > > Executing: ping -c 2 MarksMac-302.local > > > > > > The check says you have broken network settings. [as its not responding to > > > ping.] > > > > > > Does MPI work fine on this box? You can try disabling this check > > > (manually) - and do the build, and run > > > > > > Does MPI run fine? > > > > > > Satish > > > > > > > > > >>>>>>>>>>> > > > > > > diff --git a/config/BuildSystem/config/packages/MPI.py > > > b/config/BuildSystem/config/packages/MPI.py > > > index 2e130fdcfe..8464de6773 100644 > > > --- a/config/BuildSystem/config/packages/MPI.py > > > +++ b/config/BuildSystem/config/packages/MPI.py > > > @@ -267,7 +267,7 @@ shared libraries and run with > > > --known-mpi-shared-libraries=1') > > > if ret != 0: > > > raise RuntimeError(errormessage+" Return code %s\n" % ret) > > > except: > > > - raise RuntimeError("Exception: "+errormessage) > > > + pass > > > else: > > > self.logPrint("Unable to get result from hostname, skipping > > > ping check\n") > > > > > > > > > > > > From jacob.fai at gmail.com Thu Sep 17 17:02:21 2020 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Thu, 17 Sep 2020 18:02:21 -0400 Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: <06F0F722-0737-4AE0-A7A9-D36CCC311B43@gmail.com> +1 to this fix, I had similar issues a while back except I was getting the following error during make check (or running any mpi code for that matter) after petsc was built: Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(467)..............: MPID_Init(177).....................: channel initialization failed MPIDI_CH3_Init(70).................: MPID_nem_init(319).................: MPID_nem_tcp_init(171).............: MPID_nem_tcp_get_business_card(418): MPID_nem_tcp_init(377).............: gethostbyname failed, localhost (errno 3) Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 > On Sep 17, 2020, at 17:59, Satish Balay via petsc-users wrote: > > Here is a fix: > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > Satish > > On Thu, 17 Sep 2020, Matthew Knepley wrote: > >> We should have an option to disable the test. >> >> Matt >> >> On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> On Thu, 17 Sep 2020, Mark Adams wrote: >>> >>>> I rebased over master and started getting this error. >>>> I did reinstall MPICH (brew) recently. >>>> Any ideas? >>>> Thanks, >>>> Mark >>> >>> >>> ------------------------------------------------------------------------------- >>> Exception: Your hostname will not work with MPI, perhaps you have VPN >>> running whose network settings may not play well with MPI or your network >>> is misconfigured >>> >>> ******************************************************************************* >>> >>> >>> Ok - this is a new test that got added to check for broken network that >>> breaks MPI. >>> >>> Here is the check: >>> >>> Executing: ping -c 2 MarksMac-302.local >>> >>> The check says you have broken network settings. [as its not responding to >>> ping.] >>> >>> Does MPI work fine on this box? You can try disabling this check >>> (manually) - and do the build, and run >>> >>> Does MPI run fine? >>> >>> Satish >>> >>> >>>>>>>>>>>>>> >>> >>> diff --git a/config/BuildSystem/config/packages/MPI.py >>> b/config/BuildSystem/config/packages/MPI.py >>> index 2e130fdcfe..8464de6773 100644 >>> --- a/config/BuildSystem/config/packages/MPI.py >>> +++ b/config/BuildSystem/config/packages/MPI.py >>> @@ -267,7 +267,7 @@ shared libraries and run with >>> --known-mpi-shared-libraries=1') >>> if ret != 0: >>> raise RuntimeError(errormessage+" Return code %s\n" % ret) >>> except: >>> - raise RuntimeError("Exception: "+errormessage) >>> + pass >>> else: >>> self.logPrint("Unable to get result from hostname, skipping >>> ping check\n") >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 17 19:06:10 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Sep 2020 19:06:10 -0500 Subject: [petsc-users] 1D parallel split of dense matrix In-Reply-To: References: Message-ID: <7E4CE220-BB86-43FF-8E57-D34F7CF183E5@petsc.dev> Yann, I'm sorry I don't understand what you are explaining but I can explain my understanding of multiply right hand sides with linear solves. Usually when people want to solve linear systems in PETSc with multiple right hand sides thye use MatMatSolve(), this efficiently solves for multiple right hand sides "at the same time." MatMatSolve(A,B,X). B is a dense matrix with the same row partitioning as A and one column for each right hand side b and right hand side solution x. You don't do anything with the local column sizes of B, they are meaningless for MatMatSolve(). PETSc does not provide an interface for solving with multiply right hand sides with iterative solvers (using KSP) "all at once). Barry Note you can use MatDenseGetArray() and VecPlace array to share memory between a column of the dense matrix and a vector; both the Mat and Vec need to have the same row layout across the processes. > On Sep 17, 2020, at 9:48 AM, Yann Jobic wrote: > > Hi all, > > I want to decompose a matrix in parallel for only one direction, the columns, for a dense matrix, loaded from a binary file. I tried to use : > MatCreateDense(PETSC_COMM_WORLD,NumberOfRhs,matLocNbCols,NumberOfRhs,matNbCols,NULL,&NRhs); > with matLocNbCols the number of local columns of a linear system (matrix A). > However i've got an error when i use MatLoad : Read from file failed, read past end of file. > Maybe because the number of local rows is equal to the number of global ones (variable NumberOfRhs) ? > The sequential version runs fine. > > Is it possible ? > > I'm trying to set up a specific rhs matrix in order to solve the same linear system (ksp) with different right hand sides (rhs). > In order to do so, i transposed the matrix containing all the rhs in a file, which i would like to load in parallel. The idea is to use MatGetRow, in order to create the rhs Vec, and then call KSPSolve (the inverse matrix of A should have been kept). > > However, i would like a particular setup (1D parallel split) for the parallel layout of the rhs matrix : The number of columns of A is equal to the number of column of the matrix rhs, and the number of rows of rhs is equal to the number of rhs that i want to solve. I want this rhs to be split in the same way of the linear system, and i want each processors to have the whole global number of rows, that is to say, i want a 1D decomposition of my rhs matrix (corresponding to MatCreateVecs). > > I then have to copy a row of rhs in a vec, and solve, for all the rows of rhs. > > Is it a bad way to handle a N rhs problem with KSPSolve ? > > Thanks, > > Yann > > ps : the error : > Reading ComplexMatrix.bin Matrix... > Matrix ComplexMatrix.bin read > global size of A matrix (583335,583335) > PROC : 1 Number of local columns of the matrix A : 291667 > PROC : 1 Number of local columns of the vector : 291667 > global size of NRhs Matrix : (15,583335) > PROC : 0 Number of local columns of the matrix A : 291668 > PROC : 0 Number of local columns of the vector : 291668 > Reading NRhs ComplexRhs.bin Matrix... > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Read from file failed > [0]PETSC ERROR: Read past end of file From bsmith at petsc.dev Thu Sep 17 19:19:58 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Sep 2020 19:19:58 -0500 Subject: [petsc-users] [EXT]Re: Is using PETSc appropriate for this problem In-Reply-To: References: <87v9gduj44.fsf@jedbrown.org> Message-ID: <25A4D2DF-654D-4453-9153-CBF38264859A@petsc.dev> Yes, sorry, I meant to write non-linear smoother. What Matt is saying is that just like non-linear Gauss-Seidel has the linear version linear Gauss-Seidel there is also a non-linear Jacobi. Just like in the linear world non-linear Gauss-Seidel generally converges faster than non-linear Jacobi (often much faster) but in the context of the full approximation scheme (nonlinear multigrid) the smoother details are only part of the convergence picture so one can possibly use Jacobi instead of Gauss-Seidel as the smoother. PETSc has SNESFAS for implementing the full approximation scheme as well as accelerators for it like SNESQN, SNESANDERSON, SNESNGMRES which are used in a way similar to the way linear Krylov methods are used to accelerate linear multigrid or linear Jacobi or linear Gauss-Seidel. Because Jacobi (linear or nonlinear) can do all the updates simultaneously they can be "batched" as Jed notes or can run very efficiently (if coded well) on GPUs. The fact that the convergence rate may be a bit smaller than for Gauss-Seidel may be outweighed by the fact that it can much more efficiently utilize the hardware. I would suggest write your non-linear Gauss-Seidel with SNES and then time it in the context of your entire application/simulation, you can always go back and customize the code for each problem size by writing little naked Newton code directly it you need the improvement in speed. Barry We actually struggle ourselves in PETSc with writing efficient smoothers based on patches due to the overhead of the standard SNES/KSP solvers that were not coded specifically for very small problems. > On Sep 17, 2020, at 4:05 PM, Matthew Knepley wrote: > > On Thu, Sep 17, 2020 at 2:54 PM Alexander B Prescott > wrote: > Thank you all for your input. Matt is right, I cannot batch as this formulation must be done sequentially. > > >> Sounds a bit like a non-smoother (Gauss-Seidel type), speculating based on these few words. > > Barry, it is similar to a Gauss-Seidel solver in that solution updates from previous solves are used in the most recent Newton solve, though I'm not exactly sure what you mean by "non-smoother". > > He means a nonlinear smoother. You iterate over your domain solving small nonlinear problems in order to get closer to the solution > of the big nonlinear problem. Depending on what you are doing, it might be possible to decouple these, which would likely be much > more efficient. > > Thanks, > > Matt > > Best, > Alexander > > > > On Thu, Sep 17, 2020 at 6:06 AM Matthew Knepley > wrote: > External Email > > On Thu, Sep 17, 2020 at 12:23 AM Jed Brown > wrote: > Alexander B Prescott > writes: > > >> Are the problems of varying nonlinearity, that is will some converge > >> with say a couple of Newton iterations while others require more, say 8 or > >> more Newton steps? > >> > > The nonlinearity should be pretty similar, the problem setup is the same at > > every node but the global domain needs to be traversed in a specific order. > > > It sounds like you may have a Newton solver now for each individual problem? If so, could you make a histogram of number of iterations necessary to solve? Does it have a long tail or does every problem take 3 and 4 iterations (for example). > > If there is no long tail, then you can batch. If there is a long tail, you really want a solver that does one problem at a time, or a more dynamic system that checks which have completed and shrinks the active problem down. (That complexity has a development and execution time cost.) > > He cannot batch if the solves are sequential, as he says above. > > Matt > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > Alexander Prescott > alexprescott at email.arizona.edu > PhD Candidate, The University of Arizona > Department of Geosciences > 1040 E. 4th Street > Tucson, AZ, 85721 > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 17 19:22:31 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Sep 2020 19:22:31 -0500 Subject: [petsc-users] 1D parallel split of dense matrix In-Reply-To: References: Message-ID: <61410C6D-D7CC-47D1-B74C-6EEBB6C01C31@petsc.dev> If you are accessing things in the "matrix of vectors" you want to use MatDenseGetArray() and shift the pointer to any location in the matrix you wish to access. Using MatGetRow() or MatGetColumn() both are much slower than accessing the values directly in a dense array. Barry > On Sep 17, 2020, at 4:36 PM, Yann Jobic wrote: > > Hi Stefano, > You're right, i just switched the rows and colomns, and it's working fine. However, i've done this in order to have better performance when i access the rows, with MatGetRow. With this order, i need MatGetColumnVector, which is believe is kind of slow, as stated in the comments of the implementation (but i couldn't find it again, maybe i was looking at an old implementation?). > I still don't understand why i can not use the transpose of this matrix, as i give the right parallel decomposition. It should be a bijective operation no ? > I think i'll be using the KSPMatSolve of Pierre, so i don't have to redevelop this part. > Thanks a lot, > Yann > > > Le 9/17/2020 ? 6:29 PM, Stefano Zampini a ?crit : >> Yann >> you want to have the number of columns equal to the number of rhs. Not the number of rows. >> This is also consistent in terms of logical layouts. Simply create and dump the matrix this way and you can read in parallel splitting yhe rows ( ie the logical size per process of the vectors) >> Il Gio 17 Set 2020, 18:09 Pierre Jolivet > ha scritto: >> Hello Yann, >> This is probably not fully answering your question, but the proper >> way to solve a system with N RHS is _not_ to use KSPSolve(), but >> instead KSPMatSolve(), cf. >> https://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/KSP/KSPMatSolve.html. >> If you are tracking master (from the GitLab repository), it?s >> available out of the box. If you are using the release tarballs, it >> will be available in 3.14.0 scheduled to be released in a couple of >> days. >> If you want to know more about the current status of block solvers >> in PETSc, please feel free to have a look at this preprint: >> http://jolivet.perso.enseeiht.fr/article.pdf >> If you are using a specific PC which is not ?multi-RHS ready?, see >> the list at the top of page 5, please let me know and I?ll tell you >> how easy to support it. >> Thanks, >> Pierre From bsmith at petsc.dev Thu Sep 17 19:25:13 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Sep 2020 19:25:13 -0500 Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: > On Sep 17, 2020, at 4:47 PM, Matthew Knepley wrote: > > We should have an option to disable the test. No, the test should be an accurate and correct test. I'm sick of people wasting their time and our time because their network configuration is not right for MPICH/OPENMPI and PETSc gets the blame because it is the first place that checks this. > > Matt > > On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users > wrote: > On Thu, 17 Sep 2020, Mark Adams wrote: > > > I rebased over master and started getting this error. > > I did reinstall MPICH (brew) recently. > > Any ideas? > > Thanks, > > Mark > > ------------------------------------------------------------------------------- > Exception: Your hostname will not work with MPI, perhaps you have VPN running whose network settings may not play well with MPI or your network is misconfigured > ******************************************************************************* > > > Ok - this is a new test that got added to check for broken network that breaks MPI. > > Here is the check: > > Executing: ping -c 2 MarksMac-302.local > > The check says you have broken network settings. [as its not responding to ping.] > > Does MPI work fine on this box? You can try disabling this check (manually) - and do the build, and run > > Does MPI run fine? > > Satish > > > >>>>>>>>>>> > > diff --git a/config/BuildSystem/config/packages/MPI.py b/config/BuildSystem/config/packages/MPI.py > index 2e130fdcfe..8464de6773 100644 > --- a/config/BuildSystem/config/packages/MPI.py > +++ b/config/BuildSystem/config/packages/MPI.py > @@ -267,7 +267,7 @@ shared libraries and run with --known-mpi-shared-libraries=1') > if ret != 0: > raise RuntimeError(errormessage+" Return code %s\n" % ret) > except: > - raise RuntimeError("Exception: "+errormessage) > + pass > else: > self.logPrint("Unable to get result from hostname, skipping ping check\n") > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 17 19:33:19 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Sep 2020 19:33:19 -0500 Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users wrote: > > Here is a fix: > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts Satish, I don't think you want to be doing this on a Mac (on anything?) On a Mac based on the network configuration etc as it boots up and as networks are accessible or not (wi-fi) it determines what hostname should be, one should never being hardwiring it to some value. Barry > Satish > > On Thu, 17 Sep 2020, Matthew Knepley wrote: > >> We should have an option to disable the test. >> >> Matt >> >> On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> On Thu, 17 Sep 2020, Mark Adams wrote: >>> >>>> I rebased over master and started getting this error. >>>> I did reinstall MPICH (brew) recently. >>>> Any ideas? >>>> Thanks, >>>> Mark >>> >>> >>> ------------------------------------------------------------------------------- >>> Exception: Your hostname will not work with MPI, perhaps you have VPN >>> running whose network settings may not play well with MPI or your network >>> is misconfigured >>> >>> ******************************************************************************* >>> >>> >>> Ok - this is a new test that got added to check for broken network that >>> breaks MPI. >>> >>> Here is the check: >>> >>> Executing: ping -c 2 MarksMac-302.local >>> >>> The check says you have broken network settings. [as its not responding to >>> ping.] >>> >>> Does MPI work fine on this box? You can try disabling this check >>> (manually) - and do the build, and run >>> >>> Does MPI run fine? >>> >>> Satish >>> >>> >>>>>>>>>>>>>> >>> >>> diff --git a/config/BuildSystem/config/packages/MPI.py >>> b/config/BuildSystem/config/packages/MPI.py >>> index 2e130fdcfe..8464de6773 100644 >>> --- a/config/BuildSystem/config/packages/MPI.py >>> +++ b/config/BuildSystem/config/packages/MPI.py >>> @@ -267,7 +267,7 @@ shared libraries and run with >>> --known-mpi-shared-libraries=1') >>> if ret != 0: >>> raise RuntimeError(errormessage+" Return code %s\n" % ret) >>> except: >>> - raise RuntimeError("Exception: "+errormessage) >>> + pass >>> else: >>> self.logPrint("Unable to get result from hostname, skipping >>> ping check\n") >>> >>> >> >> > From knepley at gmail.com Thu Sep 17 19:36:18 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Sep 2020 20:36:18 -0400 Subject: [petsc-users] osx error In-Reply-To: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: On Thu, Sep 17, 2020 at 8:33 PM Barry Smith wrote: > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Here is a fix: > > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > Satish, > > I don't think you want to be doing this on a Mac (on anything?) On a > Mac based on the network configuration etc as it boots up and as networks > are accessible or not (wi-fi) it determines what hostname should be, one > should never being hardwiring it to some value. > Satish is just naming the loopback interface. I did this on all my former Macs. Matt > Barry > > > > Satish > > > > On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > >> We should have an option to disable the test. > >> > >> Matt > >> > >> On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < > >> petsc-users at mcs.anl.gov> wrote: > >> > >>> On Thu, 17 Sep 2020, Mark Adams wrote: > >>> > >>>> I rebased over master and started getting this error. > >>>> I did reinstall MPICH (brew) recently. > >>>> Any ideas? > >>>> Thanks, > >>>> Mark > >>> > >>> > >>> > ------------------------------------------------------------------------------- > >>> Exception: Your hostname will not work with MPI, perhaps you have VPN > >>> running whose network settings may not play well with MPI or your > network > >>> is misconfigured > >>> > >>> > ******************************************************************************* > >>> > >>> > >>> Ok - this is a new test that got added to check for broken network that > >>> breaks MPI. > >>> > >>> Here is the check: > >>> > >>> Executing: ping -c 2 MarksMac-302.local > >>> > >>> The check says you have broken network settings. [as its not > responding to > >>> ping.] > >>> > >>> Does MPI work fine on this box? You can try disabling this check > >>> (manually) - and do the build, and run > >>> > >>> Does MPI run fine? > >>> > >>> Satish > >>> > >>> > >>>>>>>>>>>>>> > >>> > >>> diff --git a/config/BuildSystem/config/packages/MPI.py > >>> b/config/BuildSystem/config/packages/MPI.py > >>> index 2e130fdcfe..8464de6773 100644 > >>> --- a/config/BuildSystem/config/packages/MPI.py > >>> +++ b/config/BuildSystem/config/packages/MPI.py > >>> @@ -267,7 +267,7 @@ shared libraries and run with > >>> --known-mpi-shared-libraries=1') > >>> if ret != 0: > >>> raise RuntimeError(errormessage+" Return code %s\n" % > ret) > >>> except: > >>> - raise RuntimeError("Exception: "+errormessage) > >>> + pass > >>> else: > >>> self.logPrint("Unable to get result from hostname, skipping > >>> ping check\n") > >>> > >>> > >> > >> > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 17 20:47:29 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 17 Sep 2020 21:47:29 -0400 Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: > > > > > Does MPI work fine on this box? It has, but I don't use it much here lately. > You can try disabling this check (manually) - and do the build, and run > > Does MPI run fine? > > I will look. I got going by disabling MPI. > Satish > > > >>>>>>>>>>> > > diff --git a/config/BuildSystem/config/packages/MPI.py > b/config/BuildSystem/config/packages/MPI.py > index 2e130fdcfe..8464de6773 100644 > --- a/config/BuildSystem/config/packages/MPI.py > +++ b/config/BuildSystem/config/packages/MPI.py > @@ -267,7 +267,7 @@ shared libraries and run with > --known-mpi-shared-libraries=1') > if ret != 0: > raise RuntimeError(errormessage+" Return code %s\n" % ret) > except: > - raise RuntimeError("Exception: "+errormessage) > + pass > else: > self.logPrint("Unable to get result from hostname, skipping > ping check\n") > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 17 20:51:10 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 17 Sep 2020 21:51:10 -0400 Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: On Thu, Sep 17, 2020 at 6:00 PM Satish Balay wrote: > But would really like to know if the test is a false positive. > Well, I did your fix and am configuring now. > > Satish > > On Thu, 17 Sep 2020, Satish Balay wrote: > > > Here is a fix: > > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > > Satish > > > > On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > > > We should have an option to disable the test. > > > > > > Matt > > > > > > On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < > > > petsc-users at mcs.anl.gov> wrote: > > > > > > > On Thu, 17 Sep 2020, Mark Adams wrote: > > > > > > > > > I rebased over master and started getting this error. > > > > > I did reinstall MPICH (brew) recently. > > > > > Any ideas? > > > > > Thanks, > > > > > Mark > > > > > > > > > > > > > ------------------------------------------------------------------------------- > > > > Exception: Your hostname will not work with MPI, perhaps you have VPN > > > > running whose network settings may not play well with MPI or your > network > > > > is misconfigured > > > > > > > > > ******************************************************************************* > > > > > > > > > > > > Ok - this is a new test that got added to check for broken network > that > > > > breaks MPI. > > > > > > > > Here is the check: > > > > > > > > Executing: ping -c 2 MarksMac-302.local > > > > > > > > The check says you have broken network settings. [as its not > responding to > > > > ping.] > > > > > > > > Does MPI work fine on this box? You can try disabling this check > > > > (manually) - and do the build, and run > > > > > > > > Does MPI run fine? > > > > > > > > Satish > > > > > > > > > > > > >>>>>>>>>>> > > > > > > > > diff --git a/config/BuildSystem/config/packages/MPI.py > > > > b/config/BuildSystem/config/packages/MPI.py > > > > index 2e130fdcfe..8464de6773 100644 > > > > --- a/config/BuildSystem/config/packages/MPI.py > > > > +++ b/config/BuildSystem/config/packages/MPI.py > > > > @@ -267,7 +267,7 @@ shared libraries and run with > > > > --known-mpi-shared-libraries=1') > > > > if ret != 0: > > > > raise RuntimeError(errormessage+" Return code %s\n" > % ret) > > > > except: > > > > - raise RuntimeError("Exception: "+errormessage) > > > > + pass > > > > else: > > > > self.logPrint("Unable to get result from hostname, > skipping > > > > ping check\n") > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 17 20:54:01 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 17 Sep 2020 21:54:01 -0400 Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: On Thu, Sep 17, 2020 at 5:59 PM Satish Balay wrote: > Here is a fix: > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > This failed. > > Satish > > On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > We should have an option to disable the test. > > > > Matt > > > > On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > > > On Thu, 17 Sep 2020, Mark Adams wrote: > > > > > > > I rebased over master and started getting this error. > > > > I did reinstall MPICH (brew) recently. > > > > Any ideas? > > > > Thanks, > > > > Mark > > > > > > > > > > ------------------------------------------------------------------------------- > > > Exception: Your hostname will not work with MPI, perhaps you have VPN > > > running whose network settings may not play well with MPI or your > network > > > is misconfigured > > > > > > > ******************************************************************************* > > > > > > > > > Ok - this is a new test that got added to check for broken network that > > > breaks MPI. > > > > > > Here is the check: > > > > > > Executing: ping -c 2 MarksMac-302.local > > > > > > The check says you have broken network settings. [as its not > responding to > > > ping.] > > > > > > Does MPI work fine on this box? You can try disabling this check > > > (manually) - and do the build, and run > > > > > > Does MPI run fine? > > > > > > Satish > > > > > > > > > >>>>>>>>>>> > > > > > > diff --git a/config/BuildSystem/config/packages/MPI.py > > > b/config/BuildSystem/config/packages/MPI.py > > > index 2e130fdcfe..8464de6773 100644 > > > --- a/config/BuildSystem/config/packages/MPI.py > > > +++ b/config/BuildSystem/config/packages/MPI.py > > > @@ -267,7 +267,7 @@ shared libraries and run with > > > --known-mpi-shared-libraries=1') > > > if ret != 0: > > > raise RuntimeError(errormessage+" Return code %s\n" % > ret) > > > except: > > > - raise RuntimeError("Exception: "+errormessage) > > > + pass > > > else: > > > self.logPrint("Unable to get result from hostname, > skipping > > > ping check\n") > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 984543 bytes Desc: not available URL: From mfadams at lbl.gov Thu Sep 17 21:23:02 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 17 Sep 2020 22:23:02 -0400 Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: On Thu, Sep 17, 2020 at 6:00 PM Satish Balay wrote: > But would really like to know if the test is a false positive. > I did the MPI.py fix and that worked. And make check worked. > > Satish > > On Thu, 17 Sep 2020, Satish Balay wrote: > > > Here is a fix: > > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > > Satish > > > > On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > > > We should have an option to disable the test. > > > > > > Matt > > > > > > On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < > > > petsc-users at mcs.anl.gov> wrote: > > > > > > > On Thu, 17 Sep 2020, Mark Adams wrote: > > > > > > > > > I rebased over master and started getting this error. > > > > > I did reinstall MPICH (brew) recently. > > > > > Any ideas? > > > > > Thanks, > > > > > Mark > > > > > > > > > > > > > ------------------------------------------------------------------------------- > > > > Exception: Your hostname will not work with MPI, perhaps you have VPN > > > > running whose network settings may not play well with MPI or your > network > > > > is misconfigured > > > > > > > > > ******************************************************************************* > > > > > > > > > > > > Ok - this is a new test that got added to check for broken network > that > > > > breaks MPI. > > > > > > > > Here is the check: > > > > > > > > Executing: ping -c 2 MarksMac-302.local > > > > > > > > The check says you have broken network settings. [as its not > responding to > > > > ping.] > > > > > > > > Does MPI work fine on this box? You can try disabling this check > > > > (manually) - and do the build, and run > > > > > > > > Does MPI run fine? > > > > > > > > Satish > > > > > > > > > > > > >>>>>>>>>>> > > > > > > > > diff --git a/config/BuildSystem/config/packages/MPI.py > > > > b/config/BuildSystem/config/packages/MPI.py > > > > index 2e130fdcfe..8464de6773 100644 > > > > --- a/config/BuildSystem/config/packages/MPI.py > > > > +++ b/config/BuildSystem/config/packages/MPI.py > > > > @@ -267,7 +267,7 @@ shared libraries and run with > > > > --known-mpi-shared-libraries=1') > > > > if ret != 0: > > > > raise RuntimeError(errormessage+" Return code %s\n" > % ret) > > > > except: > > > > - raise RuntimeError("Exception: "+errormessage) > > > > + pass > > > > else: > > > > self.logPrint("Unable to get result from hostname, > skipping > > > > ping check\n") > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Sep 17 22:04:09 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 17 Sep 2020 22:04:09 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: Mark, Can you elaborate? First you tried the change to /etc/hosts. Did it not work? What do you have for: cat /etc/hosts And after that configure still gave errors for ping test - but by disabling it in MPI.py - now the tests work? eventhough ping test fails? So this is a false positive for you? [i.e the configure tests fails but MPI works?] Satish On Thu, 17 Sep 2020, Mark Adams wrote: > On Thu, Sep 17, 2020 at 6:00 PM Satish Balay wrote: > > > But would really like to know if the test is a false positive. > > > > I did the MPI.py fix and that worked. And make check worked. On Thu, 17 Sep 2020, Mark Adams wrote: > > Well, I did your fix and am configuring now. > > > > > > Satish > > > > On Thu, 17 Sep 2020, Satish Balay wrote: > > > > > Here is a fix: > > > > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > > > > Satish > > > > > > On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > > > > > We should have an option to disable the test. > > > > > > > > Matt > > > > > > > > On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < > > > > petsc-users at mcs.anl.gov> wrote: > > > > > > > > > On Thu, 17 Sep 2020, Mark Adams wrote: > > > > > > > > > > > I rebased over master and started getting this error. > > > > > > I did reinstall MPICH (brew) recently. > > > > > > Any ideas? > > > > > > Thanks, > > > > > > Mark > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------- > > > > > Exception: Your hostname will not work with MPI, perhaps you have VPN > > > > > running whose network settings may not play well with MPI or your > > network > > > > > is misconfigured > > > > > > > > > > > > ******************************************************************************* > > > > > > > > > > > > > > > Ok - this is a new test that got added to check for broken network > > that > > > > > breaks MPI. > > > > > > > > > > Here is the check: > > > > > > > > > > Executing: ping -c 2 MarksMac-302.local > > > > > > > > > > The check says you have broken network settings. [as its not > > responding to > > > > > ping.] > > > > > > > > > > Does MPI work fine on this box? You can try disabling this check > > > > > (manually) - and do the build, and run > > > > > > > > > > Does MPI run fine? > > > > > > > > > > Satish > > > > > > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > diff --git a/config/BuildSystem/config/packages/MPI.py > > > > > b/config/BuildSystem/config/packages/MPI.py > > > > > index 2e130fdcfe..8464de6773 100644 > > > > > --- a/config/BuildSystem/config/packages/MPI.py > > > > > +++ b/config/BuildSystem/config/packages/MPI.py > > > > > @@ -267,7 +267,7 @@ shared libraries and run with > > > > > --known-mpi-shared-libraries=1') > > > > > if ret != 0: > > > > > raise RuntimeError(errormessage+" Return code %s\n" > > % ret) > > > > > except: > > > > > - raise RuntimeError("Exception: "+errormessage) > > > > > + pass > > > > > else: > > > > > self.logPrint("Unable to get result from hostname, > > skipping > > > > > ping check\n") > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From balay at mcs.anl.gov Thu Sep 17 22:10:39 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 17 Sep 2020 22:10:39 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: On Thu, 17 Sep 2020, Matthew Knepley wrote: > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith wrote: > > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > > > > Here is a fix: > > > > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > > Satish, > > > > I don't think you want to be doing this on a Mac (on anything?) On a > > Mac based on the network configuration etc as it boots up and as networks > > are accessible or not (wi-fi) it determines what hostname should be, one > > should never being hardwiring it to some value. > > > > Satish is just naming the loopback interface. I did this on all my former > Macs. Yes - this doesn't change the hostname. Its just adding an entry for gethostbyname - for current hostname. >>> 127.0.0.1 MarksMac-302.local <<< Sure - its best to not do this when one has a proper IP name [like foo.mcs.anl.gov] - but its useful when one has a hostname like "MarksMac-302.local" -that is not DNS resolvable Even if the machine is moved to a different network with a different name - the current entry won't cause problems [but will need another entry for the new host name - if this new name is also not DNS resolvable] Its likely this file is a generated file on macos - so might get reset on reboot - or some network change? [if this is the case - the change won't be permanent] Satish From pierre.jolivet at enseeiht.fr Fri Sep 18 00:44:01 2020 From: pierre.jolivet at enseeiht.fr (Pierre Jolivet) Date: Fri, 18 Sep 2020 07:44:01 +0200 Subject: [petsc-users] 1D parallel split of dense matrix In-Reply-To: References: Message-ID: <50AE1E30-1086-4939-97B2-0163A1C02D9C@enseeiht.fr> Hello Yann, MatGetColumnVector() is indeed kind of slow and there is a hard copy under the hood. Better to use MatDenseGetColumnVec()/MatDenseRestoreColumn() which is more efficient (https://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatDenseGetColumn.html ), or do what Barry suggested. KSPSetMatSolveBlockSize() is a rather advanced option, it?s only used by KSPHPDDM, not KSPPREONLY (I think?), which I?m guessing is what you are using with a full LU. The idea behind this option, borrowed from MUMPS option ICNTL(27), is to solve multiple systems with multiple RHS. So you split AX = B into A[X_0, X_1, ?, X_p] = [B_0, B_1, ?, B_p] where all blocks are of width at most KSPMatSolveBlockSize. It?s useful when you have a large number of right-hand sides and it?s becoming too memory-demanding to fit everything in memory, e.g., when doing block Krylov methods or out-of-core LU factorization. However, if everything fits, it?s best to just do KSPSetMatSolveBlockSize(ksp, PETSC_DECIDE) or don?t call that routine at all, which will default to a single solve with a single thick block. Direct solvers scale really nicely with the number of right-hand sides, thanks to BLAS3, so the higher the number of RHS, the better the efficiency. Thanks, Pierre > On 17 Sep 2020, at 11:36 PM, Yann Jobic wrote: > > Hi Stefano, > You're right, i just switched the rows and colomns, and it's working fine. However, i've done this in order to have better performance when i access the rows, with MatGetRow. With this order, i need MatGetColumnVector, which is believe is kind of slow, as stated in the comments of the implementation (but i couldn't find it again, maybe i was looking at an old implementation?). > I still don't understand why i can not use the transpose of this matrix, as i give the right parallel decomposition. It should be a bijective operation no ? > I think i'll be using the KSPMatSolve of Pierre, so i don't have to redevelop this part. > Thanks a lot, > Yann > > > Le 9/17/2020 ? 6:29 PM, Stefano Zampini a ?crit : >> Yann >> you want to have the number of columns equal to the number of rhs. Not the number of rows. >> This is also consistent in terms of logical layouts. Simply create and dump the matrix this way and you can read in parallel splitting yhe rows ( ie the logical size per process of the vectors) >> Il Gio 17 Set 2020, 18:09 Pierre Jolivet > ha scritto: >> Hello Yann, >> This is probably not fully answering your question, but the proper >> way to solve a system with N RHS is _not_ to use KSPSolve(), but >> instead KSPMatSolve(), cf. >> https://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/KSP/KSPMatSolve.html. >> If you are tracking master (from the GitLab repository), it?s >> available out of the box. If you are using the release tarballs, it >> will be available in 3.14.0 scheduled to be released in a couple of >> days. >> If you want to know more about the current status of block solvers >> in PETSc, please feel free to have a look at this preprint: >> http://jolivet.perso.enseeiht.fr/article.pdf >> If you are using a specific PC which is not ?multi-RHS ready?, see >> the list at the top of page 5, please let me know and I?ll tell you >> how easy to support it. >> Thanks, >> Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pierre.Seize at onera.fr Fri Sep 18 03:01:20 2020 From: Pierre.Seize at onera.fr (Pierre Seize) Date: Fri, 18 Sep 2020 10:01:20 +0200 Subject: [petsc-users] PetscFV types Message-ID: <7591884d-2cd1-4c5c-e480-d5e96c1cd9f0@onera.fr> Hello, I do not understand what are the two available types for the PetscFV object : "upwind" and "leastsquares", because to me those two properties describe different parts of the Finite Volume formulation. Could someone explain, or give me some references ? Thank you Pierre Seize From knepley at gmail.com Fri Sep 18 05:24:06 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 18 Sep 2020 06:24:06 -0400 Subject: [petsc-users] PetscFV types In-Reply-To: <7591884d-2cd1-4c5c-e480-d5e96c1cd9f0@onera.fr> References: <7591884d-2cd1-4c5c-e480-d5e96c1cd9f0@onera.fr> Message-ID: On Fri, Sep 18, 2020 at 4:01 AM Pierre Seize wrote: > Hello, > > I do not understand what are the two available types for the PetscFV > object : "upwind" and "leastsquares", because to me those two properties > describe different parts of the Finite Volume formulation. Could someone > explain, or give me some references ? > Sure. PetscFV is mostly an exercise for me to determine if the meshing and data layout infrastructure below can support finite volume methods, so the FV methods that it does support are rather rudimentary. My understanding of FV is quite limited. "upwind" is just the naive, first order FV method with pointwise Riemann solves for each local face. I called it upwind since we just update the state with the upwind data. I guess I could have called it "gudonov" as well. The "leastsquares" uses a least-squares reconstruction of the state over cell+neighboring cells (closure of the star of the faces) to try and achieve second-order where possible. I guess I could have called this "reconstructed". Thanks, Matt > Thank you > > > Pierre Seize > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Sep 18 06:41:02 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Sep 2020 07:41:02 -0400 Subject: [petsc-users] osx error In-Reply-To: References: Message-ID: On Thu, Sep 17, 2020 at 11:04 PM Satish Balay wrote: > Mark, > > Can you elaborate? > > First you tried the change to /etc/hosts. Did it not work? What do you > have for: > > cat /etc/hosts > 07:37 1 master *= ~/Codes/petsc$ cat /etc/hosts ## # Host Database # # localhost is used to configure the loopback interface # when the system is booting. Do not change this entry. ## 127.0.0.1 localhost 255.255.255.255 broadcasthost 127.0.0.1 MarksMac-5.local 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk 127.0.0.1 MarksMac-302.local 07:37 master *= ~/Codes/petsc$ > > And after that configure still gave errors for ping test - I never did the ping test (you didn't ask). I did the sudo command, failed configuration, patched MPI.py, succeeded configuration. > but by disabling it in MPI.py - now the tests work? eventhough ping test > fails? > > So this is a false positive for you? [i.e the configure tests fails but > MPI works?] > Let me know if you have more questions. > > Satish > > On Thu, 17 Sep 2020, Mark Adams wrote: > > > On Thu, Sep 17, 2020 at 6:00 PM Satish Balay wrote: > > > > > But would really like to know if the test is a false positive. > > > > > > > I did the MPI.py fix and that worked. And make check worked. > > On Thu, 17 Sep 2020, Mark Adams wrote: > > > > Well, I did your fix and am configuring now. > > > > > > > > > > Satish > > > > > > On Thu, 17 Sep 2020, Satish Balay wrote: > > > > > > > Here is a fix: > > > > > > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > > > > > > Satish > > > > > > > > On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > > > > > > > We should have an option to disable the test. > > > > > > > > > > Matt > > > > > > > > > > On Thu, Sep 17, 2020 at 5:31 PM Satish Balay via petsc-users < > > > > > petsc-users at mcs.anl.gov> wrote: > > > > > > > > > > > On Thu, 17 Sep 2020, Mark Adams wrote: > > > > > > > > > > > > > I rebased over master and started getting this error. > > > > > > > I did reinstall MPICH (brew) recently. > > > > > > > Any ideas? > > > > > > > Thanks, > > > > > > > Mark > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------- > > > > > > Exception: Your hostname will not work with MPI, perhaps you > have VPN > > > > > > running whose network settings may not play well with MPI or your > > > network > > > > > > is misconfigured > > > > > > > > > > > > > > > > ******************************************************************************* > > > > > > > > > > > > > > > > > > Ok - this is a new test that got added to check for broken > network > > > that > > > > > > breaks MPI. > > > > > > > > > > > > Here is the check: > > > > > > > > > > > > Executing: ping -c 2 MarksMac-302.local > > > > > > > > > > > > The check says you have broken network settings. [as its not > > > responding to > > > > > > ping.] > > > > > > > > > > > > Does MPI work fine on this box? You can try disabling this check > > > > > > (manually) - and do the build, and run > > > > > > > > > > > > Does MPI run fine? > > > > > > > > > > > > Satish > > > > > > > > > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > > > diff --git a/config/BuildSystem/config/packages/MPI.py > > > > > > b/config/BuildSystem/config/packages/MPI.py > > > > > > index 2e130fdcfe..8464de6773 100644 > > > > > > --- a/config/BuildSystem/config/packages/MPI.py > > > > > > +++ b/config/BuildSystem/config/packages/MPI.py > > > > > > @@ -267,7 +267,7 @@ shared libraries and run with > > > > > > --known-mpi-shared-libraries=1') > > > > > > if ret != 0: > > > > > > raise RuntimeError(errormessage+" Return code > %s\n" > > > % ret) > > > > > > except: > > > > > > - raise RuntimeError("Exception: "+errormessage) > > > > > > + pass > > > > > > else: > > > > > > self.logPrint("Unable to get result from hostname, > > > skipping > > > > > > ping check\n") > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Sep 18 06:46:05 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Sep 2020 07:46:05 -0400 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: Oh you did not change my hostname: 07:37 master *= ~/Codes/petsc$ hostname MarksMac-302.local 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local PING marksmac-302.local (127.0.0.1): 56 data bytes Request timeout for icmp_seq 0 --- marksmac-302.local ping statistics --- 2 packets transmitted, 0 packets received, 100.0% packet loss 07:42 2 master *= ~/Codes/petsc$ BTW, I used to get messages about some network issue and 'changing host name to MarksMac-[x+1].local'. That is, the original hostname was MarksMac.local, then I got a message about changing to MarksMac-1.local, etc. I have not seen these messages for months but apparently this process has continued unabated. On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < petsc-users at mcs.anl.gov> wrote: > On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith wrote: > > > > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > > > petsc-users at mcs.anl.gov> wrote: > > > > > > > > Here is a fix: > > > > > > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > > > > Satish, > > > > > > I don't think you want to be doing this on a Mac (on anything?) On a > > > Mac based on the network configuration etc as it boots up and as > networks > > > are accessible or not (wi-fi) it determines what hostname should be, > one > > > should never being hardwiring it to some value. > > > > > > > Satish is just naming the loopback interface. I did this on all my former > > Macs. > > > Yes - this doesn't change the hostname. Its just adding an entry for > gethostbyname - for current hostname. > > >>> > 127.0.0.1 MarksMac-302.local > <<< > > Sure - its best to not do this when one has a proper IP name [like > foo.mcs.anl.gov] - but its useful when one has a hostname like > "MarksMac-302.local" -that is not DNS resolvable > > Even if the machine is moved to a different network with a different name > - the current entry won't cause problems [but will need another entry for > the new host name - if this new name is also not DNS resolvable] > > Its likely this file is a generated file on macos - so might get reset > on reboot - or some network change? [if this is the case - the change won't > be permanent] > > > Satish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 18 06:50:58 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 18 Sep 2020 07:50:58 -0400 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: On Fri, Sep 18, 2020 at 7:46 AM Mark Adams wrote: > Oh you did not change my hostname: > > 07:37 master *= ~/Codes/petsc$ hostname > MarksMac-302.local > 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > PING marksmac-302.local (127.0.0.1): 56 data bytes > Request timeout for icmp_seq 0 > > --- marksmac-302.local ping statistics --- > 2 packets transmitted, 0 packets received, 100.0% packet loss > 07:42 2 master *= ~/Codes/petsc$ > This does not make sense to me. You have 127.0.0.1 MarksMac-302.local in /etc/hosts, but you cannot resolve that name? Matt > BTW, I used to get messages about some network issue and 'changing host > name to MarksMac-[x+1].local'. That is, the original hostname > was MarksMac.local, then I got a message about changing > to MarksMac-1.local, etc. I have not seen these messages for months but > apparently this process has continued unabated. > > > > > > > > > > On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> On Thu, 17 Sep 2020, Matthew Knepley wrote: >> >> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith wrote: >> > >> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >> > > petsc-users at mcs.anl.gov> wrote: >> > > > >> > > > Here is a fix: >> > > > >> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >> > > >> > > Satish, >> > > >> > > I don't think you want to be doing this on a Mac (on anything?) On >> a >> > > Mac based on the network configuration etc as it boots up and as >> networks >> > > are accessible or not (wi-fi) it determines what hostname should be, >> one >> > > should never being hardwiring it to some value. >> > > >> > >> > Satish is just naming the loopback interface. I did this on all my >> former >> > Macs. >> >> >> Yes - this doesn't change the hostname. Its just adding an entry for >> gethostbyname - for current hostname. >> >> >>> >> 127.0.0.1 MarksMac-302.local >> <<< >> >> Sure - its best to not do this when one has a proper IP name [like >> foo.mcs.anl.gov] - but its useful when one has a hostname like >> "MarksMac-302.local" -that is not DNS resolvable >> >> Even if the machine is moved to a different network with a different name >> - the current entry won't cause problems [but will need another entry for >> the new host name - if this new name is also not DNS resolvable] >> >> Its likely this file is a generated file on macos - so might get reset >> on reboot - or some network change? [if this is the case - the change won't >> be permanent] >> >> >> Satish >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Sep 18 08:09:30 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Sep 2020 09:09:30 -0400 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley wrote: > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams wrote: > >> Oh you did not change my hostname: >> >> 07:37 master *= ~/Codes/petsc$ hostname >> MarksMac-302.local >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >> PING marksmac-302.local (127.0.0.1): 56 data bytes >> Request timeout for icmp_seq 0 >> >> --- marksmac-302.local ping statistics --- >> 2 packets transmitted, 0 packets received, 100.0% packet loss >> 07:42 2 master *= ~/Codes/petsc$ >> > > This does not make sense to me. You have > > 127.0.0.1 MarksMac-302.local > > in /etc/hosts, > 09:07 ~/.ssh$ cat /etc/hosts ## # Host Database # # localhost is used to configure the loopback interface # when the system is booting. Do not change this entry. ## 127.0.0.1 localhost 255.255.255.255 broadcasthost 127.0.0.1 MarksMac-5.local 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk 127.0.0.1 MarksMac-302.local 09:07 ~/.ssh$ > but you cannot resolve that name? > > Matt > > >> BTW, I used to get messages about some network issue and 'changing host >> name to MarksMac-[x+1].local'. That is, the original hostname >> was MarksMac.local, then I got a message about changing >> to MarksMac-1.local, etc. I have not seen these messages for months but >> apparently this process has continued unabated. >> >> >> >> >> >> >> >> >> >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >>> >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith wrote: >>> > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >>> > > petsc-users at mcs.anl.gov> wrote: >>> > > > >>> > > > Here is a fix: >>> > > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >>> > > >>> > > Satish, >>> > > >>> > > I don't think you want to be doing this on a Mac (on anything?) >>> On a >>> > > Mac based on the network configuration etc as it boots up and as >>> networks >>> > > are accessible or not (wi-fi) it determines what hostname should be, >>> one >>> > > should never being hardwiring it to some value. >>> > > >>> > >>> > Satish is just naming the loopback interface. I did this on all my >>> former >>> > Macs. >>> >>> >>> Yes - this doesn't change the hostname. Its just adding an entry for >>> gethostbyname - for current hostname. >>> >>> >>> >>> 127.0.0.1 MarksMac-302.local >>> <<< >>> >>> Sure - its best to not do this when one has a proper IP name [like >>> foo.mcs.anl.gov] - but its useful when one has a hostname like >>> "MarksMac-302.local" -that is not DNS resolvable >>> >>> Even if the machine is moved to a different network with a different >>> name - the current entry won't cause problems [but will need another entry >>> for the new host name - if this new name is also not DNS resolvable] >>> >>> Its likely this file is a generated file on macos - so might get reset >>> on reboot - or some network change? [if this is the case - the change won't >>> be permanent] >>> >>> >>> Satish >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pierre.Seize at onera.fr Fri Sep 18 08:43:41 2020 From: Pierre.Seize at onera.fr (Pierre Seize) Date: Fri, 18 Sep 2020 15:43:41 +0200 Subject: [petsc-users] PetscFV types In-Reply-To: References: <7591884d-2cd1-4c5c-e480-d5e96c1cd9f0@onera.fr> Message-ID: So the difference between the two types is in DMPlexGetFaceFields : the upwind just takes the cell averaged (or cell centerd) states, and the least-squares use the gradient to reconstruct the states ? Thanks Pierre On 18/09/20 12:24, Matthew Knepley wrote: > On Fri, Sep 18, 2020 at 4:01 AM Pierre Seize > wrote: > > Hello, > > I do not understand what are the two available types for the PetscFV > object : "upwind" and "leastsquares", because to me those two > properties > describe different parts of the Finite Volume formulation. Could > someone > explain, or give me some references ? > > > Sure. PetscFV is mostly an exercise for me to determine if the meshing > and data layout > infrastructure below can support finite volume methods, so the FV > methods that it does support > are rather rudimentary. My understanding of FV is quite limited. > "upwind" is just the naive, first > order FV method with pointwise Riemann solves for each local face. I > called it upwind since we > just update the state with the upwind data. I guess I could have > called it "gudonov" as well. The > "leastsquares" uses a least-squares reconstruction of the state over > cell+neighboring cells (closure > of the star of the faces) to try and achieve second-order where > possible. I guess I could have called > this "reconstructed". > > ? Thanks, > > ? ? ?Matt > > Thank you > > > Pierre Seize > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Sep 18 08:53:20 2020 From: jed at jedbrown.org (Jed Brown) Date: Fri, 18 Sep 2020 07:53:20 -0600 Subject: [petsc-users] PetscFV types In-Reply-To: References: <7591884d-2cd1-4c5c-e480-d5e96c1cd9f0@onera.fr> Message-ID: <87a6xnur73.fsf@jedbrown.org> Matthew Knepley writes: > Sure. PetscFV is mostly an exercise for me to determine if the meshing > and data layout infrastructure below can support finite volume > methods, so the FV methods that it does support are rather > rudimentary. My understanding of FV is quite limited. "upwind" is just > the naive, first order FV method with pointwise Riemann solves for > each local face. I called it upwind since we just update the state > with the upwind data. I guess I could have called it "gudonov" as > well. The "leastsquares" uses a least-squares reconstruction of the > state over cell+neighboring cells (closure of the star of the faces) > to try and achieve second-order where possible. I guess I could have > called this "reconstructed". I agree with Pierre that the terms should be revised. Upwinding is what a Riemann solver does (in an appropriate sense for systems). There are multiple reconstruction methods. I would classify by reconstruction (none, least squares with various neighborhoods and limiting types, WENO), and flux methods (approximate Riemann solvers). Many reconstruction schemes can be decoupled to a linear step (least squares) and a nonlinear limiter, but methods like WENO combine these (albeit most popular on structured grids). From bsmith at petsc.dev Fri Sep 18 09:23:10 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 18 Sep 2020 09:23:10 -0500 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: <9EC6C790-29B5-4EF9-8A69-6516ECC512D7@petsc.dev> This email thread doesn't seem to have clear communication. Can we start at the beginning again please? Please answer my questions directly in the appropriate lines below in your email response so we know what answer goes with what question. I know you have done some of these things but it is unclear what order you did them and the order is important. Background: In order to decide if the test in MPI.py works, or needs to be modified or removed we need clear information about your system BEFORE you made changes to get things to work. 1) Did you add the 127.0.0.1 MarksMac-5.local to the /etc/hosts yesterday because Satish suggested it, or have you had it there for a long time? (You should not need it) 2) Please run ping -c 2 `hostname` 3) Please remove the line 127.0.0.1 MarksMac-5.local in /etc/hosts and follow the directions in https://stackoverflow.com/questions/37951379/etc-hosts-ignored-in-mac-el-capitan-10-11-5 to flush the DNS cache (note for different versions of MacOS the command is different). 4) Please run ping -c 2 `hostname` 5) Please run a MPI program (doesn't matter which and I don't care how you installed MPICH or OpenMPI) with mpiexec -n 2 ./programname does it run, hang or ? Based on this information we can decide what needs to be done next. Thanks Barry As a side note on my Mac $ hostname Barrys-MacBook-Pro-3.local ~/Src/petsc (barry/2020-07-07/docs-no-makefiles *>) arch-docs-no-makefiles $ /sbin/ping -c 2 `hostname` PING barrys-macbook-pro-3.local (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.077 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.112 ms --- barrys-macbook-pro-3.local ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.077/0.095/0.112/0.018 ms ~/Src/petsc (barry/2020-07-07/docs-no-makefiles *>) arch-docs-no-makefiles $ We are trying to understand if/why your machine is behaving differently. My theory is that if ping -c 2 `hostname` fails then MPICH and OpenMP mpiexec -n 2 will fail. We need to determine if this theory is correct or if you have a counter-example. > On Sep 18, 2020, at 8:09 AM, Mark Adams wrote: > > > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: > Oh you did not change my hostname: > > 07:37 master *= ~/Codes/petsc$ hostname > MarksMac-302.local > 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > PING marksmac-302.local (127.0.0.1): 56 data bytes > Request timeout for icmp_seq 0 > > --- marksmac-302.local ping statistics --- > 2 packets transmitted, 0 packets received, 100.0% packet loss > 07:42 2 master *= ~/Codes/petsc$ > > This does not make sense to me. You have > > 127.0.0.1 MarksMac-302.local > > in /etc/hosts, > > 09:07 ~/.ssh$ cat /etc/hosts > ## > # Host Database > # > # localhost is used to configure the loopback interface > # when the system is booting. Do not change this entry. > ## > 127.0.0.1 localhost > 255.255.255.255 broadcasthost > 127.0.0.1 MarksMac-5.local > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > 127.0.0.1 MarksMac-302.local > 09:07 ~/.ssh$ > > > > > but you cannot resolve that name? > > Matt > > BTW, I used to get messages about some network issue and 'changing host name to MarksMac-[x+1].local'. That is, the original hostname was MarksMac.local, then I got a message about changing to MarksMac-1.local, etc. I have not seen these messages for months but apparently this process has continued unabated. > > > > > > > > > > On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users > wrote: > On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: > > > > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > > > petsc-users at mcs.anl.gov > wrote: > > > > > > > > Here is a fix: > > > > > > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > > > > Satish, > > > > > > I don't think you want to be doing this on a Mac (on anything?) On a > > > Mac based on the network configuration etc as it boots up and as networks > > > are accessible or not (wi-fi) it determines what hostname should be, one > > > should never being hardwiring it to some value. > > > > > > > Satish is just naming the loopback interface. I did this on all my former > > Macs. > > > Yes - this doesn't change the hostname. Its just adding an entry for gethostbyname - for current hostname. > > >>> > 127.0.0.1 MarksMac-302.local > <<< > > Sure - its best to not do this when one has a proper IP name [like foo.mcs.anl.gov ] - but its useful when one has a hostname like "MarksMac-302.local" -that is not DNS resolvable > > Even if the machine is moved to a different network with a different name - the current entry won't cause problems [but will need another entry for the new host name - if this new name is also not DNS resolvable] > > Its likely this file is a generated file on macos - so might get reset on reboot - or some network change? [if this is the case - the change won't be permanent] > > > Satish > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Sep 18 09:28:47 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 18 Sep 2020 09:28:47 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > >> PING marksmac-302.local (127.0.0.1): 56 data bytes So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] Mark, can you remove the line that you added to /etc/hosts - i.e: 127.0.0.1 MarksMac-302.local And now rerun MPI tests. Do they work or fail? [this is to check if this test is a false positive on your machine] Satish On Fri, 18 Sep 2020, Mark Adams wrote: > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley wrote: > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams wrote: > > > >> Oh you did not change my hostname: > >> > >> 07:37 master *= ~/Codes/petsc$ hostname > >> MarksMac-302.local > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > >> Request timeout for icmp_seq 0 > >> > >> --- marksmac-302.local ping statistics --- > >> 2 packets transmitted, 0 packets received, 100.0% packet loss > >> 07:42 2 master *= ~/Codes/petsc$ > >> > > > > This does not make sense to me. You have > > > > 127.0.0.1 MarksMac-302.local > > > > in /etc/hosts, > > > > 09:07 ~/.ssh$ cat /etc/hosts > ## > # Host Database > # > # localhost is used to configure the loopback interface > # when the system is booting. Do not change this entry. > ## > 127.0.0.1 localhost > 255.255.255.255 broadcasthost > 127.0.0.1 MarksMac-5.local > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > 127.0.0.1 MarksMac-302.local > 09:07 ~/.ssh$ > > > > > > > but you cannot resolve that name? > > > > Matt > > > > > >> BTW, I used to get messages about some network issue and 'changing host > >> name to MarksMac-[x+1].local'. That is, the original hostname > >> was MarksMac.local, then I got a message about changing > >> to MarksMac-1.local, etc. I have not seen these messages for months but > >> apparently this process has continued unabated. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > >> petsc-users at mcs.anl.gov> wrote: > >> > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: > >>> > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith wrote: > >>> > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > >>> > > petsc-users at mcs.anl.gov> wrote: > >>> > > > > >>> > > > Here is a fix: > >>> > > > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > >>> > > > >>> > > Satish, > >>> > > > >>> > > I don't think you want to be doing this on a Mac (on anything?) > >>> On a > >>> > > Mac based on the network configuration etc as it boots up and as > >>> networks > >>> > > are accessible or not (wi-fi) it determines what hostname should be, > >>> one > >>> > > should never being hardwiring it to some value. > >>> > > > >>> > > >>> > Satish is just naming the loopback interface. I did this on all my > >>> former > >>> > Macs. > >>> > >>> > >>> Yes - this doesn't change the hostname. Its just adding an entry for > >>> gethostbyname - for current hostname. > >>> > >>> >>> > >>> 127.0.0.1 MarksMac-302.local > >>> <<< > >>> > >>> Sure - its best to not do this when one has a proper IP name [like > >>> foo.mcs.anl.gov] - but its useful when one has a hostname like > >>> "MarksMac-302.local" -that is not DNS resolvable > >>> > >>> Even if the machine is moved to a different network with a different > >>> name - the current entry won't cause problems [but will need another entry > >>> for the new host name - if this new name is also not DNS resolvable] > >>> > >>> Its likely this file is a generated file on macos - so might get reset > >>> on reboot - or some network change? [if this is the case - the change won't > >>> be permanent] > >>> > >>> > >>> Satish > >>> > >> > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > From balay at mcs.anl.gov Fri Sep 18 10:04:03 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 18 Sep 2020 10:04:03 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? > > I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. > > If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] BTW: To confirm, please try: ping 127.0.0.1 Satish > > > Mark, can you remove the line that you added to /etc/hosts - i.e: > > 127.0.0.1 MarksMac-302.local > > And now rerun MPI tests. Do they work or fail? > > [this is to check if this test is a false positive on your machine] > > Satish > > > On Fri, 18 Sep 2020, Mark Adams wrote: > > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley wrote: > > > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams wrote: > > > > > >> Oh you did not change my hostname: > > >> > > >> 07:37 master *= ~/Codes/petsc$ hostname > > >> MarksMac-302.local > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > >> Request timeout for icmp_seq 0 > > >> > > >> --- marksmac-302.local ping statistics --- > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss > > >> 07:42 2 master *= ~/Codes/petsc$ > > >> > > > > > > This does not make sense to me. You have > > > > > > 127.0.0.1 MarksMac-302.local > > > > > > in /etc/hosts, > > > > > > > 09:07 ~/.ssh$ cat /etc/hosts > > ## > > # Host Database > > # > > # localhost is used to configure the loopback interface > > # when the system is booting. Do not change this entry. > > ## > > 127.0.0.1 localhost > > 255.255.255.255 broadcasthost > > 127.0.0.1 MarksMac-5.local > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > > 127.0.0.1 MarksMac-302.local > > 09:07 ~/.ssh$ > > > > > > > > > > > > > but you cannot resolve that name? > > > > > > Matt > > > > > > > > >> BTW, I used to get messages about some network issue and 'changing host > > >> name to MarksMac-[x+1].local'. That is, the original hostname > > >> was MarksMac.local, then I got a message about changing > > >> to MarksMac-1.local, etc. I have not seen these messages for months but > > >> apparently this process has continued unabated. > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > > >> petsc-users at mcs.anl.gov> wrote: > > >> > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: > > >>> > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith wrote: > > >>> > > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > > >>> > > petsc-users at mcs.anl.gov> wrote: > > >>> > > > > > >>> > > > Here is a fix: > > >>> > > > > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > >>> > > > > >>> > > Satish, > > >>> > > > > >>> > > I don't think you want to be doing this on a Mac (on anything?) > > >>> On a > > >>> > > Mac based on the network configuration etc as it boots up and as > > >>> networks > > >>> > > are accessible or not (wi-fi) it determines what hostname should be, > > >>> one > > >>> > > should never being hardwiring it to some value. > > >>> > > > > >>> > > > >>> > Satish is just naming the loopback interface. I did this on all my > > >>> former > > >>> > Macs. > > >>> > > >>> > > >>> Yes - this doesn't change the hostname. Its just adding an entry for > > >>> gethostbyname - for current hostname. > > >>> > > >>> >>> > > >>> 127.0.0.1 MarksMac-302.local > > >>> <<< > > >>> > > >>> Sure - its best to not do this when one has a proper IP name [like > > >>> foo.mcs.anl.gov] - but its useful when one has a hostname like > > >>> "MarksMac-302.local" -that is not DNS resolvable > > >>> > > >>> Even if the machine is moved to a different network with a different > > >>> name - the current entry won't cause problems [but will need another entry > > >>> for the new host name - if this new name is also not DNS resolvable] > > >>> > > >>> Its likely this file is a generated file on macos - so might get reset > > >>> on reboot - or some network change? [if this is the case - the change won't > > >>> be permanent] > > >>> > > >>> > > >>> Satish > > >>> > > >> > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which their > > > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > From mfadams at lbl.gov Fri Sep 18 10:04:24 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Sep 2020 11:04:24 -0400 Subject: [petsc-users] osx error In-Reply-To: <9EC6C790-29B5-4EF9-8A69-6516ECC512D7@petsc.dev> References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <9EC6C790-29B5-4EF9-8A69-6516ECC512D7@petsc.dev> Message-ID: On Fri, Sep 18, 2020 at 10:23 AM Barry Smith wrote: > > This email thread doesn't seem to have clear communication. Can we start > at the beginning again please? Please answer my questions directly in the > appropriate lines below in your email response so we know what answer goes > with what question. I know you have done some of these things but it is > unclear what order you did them and the order is important. > > Background: In order to decide if the test in MPI.py works, or needs to > be modified or removed we need clear information about your system BEFORE > you made changes to get things to work. > > 1) Did you add the > > 127.0.0.1 MarksMac-5.local > > to the /etc/hosts yesterday because Satish suggested it, or have you > had it there for a long time? (You should not need it) > Satish suggested MarksMac-302.local. As I said earlier I have seen messages that say something like a network problem, renaming hostname to MarksMac-X.local, where X is +1 the current X. Initially it was MarksMac.local and it made MarksMac-1.local > 2) Please run > > ping -c 2 `hostname` > 09:08 master *= ~/Codes/petsc$ ping -c 2 `hostname` PING marksmac-302.local (127.0.0.1): 56 data bytes Request timeout for icmp_seq 0 --- marksmac-302.local ping statistics --- 2 packets transmitted, 0 packets received, 100.0% packet loss 10:55 2 master *= ~/Codes/petsc$ > > 3) Please remove the line 127.0.0.1 MarksMac-5.local in /etc/hosts > and follow the directions in > > > https://stackoverflow.com/questions/37951379/etc-hosts-ignored-in-mac-el-capitan-10-11-5 > > to flush the DNS cache (note for different versions of MacOS the > command is different). > My takeway here was you need one space between the IP and name. I had a tab here: 127.0.0.1 localhost fixed, but did not help. Its not clear to me what you want me to do. He did two scary (sudo goop) things. One was: sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.mDNSResponder.plist Do you want me to do that? > > 4) Please run > > ping -c 2 `hostname` > > 5) Please run a MPI program (doesn't matter which and I don't care how > you installed MPICH or OpenMPI) with > > mpiexec -n 2 ./programname > > does it run, hang or ? > > > Based on this information we can decide what needs to be done next. > > Thanks > > Barry > > As a side note on my Mac > > $ hostname > Barrys-MacBook-Pro-3.local > ~/Src/petsc* (barry/2020-07-07/docs-no-makefiles *>)* > arch-docs-no-makefiles > $ /sbin/ping -c 2 `hostname` > PING barrys-macbook-pro-3.local (127.0.0.1): 56 data bytes > 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.077 ms > 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.112 ms > > --- barrys-macbook-pro-3.local ping statistics --- > 2 packets transmitted, 2 packets received, 0.0% packet loss > round-trip min/avg/max/stddev = 0.077/0.095/0.112/0.018 ms > ~/Src/petsc* (barry/2020-07-07/docs-no-makefiles *>)* > arch-docs-no-makefiles > $ > > We are trying to understand if/why your machine is behaving differently. > > My theory is that if ping -c 2 `hostname` fails then MPICH and OpenMP > mpiexec -n 2 will fail. We need to determine if this theory is correct or > if you have a counter-example. > > > On Sep 18, 2020, at 8:09 AM, Mark Adams wrote: > > > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley wrote: > >> On Fri, Sep 18, 2020 at 7:46 AM Mark Adams wrote: >> >>> Oh you did not change my hostname: >>> >>> 07:37 master *= ~/Codes/petsc$ hostname >>> MarksMac-302.local >>> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>> PING marksmac-302.local (127.0.0.1): 56 data bytes >>> Request timeout for icmp_seq 0 >>> >>> --- marksmac-302.local ping statistics --- >>> 2 packets transmitted, 0 packets received, 100.0% packet loss >>> 07:42 2 master *= ~/Codes/petsc$ >>> >> >> This does not make sense to me. You have >> >> 127.0.0.1 MarksMac-302.local >> >> in /etc/hosts, >> > > 09:07 ~/.ssh$ cat /etc/hosts > ## > # Host Database > # > # localhost is used to configure the loopback interface > # when the system is booting. Do not change this entry. > ## > 127.0.0.1 localhost > 255.255.255.255 broadcasthost > 127.0.0.1 MarksMac-5.local > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > 127.0.0.1 MarksMac-302.local > 09:07 ~/.ssh$ > > > > > >> but you cannot resolve that name? >> >> Matt >> >> >>> BTW, I used to get messages about some network issue and 'changing host >>> name to MarksMac-[x+1].local'. That is, the original hostname >>> was MarksMac.local, then I got a message about changing >>> to MarksMac-1.local, etc. I have not seen these messages for months but >>> apparently this process has continued unabated. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> >>>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >>>> >>>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith wrote: >>>> > >>>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >>>> > > petsc-users at mcs.anl.gov> wrote: >>>> > > > >>>> > > > Here is a fix: >>>> > > > >>>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >>>> > > >>>> > > Satish, >>>> > > >>>> > > I don't think you want to be doing this on a Mac (on anything?) >>>> On a >>>> > > Mac based on the network configuration etc as it boots up and as >>>> networks >>>> > > are accessible or not (wi-fi) it determines what hostname should >>>> be, one >>>> > > should never being hardwiring it to some value. >>>> > > >>>> > >>>> > Satish is just naming the loopback interface. I did this on all my >>>> former >>>> > Macs. >>>> >>>> >>>> Yes - this doesn't change the hostname. Its just adding an entry for >>>> gethostbyname - for current hostname. >>>> >>>> >>> >>>> 127.0.0.1 MarksMac-302.local >>>> <<< >>>> >>>> Sure - its best to not do this when one has a proper IP name [like >>>> foo.mcs.anl.gov] - but its useful when one has a hostname like >>>> "MarksMac-302.local" -that is not DNS resolvable >>>> >>>> Even if the machine is moved to a different network with a different >>>> name - the current entry won't cause problems [but will need another entry >>>> for the new host name - if this new name is also not DNS resolvable] >>>> >>>> Its likely this file is a generated file on macos - so might get >>>> reset on reboot - or some network change? [if this is the case - the change >>>> won't be permanent] >>>> >>>> >>>> Satish >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 18 10:05:15 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 18 Sep 2020 10:05:15 -0500 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: <3481F1FD-EF0C-48EA-A282-563547336A75@petsc.dev> I have turned on my Mac Firewall and now get the same behavior as Mark. I have the Automatically allow builtin software to receive incoming connections checked So /sbin/ping must not be a builtin software I then tried to unblock /sbin/ping from the firewall with sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add /sbin/ping sudo /usr/libexec/ApplicationFirewall/socketfilterfw --unblock /sbin/ping Incoming connection to the application is permitted ping still indicates Request timeout for icmp_seq 0 But $ /usr/sbin/traceroute `hostname` traceroute: Warning: Barrys-MacBook-Pro-3.local has multiple addresses; using 127.0.0.1 traceroute to barrys-macbook-pro-3.local (127.0.0.1), 64 hops max, 52 byte packets 1 localhost (127.0.0.1) 0.539 ms 0.101 ms 0.067 ms works with the Firewall on. In fact traceroute works with the Automatically allow builtin software to receive incoming connections NOT checked. traceroute also fails like ping when vpn is turned on for my Mac. Mark, You need not bother with the lists of tasks I sent you. Thanks Barry > On Sep 18, 2020, at 9:28 AM, Satish Balay via petsc-users wrote: > >>>> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>>> PING marksmac-302.local (127.0.0.1): 56 data bytes > > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? > > I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. > > If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] > > > Mark, can you remove the line that you added to /etc/hosts - i.e: > > 127.0.0.1 MarksMac-302.local > > And now rerun MPI tests. Do they work or fail? > > [this is to check if this test is a false positive on your machine] > > Satish > > > On Fri, 18 Sep 2020, Mark Adams wrote: > >> On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley wrote: >> >>> On Fri, Sep 18, 2020 at 7:46 AM Mark Adams wrote: >>> >>>> Oh you did not change my hostname: >>>> >>>> 07:37 master *= ~/Codes/petsc$ hostname >>>> MarksMac-302.local >>>> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>>> PING marksmac-302.local (127.0.0.1): 56 data bytes >>>> Request timeout for icmp_seq 0 >>>> >>>> --- marksmac-302.local ping statistics --- >>>> 2 packets transmitted, 0 packets received, 100.0% packet loss >>>> 07:42 2 master *= ~/Codes/petsc$ >>>> >>> >>> This does not make sense to me. You have >>> >>> 127.0.0.1 MarksMac-302.local >>> >>> in /etc/hosts, >>> >> >> 09:07 ~/.ssh$ cat /etc/hosts >> ## >> # Host Database >> # >> # localhost is used to configure the loopback interface >> # when the system is booting. Do not change this entry. >> ## >> 127.0.0.1 localhost >> 255.255.255.255 broadcasthost >> 127.0.0.1 MarksMac-5.local >> 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk >> 127.0.0.1 MarksMac-302.local >> 09:07 ~/.ssh$ >> >> >> >> >> >>> but you cannot resolve that name? >>> >>> Matt >>> >>> >>>> BTW, I used to get messages about some network issue and 'changing host >>>> name to MarksMac-[x+1].local'. That is, the original hostname >>>> was MarksMac.local, then I got a message about changing >>>> to MarksMac-1.local, etc. I have not seen these messages for months but >>>> apparently this process has continued unabated. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >>>> petsc-users at mcs.anl.gov> wrote: >>>> >>>>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >>>>> >>>>>> On Thu, Sep 17, 2020 at 8:33 PM Barry Smith wrote: >>>>>> >>>>>>>> On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >>>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>>>> >>>>>>>> Here is a fix: >>>>>>>> >>>>>>>> echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >>>>>>> >>>>>>> Satish, >>>>>>> >>>>>>> I don't think you want to be doing this on a Mac (on anything?) >>>>> On a >>>>>>> Mac based on the network configuration etc as it boots up and as >>>>> networks >>>>>>> are accessible or not (wi-fi) it determines what hostname should be, >>>>> one >>>>>>> should never being hardwiring it to some value. >>>>>>> >>>>>> >>>>>> Satish is just naming the loopback interface. I did this on all my >>>>> former >>>>>> Macs. >>>>> >>>>> >>>>> Yes - this doesn't change the hostname. Its just adding an entry for >>>>> gethostbyname - for current hostname. >>>>> >>>>>>>> >>>>> 127.0.0.1 MarksMac-302.local >>>>> <<< >>>>> >>>>> Sure - its best to not do this when one has a proper IP name [like >>>>> foo.mcs.anl.gov] - but its useful when one has a hostname like >>>>> "MarksMac-302.local" -that is not DNS resolvable >>>>> >>>>> Even if the machine is moved to a different network with a different >>>>> name - the current entry won't cause problems [but will need another entry >>>>> for the new host name - if this new name is also not DNS resolvable] >>>>> >>>>> Its likely this file is a generated file on macos - so might get reset >>>>> on reboot - or some network change? [if this is the case - the change won't >>>>> be permanent] >>>>> >>>>> >>>>> Satish >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Sep 18 10:05:36 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Sep 2020 11:05:36 -0400 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: On Fri, Sep 18, 2020 at 11:04 AM Satish Balay wrote: > On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: > > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not > responding? > > > > I know some machines don't respond to external ping [and firewalls can > block it] but don't really know if they always respond to internal ping or > not. > > > > If some machines don't respond to internal ping - then we can't use > ping test in configure [it will create false negatives - as in this case] > > BTW: To confirm, please try: > > ping 127.0.0.1 > 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 PING 127.0.0.1 (127.0.0.1): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 Request timeout for icmp_seq 4 Request timeout for icmp_seq 5 Request timeout for icmp_seq 6 Request timeout for icmp_seq 7 Request timeout for icmp_seq 8 Request timeout for icmp_seq 9 Request timeout for icmp_seq 10 Request timeout for icmp_seq 11 Request timeout for icmp_seq 12 Request timeout for icmp_seq 13 Request timeout for icmp_seq 14 Request timeout for icmp_seq 15 Request timeout for icmp_seq 16 Request timeout for icmp_seq 17 Request timeout for icmp_seq 18 Request timeout for icmp_seq 19 Request timeout for icmp_seq 20 Request timeout for icmp_seq 21 still going ...... > > Satish > > > > > > > Mark, can you remove the line that you added to /etc/hosts - i.e: > > > > 127.0.0.1 MarksMac-302.local > > > > And now rerun MPI tests. Do they work or fail? > > > > [this is to check if this test is a false positive on your machine] > > > > Satish > > > > > > On Fri, 18 Sep 2020, Mark Adams wrote: > > > > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: > > > > > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams wrote: > > > > > > > >> Oh you did not change my hostname: > > > >> > > > >> 07:37 master *= ~/Codes/petsc$ hostname > > > >> MarksMac-302.local > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > >> Request timeout for icmp_seq 0 > > > >> > > > >> --- marksmac-302.local ping statistics --- > > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss > > > >> 07:42 2 master *= ~/Codes/petsc$ > > > >> > > > > > > > > This does not make sense to me. You have > > > > > > > > 127.0.0.1 MarksMac-302.local > > > > > > > > in /etc/hosts, > > > > > > > > > > 09:07 ~/.ssh$ cat /etc/hosts > > > ## > > > # Host Database > > > # > > > # localhost is used to configure the loopback interface > > > # when the system is booting. Do not change this entry. > > > ## > > > 127.0.0.1 localhost > > > 255.255.255.255 broadcasthost > > > 127.0.0.1 MarksMac-5.local > > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > > > 127.0.0.1 MarksMac-302.local > > > 09:07 ~/.ssh$ > > > > > > > > > > > > > > > > > > > but you cannot resolve that name? > > > > > > > > Matt > > > > > > > > > > > >> BTW, I used to get messages about some network issue and 'changing > host > > > >> name to MarksMac-[x+1].local'. That is, the original hostname > > > >> was MarksMac.local, then I got a message about changing > > > >> to MarksMac-1.local, etc. I have not seen these messages for months > but > > > >> apparently this process has continued unabated. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > > > >> petsc-users at mcs.anl.gov> wrote: > > > >> > > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > >>> > > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: > > > >>> > > > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > > > >>> > > petsc-users at mcs.anl.gov> wrote: > > > >>> > > > > > > >>> > > > Here is a fix: > > > >>> > > > > > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > >>> > > > > > >>> > > Satish, > > > >>> > > > > > >>> > > I don't think you want to be doing this on a Mac (on > anything?) > > > >>> On a > > > >>> > > Mac based on the network configuration etc as it boots up and > as > > > >>> networks > > > >>> > > are accessible or not (wi-fi) it determines what hostname > should be, > > > >>> one > > > >>> > > should never being hardwiring it to some value. > > > >>> > > > > > >>> > > > > >>> > Satish is just naming the loopback interface. I did this on all > my > > > >>> former > > > >>> > Macs. > > > >>> > > > >>> > > > >>> Yes - this doesn't change the hostname. Its just adding an entry > for > > > >>> gethostbyname - for current hostname. > > > >>> > > > >>> >>> > > > >>> 127.0.0.1 MarksMac-302.local > > > >>> <<< > > > >>> > > > >>> Sure - its best to not do this when one has a proper IP name [like > > > >>> foo.mcs.anl.gov] - but its useful when one has a hostname like > > > >>> "MarksMac-302.local" -that is not DNS resolvable > > > >>> > > > >>> Even if the machine is moved to a different network with a > different > > > >>> name - the current entry won't cause problems [but will need > another entry > > > >>> for the new host name - if this new name is also not DNS > resolvable] > > > >>> > > > >>> Its likely this file is a generated file on macos - so might get > reset > > > >>> on reboot - or some network change? [if this is the case - the > change won't > > > >>> be permanent] > > > >>> > > > >>> > > > >>> Satish > > > >>> > > > >> > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > > > > experiments is infinitely more interesting than any results to which > their > > > > experiments lead. > > > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Sep 18 10:07:08 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Sep 2020 11:07:08 -0400 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: Let me know if you want anything else. Thanks, Mark On Fri, Sep 18, 2020 at 11:05 AM Mark Adams wrote: > > > On Fri, Sep 18, 2020 at 11:04 AM Satish Balay wrote: > >> On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: >> >> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >> > >> > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not >> responding? >> > >> > I know some machines don't respond to external ping [and firewalls can >> block it] but don't really know if they always respond to internal ping or >> not. >> > >> > If some machines don't respond to internal ping - then we can't use >> ping test in configure [it will create false negatives - as in this case] >> >> BTW: To confirm, please try: >> >> ping 127.0.0.1 >> > > > 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts > 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 > PING 127.0.0.1 (127.0.0.1): 56 data bytes > Request timeout for icmp_seq 0 > Request timeout for icmp_seq 1 > Request timeout for icmp_seq 2 > Request timeout for icmp_seq 3 > Request timeout for icmp_seq 4 > Request timeout for icmp_seq 5 > Request timeout for icmp_seq 6 > Request timeout for icmp_seq 7 > Request timeout for icmp_seq 8 > Request timeout for icmp_seq 9 > Request timeout for icmp_seq 10 > Request timeout for icmp_seq 11 > Request timeout for icmp_seq 12 > Request timeout for icmp_seq 13 > Request timeout for icmp_seq 14 > Request timeout for icmp_seq 15 > Request timeout for icmp_seq 16 > Request timeout for icmp_seq 17 > Request timeout for icmp_seq 18 > Request timeout for icmp_seq 19 > Request timeout for icmp_seq 20 > Request timeout for icmp_seq 21 > > still going ...... > > >> >> Satish >> >> > >> > >> > Mark, can you remove the line that you added to /etc/hosts - i.e: >> > >> > 127.0.0.1 MarksMac-302.local >> > >> > And now rerun MPI tests. Do they work or fail? >> > >> > [this is to check if this test is a false positive on your machine] >> > >> > Satish >> > >> > >> > On Fri, 18 Sep 2020, Mark Adams wrote: >> > >> > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley >> wrote: >> > > >> > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams wrote: >> > > > >> > > >> Oh you did not change my hostname: >> > > >> >> > > >> 07:37 master *= ~/Codes/petsc$ hostname >> > > >> MarksMac-302.local >> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >> > > >> Request timeout for icmp_seq 0 >> > > >> >> > > >> --- marksmac-302.local ping statistics --- >> > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss >> > > >> 07:42 2 master *= ~/Codes/petsc$ >> > > >> >> > > > >> > > > This does not make sense to me. You have >> > > > >> > > > 127.0.0.1 MarksMac-302.local >> > > > >> > > > in /etc/hosts, >> > > > >> > > >> > > 09:07 ~/.ssh$ cat /etc/hosts >> > > ## >> > > # Host Database >> > > # >> > > # localhost is used to configure the loopback interface >> > > # when the system is booting. Do not change this entry. >> > > ## >> > > 127.0.0.1 localhost >> > > 255.255.255.255 broadcasthost >> > > 127.0.0.1 MarksMac-5.local >> > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk >> > > 127.0.0.1 MarksMac-302.local >> > > 09:07 ~/.ssh$ >> > > >> > > >> > > >> > > >> > > >> > > > but you cannot resolve that name? >> > > > >> > > > Matt >> > > > >> > > > >> > > >> BTW, I used to get messages about some network issue and 'changing >> host >> > > >> name to MarksMac-[x+1].local'. That is, the original hostname >> > > >> was MarksMac.local, then I got a message about changing >> > > >> to MarksMac-1.local, etc. I have not seen these messages for >> months but >> > > >> apparently this process has continued unabated. >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >> > > >> petsc-users at mcs.anl.gov> wrote: >> > > >> >> > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >> > > >>> >> > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith >> wrote: >> > > >>> > >> > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >> > > >>> > > petsc-users at mcs.anl.gov> wrote: >> > > >>> > > > >> > > >>> > > > Here is a fix: >> > > >>> > > > >> > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >> > > >>> > > >> > > >>> > > Satish, >> > > >>> > > >> > > >>> > > I don't think you want to be doing this on a Mac (on >> anything?) >> > > >>> On a >> > > >>> > > Mac based on the network configuration etc as it boots up and >> as >> > > >>> networks >> > > >>> > > are accessible or not (wi-fi) it determines what hostname >> should be, >> > > >>> one >> > > >>> > > should never being hardwiring it to some value. >> > > >>> > > >> > > >>> > >> > > >>> > Satish is just naming the loopback interface. I did this on all >> my >> > > >>> former >> > > >>> > Macs. >> > > >>> >> > > >>> >> > > >>> Yes - this doesn't change the hostname. Its just adding an entry >> for >> > > >>> gethostbyname - for current hostname. >> > > >>> >> > > >>> >>> >> > > >>> 127.0.0.1 MarksMac-302.local >> > > >>> <<< >> > > >>> >> > > >>> Sure - its best to not do this when one has a proper IP name [like >> > > >>> foo.mcs.anl.gov] - but its useful when one has a hostname like >> > > >>> "MarksMac-302.local" -that is not DNS resolvable >> > > >>> >> > > >>> Even if the machine is moved to a different network with a >> different >> > > >>> name - the current entry won't cause problems [but will need >> another entry >> > > >>> for the new host name - if this new name is also not DNS >> resolvable] >> > > >>> >> > > >>> Its likely this file is a generated file on macos - so might >> get reset >> > > >>> on reboot - or some network change? [if this is the case - the >> change won't >> > > >>> be permanent] >> > > >>> >> > > >>> >> > > >>> Satish >> > > >>> >> > > >> >> > > > >> > > > -- >> > > > What most experimenters take for granted before they begin their >> > > > experiments is infinitely more interesting than any results to >> which their >> > > > experiments lead. >> > > > -- Norbert Wiener >> > > > >> > > > https://www.cse.buffalo.edu/~knepley/ >> > > > >> > > > >> > > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 18 10:08:20 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 18 Sep 2020 10:08:20 -0500 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: I get this as well when the firewall is on. I am guessing Apple does not have ping working "correctly" when the firewall is on but options are set to allow builtin software to work correctly. Anyways, not important. Barry > On Sep 18, 2020, at 10:05 AM, Mark Adams wrote: > > > > On Fri, Sep 18, 2020 at 11:04 AM Satish Balay > wrote: > On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: > > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? > > > > I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. > > > > If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] > > BTW: To confirm, please try: > > ping 127.0.0.1 > > > 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts > 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 > PING 127.0.0.1 (127.0.0.1): 56 data bytes > Request timeout for icmp_seq 0 > Request timeout for icmp_seq 1 > Request timeout for icmp_seq 2 > Request timeout for icmp_seq 3 > Request timeout for icmp_seq 4 > Request timeout for icmp_seq 5 > Request timeout for icmp_seq 6 > Request timeout for icmp_seq 7 > Request timeout for icmp_seq 8 > Request timeout for icmp_seq 9 > Request timeout for icmp_seq 10 > Request timeout for icmp_seq 11 > Request timeout for icmp_seq 12 > Request timeout for icmp_seq 13 > Request timeout for icmp_seq 14 > Request timeout for icmp_seq 15 > Request timeout for icmp_seq 16 > Request timeout for icmp_seq 17 > Request timeout for icmp_seq 18 > Request timeout for icmp_seq 19 > Request timeout for icmp_seq 20 > Request timeout for icmp_seq 21 > > still going ...... > > > Satish > > > > > > > Mark, can you remove the line that you added to /etc/hosts - i.e: > > > > 127.0.0.1 MarksMac-302.local > > > > And now rerun MPI tests. Do they work or fail? > > > > [this is to check if this test is a false positive on your machine] > > > > Satish > > > > > > On Fri, 18 Sep 2020, Mark Adams wrote: > > > > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: > > > > > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: > > > > > > > >> Oh you did not change my hostname: > > > >> > > > >> 07:37 master *= ~/Codes/petsc$ hostname > > > >> MarksMac-302.local > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > >> Request timeout for icmp_seq 0 > > > >> > > > >> --- marksmac-302.local ping statistics --- > > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss > > > >> 07:42 2 master *= ~/Codes/petsc$ > > > >> > > > > > > > > This does not make sense to me. You have > > > > > > > > 127.0.0.1 MarksMac-302.local > > > > > > > > in /etc/hosts, > > > > > > > > > > 09:07 ~/.ssh$ cat /etc/hosts > > > ## > > > # Host Database > > > # > > > # localhost is used to configure the loopback interface > > > # when the system is booting. Do not change this entry. > > > ## > > > 127.0.0.1 localhost > > > 255.255.255.255 broadcasthost > > > 127.0.0.1 MarksMac-5.local > > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > > > 127.0.0.1 MarksMac-302.local > > > 09:07 ~/.ssh$ > > > > > > > > > > > > > > > > > > > but you cannot resolve that name? > > > > > > > > Matt > > > > > > > > > > > >> BTW, I used to get messages about some network issue and 'changing host > > > >> name to MarksMac-[x+1].local'. That is, the original hostname > > > >> was MarksMac.local, then I got a message about changing > > > >> to MarksMac-1.local, etc. I have not seen these messages for months but > > > >> apparently this process has continued unabated. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > > > >> petsc-users at mcs.anl.gov > wrote: > > > >> > > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > >>> > > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: > > > >>> > > > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > > > >>> > > petsc-users at mcs.anl.gov > wrote: > > > >>> > > > > > > >>> > > > Here is a fix: > > > >>> > > > > > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > >>> > > > > > >>> > > Satish, > > > >>> > > > > > >>> > > I don't think you want to be doing this on a Mac (on anything?) > > > >>> On a > > > >>> > > Mac based on the network configuration etc as it boots up and as > > > >>> networks > > > >>> > > are accessible or not (wi-fi) it determines what hostname should be, > > > >>> one > > > >>> > > should never being hardwiring it to some value. > > > >>> > > > > > >>> > > > > >>> > Satish is just naming the loopback interface. I did this on all my > > > >>> former > > > >>> > Macs. > > > >>> > > > >>> > > > >>> Yes - this doesn't change the hostname. Its just adding an entry for > > > >>> gethostbyname - for current hostname. > > > >>> > > > >>> >>> > > > >>> 127.0.0.1 MarksMac-302.local > > > >>> <<< > > > >>> > > > >>> Sure - its best to not do this when one has a proper IP name [like > > > >>> foo.mcs.anl.gov ] - but its useful when one has a hostname like > > > >>> "MarksMac-302.local" -that is not DNS resolvable > > > >>> > > > >>> Even if the machine is moved to a different network with a different > > > >>> name - the current entry won't cause problems [but will need another entry > > > >>> for the new host name - if this new name is also not DNS resolvable] > > > >>> > > > >>> Its likely this file is a generated file on macos - so might get reset > > > >>> on reboot - or some network change? [if this is the case - the change won't > > > >>> be permanent] > > > >>> > > > >>> > > > >>> Satish > > > >>> > > > >> > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > > > > experiments is infinitely more interesting than any results to which their > > > > experiments lead. > > > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 18 10:08:53 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 18 Sep 2020 10:08:53 -0500 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> try /usr/sbin/traceroute `hostname` > On Sep 18, 2020, at 10:07 AM, Mark Adams wrote: > > Let me know if you want anything else. > Thanks, > Mark > > On Fri, Sep 18, 2020 at 11:05 AM Mark Adams > wrote: > > > On Fri, Sep 18, 2020 at 11:04 AM Satish Balay > wrote: > On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: > > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? > > > > I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. > > > > If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] > > BTW: To confirm, please try: > > ping 127.0.0.1 > > > 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts > 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 > PING 127.0.0.1 (127.0.0.1): 56 data bytes > Request timeout for icmp_seq 0 > Request timeout for icmp_seq 1 > Request timeout for icmp_seq 2 > Request timeout for icmp_seq 3 > Request timeout for icmp_seq 4 > Request timeout for icmp_seq 5 > Request timeout for icmp_seq 6 > Request timeout for icmp_seq 7 > Request timeout for icmp_seq 8 > Request timeout for icmp_seq 9 > Request timeout for icmp_seq 10 > Request timeout for icmp_seq 11 > Request timeout for icmp_seq 12 > Request timeout for icmp_seq 13 > Request timeout for icmp_seq 14 > Request timeout for icmp_seq 15 > Request timeout for icmp_seq 16 > Request timeout for icmp_seq 17 > Request timeout for icmp_seq 18 > Request timeout for icmp_seq 19 > Request timeout for icmp_seq 20 > Request timeout for icmp_seq 21 > > still going ...... > > > Satish > > > > > > > Mark, can you remove the line that you added to /etc/hosts - i.e: > > > > 127.0.0.1 MarksMac-302.local > > > > And now rerun MPI tests. Do they work or fail? > > > > [this is to check if this test is a false positive on your machine] > > > > Satish > > > > > > On Fri, 18 Sep 2020, Mark Adams wrote: > > > > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: > > > > > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: > > > > > > > >> Oh you did not change my hostname: > > > >> > > > >> 07:37 master *= ~/Codes/petsc$ hostname > > > >> MarksMac-302.local > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > >> Request timeout for icmp_seq 0 > > > >> > > > >> --- marksmac-302.local ping statistics --- > > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss > > > >> 07:42 2 master *= ~/Codes/petsc$ > > > >> > > > > > > > > This does not make sense to me. You have > > > > > > > > 127.0.0.1 MarksMac-302.local > > > > > > > > in /etc/hosts, > > > > > > > > > > 09:07 ~/.ssh$ cat /etc/hosts > > > ## > > > # Host Database > > > # > > > # localhost is used to configure the loopback interface > > > # when the system is booting. Do not change this entry. > > > ## > > > 127.0.0.1 localhost > > > 255.255.255.255 broadcasthost > > > 127.0.0.1 MarksMac-5.local > > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > > > 127.0.0.1 MarksMac-302.local > > > 09:07 ~/.ssh$ > > > > > > > > > > > > > > > > > > > but you cannot resolve that name? > > > > > > > > Matt > > > > > > > > > > > >> BTW, I used to get messages about some network issue and 'changing host > > > >> name to MarksMac-[x+1].local'. That is, the original hostname > > > >> was MarksMac.local, then I got a message about changing > > > >> to MarksMac-1.local, etc. I have not seen these messages for months but > > > >> apparently this process has continued unabated. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > > > >> petsc-users at mcs.anl.gov > wrote: > > > >> > > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > >>> > > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: > > > >>> > > > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > > > >>> > > petsc-users at mcs.anl.gov > wrote: > > > >>> > > > > > > >>> > > > Here is a fix: > > > >>> > > > > > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > >>> > > > > > >>> > > Satish, > > > >>> > > > > > >>> > > I don't think you want to be doing this on a Mac (on anything?) > > > >>> On a > > > >>> > > Mac based on the network configuration etc as it boots up and as > > > >>> networks > > > >>> > > are accessible or not (wi-fi) it determines what hostname should be, > > > >>> one > > > >>> > > should never being hardwiring it to some value. > > > >>> > > > > > >>> > > > > >>> > Satish is just naming the loopback interface. I did this on all my > > > >>> former > > > >>> > Macs. > > > >>> > > > >>> > > > >>> Yes - this doesn't change the hostname. Its just adding an entry for > > > >>> gethostbyname - for current hostname. > > > >>> > > > >>> >>> > > > >>> 127.0.0.1 MarksMac-302.local > > > >>> <<< > > > >>> > > > >>> Sure - its best to not do this when one has a proper IP name [like > > > >>> foo.mcs.anl.gov ] - but its useful when one has a hostname like > > > >>> "MarksMac-302.local" -that is not DNS resolvable > > > >>> > > > >>> Even if the machine is moved to a different network with a different > > > >>> name - the current entry won't cause problems [but will need another entry > > > >>> for the new host name - if this new name is also not DNS resolvable] > > > >>> > > > >>> Its likely this file is a generated file on macos - so might get reset > > > >>> on reboot - or some network change? [if this is the case - the change won't > > > >>> be permanent] > > > >>> > > > >>> > > > >>> Satish > > > >>> > > > >> > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > > > > experiments is infinitely more interesting than any results to which their > > > > experiments lead. > > > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Sep 18 10:10:17 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 18 Sep 2020 10:10:17 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> Message-ID: Yes - please try this part: > >> > Mark, can you remove the line that you added to /etc/hosts - i.e: > >> > > >> > 127.0.0.1 MarksMac-302.local > >> > > >> > And now rerun MPI tests. Do they work or fail? > >> > > >> > [this is to check if this test is a false positive on your machine] > >> > > >> > Satish Satish On Fri, 18 Sep 2020, Mark Adams wrote: > Let me know if you want anything else. > Thanks, > Mark > > On Fri, Sep 18, 2020 at 11:05 AM Mark Adams wrote: > > > > > > > On Fri, Sep 18, 2020 at 11:04 AM Satish Balay wrote: > > > >> On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: > >> > >> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > >> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > >> > > >> > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not > >> responding? > >> > > >> > I know some machines don't respond to external ping [and firewalls can > >> block it] but don't really know if they always respond to internal ping or > >> not. > >> > > >> > If some machines don't respond to internal ping - then we can't use > >> ping test in configure [it will create false negatives - as in this case] > >> > >> BTW: To confirm, please try: > >> > >> ping 127.0.0.1 > >> > > > > > > 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts > > 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 > > PING 127.0.0.1 (127.0.0.1): 56 data bytes > > Request timeout for icmp_seq 0 > > Request timeout for icmp_seq 1 > > Request timeout for icmp_seq 2 > > Request timeout for icmp_seq 3 > > Request timeout for icmp_seq 4 > > Request timeout for icmp_seq 5 > > Request timeout for icmp_seq 6 > > Request timeout for icmp_seq 7 > > Request timeout for icmp_seq 8 > > Request timeout for icmp_seq 9 > > Request timeout for icmp_seq 10 > > Request timeout for icmp_seq 11 > > Request timeout for icmp_seq 12 > > Request timeout for icmp_seq 13 > > Request timeout for icmp_seq 14 > > Request timeout for icmp_seq 15 > > Request timeout for icmp_seq 16 > > Request timeout for icmp_seq 17 > > Request timeout for icmp_seq 18 > > Request timeout for icmp_seq 19 > > Request timeout for icmp_seq 20 > > Request timeout for icmp_seq 21 > > > > still going ...... > > > > > >> > >> Satish > >> > >> > > >> > > >> > Mark, can you remove the line that you added to /etc/hosts - i.e: > >> > > >> > 127.0.0.1 MarksMac-302.local > >> > > >> > And now rerun MPI tests. Do they work or fail? > >> > > >> > [this is to check if this test is a false positive on your machine] > >> > > >> > Satish > >> > > >> > > >> > On Fri, 18 Sep 2020, Mark Adams wrote: > >> > > >> > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > >> wrote: > >> > > > >> > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams wrote: > >> > > > > >> > > >> Oh you did not change my hostname: > >> > > >> > >> > > >> 07:37 master *= ~/Codes/petsc$ hostname > >> > > >> MarksMac-302.local > >> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > >> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > >> > > >> Request timeout for icmp_seq 0 > >> > > >> > >> > > >> --- marksmac-302.local ping statistics --- > >> > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss > >> > > >> 07:42 2 master *= ~/Codes/petsc$ > >> > > >> > >> > > > > >> > > > This does not make sense to me. You have > >> > > > > >> > > > 127.0.0.1 MarksMac-302.local > >> > > > > >> > > > in /etc/hosts, > >> > > > > >> > > > >> > > 09:07 ~/.ssh$ cat /etc/hosts > >> > > ## > >> > > # Host Database > >> > > # > >> > > # localhost is used to configure the loopback interface > >> > > # when the system is booting. Do not change this entry. > >> > > ## > >> > > 127.0.0.1 localhost > >> > > 255.255.255.255 broadcasthost > >> > > 127.0.0.1 MarksMac-5.local > >> > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > >> > > 127.0.0.1 MarksMac-302.local > >> > > 09:07 ~/.ssh$ > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > but you cannot resolve that name? > >> > > > > >> > > > Matt > >> > > > > >> > > > > >> > > >> BTW, I used to get messages about some network issue and 'changing > >> host > >> > > >> name to MarksMac-[x+1].local'. That is, the original hostname > >> > > >> was MarksMac.local, then I got a message about changing > >> > > >> to MarksMac-1.local, etc. I have not seen these messages for > >> months but > >> > > >> apparently this process has continued unabated. > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > >> > > >> petsc-users at mcs.anl.gov> wrote: > >> > > >> > >> > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: > >> > > >>> > >> > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > >> wrote: > >> > > >>> > > >> > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > >> > > >>> > > petsc-users at mcs.anl.gov> wrote: > >> > > >>> > > > > >> > > >>> > > > Here is a fix: > >> > > >>> > > > > >> > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > >> > > >>> > > > >> > > >>> > > Satish, > >> > > >>> > > > >> > > >>> > > I don't think you want to be doing this on a Mac (on > >> anything?) > >> > > >>> On a > >> > > >>> > > Mac based on the network configuration etc as it boots up and > >> as > >> > > >>> networks > >> > > >>> > > are accessible or not (wi-fi) it determines what hostname > >> should be, > >> > > >>> one > >> > > >>> > > should never being hardwiring it to some value. > >> > > >>> > > > >> > > >>> > > >> > > >>> > Satish is just naming the loopback interface. I did this on all > >> my > >> > > >>> former > >> > > >>> > Macs. > >> > > >>> > >> > > >>> > >> > > >>> Yes - this doesn't change the hostname. Its just adding an entry > >> for > >> > > >>> gethostbyname - for current hostname. > >> > > >>> > >> > > >>> >>> > >> > > >>> 127.0.0.1 MarksMac-302.local > >> > > >>> <<< > >> > > >>> > >> > > >>> Sure - its best to not do this when one has a proper IP name [like > >> > > >>> foo.mcs.anl.gov] - but its useful when one has a hostname like > >> > > >>> "MarksMac-302.local" -that is not DNS resolvable > >> > > >>> > >> > > >>> Even if the machine is moved to a different network with a > >> different > >> > > >>> name - the current entry won't cause problems [but will need > >> another entry > >> > > >>> for the new host name - if this new name is also not DNS > >> resolvable] > >> > > >>> > >> > > >>> Its likely this file is a generated file on macos - so might > >> get reset > >> > > >>> on reboot - or some network change? [if this is the case - the > >> change won't > >> > > >>> be permanent] > >> > > >>> > >> > > >>> > >> > > >>> Satish > >> > > >>> > >> > > >> > >> > > > > >> > > > -- > >> > > > What most experimenters take for granted before they begin their > >> > > > experiments is infinitely more interesting than any results to > >> which their > >> > > > experiments lead. > >> > > > -- Norbert Wiener > >> > > > > >> > > > https://www.cse.buffalo.edu/~knepley/ > >> > > > > >> > > > > >> > > > >> > > >> > >> > From jacob.fai at gmail.com Fri Sep 18 10:13:46 2020 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Fri, 18 Sep 2020 11:13:46 -0400 Subject: [petsc-users] osx error In-Reply-To: <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> Message-ID: Do you have any anti-virus on? This user had McAfee running which had its own firewall active: https://discussions.apple.com/thread/6980819 Do you have your firewall on in stealth mode? System Preferences > Firewall > Firewall Options then look for a button ?enable stealth mode? at the bottom and make sure its unchecked. And not to be that guy, have you restarted your machine? Its always worth a try... Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 > On Sep 18, 2020, at 11:08, Barry Smith wrote: > > > try > > /usr/sbin/traceroute `hostname` > > >> On Sep 18, 2020, at 10:07 AM, Mark Adams > wrote: >> >> Let me know if you want anything else. >> Thanks, >> Mark >> >> On Fri, Sep 18, 2020 at 11:05 AM Mark Adams > wrote: >> >> >> On Fri, Sep 18, 2020 at 11:04 AM Satish Balay > wrote: >> On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: >> >> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >> > >> > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? >> > >> > I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. >> > >> > If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] >> >> BTW: To confirm, please try: >> >> ping 127.0.0.1 >> >> >> 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts >> 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 >> PING 127.0.0.1 (127.0.0.1): 56 data bytes >> Request timeout for icmp_seq 0 >> Request timeout for icmp_seq 1 >> Request timeout for icmp_seq 2 >> Request timeout for icmp_seq 3 >> Request timeout for icmp_seq 4 >> Request timeout for icmp_seq 5 >> Request timeout for icmp_seq 6 >> Request timeout for icmp_seq 7 >> Request timeout for icmp_seq 8 >> Request timeout for icmp_seq 9 >> Request timeout for icmp_seq 10 >> Request timeout for icmp_seq 11 >> Request timeout for icmp_seq 12 >> Request timeout for icmp_seq 13 >> Request timeout for icmp_seq 14 >> Request timeout for icmp_seq 15 >> Request timeout for icmp_seq 16 >> Request timeout for icmp_seq 17 >> Request timeout for icmp_seq 18 >> Request timeout for icmp_seq 19 >> Request timeout for icmp_seq 20 >> Request timeout for icmp_seq 21 >> >> still going ...... >> >> >> Satish >> >> > >> > >> > Mark, can you remove the line that you added to /etc/hosts - i.e: >> > >> > 127.0.0.1 MarksMac-302.local >> > >> > And now rerun MPI tests. Do they work or fail? >> > >> > [this is to check if this test is a false positive on your machine] >> > >> > Satish >> > >> > >> > On Fri, 18 Sep 2020, Mark Adams wrote: >> > >> > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: >> > > >> > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: >> > > > >> > > >> Oh you did not change my hostname: >> > > >> >> > > >> 07:37 master *= ~/Codes/petsc$ hostname >> > > >> MarksMac-302.local >> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >> > > >> Request timeout for icmp_seq 0 >> > > >> >> > > >> --- marksmac-302.local ping statistics --- >> > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss >> > > >> 07:42 2 master *= ~/Codes/petsc$ >> > > >> >> > > > >> > > > This does not make sense to me. You have >> > > > >> > > > 127.0.0.1 MarksMac-302.local >> > > > >> > > > in /etc/hosts, >> > > > >> > > >> > > 09:07 ~/.ssh$ cat /etc/hosts >> > > ## >> > > # Host Database >> > > # >> > > # localhost is used to configure the loopback interface >> > > # when the system is booting. Do not change this entry. >> > > ## >> > > 127.0.0.1 localhost >> > > 255.255.255.255 broadcasthost >> > > 127.0.0.1 MarksMac-5.local >> > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk >> > > 127.0.0.1 MarksMac-302.local >> > > 09:07 ~/.ssh$ >> > > >> > > >> > > >> > > >> > > >> > > > but you cannot resolve that name? >> > > > >> > > > Matt >> > > > >> > > > >> > > >> BTW, I used to get messages about some network issue and 'changing host >> > > >> name to MarksMac-[x+1].local'. That is, the original hostname >> > > >> was MarksMac.local, then I got a message about changing >> > > >> to MarksMac-1.local, etc. I have not seen these messages for months but >> > > >> apparently this process has continued unabated. >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >> > > >> petsc-users at mcs.anl.gov > wrote: >> > > >> >> > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >> > > >>> >> > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: >> > > >>> > >> > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >> > > >>> > > petsc-users at mcs.anl.gov > wrote: >> > > >>> > > > >> > > >>> > > > Here is a fix: >> > > >>> > > > >> > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >> > > >>> > > >> > > >>> > > Satish, >> > > >>> > > >> > > >>> > > I don't think you want to be doing this on a Mac (on anything?) >> > > >>> On a >> > > >>> > > Mac based on the network configuration etc as it boots up and as >> > > >>> networks >> > > >>> > > are accessible or not (wi-fi) it determines what hostname should be, >> > > >>> one >> > > >>> > > should never being hardwiring it to some value. >> > > >>> > > >> > > >>> > >> > > >>> > Satish is just naming the loopback interface. I did this on all my >> > > >>> former >> > > >>> > Macs. >> > > >>> >> > > >>> >> > > >>> Yes - this doesn't change the hostname. Its just adding an entry for >> > > >>> gethostbyname - for current hostname. >> > > >>> >> > > >>> >>> >> > > >>> 127.0.0.1 MarksMac-302.local >> > > >>> <<< >> > > >>> >> > > >>> Sure - its best to not do this when one has a proper IP name [like >> > > >>> foo.mcs.anl.gov ] - but its useful when one has a hostname like >> > > >>> "MarksMac-302.local" -that is not DNS resolvable >> > > >>> >> > > >>> Even if the machine is moved to a different network with a different >> > > >>> name - the current entry won't cause problems [but will need another entry >> > > >>> for the new host name - if this new name is also not DNS resolvable] >> > > >>> >> > > >>> Its likely this file is a generated file on macos - so might get reset >> > > >>> on reboot - or some network change? [if this is the case - the change won't >> > > >>> be permanent] >> > > >>> >> > > >>> >> > > >>> Satish >> > > >>> >> > > >> >> > > > >> > > > -- >> > > > What most experimenters take for granted before they begin their >> > > > experiments is infinitely more interesting than any results to which their >> > > > experiments lead. >> > > > -- Norbert Wiener >> > > > >> > > > https://www.cse.buffalo.edu/~knepley/ >> > > > > >> > > > >> > > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Sep 18 10:14:37 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 18 Sep 2020 10:14:37 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> Message-ID: Its probably better to just run a test with gethostbyname()? The closest thing I can think off is: I don't know if 'traceroute' or 'host' commands are universally available. >>>>>> balay at sb /home/balay $ host `hostname` sb has address 192.168.0.144 balay at sb /home/balay $ echo $? 0 balay at sb /home/balay $ host foobar Host foobar not found: 3(NXDOMAIN) balay at sb /home/balay $ echo $? 1 balay at sb /home/balay $ <<<<<< However - I fear if there are *any* false positives - or false negatives - this test will generate more e-mail than the actual issue [of misbehaving MPI] Satish On Fri, 18 Sep 2020, Barry Smith wrote: > > try > > /usr/sbin/traceroute `hostname` > > > > On Sep 18, 2020, at 10:07 AM, Mark Adams wrote: > > > > Let me know if you want anything else. > > Thanks, > > Mark > > > > On Fri, Sep 18, 2020 at 11:05 AM Mark Adams > wrote: > > > > > > On Fri, Sep 18, 2020 at 11:04 AM Satish Balay > wrote: > > On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: > > > > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > > > > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? > > > > > > I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. > > > > > > If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] > > > > BTW: To confirm, please try: > > > > ping 127.0.0.1 > > > > > > 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts > > 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 > > PING 127.0.0.1 (127.0.0.1): 56 data bytes > > Request timeout for icmp_seq 0 > > Request timeout for icmp_seq 1 > > Request timeout for icmp_seq 2 > > Request timeout for icmp_seq 3 > > Request timeout for icmp_seq 4 > > Request timeout for icmp_seq 5 > > Request timeout for icmp_seq 6 > > Request timeout for icmp_seq 7 > > Request timeout for icmp_seq 8 > > Request timeout for icmp_seq 9 > > Request timeout for icmp_seq 10 > > Request timeout for icmp_seq 11 > > Request timeout for icmp_seq 12 > > Request timeout for icmp_seq 13 > > Request timeout for icmp_seq 14 > > Request timeout for icmp_seq 15 > > Request timeout for icmp_seq 16 > > Request timeout for icmp_seq 17 > > Request timeout for icmp_seq 18 > > Request timeout for icmp_seq 19 > > Request timeout for icmp_seq 20 > > Request timeout for icmp_seq 21 > > > > still going ...... > > > > > > Satish > > > > > > > > > > > Mark, can you remove the line that you added to /etc/hosts - i.e: > > > > > > 127.0.0.1 MarksMac-302.local > > > > > > And now rerun MPI tests. Do they work or fail? > > > > > > [this is to check if this test is a false positive on your machine] > > > > > > Satish > > > > > > > > > On Fri, 18 Sep 2020, Mark Adams wrote: > > > > > > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: > > > > > > > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: > > > > > > > > > >> Oh you did not change my hostname: > > > > >> > > > > >> 07:37 master *= ~/Codes/petsc$ hostname > > > > >> MarksMac-302.local > > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > > >> Request timeout for icmp_seq 0 > > > > >> > > > > >> --- marksmac-302.local ping statistics --- > > > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss > > > > >> 07:42 2 master *= ~/Codes/petsc$ > > > > >> > > > > > > > > > > This does not make sense to me. You have > > > > > > > > > > 127.0.0.1 MarksMac-302.local > > > > > > > > > > in /etc/hosts, > > > > > > > > > > > > > 09:07 ~/.ssh$ cat /etc/hosts > > > > ## > > > > # Host Database > > > > # > > > > # localhost is used to configure the loopback interface > > > > # when the system is booting. Do not change this entry. > > > > ## > > > > 127.0.0.1 localhost > > > > 255.255.255.255 broadcasthost > > > > 127.0.0.1 MarksMac-5.local > > > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > > > > 127.0.0.1 MarksMac-302.local > > > > 09:07 ~/.ssh$ > > > > > > > > > > > > > > > > > > > > > > > > > but you cannot resolve that name? > > > > > > > > > > Matt > > > > > > > > > > > > > > >> BTW, I used to get messages about some network issue and 'changing host > > > > >> name to MarksMac-[x+1].local'. That is, the original hostname > > > > >> was MarksMac.local, then I got a message about changing > > > > >> to MarksMac-1.local, etc. I have not seen these messages for months but > > > > >> apparently this process has continued unabated. > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > > > > >> petsc-users at mcs.anl.gov > wrote: > > > > >> > > > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > > >>> > > > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: > > > > >>> > > > > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > > > > >>> > > petsc-users at mcs.anl.gov > wrote: > > > > >>> > > > > > > > >>> > > > Here is a fix: > > > > >>> > > > > > > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > > >>> > > > > > > >>> > > Satish, > > > > >>> > > > > > > >>> > > I don't think you want to be doing this on a Mac (on anything?) > > > > >>> On a > > > > >>> > > Mac based on the network configuration etc as it boots up and as > > > > >>> networks > > > > >>> > > are accessible or not (wi-fi) it determines what hostname should be, > > > > >>> one > > > > >>> > > should never being hardwiring it to some value. > > > > >>> > > > > > > >>> > > > > > >>> > Satish is just naming the loopback interface. I did this on all my > > > > >>> former > > > > >>> > Macs. > > > > >>> > > > > >>> > > > > >>> Yes - this doesn't change the hostname. Its just adding an entry for > > > > >>> gethostbyname - for current hostname. > > > > >>> > > > > >>> >>> > > > > >>> 127.0.0.1 MarksMac-302.local > > > > >>> <<< > > > > >>> > > > > >>> Sure - its best to not do this when one has a proper IP name [like > > > > >>> foo.mcs.anl.gov ] - but its useful when one has a hostname like > > > > >>> "MarksMac-302.local" -that is not DNS resolvable > > > > >>> > > > > >>> Even if the machine is moved to a different network with a different > > > > >>> name - the current entry won't cause problems [but will need another entry > > > > >>> for the new host name - if this new name is also not DNS resolvable] > > > > >>> > > > > >>> Its likely this file is a generated file on macos - so might get reset > > > > >>> on reboot - or some network change? [if this is the case - the change won't > > > > >>> be permanent] > > > > >>> > > > > >>> > > > > >>> Satish > > > > >>> > > > > >> > > > > > > > > > > -- > > > > > What most experimenters take for granted before they begin their > > > > > experiments is infinitely more interesting than any results to which their > > > > > experiments lead. > > > > > -- Norbert Wiener > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > > > > > > > From jacob.fai at gmail.com Fri Sep 18 10:21:12 2020 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Fri, 18 Sep 2020 11:21:12 -0400 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> Message-ID: > System Preferences > Firewall > Firewall Options then look for a button ?enable stealth mode? at the bottom and make sure its unchecked. Whoops should be System Preferences > Security & Privacy > Firewall > Firewall Options Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 > On Sep 18, 2020, at 11:13, Jacob Faibussowitsch wrote: > > Do you have any anti-virus on? This user had McAfee running which had its own firewall active: https://discussions.apple.com/thread/6980819 > > Do you have your firewall on in stealth mode? System Preferences > Firewall > Firewall Options then look for a button ?enable stealth mode? at the bottom and make sure its unchecked. > > And not to be that guy, have you restarted your machine? Its always worth a try... > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: (312) 694-3391 > >> On Sep 18, 2020, at 11:08, Barry Smith > wrote: >> >> >> try >> >> /usr/sbin/traceroute `hostname` >> >> >>> On Sep 18, 2020, at 10:07 AM, Mark Adams > wrote: >>> >>> Let me know if you want anything else. >>> Thanks, >>> Mark >>> >>> On Fri, Sep 18, 2020 at 11:05 AM Mark Adams > wrote: >>> >>> >>> On Fri, Sep 18, 2020 at 11:04 AM Satish Balay > wrote: >>> On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: >>> >>> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >>> > >>> > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? >>> > >>> > I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. >>> > >>> > If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] >>> >>> BTW: To confirm, please try: >>> >>> ping 127.0.0.1 >>> >>> >>> 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts >>> 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 >>> PING 127.0.0.1 (127.0.0.1): 56 data bytes >>> Request timeout for icmp_seq 0 >>> Request timeout for icmp_seq 1 >>> Request timeout for icmp_seq 2 >>> Request timeout for icmp_seq 3 >>> Request timeout for icmp_seq 4 >>> Request timeout for icmp_seq 5 >>> Request timeout for icmp_seq 6 >>> Request timeout for icmp_seq 7 >>> Request timeout for icmp_seq 8 >>> Request timeout for icmp_seq 9 >>> Request timeout for icmp_seq 10 >>> Request timeout for icmp_seq 11 >>> Request timeout for icmp_seq 12 >>> Request timeout for icmp_seq 13 >>> Request timeout for icmp_seq 14 >>> Request timeout for icmp_seq 15 >>> Request timeout for icmp_seq 16 >>> Request timeout for icmp_seq 17 >>> Request timeout for icmp_seq 18 >>> Request timeout for icmp_seq 19 >>> Request timeout for icmp_seq 20 >>> Request timeout for icmp_seq 21 >>> >>> still going ...... >>> >>> >>> Satish >>> >>> > >>> > >>> > Mark, can you remove the line that you added to /etc/hosts - i.e: >>> > >>> > 127.0.0.1 MarksMac-302.local >>> > >>> > And now rerun MPI tests. Do they work or fail? >>> > >>> > [this is to check if this test is a false positive on your machine] >>> > >>> > Satish >>> > >>> > >>> > On Fri, 18 Sep 2020, Mark Adams wrote: >>> > >>> > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: >>> > > >>> > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: >>> > > > >>> > > >> Oh you did not change my hostname: >>> > > >> >>> > > >> 07:37 master *= ~/Codes/petsc$ hostname >>> > > >> MarksMac-302.local >>> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >>> > > >> Request timeout for icmp_seq 0 >>> > > >> >>> > > >> --- marksmac-302.local ping statistics --- >>> > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss >>> > > >> 07:42 2 master *= ~/Codes/petsc$ >>> > > >> >>> > > > >>> > > > This does not make sense to me. You have >>> > > > >>> > > > 127.0.0.1 MarksMac-302.local >>> > > > >>> > > > in /etc/hosts, >>> > > > >>> > > >>> > > 09:07 ~/.ssh$ cat /etc/hosts >>> > > ## >>> > > # Host Database >>> > > # >>> > > # localhost is used to configure the loopback interface >>> > > # when the system is booting. Do not change this entry. >>> > > ## >>> > > 127.0.0.1 localhost >>> > > 255.255.255.255 broadcasthost >>> > > 127.0.0.1 MarksMac-5.local >>> > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk >>> > > 127.0.0.1 MarksMac-302.local >>> > > 09:07 ~/.ssh$ >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > > but you cannot resolve that name? >>> > > > >>> > > > Matt >>> > > > >>> > > > >>> > > >> BTW, I used to get messages about some network issue and 'changing host >>> > > >> name to MarksMac-[x+1].local'. That is, the original hostname >>> > > >> was MarksMac.local, then I got a message about changing >>> > > >> to MarksMac-1.local, etc. I have not seen these messages for months but >>> > > >> apparently this process has continued unabated. >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >>> > > >> petsc-users at mcs.anl.gov > wrote: >>> > > >> >>> > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >>> > > >>> >>> > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: >>> > > >>> > >>> > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >>> > > >>> > > petsc-users at mcs.anl.gov > wrote: >>> > > >>> > > > >>> > > >>> > > > Here is a fix: >>> > > >>> > > > >>> > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >>> > > >>> > > >>> > > >>> > > Satish, >>> > > >>> > > >>> > > >>> > > I don't think you want to be doing this on a Mac (on anything?) >>> > > >>> On a >>> > > >>> > > Mac based on the network configuration etc as it boots up and as >>> > > >>> networks >>> > > >>> > > are accessible or not (wi-fi) it determines what hostname should be, >>> > > >>> one >>> > > >>> > > should never being hardwiring it to some value. >>> > > >>> > > >>> > > >>> > >>> > > >>> > Satish is just naming the loopback interface. I did this on all my >>> > > >>> former >>> > > >>> > Macs. >>> > > >>> >>> > > >>> >>> > > >>> Yes - this doesn't change the hostname. Its just adding an entry for >>> > > >>> gethostbyname - for current hostname. >>> > > >>> >>> > > >>> >>> >>> > > >>> 127.0.0.1 MarksMac-302.local >>> > > >>> <<< >>> > > >>> >>> > > >>> Sure - its best to not do this when one has a proper IP name [like >>> > > >>> foo.mcs.anl.gov ] - but its useful when one has a hostname like >>> > > >>> "MarksMac-302.local" -that is not DNS resolvable >>> > > >>> >>> > > >>> Even if the machine is moved to a different network with a different >>> > > >>> name - the current entry won't cause problems [but will need another entry >>> > > >>> for the new host name - if this new name is also not DNS resolvable] >>> > > >>> >>> > > >>> Its likely this file is a generated file on macos - so might get reset >>> > > >>> on reboot - or some network change? [if this is the case - the change won't >>> > > >>> be permanent] >>> > > >>> >>> > > >>> >>> > > >>> Satish >>> > > >>> >>> > > >> >>> > > > >>> > > > -- >>> > > > What most experimenters take for granted before they begin their >>> > > > experiments is infinitely more interesting than any results to which their >>> > > > experiments lead. >>> > > > -- Norbert Wiener >>> > > > >>> > > > https://www.cse.buffalo.edu/~knepley/ >>> > > > > >>> > > > >>> > > >>> > >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 18 10:42:35 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 18 Sep 2020 10:42:35 -0500 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> Message-ID: <1BDE2A5D-2159-48CE-8FED-D16D650AB605@petsc.dev> > On Sep 18, 2020, at 10:14 AM, Satish Balay wrote: > > Its probably better to just run a test with gethostbyname()? I had hoped to avoid building C code and running it. The Apple manual page for gethostbyname() states: The getaddrinfo(3) and getnameinfo(3) functions are preferred over the gethostbyname(), gethostbyname2(), and gethostbyaddr() functions. I do not know what MPICH and OpenMPI use. On the Mac > > The closest thing I can think off is: > > > I don't know if 'traceroute' or 'host' commands are universally available. > >>>>>>> > balay at sb /home/balay > $ host `hostname` > sb has address 192.168.0.144 $ host `hostname` Host Barrys-MacBook-Pro-3.local not found: 3(NXDOMAIN) Also on the Apple `hostname` is associated with multiple addresses and it seems different utilities may use different addresses produced. Some addresses may work, others may not. I will make one more MR adding traceroute first and if any of the tests succeed continue. If that fails for users then we will likely need to drop the test. I don't like just using a mpiexec -n 2 test because that can fail for so many reasons it is difficult to provide diagnostics to the users. Barry > balay at sb /home/balay > $ echo $? > 0 > balay at sb /home/balay > $ host foobar > Host foobar not found: 3(NXDOMAIN) > balay at sb /home/balay > $ echo $? > 1 > balay at sb /home/balay > $ > <<<<<< > > However - I fear if there are *any* false positives - or false negatives - this test will generate more e-mail than the actual issue [of misbehaving MPI] > > Satish > > On Fri, 18 Sep 2020, Barry Smith wrote: > >> >> try >> >> /usr/sbin/traceroute `hostname` >> >> >>> On Sep 18, 2020, at 10:07 AM, Mark Adams wrote: >>> >>> Let me know if you want anything else. >>> Thanks, >>> Mark >>> >>> On Fri, Sep 18, 2020 at 11:05 AM Mark Adams > wrote: >>> >>> >>> On Fri, Sep 18, 2020 at 11:04 AM Satish Balay > wrote: >>> On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: >>> >>>>>>> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>>>>>> PING marksmac-302.local (127.0.0.1): 56 data bytes >>>> >>>> So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? >>>> >>>> I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. >>>> >>>> If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] >>> >>> BTW: To confirm, please try: >>> >>> ping 127.0.0.1 >>> >>> >>> 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts >>> 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 >>> PING 127.0.0.1 (127.0.0.1): 56 data bytes >>> Request timeout for icmp_seq 0 >>> Request timeout for icmp_seq 1 >>> Request timeout for icmp_seq 2 >>> Request timeout for icmp_seq 3 >>> Request timeout for icmp_seq 4 >>> Request timeout for icmp_seq 5 >>> Request timeout for icmp_seq 6 >>> Request timeout for icmp_seq 7 >>> Request timeout for icmp_seq 8 >>> Request timeout for icmp_seq 9 >>> Request timeout for icmp_seq 10 >>> Request timeout for icmp_seq 11 >>> Request timeout for icmp_seq 12 >>> Request timeout for icmp_seq 13 >>> Request timeout for icmp_seq 14 >>> Request timeout for icmp_seq 15 >>> Request timeout for icmp_seq 16 >>> Request timeout for icmp_seq 17 >>> Request timeout for icmp_seq 18 >>> Request timeout for icmp_seq 19 >>> Request timeout for icmp_seq 20 >>> Request timeout for icmp_seq 21 >>> >>> still going ...... >>> >>> >>> Satish >>> >>>> >>>> >>>> Mark, can you remove the line that you added to /etc/hosts - i.e: >>>> >>>> 127.0.0.1 MarksMac-302.local >>>> >>>> And now rerun MPI tests. Do they work or fail? >>>> >>>> [this is to check if this test is a false positive on your machine] >>>> >>>> Satish >>>> >>>> >>>> On Fri, 18 Sep 2020, Mark Adams wrote: >>>> >>>>> On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: >>>>> >>>>>> On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: >>>>>> >>>>>>> Oh you did not change my hostname: >>>>>>> >>>>>>> 07:37 master *= ~/Codes/petsc$ hostname >>>>>>> MarksMac-302.local >>>>>>> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>>>>>> PING marksmac-302.local (127.0.0.1): 56 data bytes >>>>>>> Request timeout for icmp_seq 0 >>>>>>> >>>>>>> --- marksmac-302.local ping statistics --- >>>>>>> 2 packets transmitted, 0 packets received, 100.0% packet loss >>>>>>> 07:42 2 master *= ~/Codes/petsc$ >>>>>>> >>>>>> >>>>>> This does not make sense to me. You have >>>>>> >>>>>> 127.0.0.1 MarksMac-302.local >>>>>> >>>>>> in /etc/hosts, >>>>>> >>>>> >>>>> 09:07 ~/.ssh$ cat /etc/hosts >>>>> ## >>>>> # Host Database >>>>> # >>>>> # localhost is used to configure the loopback interface >>>>> # when the system is booting. Do not change this entry. >>>>> ## >>>>> 127.0.0.1 localhost >>>>> 255.255.255.255 broadcasthost >>>>> 127.0.0.1 MarksMac-5.local >>>>> 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk >>>>> 127.0.0.1 MarksMac-302.local >>>>> 09:07 ~/.ssh$ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> but you cannot resolve that name? >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> BTW, I used to get messages about some network issue and 'changing host >>>>>>> name to MarksMac-[x+1].local'. That is, the original hostname >>>>>>> was MarksMac.local, then I got a message about changing >>>>>>> to MarksMac-1.local, etc. I have not seen these messages for months but >>>>>>> apparently this process has continued unabated. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >>>>>>> petsc-users at mcs.anl.gov > wrote: >>>>>>> >>>>>>>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >>>>>>>> >>>>>>>>> On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: >>>>>>>>> >>>>>>>>>>> On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >>>>>>>>>> petsc-users at mcs.anl.gov > wrote: >>>>>>>>>>> >>>>>>>>>>> Here is a fix: >>>>>>>>>>> >>>>>>>>>>> echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >>>>>>>>>> >>>>>>>>>> Satish, >>>>>>>>>> >>>>>>>>>> I don't think you want to be doing this on a Mac (on anything?) >>>>>>>> On a >>>>>>>>>> Mac based on the network configuration etc as it boots up and as >>>>>>>> networks >>>>>>>>>> are accessible or not (wi-fi) it determines what hostname should be, >>>>>>>> one >>>>>>>>>> should never being hardwiring it to some value. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Satish is just naming the loopback interface. I did this on all my >>>>>>>> former >>>>>>>>> Macs. >>>>>>>> >>>>>>>> >>>>>>>> Yes - this doesn't change the hostname. Its just adding an entry for >>>>>>>> gethostbyname - for current hostname. >>>>>>>> >>>>>>>>>>> >>>>>>>> 127.0.0.1 MarksMac-302.local >>>>>>>> <<< >>>>>>>> >>>>>>>> Sure - its best to not do this when one has a proper IP name [like >>>>>>>> foo.mcs.anl.gov ] - but its useful when one has a hostname like >>>>>>>> "MarksMac-302.local" -that is not DNS resolvable >>>>>>>> >>>>>>>> Even if the machine is moved to a different network with a different >>>>>>>> name - the current entry won't cause problems [but will need another entry >>>>>>>> for the new host name - if this new name is also not DNS resolvable] >>>>>>>> >>>>>>>> Its likely this file is a generated file on macos - so might get reset >>>>>>>> on reboot - or some network change? [if this is the case - the change won't >>>>>>>> be permanent] >>>>>>>> >>>>>>>> >>>>>>>> Satish >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> > >>>>>> >>>>> >>>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Sep 18 10:42:40 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 18 Sep 2020 10:42:40 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> Message-ID: Windows has neither host nor traceroute. hostname is from cygwin [this is fine]. equivalent of traceroute is tracert And for `host` recommendation is to use nslookup https://stackoverflow.com/questions/21520191/unix-command-host-is-there-windows-equivalent I think its best to: - improve this test [with host - and fallback to traceroute/tracert - if host not found] - make this a warning - instead of error. [as the error case won't prevent any e-mails - it can increase due to false positives and negatives - if any] Satish On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: > Its probably better to just run a test with gethostbyname()? > > The closest thing I can think off is: > > > I don't know if 'traceroute' or 'host' commands are universally available. > > >>>>>> > balay at sb /home/balay > $ host `hostname` > sb has address 192.168.0.144 > balay at sb /home/balay > $ echo $? > 0 > balay at sb /home/balay > $ host foobar > Host foobar not found: 3(NXDOMAIN) > balay at sb /home/balay > $ echo $? > 1 > balay at sb /home/balay > $ > <<<<<< > > However - I fear if there are *any* false positives - or false negatives - this test will generate more e-mail than the actual issue [of misbehaving MPI] > > Satish > > On Fri, 18 Sep 2020, Barry Smith wrote: > > > > > try > > > > /usr/sbin/traceroute `hostname` > > > > > > > On Sep 18, 2020, at 10:07 AM, Mark Adams wrote: > > > > > > Let me know if you want anything else. > > > Thanks, > > > Mark > > > > > > On Fri, Sep 18, 2020 at 11:05 AM Mark Adams > wrote: > > > > > > > > > On Fri, Sep 18, 2020 at 11:04 AM Satish Balay > wrote: > > > On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: > > > > > > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > > > > > > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? > > > > > > > > I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. > > > > > > > > If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] > > > > > > BTW: To confirm, please try: > > > > > > ping 127.0.0.1 > > > > > > > > > 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts > > > 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 > > > PING 127.0.0.1 (127.0.0.1): 56 data bytes > > > Request timeout for icmp_seq 0 > > > Request timeout for icmp_seq 1 > > > Request timeout for icmp_seq 2 > > > Request timeout for icmp_seq 3 > > > Request timeout for icmp_seq 4 > > > Request timeout for icmp_seq 5 > > > Request timeout for icmp_seq 6 > > > Request timeout for icmp_seq 7 > > > Request timeout for icmp_seq 8 > > > Request timeout for icmp_seq 9 > > > Request timeout for icmp_seq 10 > > > Request timeout for icmp_seq 11 > > > Request timeout for icmp_seq 12 > > > Request timeout for icmp_seq 13 > > > Request timeout for icmp_seq 14 > > > Request timeout for icmp_seq 15 > > > Request timeout for icmp_seq 16 > > > Request timeout for icmp_seq 17 > > > Request timeout for icmp_seq 18 > > > Request timeout for icmp_seq 19 > > > Request timeout for icmp_seq 20 > > > Request timeout for icmp_seq 21 > > > > > > still going ...... > > > > > > > > > Satish > > > > > > > > > > > > > > > Mark, can you remove the line that you added to /etc/hosts - i.e: > > > > > > > > 127.0.0.1 MarksMac-302.local > > > > > > > > And now rerun MPI tests. Do they work or fail? > > > > > > > > [this is to check if this test is a false positive on your machine] > > > > > > > > Satish > > > > > > > > > > > > On Fri, 18 Sep 2020, Mark Adams wrote: > > > > > > > > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: > > > > > > > > > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: > > > > > > > > > > > >> Oh you did not change my hostname: > > > > > >> > > > > > >> 07:37 master *= ~/Codes/petsc$ hostname > > > > > >> MarksMac-302.local > > > > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > > > > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes > > > > > >> Request timeout for icmp_seq 0 > > > > > >> > > > > > >> --- marksmac-302.local ping statistics --- > > > > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss > > > > > >> 07:42 2 master *= ~/Codes/petsc$ > > > > > >> > > > > > > > > > > > > This does not make sense to me. You have > > > > > > > > > > > > 127.0.0.1 MarksMac-302.local > > > > > > > > > > > > in /etc/hosts, > > > > > > > > > > > > > > > > 09:07 ~/.ssh$ cat /etc/hosts > > > > > ## > > > > > # Host Database > > > > > # > > > > > # localhost is used to configure the loopback interface > > > > > # when the system is booting. Do not change this entry. > > > > > ## > > > > > 127.0.0.1 localhost > > > > > 255.255.255.255 broadcasthost > > > > > 127.0.0.1 MarksMac-5.local > > > > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > > > > > 127.0.0.1 MarksMac-302.local > > > > > 09:07 ~/.ssh$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > but you cannot resolve that name? > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > >> BTW, I used to get messages about some network issue and 'changing host > > > > > >> name to MarksMac-[x+1].local'. That is, the original hostname > > > > > >> was MarksMac.local, then I got a message about changing > > > > > >> to MarksMac-1.local, etc. I have not seen these messages for months but > > > > > >> apparently this process has continued unabated. > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > > > > > >> petsc-users at mcs.anl.gov > wrote: > > > > > >> > > > > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: > > > > > >>> > > > > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: > > > > > >>> > > > > > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > > > > > >>> > > petsc-users at mcs.anl.gov > wrote: > > > > > >>> > > > > > > > > >>> > > > Here is a fix: > > > > > >>> > > > > > > > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > > > > > >>> > > > > > > > >>> > > Satish, > > > > > >>> > > > > > > > >>> > > I don't think you want to be doing this on a Mac (on anything?) > > > > > >>> On a > > > > > >>> > > Mac based on the network configuration etc as it boots up and as > > > > > >>> networks > > > > > >>> > > are accessible or not (wi-fi) it determines what hostname should be, > > > > > >>> one > > > > > >>> > > should never being hardwiring it to some value. > > > > > >>> > > > > > > > >>> > > > > > > >>> > Satish is just naming the loopback interface. I did this on all my > > > > > >>> former > > > > > >>> > Macs. > > > > > >>> > > > > > >>> > > > > > >>> Yes - this doesn't change the hostname. Its just adding an entry for > > > > > >>> gethostbyname - for current hostname. > > > > > >>> > > > > > >>> >>> > > > > > >>> 127.0.0.1 MarksMac-302.local > > > > > >>> <<< > > > > > >>> > > > > > >>> Sure - its best to not do this when one has a proper IP name [like > > > > > >>> foo.mcs.anl.gov ] - but its useful when one has a hostname like > > > > > >>> "MarksMac-302.local" -that is not DNS resolvable > > > > > >>> > > > > > >>> Even if the machine is moved to a different network with a different > > > > > >>> name - the current entry won't cause problems [but will need another entry > > > > > >>> for the new host name - if this new name is also not DNS resolvable] > > > > > >>> > > > > > >>> Its likely this file is a generated file on macos - so might get reset > > > > > >>> on reboot - or some network change? [if this is the case - the change won't > > > > > >>> be permanent] > > > > > >>> > > > > > >>> > > > > > >>> Satish > > > > > >>> > > > > > >> > > > > > > > > > > > > -- > > > > > > What most experimenters take for granted before they begin their > > > > > > experiments is infinitely more interesting than any results to which their > > > > > > experiments lead. > > > > > > -- Norbert Wiener > > > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From balay at mcs.anl.gov Fri Sep 18 10:48:12 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 18 Sep 2020 10:48:12 -0500 (CDT) Subject: [petsc-users] osx error In-Reply-To: <1BDE2A5D-2159-48CE-8FED-D16D650AB605@petsc.dev> References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> <1BDE2A5D-2159-48CE-8FED-D16D650AB605@petsc.dev> Message-ID: On Fri, 18 Sep 2020, Barry Smith wrote: > > > > On Sep 18, 2020, at 10:14 AM, Satish Balay wrote: > > > > Its probably better to just run a test with gethostbyname()? > > I had hoped to avoid building C code and running it. The Apple manual page for gethostbyname() states: The getaddrinfo(3) and getnameinfo(3) functions are preferred over the gethostbyname(), gethostbyname2(), and > gethostbyaddr() functions. > > > I do not know what MPICH and OpenMPI use. > > On the Mac > > > > > The closest thing I can think off is: > > > > > > I don't know if 'traceroute' or 'host' commands are universally available. > > > >>>>>>> > > balay at sb /home/balay > > $ host `hostname` > > sb has address 192.168.0.144 > > $ host `hostname` > Host Barrys-MacBook-Pro-3.local not found: 3(NXDOMAIN) > > Also on the Apple `hostname` is associated with multiple addresses and it seems different utilities may use different addresses produced. Some addresses may work, others may not. If its bound to multiple adresses nslookup should list all adressed If host doesn't work - how is tracroute able to resolve it? What do you get for: nslookup `hostname` traceroute `hostname` dig `hostname` Satish > > > I will make one more MR adding traceroute first and if any of the tests succeed continue. If that fails for users then we will likely need to drop the test. > > I don't like just using a mpiexec -n 2 test because that can fail for so many reasons it is difficult to provide diagnostics to the users. > > Barry > > > > > balay at sb /home/balay > > $ echo $? > > 0 > > balay at sb /home/balay > > $ host foobar > > Host foobar not found: 3(NXDOMAIN) > > balay at sb /home/balay > > $ echo $? > > 1 > > balay at sb /home/balay > > $ > > <<<<<< > > > > However - I fear if there are *any* false positives - or false negatives - this test will generate more e-mail than the actual issue [of misbehaving MPI] > > > > Satish > > > > On Fri, 18 Sep 2020, Barry Smith wrote: > > > >> > >> try > >> > >> /usr/sbin/traceroute `hostname` > >> > >> > >>> On Sep 18, 2020, at 10:07 AM, Mark Adams wrote: > >>> > >>> Let me know if you want anything else. > >>> Thanks, > >>> Mark > >>> > >>> On Fri, Sep 18, 2020 at 11:05 AM Mark Adams > wrote: > >>> > >>> > >>> On Fri, Sep 18, 2020 at 11:04 AM Satish Balay > wrote: > >>> On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: > >>> > >>>>>>> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > >>>>>>> PING marksmac-302.local (127.0.0.1): 56 data bytes > >>>> > >>>> So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? > >>>> > >>>> I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. > >>>> > >>>> If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] > >>> > >>> BTW: To confirm, please try: > >>> > >>> ping 127.0.0.1 > >>> > >>> > >>> 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts > >>> 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 > >>> PING 127.0.0.1 (127.0.0.1): 56 data bytes > >>> Request timeout for icmp_seq 0 > >>> Request timeout for icmp_seq 1 > >>> Request timeout for icmp_seq 2 > >>> Request timeout for icmp_seq 3 > >>> Request timeout for icmp_seq 4 > >>> Request timeout for icmp_seq 5 > >>> Request timeout for icmp_seq 6 > >>> Request timeout for icmp_seq 7 > >>> Request timeout for icmp_seq 8 > >>> Request timeout for icmp_seq 9 > >>> Request timeout for icmp_seq 10 > >>> Request timeout for icmp_seq 11 > >>> Request timeout for icmp_seq 12 > >>> Request timeout for icmp_seq 13 > >>> Request timeout for icmp_seq 14 > >>> Request timeout for icmp_seq 15 > >>> Request timeout for icmp_seq 16 > >>> Request timeout for icmp_seq 17 > >>> Request timeout for icmp_seq 18 > >>> Request timeout for icmp_seq 19 > >>> Request timeout for icmp_seq 20 > >>> Request timeout for icmp_seq 21 > >>> > >>> still going ...... > >>> > >>> > >>> Satish > >>> > >>>> > >>>> > >>>> Mark, can you remove the line that you added to /etc/hosts - i.e: > >>>> > >>>> 127.0.0.1 MarksMac-302.local > >>>> > >>>> And now rerun MPI tests. Do they work or fail? > >>>> > >>>> [this is to check if this test is a false positive on your machine] > >>>> > >>>> Satish > >>>> > >>>> > >>>> On Fri, 18 Sep 2020, Mark Adams wrote: > >>>> > >>>>> On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: > >>>>> > >>>>>> On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: > >>>>>> > >>>>>>> Oh you did not change my hostname: > >>>>>>> > >>>>>>> 07:37 master *= ~/Codes/petsc$ hostname > >>>>>>> MarksMac-302.local > >>>>>>> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local > >>>>>>> PING marksmac-302.local (127.0.0.1): 56 data bytes > >>>>>>> Request timeout for icmp_seq 0 > >>>>>>> > >>>>>>> --- marksmac-302.local ping statistics --- > >>>>>>> 2 packets transmitted, 0 packets received, 100.0% packet loss > >>>>>>> 07:42 2 master *= ~/Codes/petsc$ > >>>>>>> > >>>>>> > >>>>>> This does not make sense to me. You have > >>>>>> > >>>>>> 127.0.0.1 MarksMac-302.local > >>>>>> > >>>>>> in /etc/hosts, > >>>>>> > >>>>> > >>>>> 09:07 ~/.ssh$ cat /etc/hosts > >>>>> ## > >>>>> # Host Database > >>>>> # > >>>>> # localhost is used to configure the loopback interface > >>>>> # when the system is booting. Do not change this entry. > >>>>> ## > >>>>> 127.0.0.1 localhost > >>>>> 255.255.255.255 broadcasthost > >>>>> 127.0.0.1 MarksMac-5.local > >>>>> 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk > >>>>> 127.0.0.1 MarksMac-302.local > >>>>> 09:07 ~/.ssh$ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> but you cannot resolve that name? > >>>>>> > >>>>>> Matt > >>>>>> > >>>>>> > >>>>>>> BTW, I used to get messages about some network issue and 'changing host > >>>>>>> name to MarksMac-[x+1].local'. That is, the original hostname > >>>>>>> was MarksMac.local, then I got a message about changing > >>>>>>> to MarksMac-1.local, etc. I have not seen these messages for months but > >>>>>>> apparently this process has continued unabated. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < > >>>>>>> petsc-users at mcs.anl.gov > wrote: > >>>>>>> > >>>>>>>> On Thu, 17 Sep 2020, Matthew Knepley wrote: > >>>>>>>> > >>>>>>>>> On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: > >>>>>>>>> > >>>>>>>>>>> On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < > >>>>>>>>>> petsc-users at mcs.anl.gov > wrote: > >>>>>>>>>>> > >>>>>>>>>>> Here is a fix: > >>>>>>>>>>> > >>>>>>>>>>> echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > >>>>>>>>>> > >>>>>>>>>> Satish, > >>>>>>>>>> > >>>>>>>>>> I don't think you want to be doing this on a Mac (on anything?) > >>>>>>>> On a > >>>>>>>>>> Mac based on the network configuration etc as it boots up and as > >>>>>>>> networks > >>>>>>>>>> are accessible or not (wi-fi) it determines what hostname should be, > >>>>>>>> one > >>>>>>>>>> should never being hardwiring it to some value. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> Satish is just naming the loopback interface. I did this on all my > >>>>>>>> former > >>>>>>>>> Macs. > >>>>>>>> > >>>>>>>> > >>>>>>>> Yes - this doesn't change the hostname. Its just adding an entry for > >>>>>>>> gethostbyname - for current hostname. > >>>>>>>> > >>>>>>>>>>> > >>>>>>>> 127.0.0.1 MarksMac-302.local > >>>>>>>> <<< > >>>>>>>> > >>>>>>>> Sure - its best to not do this when one has a proper IP name [like > >>>>>>>> foo.mcs.anl.gov ] - but its useful when one has a hostname like > >>>>>>>> "MarksMac-302.local" -that is not DNS resolvable > >>>>>>>> > >>>>>>>> Even if the machine is moved to a different network with a different > >>>>>>>> name - the current entry won't cause problems [but will need another entry > >>>>>>>> for the new host name - if this new name is also not DNS resolvable] > >>>>>>>> > >>>>>>>> Its likely this file is a generated file on macos - so might get reset > >>>>>>>> on reboot - or some network change? [if this is the case - the change won't > >>>>>>>> be permanent] > >>>>>>>> > >>>>>>>> > >>>>>>>> Satish > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> What most experimenters take for granted before they begin their > >>>>>> experiments is infinitely more interesting than any results to which their > >>>>>> experiments lead. > >>>>>> -- Norbert Wiener > >>>>>> > >>>>>> https://www.cse.buffalo.edu/~knepley/ > >>>>>> > > >>>>>> > >>>>> > >>>> > >>> > >> > >> > > > > From mfadams at lbl.gov Fri Sep 18 11:13:20 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Sep 2020 12:13:20 -0400 Subject: [petsc-users] osx error In-Reply-To: <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> Message-ID: On Fri, Sep 18, 2020 at 11:08 AM Barry Smith wrote: > > try > > /usr/sbin/traceroute `hostname` > 12:02 adams/plex-noprealloc-fix= ~/Codes/petsc/src/ts/utils/dmplexlandau/tutorials$ /usr/sbin/traceroute `hostname` traceroute to marksmac-302.local (127.0.0.1), 64 hops max, 52 byte packets 1 localhost (127.0.0.1) 0.322 ms 0.057 ms 0.032 ms 12:12 adams/plex-noprealloc-fix= ~/Codes/petsc/src/ts/utils/dmplexlandau/tutorials$ > > > On Sep 18, 2020, at 10:07 AM, Mark Adams wrote: > > Let me know if you want anything else. > Thanks, > Mark > > On Fri, Sep 18, 2020 at 11:05 AM Mark Adams wrote: > >> >> >> On Fri, Sep 18, 2020 at 11:04 AM Satish Balay wrote: >> >>> On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: >>> >>> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >>> > >>> > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not >>> responding? >>> > >>> > I know some machines don't respond to external ping [and firewalls can >>> block it] but don't really know if they always respond to internal ping or >>> not. >>> > >>> > If some machines don't respond to internal ping - then we can't use >>> ping test in configure [it will create false negatives - as in this case] >>> >>> BTW: To confirm, please try: >>> >>> ping 127.0.0.1 >>> >> >> >> 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts >> 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 >> PING 127.0.0.1 (127.0.0.1): 56 data bytes >> Request timeout for icmp_seq 0 >> Request timeout for icmp_seq 1 >> Request timeout for icmp_seq 2 >> Request timeout for icmp_seq 3 >> Request timeout for icmp_seq 4 >> Request timeout for icmp_seq 5 >> Request timeout for icmp_seq 6 >> Request timeout for icmp_seq 7 >> Request timeout for icmp_seq 8 >> Request timeout for icmp_seq 9 >> Request timeout for icmp_seq 10 >> Request timeout for icmp_seq 11 >> Request timeout for icmp_seq 12 >> Request timeout for icmp_seq 13 >> Request timeout for icmp_seq 14 >> Request timeout for icmp_seq 15 >> Request timeout for icmp_seq 16 >> Request timeout for icmp_seq 17 >> Request timeout for icmp_seq 18 >> Request timeout for icmp_seq 19 >> Request timeout for icmp_seq 20 >> Request timeout for icmp_seq 21 >> >> still going ...... >> >> >>> >>> Satish >>> >>> > >>> > >>> > Mark, can you remove the line that you added to /etc/hosts - i.e: >>> > >>> > 127.0.0.1 MarksMac-302.local >>> > >>> > And now rerun MPI tests. Do they work or fail? >>> > >>> > [this is to check if this test is a false positive on your machine] >>> > >>> > Satish >>> > >>> > >>> > On Fri, 18 Sep 2020, Mark Adams wrote: >>> > >>> > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley >>> wrote: >>> > > >>> > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams >>> wrote: >>> > > > >>> > > >> Oh you did not change my hostname: >>> > > >> >>> > > >> 07:37 master *= ~/Codes/petsc$ hostname >>> > > >> MarksMac-302.local >>> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >>> > > >> Request timeout for icmp_seq 0 >>> > > >> >>> > > >> --- marksmac-302.local ping statistics --- >>> > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss >>> > > >> 07:42 2 master *= ~/Codes/petsc$ >>> > > >> >>> > > > >>> > > > This does not make sense to me. You have >>> > > > >>> > > > 127.0.0.1 MarksMac-302.local >>> > > > >>> > > > in /etc/hosts, >>> > > > >>> > > >>> > > 09:07 ~/.ssh$ cat /etc/hosts >>> > > ## >>> > > # Host Database >>> > > # >>> > > # localhost is used to configure the loopback interface >>> > > # when the system is booting. Do not change this entry. >>> > > ## >>> > > 127.0.0.1 localhost >>> > > 255.255.255.255 broadcasthost >>> > > 127.0.0.1 MarksMac-5.local >>> > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk >>> > > 127.0.0.1 MarksMac-302.local >>> > > 09:07 ~/.ssh$ >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > > but you cannot resolve that name? >>> > > > >>> > > > Matt >>> > > > >>> > > > >>> > > >> BTW, I used to get messages about some network issue and >>> 'changing host >>> > > >> name to MarksMac-[x+1].local'. That is, the original hostname >>> > > >> was MarksMac.local, then I got a message about changing >>> > > >> to MarksMac-1.local, etc. I have not seen these messages for >>> months but >>> > > >> apparently this process has continued unabated. >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >>> > > >> petsc-users at mcs.anl.gov> wrote: >>> > > >> >>> > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >>> > > >>> >>> > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith >>> wrote: >>> > > >>> > >>> > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >>> > > >>> > > petsc-users at mcs.anl.gov> wrote: >>> > > >>> > > > >>> > > >>> > > > Here is a fix: >>> > > >>> > > > >>> > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >>> > > >>> > > >>> > > >>> > > Satish, >>> > > >>> > > >>> > > >>> > > I don't think you want to be doing this on a Mac (on >>> anything?) >>> > > >>> On a >>> > > >>> > > Mac based on the network configuration etc as it boots up >>> and as >>> > > >>> networks >>> > > >>> > > are accessible or not (wi-fi) it determines what hostname >>> should be, >>> > > >>> one >>> > > >>> > > should never being hardwiring it to some value. >>> > > >>> > > >>> > > >>> > >>> > > >>> > Satish is just naming the loopback interface. I did this on >>> all my >>> > > >>> former >>> > > >>> > Macs. >>> > > >>> >>> > > >>> >>> > > >>> Yes - this doesn't change the hostname. Its just adding an entry >>> for >>> > > >>> gethostbyname - for current hostname. >>> > > >>> >>> > > >>> >>> >>> > > >>> 127.0.0.1 MarksMac-302.local >>> > > >>> <<< >>> > > >>> >>> > > >>> Sure - its best to not do this when one has a proper IP name >>> [like >>> > > >>> foo.mcs.anl.gov] - but its useful when one has a hostname like >>> > > >>> "MarksMac-302.local" -that is not DNS resolvable >>> > > >>> >>> > > >>> Even if the machine is moved to a different network with a >>> different >>> > > >>> name - the current entry won't cause problems [but will need >>> another entry >>> > > >>> for the new host name - if this new name is also not DNS >>> resolvable] >>> > > >>> >>> > > >>> Its likely this file is a generated file on macos - so might >>> get reset >>> > > >>> on reboot - or some network change? [if this is the case - the >>> change won't >>> > > >>> be permanent] >>> > > >>> >>> > > >>> >>> > > >>> Satish >>> > > >>> >>> > > >> >>> > > > >>> > > > -- >>> > > > What most experimenters take for granted before they begin their >>> > > > experiments is infinitely more interesting than any results to >>> which their >>> > > > experiments lead. >>> > > > -- Norbert Wiener >>> > > > >>> > > > https://www.cse.buffalo.edu/~knepley/ >>> > > > >>> > > > >>> > > >>> > >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 18 11:13:56 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 18 Sep 2020 11:13:56 -0500 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> <1BDE2A5D-2159-48CE-8FED-D16D650AB605@petsc.dev> Message-ID: <142EA98B-DD44-44BD-A3F2-0174C1BD57BF@petsc.dev> $ host `hostname` Host Barrys-MacBook-Pro-3.local not found: 3(NXDOMAIN) ~/Src/petsc/src/snes/tutorials (barry/2020-07-07/docs-no-makefiles *>) arch-docs-no-makefiles $ nslookup `hostname` Server: 10.0.1.1 Address: 10.0.1.1#53 ** server can't find Barrys-MacBook-Pro-3.local: NXDOMAIN ~/Src/petsc/src/snes/tutorials (barry/2020-07-07/docs-no-makefiles *>) arch-docs-no-makefiles $ host `hostname` Host Barrys-MacBook-Pro-3.local not found: 3(NXDOMAIN) ~/Src/petsc/src/snes/tutorials (barry/2020-07-07/docs-no-makefiles *>) arch-docs-no-makefiles $ traceroute `hostname` traceroute: Warning: Barrys-MacBook-Pro-3.local has multiple addresses; using 127.0.0.1 traceroute to barrys-macbook-pro-3.local (127.0.0.1), 64 hops max, 52 byte packets 1 localhost (127.0.0.1) 0.286 ms 0.039 ms 0.028 ms ~/Src/petsc/src/snes/tutorials (barry/2020-07-07/docs-no-makefiles *>) arch-docs-no-makefiles $ /sbin/ping `hostname` PING barrys-macbook-pro-3.local (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.031 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.089 ms ^C --- barrys-macbook-pro-3.local ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.031/0.060/0.089/0.029 ms ~/Src/petsc/src/snes/tutorials (barry/2020-07-07/docs-no-makefiles *>) arch-docs-no-makefiles \ $ dig `hostname` ; <<>> DiG 9.10.6 <<>> Barrys-MacBook-Pro-3.local ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 4681 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;Barrys-MacBook-Pro-3.local. IN A ;; Query time: 2 msec ;; SERVER: 10.0.1.1#53(10.0.1.1) ;; WHEN: Fri Sep 18 11:09:08 CDT 2020 ;; MSG SIZE rcvd: 44 ~/Src/petsc/src/snes/tutorials (barry/2020-07-07/docs-no-makefiles *>) arch-docs-no-makefiles $ /sbin/ping 10.0.1.1 PING 10.0.1.1 (10.0.1.1): 56 data bytes 64 bytes from 10.0.1.1: icmp_seq=0 ttl=255 time=4.089 ms 64 bytes from 10.0.1.1: icmp_seq=1 ttl=255 time=1.607 ms 64 bytes from 10.0.1.1: icmp_seq=2 ttl=255 time=6.884 ms ^C --- 10.0.1.1 ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 1.607/4.193/6.884/2.156 ms >> If host doesn't work - how is tracroute able to resolve it? macOS NOTICE The host command does not use the host name and address resolution or the DNS query routing mechanisms used by other processes running on macOS. The results of name or address queries printed by host may differ from those found by other processes that use the macOS native name and address resolution mechanisms. The results of DNS queries may also differ from queries that use the macOS DNS routing library. > On Sep 18, 2020, at 10:48 AM, Satish Balay wrote: > > On Fri, 18 Sep 2020, Barry Smith wrote: > >> >> >>> On Sep 18, 2020, at 10:14 AM, Satish Balay wrote: >>> >>> Its probably better to just run a test with gethostbyname()? >> >> I had hoped to avoid building C code and running it. The Apple manual page for gethostbyname() states: The getaddrinfo(3) and getnameinfo(3) functions are preferred over the gethostbyname(), gethostbyname2(), and >> gethostbyaddr() functions. >> >> >> I do not know what MPICH and OpenMPI use. >> >> On the Mac >> >>> >>> The closest thing I can think off is: >>> >>> >>> I don't know if 'traceroute' or 'host' commands are universally available. >>> >>>>>>>>> >>> balay at sb /home/balay >>> $ host `hostname` >>> sb has address 192.168.0.144 >> >> $ host `hostname` >> Host Barrys-MacBook-Pro-3.local not found: 3(NXDOMAIN) >> >> Also on the Apple `hostname` is associated with multiple addresses and it seems different utilities may use different addresses produced. Some addresses may work, others may not. > > If its bound to multiple adresses nslookup should list all adressed > > What do you get for: > > nslookup `hostname` > traceroute `hostname` > > dig `hostname` > > Satish > >> >> >> I will make one more MR adding traceroute first and if any of the tests succeed continue. If that fails for users then we will likely need to drop the test. >> >> I don't like just using a mpiexec -n 2 test because that can fail for so many reasons it is difficult to provide diagnostics to the users. >> >> Barry >> >> >> >>> balay at sb /home/balay >>> $ echo $? >>> 0 >>> balay at sb /home/balay >>> $ host foobar >>> Host foobar not found: 3(NXDOMAIN) >>> balay at sb /home/balay >>> $ echo $? >>> 1 >>> balay at sb /home/balay >>> $ >>> <<<<<< >>> >>> However - I fear if there are *any* false positives - or false negatives - this test will generate more e-mail than the actual issue [of misbehaving MPI] >>> >>> Satish >>> >>> On Fri, 18 Sep 2020, Barry Smith wrote: >>> >>>> >>>> try >>>> >>>> /usr/sbin/traceroute `hostname` >>>> >>>> >>>>> On Sep 18, 2020, at 10:07 AM, Mark Adams wrote: >>>>> >>>>> Let me know if you want anything else. >>>>> Thanks, >>>>> Mark >>>>> >>>>> On Fri, Sep 18, 2020 at 11:05 AM Mark Adams > wrote: >>>>> >>>>> >>>>> On Fri, Sep 18, 2020 at 11:04 AM Satish Balay > wrote: >>>>> On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: >>>>> >>>>>>>>> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>>>>>>>> PING marksmac-302.local (127.0.0.1): 56 data bytes >>>>>> >>>>>> So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not responding? >>>>>> >>>>>> I know some machines don't respond to external ping [and firewalls can block it] but don't really know if they always respond to internal ping or not. >>>>>> >>>>>> If some machines don't respond to internal ping - then we can't use ping test in configure [it will create false negatives - as in this case] >>>>> >>>>> BTW: To confirm, please try: >>>>> >>>>> ping 127.0.0.1 >>>>> >>>>> >>>>> 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts >>>>> 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 >>>>> PING 127.0.0.1 (127.0.0.1): 56 data bytes >>>>> Request timeout for icmp_seq 0 >>>>> Request timeout for icmp_seq 1 >>>>> Request timeout for icmp_seq 2 >>>>> Request timeout for icmp_seq 3 >>>>> Request timeout for icmp_seq 4 >>>>> Request timeout for icmp_seq 5 >>>>> Request timeout for icmp_seq 6 >>>>> Request timeout for icmp_seq 7 >>>>> Request timeout for icmp_seq 8 >>>>> Request timeout for icmp_seq 9 >>>>> Request timeout for icmp_seq 10 >>>>> Request timeout for icmp_seq 11 >>>>> Request timeout for icmp_seq 12 >>>>> Request timeout for icmp_seq 13 >>>>> Request timeout for icmp_seq 14 >>>>> Request timeout for icmp_seq 15 >>>>> Request timeout for icmp_seq 16 >>>>> Request timeout for icmp_seq 17 >>>>> Request timeout for icmp_seq 18 >>>>> Request timeout for icmp_seq 19 >>>>> Request timeout for icmp_seq 20 >>>>> Request timeout for icmp_seq 21 >>>>> >>>>> still going ...... >>>>> >>>>> >>>>> Satish >>>>> >>>>>> >>>>>> >>>>>> Mark, can you remove the line that you added to /etc/hosts - i.e: >>>>>> >>>>>> 127.0.0.1 MarksMac-302.local >>>>>> >>>>>> And now rerun MPI tests. Do they work or fail? >>>>>> >>>>>> [this is to check if this test is a false positive on your machine] >>>>>> >>>>>> Satish >>>>>> >>>>>> >>>>>> On Fri, 18 Sep 2020, Mark Adams wrote: >>>>>> >>>>>>> On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley > wrote: >>>>>>> >>>>>>>> On Fri, Sep 18, 2020 at 7:46 AM Mark Adams > wrote: >>>>>>>> >>>>>>>>> Oh you did not change my hostname: >>>>>>>>> >>>>>>>>> 07:37 master *= ~/Codes/petsc$ hostname >>>>>>>>> MarksMac-302.local >>>>>>>>> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>>>>>>>> PING marksmac-302.local (127.0.0.1): 56 data bytes >>>>>>>>> Request timeout for icmp_seq 0 >>>>>>>>> >>>>>>>>> --- marksmac-302.local ping statistics --- >>>>>>>>> 2 packets transmitted, 0 packets received, 100.0% packet loss >>>>>>>>> 07:42 2 master *= ~/Codes/petsc$ >>>>>>>>> >>>>>>>> >>>>>>>> This does not make sense to me. You have >>>>>>>> >>>>>>>> 127.0.0.1 MarksMac-302.local >>>>>>>> >>>>>>>> in /etc/hosts, >>>>>>>> >>>>>>> >>>>>>> 09:07 ~/.ssh$ cat /etc/hosts >>>>>>> ## >>>>>>> # Host Database >>>>>>> # >>>>>>> # localhost is used to configure the loopback interface >>>>>>> # when the system is booting. Do not change this entry. >>>>>>> ## >>>>>>> 127.0.0.1 localhost >>>>>>> 255.255.255.255 broadcasthost >>>>>>> 127.0.0.1 MarksMac-5.local >>>>>>> 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk >>>>>>> 127.0.0.1 MarksMac-302.local >>>>>>> 09:07 ~/.ssh$ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> but you cannot resolve that name? >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> BTW, I used to get messages about some network issue and 'changing host >>>>>>>>> name to MarksMac-[x+1].local'. That is, the original hostname >>>>>>>>> was MarksMac.local, then I got a message about changing >>>>>>>>> to MarksMac-1.local, etc. I have not seen these messages for months but >>>>>>>>> apparently this process has continued unabated. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >>>>>>>>> petsc-users at mcs.anl.gov > wrote: >>>>>>>>> >>>>>>>>>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >>>>>>>>>> >>>>>>>>>>> On Thu, Sep 17, 2020 at 8:33 PM Barry Smith > wrote: >>>>>>>>>>> >>>>>>>>>>>>> On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >>>>>>>>>>>> petsc-users at mcs.anl.gov > wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Here is a fix: >>>>>>>>>>>>> >>>>>>>>>>>>> echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >>>>>>>>>>>> >>>>>>>>>>>> Satish, >>>>>>>>>>>> >>>>>>>>>>>> I don't think you want to be doing this on a Mac (on anything?) >>>>>>>>>> On a >>>>>>>>>>>> Mac based on the network configuration etc as it boots up and as >>>>>>>>>> networks >>>>>>>>>>>> are accessible or not (wi-fi) it determines what hostname should be, >>>>>>>>>> one >>>>>>>>>>>> should never being hardwiring it to some value. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Satish is just naming the loopback interface. I did this on all my >>>>>>>>>> former >>>>>>>>>>> Macs. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes - this doesn't change the hostname. Its just adding an entry for >>>>>>>>>> gethostbyname - for current hostname. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> 127.0.0.1 MarksMac-302.local >>>>>>>>>> <<< >>>>>>>>>> >>>>>>>>>> Sure - its best to not do this when one has a proper IP name [like >>>>>>>>>> foo.mcs.anl.gov ] - but its useful when one has a hostname like >>>>>>>>>> "MarksMac-302.local" -that is not DNS resolvable >>>>>>>>>> >>>>>>>>>> Even if the machine is moved to a different network with a different >>>>>>>>>> name - the current entry won't cause problems [but will need another entry >>>>>>>>>> for the new host name - if this new name is also not DNS resolvable] >>>>>>>>>> >>>>>>>>>> Its likely this file is a generated file on macos - so might get reset >>>>>>>>>> on reboot - or some network change? [if this is the case - the change won't >>>>>>>>>> be permanent] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Satish >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> > >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Sep 18 11:19:00 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Sep 2020 12:19:00 -0400 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> Message-ID: On Fri, Sep 18, 2020 at 11:13 AM Jacob Faibussowitsch wrote: > Do you have any anti-virus on? This user had McAfee running which had its > own firewall active: https://discussions.apple.com/thread/6980819 > > Do you have your firewall on in stealth mode? System Preferences > > Firewall > Firewall Options then look for a button ?enable stealth mode? at > the bottom and make sure its unchecked. > It is checked. I'm going to wait on unchecking it and rebooting. Thanks, > > And not to be that guy, have you restarted your machine? Its always worth > a try... > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: (312) 694-3391 > > On Sep 18, 2020, at 11:08, Barry Smith wrote: > > > try > > /usr/sbin/traceroute `hostname` > > > On Sep 18, 2020, at 10:07 AM, Mark Adams wrote: > > Let me know if you want anything else. > Thanks, > Mark > > On Fri, Sep 18, 2020 at 11:05 AM Mark Adams wrote: > >> >> >> On Fri, Sep 18, 2020 at 11:04 AM Satish Balay wrote: >> >>> On Fri, 18 Sep 2020, Satish Balay via petsc-users wrote: >>> >>> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >>> > >>> > So it is resolving MarksMac-302.local as 127.0.0.1 - but ping is not >>> responding? >>> > >>> > I know some machines don't respond to external ping [and firewalls can >>> block it] but don't really know if they always respond to internal ping or >>> not. >>> > >>> > If some machines don't respond to internal ping - then we can't use >>> ping test in configure [it will create false negatives - as in this case] >>> >>> BTW: To confirm, please try: >>> >>> ping 127.0.0.1 >>> >> >> >> 11:02 master *= ~/Codes/petsc$ sudo vi /etc/hosts >> 11:02 master *= ~/Codes/petsc$ ping 127.0.0.1 >> PING 127.0.0.1 (127.0.0.1): 56 data bytes >> Request timeout for icmp_seq 0 >> Request timeout for icmp_seq 1 >> Request timeout for icmp_seq 2 >> Request timeout for icmp_seq 3 >> Request timeout for icmp_seq 4 >> Request timeout for icmp_seq 5 >> Request timeout for icmp_seq 6 >> Request timeout for icmp_seq 7 >> Request timeout for icmp_seq 8 >> Request timeout for icmp_seq 9 >> Request timeout for icmp_seq 10 >> Request timeout for icmp_seq 11 >> Request timeout for icmp_seq 12 >> Request timeout for icmp_seq 13 >> Request timeout for icmp_seq 14 >> Request timeout for icmp_seq 15 >> Request timeout for icmp_seq 16 >> Request timeout for icmp_seq 17 >> Request timeout for icmp_seq 18 >> Request timeout for icmp_seq 19 >> Request timeout for icmp_seq 20 >> Request timeout for icmp_seq 21 >> >> still going ...... >> >> >>> >>> Satish >>> >>> > >>> > >>> > Mark, can you remove the line that you added to /etc/hosts - i.e: >>> > >>> > 127.0.0.1 MarksMac-302.local >>> > >>> > And now rerun MPI tests. Do they work or fail? >>> > >>> > [this is to check if this test is a false positive on your machine] >>> > >>> > Satish >>> > >>> > >>> > On Fri, 18 Sep 2020, Mark Adams wrote: >>> > >>> > > On Fri, Sep 18, 2020 at 7:51 AM Matthew Knepley >>> wrote: >>> > > >>> > > > On Fri, Sep 18, 2020 at 7:46 AM Mark Adams >>> wrote: >>> > > > >>> > > >> Oh you did not change my hostname: >>> > > >> >>> > > >> 07:37 master *= ~/Codes/petsc$ hostname >>> > > >> MarksMac-302.local >>> > > >> 07:41 master *= ~/Codes/petsc$ ping -c 2 MarksMac-302.local >>> > > >> PING marksmac-302.local (127.0.0.1): 56 data bytes >>> > > >> Request timeout for icmp_seq 0 >>> > > >> >>> > > >> --- marksmac-302.local ping statistics --- >>> > > >> 2 packets transmitted, 0 packets received, 100.0% packet loss >>> > > >> 07:42 2 master *= ~/Codes/petsc$ >>> > > >> >>> > > > >>> > > > This does not make sense to me. You have >>> > > > >>> > > > 127.0.0.1 MarksMac-302.local >>> > > > >>> > > > in /etc/hosts, >>> > > > >>> > > >>> > > 09:07 ~/.ssh$ cat /etc/hosts >>> > > ## >>> > > # Host Database >>> > > # >>> > > # localhost is used to configure the loopback interface >>> > > # when the system is booting. Do not change this entry. >>> > > ## >>> > > 127.0.0.1 localhost >>> > > 255.255.255.255 broadcasthost >>> > > 127.0.0.1 MarksMac-5.local >>> > > 127.0.0.1 243.124.240.10.in-addr.arpa.private.cam.ac.uk >>> > > 127.0.0.1 MarksMac-302.local >>> > > 09:07 ~/.ssh$ >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > > but you cannot resolve that name? >>> > > > >>> > > > Matt >>> > > > >>> > > > >>> > > >> BTW, I used to get messages about some network issue and >>> 'changing host >>> > > >> name to MarksMac-[x+1].local'. That is, the original hostname >>> > > >> was MarksMac.local, then I got a message about changing >>> > > >> to MarksMac-1.local, etc. I have not seen these messages for >>> months but >>> > > >> apparently this process has continued unabated. >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> On Thu, Sep 17, 2020 at 11:10 PM Satish Balay via petsc-users < >>> > > >> petsc-users at mcs.anl.gov> wrote: >>> > > >> >>> > > >>> On Thu, 17 Sep 2020, Matthew Knepley wrote: >>> > > >>> >>> > > >>> > On Thu, Sep 17, 2020 at 8:33 PM Barry Smith >>> wrote: >>> > > >>> > >>> > > >>> > > > On Sep 17, 2020, at 4:59 PM, Satish Balay via petsc-users < >>> > > >>> > > petsc-users at mcs.anl.gov> wrote: >>> > > >>> > > > >>> > > >>> > > > Here is a fix: >>> > > >>> > > > >>> > > >>> > > > echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts >>> > > >>> > > >>> > > >>> > > Satish, >>> > > >>> > > >>> > > >>> > > I don't think you want to be doing this on a Mac (on >>> anything?) >>> > > >>> On a >>> > > >>> > > Mac based on the network configuration etc as it boots up >>> and as >>> > > >>> networks >>> > > >>> > > are accessible or not (wi-fi) it determines what hostname >>> should be, >>> > > >>> one >>> > > >>> > > should never being hardwiring it to some value. >>> > > >>> > > >>> > > >>> > >>> > > >>> > Satish is just naming the loopback interface. I did this on >>> all my >>> > > >>> former >>> > > >>> > Macs. >>> > > >>> >>> > > >>> >>> > > >>> Yes - this doesn't change the hostname. Its just adding an entry >>> for >>> > > >>> gethostbyname - for current hostname. >>> > > >>> >>> > > >>> >>> >>> > > >>> 127.0.0.1 MarksMac-302.local >>> > > >>> <<< >>> > > >>> >>> > > >>> Sure - its best to not do this when one has a proper IP name >>> [like >>> > > >>> foo.mcs.anl.gov] - but its useful when one has a hostname like >>> > > >>> "MarksMac-302.local" -that is not DNS resolvable >>> > > >>> >>> > > >>> Even if the machine is moved to a different network with a >>> different >>> > > >>> name - the current entry won't cause problems [but will need >>> another entry >>> > > >>> for the new host name - if this new name is also not DNS >>> resolvable] >>> > > >>> >>> > > >>> Its likely this file is a generated file on macos - so might >>> get reset >>> > > >>> on reboot - or some network change? [if this is the case - the >>> change won't >>> > > >>> be permanent] >>> > > >>> >>> > > >>> >>> > > >>> Satish >>> > > >>> >>> > > >> >>> > > > >>> > > > -- >>> > > > What most experimenters take for granted before they begin their >>> > > > experiments is infinitely more interesting than any results to >>> which their >>> > > > experiments lead. >>> > > > -- Norbert Wiener >>> > > > >>> > > > https://www.cse.buffalo.edu/~knepley/ >>> > > > >>> > > > >>> > > >>> > >>> >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexey.V.Kozlov.2 at nd.edu Fri Sep 18 21:36:17 2020 From: Alexey.V.Kozlov.2 at nd.edu (Alexey Kozlov) Date: Fri, 18 Sep 2020 22:36:17 -0400 Subject: [petsc-users] Preconditioner for Helmholtz-like problem Message-ID: Dear all, I am solving a convected wave equation in a frequency domain. This equation is a 3D Helmholtz equation with added first-order derivatives and mixed derivatives, and with complex coefficients. The discretized PDE results in a sparse linear system (about 10^6 equations) which is solved in PETSc. I am having difficulty with the code convergence at high frequency, skewed grid, and high Mach number. I suspect it may be due to the preconditioner I use. I am currently using the ILU preconditioner with the number of fill levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art has evolved and there are better preconditioners for Helmholtz-like problems. Could you, please, advise me on a better preconditioner? Thanks, Alexey -- Alexey V. Kozlov Research Scientist Department of Aerospace and Mechanical Engineering University of Notre Dame 117 Hessert Center Notre Dame, IN 46556-5684 Phone: (574) 631-4335 Fax: (574) 631-8355 Email: akozlov at nd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Sep 18 22:52:03 2020 From: jed at jedbrown.org (Jed Brown) Date: Fri, 18 Sep 2020 21:52:03 -0600 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: Message-ID: <87o8m2tod8.fsf@jedbrown.org> Unfortunately, those are hard problems in which the "good" methods are technical and hard to make black-box. There are "sweeping" methods that solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, and fancy multigrid methods. Attempting to solve with STRUMPACK is probably the easiest thing to try (--download-strumpack). https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html Is the matrix complex symmetric? Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D problem like this if you have enough memory. I'm assuming the memory or time is unacceptable and you want an iterative method with much lower setup costs. Alexey Kozlov writes: > Dear all, > > I am solving a convected wave equation in a frequency domain. This equation > is a 3D Helmholtz equation with added first-order derivatives and mixed > derivatives, and with complex coefficients. The discretized PDE results in > a sparse linear system (about 10^6 equations) which is solved in PETSc. I > am having difficulty with the code convergence at high frequency, skewed > grid, and high Mach number. I suspect it may be due to the preconditioner I > use. I am currently using the ILU preconditioner with the number of fill > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art > has evolved and there are better preconditioners for Helmholtz-like > problems. Could you, please, advise me on a better preconditioner? > > Thanks, > Alexey > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu From Alexey.V.Kozlov.2 at nd.edu Fri Sep 18 23:28:10 2020 From: Alexey.V.Kozlov.2 at nd.edu (Alexey Kozlov) Date: Sat, 19 Sep 2020 00:28:10 -0400 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: <87o8m2tod8.fsf@jedbrown.org> References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: Thanks for the tips! My matrix is complex and unsymmetric. My typical test case has of the order of one million equations. I use a 2nd-order finite-difference scheme with 19-point stencil, so my typical test case uses several GB of RAM. On Fri, Sep 18, 2020 at 11:52 PM Jed Brown wrote: > Unfortunately, those are hard problems in which the "good" methods are > technical and hard to make black-box. There are "sweeping" methods that > solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, > and fancy multigrid methods. Attempting to solve with STRUMPACK is > probably the easiest thing to try (--download-strumpack). > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html > > Is the matrix complex symmetric? > > Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D > problem like this if you have enough memory. I'm assuming the memory or > time is unacceptable and you want an iterative method with much lower setup > costs. > > Alexey Kozlov writes: > > > Dear all, > > > > I am solving a convected wave equation in a frequency domain. This > equation > > is a 3D Helmholtz equation with added first-order derivatives and mixed > > derivatives, and with complex coefficients. The discretized PDE results > in > > a sparse linear system (about 10^6 equations) which is solved in PETSc. I > > am having difficulty with the code convergence at high frequency, skewed > > grid, and high Mach number. I suspect it may be due to the > preconditioner I > > use. I am currently using the ILU preconditioner with the number of fill > > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art > > has evolved and there are better preconditioners for Helmholtz-like > > problems. Could you, please, advise me on a better preconditioner? > > > > Thanks, > > Alexey > > > > -- > > Alexey V. Kozlov > > > > Research Scientist > > Department of Aerospace and Mechanical Engineering > > University of Notre Dame > > > > 117 Hessert Center > > Notre Dame, IN 46556-5684 > > Phone: (574) 631-4335 > > Fax: (574) 631-8355 > > Email: akozlov at nd.edu > -- Alexey V. Kozlov Research Scientist Department of Aerospace and Mechanical Engineering University of Notre Dame 117 Hessert Center Notre Dame, IN 46556-5684 Phone: (574) 631-4335 Fax: (574) 631-8355 Email: akozlov at nd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Sep 19 00:41:51 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 19 Sep 2020 00:41:51 -0500 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: These are small enough that likely sparse direct solvers are the best use of your time and for general efficiency. PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and Pastix. I recommend configuring PETSc for all three of them and then comparing them for problems of interest to you. --download-superlu_dist --download-mumps --download-pastix --download-scalapack (used by MUMPS) --download-metis --download-parmetis --download-ptscotch Barry > On Sep 18, 2020, at 11:28 PM, Alexey Kozlov wrote: > > Thanks for the tips! My matrix is complex and unsymmetric. My typical test case has of the order of one million equations. I use a 2nd-order finite-difference scheme with 19-point stencil, so my typical test case uses several GB of RAM. > > On Fri, Sep 18, 2020 at 11:52 PM Jed Brown > wrote: > Unfortunately, those are hard problems in which the "good" methods are technical and hard to make black-box. There are "sweeping" methods that solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, and fancy multigrid methods. Attempting to solve with STRUMPACK is probably the easiest thing to try (--download-strumpack). > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html > > Is the matrix complex symmetric? > > Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D problem like this if you have enough memory. I'm assuming the memory or time is unacceptable and you want an iterative method with much lower setup costs. > > Alexey Kozlov > writes: > > > Dear all, > > > > I am solving a convected wave equation in a frequency domain. This equation > > is a 3D Helmholtz equation with added first-order derivatives and mixed > > derivatives, and with complex coefficients. The discretized PDE results in > > a sparse linear system (about 10^6 equations) which is solved in PETSc. I > > am having difficulty with the code convergence at high frequency, skewed > > grid, and high Mach number. I suspect it may be due to the preconditioner I > > use. I am currently using the ILU preconditioner with the number of fill > > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art > > has evolved and there are better preconditioners for Helmholtz-like > > problems. Could you, please, advise me on a better preconditioner? > > > > Thanks, > > Alexey > > > > -- > > Alexey V. Kozlov > > > > Research Scientist > > Department of Aerospace and Mechanical Engineering > > University of Notre Dame > > > > 117 Hessert Center > > Notre Dame, IN 46556-5684 > > Phone: (574) 631-4335 > > Fax: (574) 631-8355 > > Email: akozlov at nd.edu > > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexey.V.Kozlov.2 at nd.edu Sat Sep 19 01:11:27 2020 From: Alexey.V.Kozlov.2 at nd.edu (Alexey Kozlov) Date: Sat, 19 Sep 2020 02:11:27 -0400 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: Thanks a lot! I'll check them out. On Sat, Sep 19, 2020 at 1:41 AM Barry Smith wrote: > > These are small enough that likely sparse direct solvers are the best > use of your time and for general efficiency. > > PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and > Pastix. I recommend configuring PETSc for all three of them and then > comparing them for problems of interest to you. > > --download-superlu_dist --download-mumps --download-pastix > --download-scalapack (used by MUMPS) --download-metis --download-parmetis > --download-ptscotch > > Barry > > > On Sep 18, 2020, at 11:28 PM, Alexey Kozlov > wrote: > > Thanks for the tips! My matrix is complex and unsymmetric. My typical test > case has of the order of one million equations. I use a 2nd-order > finite-difference scheme with 19-point stencil, so my typical test case > uses several GB of RAM. > > On Fri, Sep 18, 2020 at 11:52 PM Jed Brown wrote: > >> Unfortunately, those are hard problems in which the "good" methods are >> technical and hard to make black-box. There are "sweeping" methods that >> solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, >> and fancy multigrid methods. Attempting to solve with STRUMPACK is >> probably the easiest thing to try (--download-strumpack). >> >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html >> >> Is the matrix complex symmetric? >> >> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D >> problem like this if you have enough memory. I'm assuming the memory or >> time is unacceptable and you want an iterative method with much lower setup >> costs. >> >> Alexey Kozlov writes: >> >> > Dear all, >> > >> > I am solving a convected wave equation in a frequency domain. This >> equation >> > is a 3D Helmholtz equation with added first-order derivatives and mixed >> > derivatives, and with complex coefficients. The discretized PDE results >> in >> > a sparse linear system (about 10^6 equations) which is solved in PETSc. >> I >> > am having difficulty with the code convergence at high frequency, skewed >> > grid, and high Mach number. I suspect it may be due to the >> preconditioner I >> > use. I am currently using the ILU preconditioner with the number of fill >> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art >> > has evolved and there are better preconditioners for Helmholtz-like >> > problems. Could you, please, advise me on a better preconditioner? >> > >> > Thanks, >> > Alexey >> > >> > -- >> > Alexey V. Kozlov >> > >> > Research Scientist >> > Department of Aerospace and Mechanical Engineering >> > University of Notre Dame >> > >> > 117 Hessert Center >> > Notre Dame, IN 46556-5684 >> > Phone: (574) 631-4335 >> > Fax: (574) 631-8355 >> > Email: akozlov at nd.edu >> > > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu > > > -- Alexey V. Kozlov Research Scientist Department of Aerospace and Mechanical Engineering University of Notre Dame 117 Hessert Center Notre Dame, IN 46556-5684 Phone: (574) 631-4335 Fax: (574) 631-8355 Email: akozlov at nd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat Sep 19 06:40:11 2020 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 19 Sep 2020 07:40:11 -0400 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: As Jed said high frequency is hard. AMG, as-is, can be adapted ( https://link.springer.com/article/10.1007/s00466-006-0047-8) with parameters. AMG for convection: use richardson/sor and not chebyshev smoothers and in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0). Mark On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov wrote: > Thanks a lot! I'll check them out. > > On Sat, Sep 19, 2020 at 1:41 AM Barry Smith wrote: > >> >> These are small enough that likely sparse direct solvers are the best >> use of your time and for general efficiency. >> >> PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and >> Pastix. I recommend configuring PETSc for all three of them and then >> comparing them for problems of interest to you. >> >> --download-superlu_dist --download-mumps --download-pastix >> --download-scalapack (used by MUMPS) --download-metis --download-parmetis >> --download-ptscotch >> >> Barry >> >> >> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov >> wrote: >> >> Thanks for the tips! My matrix is complex and unsymmetric. My typical >> test case has of the order of one million equations. I use a 2nd-order >> finite-difference scheme with 19-point stencil, so my typical test case >> uses several GB of RAM. >> >> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown wrote: >> >>> Unfortunately, those are hard problems in which the "good" methods are >>> technical and hard to make black-box. There are "sweeping" methods that >>> solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, >>> and fancy multigrid methods. Attempting to solve with STRUMPACK is >>> probably the easiest thing to try (--download-strumpack). >>> >>> >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html >>> >>> Is the matrix complex symmetric? >>> >>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D >>> problem like this if you have enough memory. I'm assuming the memory or >>> time is unacceptable and you want an iterative method with much lower setup >>> costs. >>> >>> Alexey Kozlov writes: >>> >>> > Dear all, >>> > >>> > I am solving a convected wave equation in a frequency domain. This >>> equation >>> > is a 3D Helmholtz equation with added first-order derivatives and mixed >>> > derivatives, and with complex coefficients. The discretized PDE >>> results in >>> > a sparse linear system (about 10^6 equations) which is solved in >>> PETSc. I >>> > am having difficulty with the code convergence at high frequency, >>> skewed >>> > grid, and high Mach number. I suspect it may be due to the >>> preconditioner I >>> > use. I am currently using the ILU preconditioner with the number of >>> fill >>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the >>> art >>> > has evolved and there are better preconditioners for Helmholtz-like >>> > problems. Could you, please, advise me on a better preconditioner? >>> > >>> > Thanks, >>> > Alexey >>> > >>> > -- >>> > Alexey V. Kozlov >>> > >>> > Research Scientist >>> > Department of Aerospace and Mechanical Engineering >>> > University of Notre Dame >>> > >>> > 117 Hessert Center >>> > Notre Dame, IN 46556-5684 >>> > Phone: (574) 631-4335 >>> > Fax: (574) 631-8355 >>> > Email: akozlov at nd.edu >>> >> >> >> -- >> Alexey V. Kozlov >> >> Research Scientist >> Department of Aerospace and Mechanical Engineering >> University of Notre Dame >> >> 117 Hessert Center >> Notre Dame, IN 46556-5684 >> Phone: (574) 631-4335 >> Fax: (574) 631-8355 >> Email: akozlov at nd.edu >> >> >> > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luciano.siqueira at usp.br Mon Sep 21 07:51:20 2020 From: luciano.siqueira at usp.br (Luciano Siqueira) Date: Mon, 21 Sep 2020 09:51:20 -0300 Subject: [petsc-users] Motivation for default KSP solver and PC Message-ID: <62241390-9c3c-bdf3-f6e3-e176a1113c39@usp.br> Hi *, I'm experimenting with different combinations of KSP solvers and PCs and I don't know why GMRES/bjacobi are the default choices for CPU and GMRES/icc are the default choices for GPU. Does anyone know the reason for that? Thanks, Luciano. From knepley at gmail.com Mon Sep 21 08:03:18 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Sep 2020 09:03:18 -0400 Subject: [petsc-users] Motivation for default KSP solver and PC In-Reply-To: <62241390-9c3c-bdf3-f6e3-e176a1113c39@usp.br> References: <62241390-9c3c-bdf3-f6e3-e176a1113c39@usp.br> Message-ID: On Mon, Sep 21, 2020 at 8:51 AM Luciano Siqueira wrote: > Hi *, > > I'm experimenting with different combinations of KSP solvers and PCs and > I don't know why GMRES/bjacobi are the default choices for CPU and > GMRES is chosen because it is monotonic in the 2-norm, and also faster than any other Krylov method in terms of iterates if you do not restart. Block-Jacobi is trivially parallel. I don't think anyone loves ILU(0), but it is a black-box preconditioner without much overhead, so it is the default. > GMRES/icc are the default choices for GPU. Does anyone know the reason for > that? > ICC is only a default if you have a symmetric matrix. I guess you had one on the GPU. Thanks, Matt > Thanks, > Luciano. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Mon Sep 21 09:20:54 2020 From: wence at gmx.li (Lawrence Mitchell) Date: Mon, 21 Sep 2020 15:20:54 +0100 Subject: [petsc-users] Motivation for default KSP solver and PC In-Reply-To: References: <62241390-9c3c-bdf3-f6e3-e176a1113c39@usp.br> Message-ID: <081DADB8-1892-4EF6-A452-2B06DA9D10E9@gmx.li> > On 21 Sep 2020, at 14:03, Matthew Knepley wrote: > > On Mon, Sep 21, 2020 at 8:51 AM Luciano Siqueira wrote: > Hi *, > > I'm experimenting with different combinations of KSP solvers and PCs and > I don't know why GMRES/bjacobi are the default choices for CPU and > > GMRES is chosen because it is monotonic in the 2-norm, and also faster than any other > Krylov method in terms of iterates if you do not restart. Counterpoint: https://doi.org/10.1137/0613049 Lawrence From knepley at gmail.com Mon Sep 21 09:27:06 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Sep 2020 10:27:06 -0400 Subject: [petsc-users] Motivation for default KSP solver and PC In-Reply-To: <081DADB8-1892-4EF6-A452-2B06DA9D10E9@gmx.li> References: <62241390-9c3c-bdf3-f6e3-e176a1113c39@usp.br> <081DADB8-1892-4EF6-A452-2B06DA9D10E9@gmx.li> Message-ID: On Mon, Sep 21, 2020 at 10:20 AM Lawrence Mitchell wrote: > > On 21 Sep 2020, at 14:03, Matthew Knepley wrote: > > > > On Mon, Sep 21, 2020 at 8:51 AM Luciano Siqueira < > luciano.siqueira at usp.br> wrote: > > Hi *, > > > > I'm experimenting with different combinations of KSP solvers and PCs and > > I don't know why GMRES/bjacobi are the default choices for CPU and > > > > GMRES is chosen because it is monotonic in the 2-norm, and also faster > than any other > > Krylov method in terms of iterates if you do not restart. > > Counterpoint: https://doi.org/10.1137/0613049 I was trying to be careful with "Krylov". What I meant by that is a method that forms the solution from the Krylov space {r, Ar, A^2r, ..., A^kr} The other methods in the article use A^T as well. I don't think we can generically call them Lanczos, but maybe we need a word for methods going outside of the Krylov space above. Matt > > Lawrence > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Sep 21 10:27:20 2020 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 21 Sep 2020 11:27:20 -0400 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> Message-ID: What is the status of this issue? I am now seeing this on SUMMIT. UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Exception: Your hostname will not work with MPI, perhaps you have VPN running whose network settings may not play well with MPI or your network is misconfigured ******************************************************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 587148 bytes Desc: not available URL: From bsmith at petsc.dev Mon Sep 21 11:00:46 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Sep 2020 11:00:46 -0500 Subject: [petsc-users] Motivation for default KSP solver and PC In-Reply-To: <62241390-9c3c-bdf3-f6e3-e176a1113c39@usp.br> References: <62241390-9c3c-bdf3-f6e3-e176a1113c39@usp.br> Message-ID: <760282AE-498F-493C-9C5C-98294193A6D3@petsc.dev> > On Sep 21, 2020, at 7:51 AM, Luciano Siqueira wrote: > > Hi *, > > I'm experimenting with different combinations of KSP solvers and PCs and I don't know why GMRES/bjacobi are the default choices for CPU and GMRES/icc are the default choices for GPU. Does anyone know the reason for that? > The intention is that it is the same default, and shouldn't depend on CPU/GPU. It is GMRES + block Jacobi (with one block per MPI rank; if there is only 1 MPI rank the block Jacobi drops out) + incomplete Cholesky or LU depending on if the matrix is marked by the users as symmetric. Please let us know if we have any error. Barry > Thanks, > Luciano. From bsmith at petsc.dev Mon Sep 21 11:01:58 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Sep 2020 11:01:58 -0500 Subject: [petsc-users] osx error In-Reply-To: References: <53E404F6-3CA4-4C67-B968-EE79D3775929@petsc.dev> <20219104-0F78-4423-84F7-013F5522D608@petsc.dev> Message-ID: <05C85EBE-65BF-4F51-B04B-1834AB9ABBD5@petsc.dev> Mark, I will fix it today. Barry > On Sep 21, 2020, at 10:27 AM, Mark Adams wrote: > > What is the status of this issue? > I am now seeing this on SUMMIT. > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > Exception: Your hostname will not work with MPI, perhaps you have VPN running whose network settings may not play well with MPI or your network is misconfigured > ******************************************************************************* > From mfadams at lbl.gov Tue Sep 22 16:27:39 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 22 Sep 2020 17:27:39 -0400 Subject: [petsc-users] error configuring on SUMMIT Message-ID: I am getting this error on SUMMIT. Junchao said something about using too many threads. The compiler seems to run out of memory. I know about MAKE_NP for make all. Not sure what to do during configuration. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1792760 bytes Desc: not available URL: From junchao.zhang at gmail.com Tue Sep 22 16:50:30 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 22 Sep 2020 16:50:30 -0500 Subject: [petsc-users] error configuring on SUMMIT In-Reply-To: References: Message-ID: configure --with-make-np=8 --Junchao Zhang On Tue, Sep 22, 2020 at 4:27 PM Mark Adams wrote: > I am getting this error on SUMMIT. Junchao said something about using too > many threads. The compiler seems to run out of memory. > > I know about MAKE_NP for make all. Not sure what to do during > configuration. > > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Sep 22 18:09:16 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 22 Sep 2020 18:09:16 -0500 Subject: [petsc-users] error configuring on SUMMIT In-Reply-To: References: Message-ID: <026379CD-E160-437F-9667-395F68607B7A@petsc.dev> The kokkos kernels package file should somehow pass this value down to the kokkos kernels installation. Barry > On Sep 22, 2020, at 4:50 PM, Junchao Zhang wrote: > > configure --with-make-np=8 > > --Junchao Zhang > > > On Tue, Sep 22, 2020 at 4:27 PM Mark Adams > wrote: > I am getting this error on SUMMIT. Junchao said something about using too many threads. The compiler seems to run out of memory. > > I know about MAKE_NP for make all. Not sure what to do during configuration. > > Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mandhapati.raju at convergecfd.com Tue Sep 22 21:24:12 2020 From: mandhapati.raju at convergecfd.com (Raju Mandhapati) Date: Tue, 22 Sep 2020 21:24:12 -0500 Subject: [petsc-users] AMG for block matrices Message-ID: Hello, Does Petsc have an option to solve block matrices using AMG solver/preconditioner. My block matrix is coming from solving u,v,w,p (coupled Navier stokes) at each cell. thanks Raju. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Sep 22 22:23:15 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 22 Sep 2020 22:23:15 -0500 Subject: [petsc-users] AMG for block matrices In-Reply-To: References: Message-ID: Raju, If you call MatSetBlockSize(mat,4) then GAMG will keep the 4 variables together through all the levels. You might consider using the point block Jacobi smoother with it instead of the default Jacobi smoother -mg_levels_pc_type pbjacobi There is also PCPATCH which supports Vanka smoothing (though it is to well hidden) PCPatchSetConstructType(pc,PC_PATCH_VANKA) -pc_patch_construct_type vanka For coupled Navier stokes you might also consider using PCFIELDSPLIT and GAMG just for the pressure solve inside it. Barry > On Sep 22, 2020, at 9:24 PM, Raju Mandhapati via petsc-users wrote: > > Hello, > > Does Petsc have an option to solve block matrices using AMG solver/preconditioner. My block matrix is coming from solving u,v,w,p (coupled Navier stokes) at each cell. > > thanks > Raju. From mfadams at lbl.gov Wed Sep 23 08:26:40 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 23 Sep 2020 09:26:40 -0400 Subject: [petsc-users] AMG for block matrices In-Reply-To: References: Message-ID: Generally you do want to use FieldSplit but Vanka might work. I'm not sure what to use as the smoother KSP. -mg_levels_ksp_type [gmres | richardson]. If you use richardson you will probably want to fiddle with the damping. On Tue, Sep 22, 2020 at 11:23 PM Barry Smith wrote: > > Raju, > > If you call MatSetBlockSize(mat,4) then GAMG will keep the 4 variables > together through all the levels. > > You might consider using the point block Jacobi smoother with it > instead of the default Jacobi smoother -mg_levels_pc_type pbjacobi > > There is also PCPATCH which supports Vanka smoothing (though it is to > well hidden) PCPatchSetConstructType(pc,PC_PATCH_VANKA) > -pc_patch_construct_type vanka > > > For coupled Navier stokes you might also consider using PCFIELDSPLIT > and GAMG just for the pressure solve inside it. > > Barry > > > On Sep 22, 2020, at 9:24 PM, Raju Mandhapati via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hello, > > > > Does Petsc have an option to solve block matrices using AMG > solver/preconditioner. My block matrix is coming from solving u,v,w,p > (coupled Navier stokes) at each cell. > > > > thanks > > Raju. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 23 08:49:14 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 23 Sep 2020 09:49:14 -0400 Subject: [petsc-users] AMG for block matrices In-Reply-To: References: Message-ID: On Wed, Sep 23, 2020 at 9:26 AM Mark Adams wrote: > Generally you do want to use FieldSplit but Vanka might work. > Just to be clear, these suggestions apply to _incompressible_ Navier-Stokes. Thanks, Matt > I'm not sure what to use as the smoother KSP. -mg_levels_ksp_type [gmres | > richardson]. If you use richardson you will probably want to fiddle with > the damping. > > > > On Tue, Sep 22, 2020 at 11:23 PM Barry Smith wrote: > >> >> Raju, >> >> If you call MatSetBlockSize(mat,4) then GAMG will keep the 4 >> variables together through all the levels. >> >> You might consider using the point block Jacobi smoother with it >> instead of the default Jacobi smoother -mg_levels_pc_type pbjacobi >> >> There is also PCPATCH which supports Vanka smoothing (though it is to >> well hidden) PCPatchSetConstructType(pc,PC_PATCH_VANKA) >> -pc_patch_construct_type vanka >> >> >> For coupled Navier stokes you might also consider using PCFIELDSPLIT >> and GAMG just for the pressure solve inside it. >> >> Barry >> >> > On Sep 22, 2020, at 9:24 PM, Raju Mandhapati via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > >> > Hello, >> > >> > Does Petsc have an option to solve block matrices using AMG >> solver/preconditioner. My block matrix is coming from solving u,v,w,p >> (coupled Navier stokes) at each cell. >> > >> > thanks >> > Raju. >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.jamond at cea.fr Wed Sep 23 11:56:27 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Wed, 23 Sep 2020 18:56:27 +0200 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct Message-ID: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> Hi all, I have a rectangle MPIAIJ matrix C. I would like to compute C*Ct explicitly to solve some ksp(C*Ct) problems using direct solvers or with petsc preconditioners. If I just create a 'shell' Ct: Mat Ct; MatCreateTranspose(C, &Ct); MatMatMult(C, Ct, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &CCt); Then it does not work. I get: [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: MatMatTransposeMult not supported for A of type mpiaij But if instead of C it was Ct that I have explicitly, then it would work: Mat Ct, CC; MatTranspose(C, MAT_INITIAL_MATRIX, &Ct); MatCreateTranspose(Ct, &CC); // create a 'shell' C from explicit Ct to test MatMatMult(CC, Ct, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &CCt);// this is OK Is there a way to avoid the explicit transposition of the matrix? Many thanks, Olivier Jamond -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Sep 23 13:04:23 2020 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 23 Sep 2020 14:04:23 -0400 Subject: [petsc-users] error vector for TSARKIMEX Message-ID: Is there a mechanism to get the error estimate with a fully implicit IMEX method without using adaptive time stepping? TSGetTimeError always returns 0, but I guess that's only implemented for GLEE methods. I'm trying to assess time accuracy for constant steps but varying the SDIRK type. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Sep 23 13:23:51 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Wed, 23 Sep 2020 18:23:51 +0000 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct In-Reply-To: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> References: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> Message-ID: You can use MatMatTransposeMult(). Hong ________________________________ From: petsc-users on behalf of Olivier Jamond Sent: Wednesday, September 23, 2020 11:56 AM To: PETSc Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct Hi all, I have a rectangle MPIAIJ matrix C. I would like to compute C*Ct explicitly to solve some ksp(C*Ct) problems using direct solvers or with petsc preconditioners. If I just create a 'shell' Ct: Mat Ct; MatCreateTranspose(C, &Ct); MatMatMult(C, Ct, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &CCt); Then it does not work. I get: [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: MatMatTransposeMult not supported for A of type mpiaij But if instead of C it was Ct that I have explicitly, then it would work: Mat Ct, CC; MatTranspose(C, MAT_INITIAL_MATRIX, &Ct); MatCreateTranspose(Ct, &CC); // create a 'shell' C from explicit Ct to test MatMatMult(CC, Ct, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &CCt); // this is OK Is there a way to avoid the explicit transposition of the matrix? Many thanks, Olivier Jamond -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Sep 23 14:17:04 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 23 Sep 2020 13:17:04 -0600 Subject: [petsc-users] error vector for TSARKIMEX In-Reply-To: References: Message-ID: <87eemsz4jz.fsf@jedbrown.org> It is intended that you can implement TSAdapt to change scheme within a family. It'd be typical to use TSEvaluateWLTE or TSEvaluateStep/TSErrorWeightedNorm. Would those do what you want? Mark Lohry writes: > Is there a mechanism to get the error estimate with a fully implicit IMEX > method without using adaptive time stepping? TSGetTimeError always returns > 0, but I guess that's only implemented for GLEE methods. > > I'm trying to assess time accuracy for constant steps but varying the SDIRK > type. From gautham3 at illinois.edu Wed Sep 23 15:11:49 2020 From: gautham3 at illinois.edu (Krishnan, Gautham) Date: Wed, 23 Sep 2020 20:11:49 +0000 Subject: [petsc-users] Regarding help with MatGetSubMatrix parallel use Message-ID: Hello, For a CFD code being developed with FORTRAN and MPI, I am using PETSC matrices and for a particular operation, I require to extract a submatrix(n-1 x n-1) of a matrix created (n x n). However using the petsc MatGetSubMatrix works for serial runs but fails when the domain is split up over PEs- I suspect the indexing changed for parallel runs and hence the global indexing that worked for serial case just shuffles around matrix entries in parallel undesirably. I would like to ask whether anybody could offer some guidance regarding this. I would like to note that the 2D domain is split along both x and y axes for parallel runs on multiple PEs. Regards, Gautham Krishnan, -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 23 15:55:47 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 23 Sep 2020 16:55:47 -0400 Subject: [petsc-users] Regarding help with MatGetSubMatrix parallel use In-Reply-To: References: Message-ID: On Wed, Sep 23, 2020 at 4:12 PM Krishnan, Gautham wrote: > Hello, > > For a CFD code being developed with FORTRAN and MPI, I am using PETSC > matrices and for a particular operation, I require to extract a > submatrix(n-1 x n-1) of a matrix created (n x n). However using the petsc > MatGetSubMatrix works for serial runs but fails when the domain is split up > over PEs- I suspect the indexing changed for parallel runs and hence the > global indexing that worked for serial case just shuffles around matrix > entries in parallel undesirably. I would like to ask whether anybody could > offer some guidance regarding this. I would like to note that the 2D > domain is split along both x and y axes for parallel runs on multiple PEs. > In parallel, you pass MatGetSubmatrix() the global indices that you want on your process. Thanks, Matt > Regards, > Gautham Krishnan, > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mandhapati.raju at convergecfd.com Wed Sep 23 18:18:41 2020 From: mandhapati.raju at convergecfd.com (Raju Mandhapati) Date: Wed, 23 Sep 2020 18:18:41 -0500 Subject: [petsc-users] AMG for block matrices In-Reply-To: References: Message-ID: Hello, Before jumping into block AMG, I started testing AMG on the pressure equation. I find that for simple systems, both serial and parallel version of AMG worked well. For a difficult system (variable coefficient laplacian operator), serial worked but AMG struggled when using 4 cores. I was using AMG as preconditioner to BiCGSTAB. Regular ILU0 preconditioner worked well both in serial and parallel. I am not sure why AMG struggled. Are there any tuning parameters for AMG? Does Petsc have SOR/Gauss Seidel/ILU0 smoother options? Or can I provide my own user smoother? Are there options to choose between unsmoothed aggregation vs smoothed aggregation? thanks, Raju. On Tue, Sep 22, 2020 at 10:23 PM Barry Smith wrote: > > Raju, > > If you call MatSetBlockSize(mat,4) then GAMG will keep the 4 variables > together through all the levels. > > You might consider using the point block Jacobi smoother with it > instead of the default Jacobi smoother -mg_levels_pc_type pbjacobi > > There is also PCPATCH which supports Vanka smoothing (though it is to > well hidden) PCPatchSetConstructType(pc,PC_PATCH_VANKA) > -pc_patch_construct_type vanka > > > For coupled Navier stokes you might also consider using PCFIELDSPLIT > and GAMG just for the pressure solve inside it. > > Barry > > > On Sep 22, 2020, at 9:24 PM, Raju Mandhapati via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hello, > > > > Does Petsc have an option to solve block matrices using AMG > solver/preconditioner. My block matrix is coming from solving u,v,w,p > (coupled Navier stokes) at each cell. > > > > thanks > > Raju. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 23 18:32:41 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 23 Sep 2020 18:32:41 -0500 Subject: [petsc-users] AMG for block matrices In-Reply-To: References: Message-ID: <37FEC5AF-B11A-42FA-A033-F450AB988BC5@petsc.dev> > On Sep 23, 2020, at 6:18 PM, Raju Mandhapati via petsc-users wrote: > > Hello, > > Before jumping into block AMG, I started testing AMG on the pressure equation. > > I find that for simple systems, both serial and parallel version of AMG worked well. For a difficult system (variable coefficient laplacian operator), serial worked but AMG struggled when using 4 cores. I was using AMG as preconditioner to BiCGSTAB. Regular ILU0 preconditioner worked well both in serial and parallel. I am not sure why AMG struggled. > > Are there any tuning parameters for AMG? > > Does Petsc have SOR/Gauss Seidel/ILU0 smoother options? Or can I provide my own user smoother? Run with -ksp_view to see what it is using. Run with -help | grep mg_ and | grep gamg to see the options, there are too many to list here. You can also look at the manual page for PCMG and PCGAMG You can use any preconditioner/Krylov method combination as a smoother including your own (PCCreateShell()). By default the smoother is Chebyshev Jacobi (where it runs CG or GMRES initially to get eigenvalue estimates). We generally use GMRES with GAMG because when it works well it requires only a handful of Krylov iterations and when a handful is used GMRES is probably more efficient than BiCGSTAB. > > Are there options to choose between unsmoothed aggregation vs smoothed aggregation? I think so, check with -help. > > thanks, > Raju. > > > > > > On Tue, Sep 22, 2020 at 10:23 PM Barry Smith > wrote: > > Raju, > > If you call MatSetBlockSize(mat,4) then GAMG will keep the 4 variables together through all the levels. > > You might consider using the point block Jacobi smoother with it instead of the default Jacobi smoother -mg_levels_pc_type pbjacobi > > There is also PCPATCH which supports Vanka smoothing (though it is to well hidden) PCPatchSetConstructType(pc,PC_PATCH_VANKA) -pc_patch_construct_type vanka > > > For coupled Navier stokes you might also consider using PCFIELDSPLIT and GAMG just for the pressure solve inside it. > > Barry > > > On Sep 22, 2020, at 9:24 PM, Raju Mandhapati via petsc-users > wrote: > > > > Hello, > > > > Does Petsc have an option to solve block matrices using AMG solver/preconditioner. My block matrix is coming from solving u,v,w,p (coupled Navier stokes) at each cell. > > > > thanks > > Raju. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Sep 23 19:38:17 2020 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 23 Sep 2020 20:38:17 -0400 Subject: [petsc-users] error vector for TSARKIMEX In-Reply-To: <87eemsz4jz.fsf@jedbrown.org> References: <87eemsz4jz.fsf@jedbrown.org> Message-ID: perfect, thanks! On Wed, Sep 23, 2020 at 3:17 PM Jed Brown wrote: > It is intended that you can implement TSAdapt to change scheme within a > family. > > It'd be typical to use TSEvaluateWLTE or > TSEvaluateStep/TSErrorWeightedNorm. Would those do what you want? > > Mark Lohry writes: > > > Is there a mechanism to get the error estimate with a fully implicit IMEX > > method without using adaptive time stepping? TSGetTimeError always > returns > > 0, but I guess that's only implemented for GLEE methods. > > > > I'm trying to assess time accuracy for constant steps but varying the > SDIRK > > type. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.jamond at cea.fr Thu Sep 24 03:34:04 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Thu, 24 Sep 2020 10:34:04 +0200 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct In-Reply-To: References: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> Message-ID: <7f54c857-b198-5798-4dde-7d6c9876b4bd@cea.fr> But it seems that MatMatTransposeMult does not work for MPIAIJ matrices: [0]PETSC ERROR: MatMatTransposeMult not supported for A of type mpiaij On 23/09/2020 20:23, Zhang, Hong wrote: > You can use?MatMatTransposeMult(). > Hong > ------------------------------------------------------------------------ > *From:* petsc-users on behalf of > Olivier Jamond > *Sent:* Wednesday, September 23, 2020 11:56 AM > *To:* PETSc > *Subject:* [petsc-users] Compute C*Ct using MatCreateTranspose for Ct > > Hi all, > > I have a rectangle MPIAIJ matrix C. I would like to compute C*Ct > explicitly to solve some ksp(C*Ct) problems using direct solvers or > with petsc preconditioners. > > If I just create a 'shell' Ct: > > Mat Ct; > MatCreateTranspose(C, &Ct); > MatMatMult(C, Ct, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &CCt); > > Then it does not work. I get: > > [0]PETSC ERROR: No support for this operation for this object type > > [0]PETSC ERROR: MatMatTransposeMult not supported for A of type mpiaij > > But if instead of C it was Ct that I have explicitly, then it would work: > > Mat Ct, CC; > MatTranspose(C, MAT_INITIAL_MATRIX, &Ct); > MatCreateTranspose(Ct, &CC); // create a 'shell' C from explicit Ct to > test > MatMatMult(CC, Ct, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &CCt);// this is OK > > Is there a way to avoid the explicit transposition of the matrix? > > Many thanks, > Olivier Jamond > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 24 05:47:55 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 24 Sep 2020 06:47:55 -0400 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct In-Reply-To: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> References: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> Message-ID: > > > Is there a way to avoid the explicit transposition of the matrix? > It does not look like we have A*B^T for mpiaij as the error message says. I am not finding it in the code. Note, MatMatMult with a transpose shell matrix, I suspect that it does an explicit transpose internally, or it could notice that you have C^T*C and we might have that implemented in-place (I doubt it, but it would be legal and fine to do). > Many thanks, > Olivier Jamond > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.jamond at cea.fr Thu Sep 24 05:53:59 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Thu, 24 Sep 2020 12:53:59 +0200 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct In-Reply-To: References: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> Message-ID: <205c177a-0acb-7b41-4cec-5996495d1eab@cea.fr> Ahah, maybe I am using a too old version. I use the 3.12.3. I will try with a more recent one. On 24/09/2020 12:47, Mark Adams wrote: > > > Is there a way to avoid the explicit transposition of the matrix? > > > It does not look like we have A*B^T for mpiaij as the?error message > says. I am not finding it in the code. > > Note, MatMatMult with a transpose shell matrix, I suspect that it does > an explicit transpose internally, or it could?notice that you have > C^T*C and we might have that implemented in-place (I doubt it, but it > would be legal and fine to do). > > Many thanks, > Olivier Jamond > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 24 06:12:06 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 07:12:06 -0400 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct In-Reply-To: References: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> Message-ID: On Thu, Sep 24, 2020 at 6:48 AM Mark Adams wrote: > >> Is there a way to avoid the explicit transposition of the matrix? >> > > It does not look like we have A*B^T for mpiaij as the error message says. > I am not finding it in the code. > > Note, MatMatMult with a transpose shell matrix, I suspect that it does an > explicit transpose internally, or it could notice that you have C^T*C and > we might have that implemented in-place (I doubt it, but it would be legal > and fine to do). > We definitely have https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatPtAP.html For now, you can put the identity in for A. It would be nice it we assumed that when A = NULL. Patrick, the implementation strategy is broken for the MatProduct mechanism that was just introduced, so we cannot see which things are implemented in the documentation. How would I go about fixing it? Thanks, Matt > Many thanks, >> Olivier Jamond >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at resfrac.com Thu Sep 24 08:46:43 2020 From: chris at resfrac.com (Chris Hewson) Date: Thu, 24 Sep 2020 07:46:43 -0600 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> Message-ID: After about a month of not having this issue pop up, it has come up again We have been struggling with a similar PETSc Error for awhile now, the error message is as follows: [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than largest key allowed 23988 It is a particularly nasty bug as it doesn't reproduce itself when debugging and doesn't happen all the time with the same inputs either. The problem occurs after a long runtime of the code (12-40 hours) and we are using a ksp solve with KSPBCGS. The PETSc compilation options that are used are: --download-metis --download-mpich --download-mumps --download-parmetis --download-ptscotch --download-scalapack --download-suitesparse --prefix=/opt/anl/petsc-3.13.3 --with-debugging=0 --with-mpi=1 COPTFLAGS=-O3 -march=haswell -mtune=haswell CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell FOPTFLAGS=-O3 -march=haswell -mtune=haswell This is being run across 8 processes and is failing consistently on the rank 7 process. We also use openmp outside of PETSC and the linear solve portion of the code. The rank 0 process is always using compute, during this the slave processes use an MPI_Wait call to wait for the collective parts of the code to be called again. All PETSC calls are done across all of the processes. We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 package. At every PETSC call we are checking the error return from the function collectively to ensure that no errors have been returned from PETSC. Some possible causes that I can think of are as follows: 1. Memory leak causing a corruption either in our program or in petsc or with one of the petsc objects. This seems unlikely as we have checked runs with the option -malloc_dump for PETSc and using valgrind. 2. Optimization flags set for petsc compilation are causing variables that go out of scope to be optimized out. 3. We are giving the wrong number of elements for a process or the value is changing during the simulation. This seems unlikely as there is nothing overly unique about these simulations and it's not reproducing itself. 4. An MPI channel or socket error causing an error in the collective values for PETSc. Any input on this issue would be greatly appreciated. *Chris Hewson* Senior Reservoir Simulation Engineer ResFrac +1.587.575.9792 On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang wrote: > That is a great idea. I'll figure it out. > --Junchao Zhang > > > On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: > >> >> Junchao, >> >> Any way in the PETSc configure to warn that MPICH version is "bad" or >> "untrustworthy" or even the vague "we have suspicians about this version >> and recommend avoiding it"? A lot of time could be saved if others don't >> deal with the same problem. >> >> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and >> always check against that list and print a boxed warning at configure time? >> Better you could somehow generalize it and put it in package.py for use by >> all packages, then any package can included lists of "suspect" versions. >> (There are definitely HDF5 versions that should be avoided :-)). >> >> Barry >> >> >> On Aug 13, 2020, at 12:14 PM, Junchao Zhang >> wrote: >> >> Thanks for the update. Let's assume it is a bug in MPI :) >> --Junchao Zhang >> >> >> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson wrote: >> >>> Just as an update to this, I can confirm that using the mpich version >>> (3.3.2) downloaded with the petsc download solved this issue on my end. >>> >>> *Chris Hewson* >>> Senior Reservoir Simulation Engineer >>> ResFrac >>> +1.587.575.9792 >>> >>> >>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang >>> wrote: >>> >>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith wrote: >>>> >>>>> >>>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is >>>>> there any way to run this full test suite under the problematic MPI and see >>>>> if it detects any problems? >>>>> >>>>> Is so, could someone add it to the FAQ in the debugging section? >>>>> >>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>> downloaded mpich >>>> . It >>>> annoyed me since it is not user-friendly. It might be helpful in catching >>>> bugs at very small scale. But say if I want to test allreduce on 1024 ranks >>>> on 100 doubles, I have to hack the test suite. >>>> Anyway, the instructions are here. >>>> >>>> For the purpose of petsc, under test/mpi one can configure it with >>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>> cxx but I had to set CXX! >>>> $make -k -j8 // -k is to keep going and ignore compilation errors, >>>> e.g., when building tests for MPICH extensions not in MPI standard, but >>>> your MPI is OpenMPI. >>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>>> $ make testings or directly './runtests -tests=testlist' >>>> >>>> On a batch system, >>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say >>>> btest, >>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use >>>> 1024 ranks if a test does no specify the number of processes. >>>> $ // It copies test binaries to the batch dir and generates a >>>> script runtests.batch there. Edit the script to fit your batch system and >>>> then submit a job and wait for its finish. >>>> $ cd btest && ../checktests --ignorebogus >>>> >>>> >>>> PS: Fande, changing an MPI fixed your problem does not necessarily mean >>>> the old MPI has bugs. It is complicated. It could be a petsc bug. You need >>>> to provide us a code to reproduce your error. It does not matter if the >>>> code is big. >>>> >>>> >>>>> Thanks >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong wrote: >>>>> >>>>> Trace could look like this: >>>>> >>>>> [640]PETSC ERROR: --------------------- Error Message >>>>> -------------------------------------------------------------- >>>>> [640]PETSC ERROR: Argument out of range >>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>>> 740521 >>>>> [640]PETSC ERROR: See >>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>> shooting. >>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >>>>> wangy2 Sun Jul 19 17:14:28 2020 >>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>> --download-mumps=0 >>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line >>>>> 901 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>>> 3180 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>> >>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>> wrote: >>>>> >>>>>> I am not entirely sure what is happening, but we encountered similar >>>>>> issues recently. It was not reproducible. It might occur at different >>>>>> stages, and errors could be weird other than "ctable stuff." Our code was >>>>>> Valgrind clean since every PR in moose needs to go through rigorous >>>>>> Valgrind checks before it reaches the devel branch. The errors happened >>>>>> when we used mvapich. >>>>>> >>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was >>>>>> smooth. May you try a different MPI? It is better to try a system carried >>>>>> one. >>>>>> >>>>>> We did not get the bottom of this problem yet, but we at least know >>>>>> this is kind of MPI-related. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Fande, >>>>>> >>>>>> >>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am having a bug that is occurring in PETSC with the return string: >>>>>>> >>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>> largest key allowed 5693 >>>>>>> >>>>>>> This is using petsc-3.13.2, compiled and running using mpich with >>>>>>> -O3 and debugging turned off tuned to the haswell architecture and >>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>> same set of instructions etc.). >>>>>>> >>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>> time and is happening somewhat rarely. >>>>>>> >>>>>>> More than likely I am using a static variable (code is written in >>>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>> >>>>>>> *Chris Hewson* >>>>>>> Senior Reservoir Simulation Engineer >>>>>>> ResFrac >>>>>>> +1.587.575.9792 >>>>>>> >>>>>> >>>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Sep 24 08:56:07 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Thu, 24 Sep 2020 13:56:07 +0000 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct In-Reply-To: References: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> , Message-ID: Indeed, we do not have MatCreateTranspose for mpaij matrix. I can adding such support. How soon do you need it? Hong ________________________________ From: petsc-users on behalf of Matthew Knepley Sent: Thursday, September 24, 2020 6:12 AM To: Mark Adams Cc: PETSc Subject: Re: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct On Thu, Sep 24, 2020 at 6:48 AM Mark Adams > wrote: Is there a way to avoid the explicit transposition of the matrix? It does not look like we have A*B^T for mpiaij as the error message says. I am not finding it in the code. Note, MatMatMult with a transpose shell matrix, I suspect that it does an explicit transpose internally, or it could notice that you have C^T*C and we might have that implemented in-place (I doubt it, but it would be legal and fine to do). We definitely have https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatPtAP.html For now, you can put the identity in for A. It would be nice it we assumed that when A = NULL. Patrick, the implementation strategy is broken for the MatProduct mechanism that was just introduced, so we cannot see which things are implemented in the documentation. How would I go about fixing it? Thanks, Matt Many thanks, Olivier Jamond -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Sep 24 09:16:46 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Thu, 24 Sep 2020 14:16:46 +0000 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct In-Reply-To: References: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> , , Message-ID: Olivier and Matt, MatPtAP with A=I gives Pt*P, not P*Pt. We have sequential MatRARt and MatMatTransposeMult(), but no support for mpiaij matrices. The problem is that we do not have a way to implement C*Ct without explicitly transpose C in parallel. We support MatTransposeMatMult (A*Bt) for mpiaij. Can you use this instead? Hong ________________________________ From: petsc-users on behalf of Zhang, Hong via petsc-users Sent: Thursday, September 24, 2020 8:56 AM To: Matthew Knepley ; Mark Adams Cc: PETSc Subject: Re: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct Indeed, we do not have MatCreateTranspose for mpaij matrix. I can adding such support. How soon do you need it? Hong ________________________________ From: petsc-users on behalf of Matthew Knepley Sent: Thursday, September 24, 2020 6:12 AM To: Mark Adams Cc: PETSc Subject: Re: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct On Thu, Sep 24, 2020 at 6:48 AM Mark Adams > wrote: Is there a way to avoid the explicit transposition of the matrix? It does not look like we have A*B^T for mpiaij as the error message says. I am not finding it in the code. Note, MatMatMult with a transpose shell matrix, I suspect that it does an explicit transpose internally, or it could notice that you have C^T*C and we might have that implemented in-place (I doubt it, but it would be legal and fine to do). We definitely have https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatPtAP.html For now, you can put the identity in for A. It would be nice it we assumed that when A = NULL. Patrick, the implementation strategy is broken for the MatProduct mechanism that was just introduced, so we cannot see which things are implemented in the documentation. How would I go about fixing it? Thanks, Matt Many thanks, Olivier Jamond -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangc20 at rpi.edu Thu Sep 24 10:08:30 2020 From: zhangc20 at rpi.edu (Zhang, Chonglin) Date: Thu, 24 Sep 2020 15:08:30 +0000 Subject: [petsc-users] Proper GPU usage in PETSc Message-ID: Dear PETSc Users, I have some questions regarding the proper GPU usage. I would like to know the proper way to: (1) solve linear equation in SNES, using GPU in PETSc; what syntax/arguments should I be using; (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data transfer showed in PETSc log file, when using CUDA aware MPI. Details of what I am doing now and my observations are below: System and compilers used: (1) RPI?s AiMOS computer (node wise, it is the same as Summit); (2) using GCC 7.4.0 and Spectrum-MPI 10.3. I am doing the followings to solve the linear Poisson equation with SNES interface, under DMPlex: (1) using DMPlex to set up the unstructured mesh; (2) using DM to create vector and matrix; (3) using SNES interface to solve the linear Poisson equation, with ?-snes_type ksponly?; (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector and matrix, as suggested in this webpage: https://www.mcs.anl.gov/petsc/features/gpus.html (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf (6) using ?-options_left? to check and make sure all the arguments are accepted and used by PETSc. (7) After problem setup, I am running the ?SNESSolve()? multiple times to solve the linear problem and observe the log file with ?-log_view" I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log corresponding to running SNESSolve() 500 times: Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 1.44e+03 82 SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 2.48e+01 0 SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 1.24e+01 0 DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 1.24e+01 0 DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 0.00e+00 0 MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 1.41e+03 0 MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 0.00e+00 100 VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 0.00e+00 100 VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 0.00e+00 100 VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 0.00e+00 0 VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 1.44e+03 0 KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 1.41e+03 89 KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 0.00e+00 100 PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 1.41e+03 0 Thanks! Chonglin -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Sep 24 11:03:28 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 24 Sep 2020 11:03:28 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> Message-ID: On Thu, Sep 24, 2020 at 8:47 AM Chris Hewson wrote: > After about a month of not having this issue pop up, it has come up again > > We have been struggling with a similar PETSc Error for awhile now, the > error message is as follows: > > [7]PETSC ERROR: PetscTableFind() line 132 in > /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than > largest key allowed 23988 > > It is a particularly nasty bug as it doesn't reproduce itself when > debugging and doesn't happen all the time with the same inputs either. The > problem occurs after a long runtime of the code (12-40 hours) and we are > using a ksp solve with KSPBCGS. > By "when debugging", did you mean when configure petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a debugger? > > The PETSc compilation options that are used are: > > --download-metis > --download-mpich > --download-mumps > --download-parmetis > --download-ptscotch > --download-scalapack > --download-suitesparse > --prefix=/opt/anl/petsc-3.13.3 > --with-debugging=0 > --with-mpi=1 > COPTFLAGS=-O3 -march=haswell -mtune=haswell > CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell > FOPTFLAGS=-O3 -march=haswell -mtune=haswell > > This is being run across 8 processes and is failing consistently on the > rank 7 process. We also use openmp outside of PETSC and the linear solve > portion of the code. The rank 0 process is always using compute, during > this the slave processes use an MPI_Wait call to wait for the collective > parts of the code to be called again. All PETSC calls are done across all > of the processes. > > We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 package. > > At every PETSC call we are checking the error return from the function > collectively to ensure that no errors have been returned from PETSC. > > Some possible causes that I can think of are as follows: > 1. Memory leak causing a corruption either in our program or in petsc or > with one of the petsc objects. This seems unlikely as we have checked runs > with the option -malloc_dump for PETSc and using valgrind. > > 2. Optimization flags set for petsc compilation are causing variables that > go out of scope to be optimized out. > > 3. We are giving the wrong number of elements for a process or the value > is changing during the simulation. This seems unlikely as there is nothing > overly unique about these simulations and it's not reproducing itself. > > 4. An MPI channel or socket error causing an error in the collective > values for PETSc. > > Any input on this issue would be greatly appreciated. > 1) Try OpenMPI (probably won't help, but worth trying) 2) Find which part of the simulation makes it non-deterministic. Is it the mesh partitioning (parmetis)? Then try to make it deterministic. 3) Dump matrices, vectors, etc and see when it fails, you can quickly reproduce the error by reading in the intermediate data. > > *Chris Hewson* > Senior Reservoir Simulation Engineer > ResFrac > +1.587.575.9792 > > > On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang > wrote: > >> That is a great idea. I'll figure it out. >> --Junchao Zhang >> >> >> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: >> >>> >>> Junchao, >>> >>> Any way in the PETSc configure to warn that MPICH version is "bad" >>> or "untrustworthy" or even the vague "we have suspicians about this version >>> and recommend avoiding it"? A lot of time could be saved if others don't >>> deal with the same problem. >>> >>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and >>> always check against that list and print a boxed warning at configure time? >>> Better you could somehow generalize it and put it in package.py for use by >>> all packages, then any package can included lists of "suspect" versions. >>> (There are definitely HDF5 versions that should be avoided :-)). >>> >>> Barry >>> >>> >>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang >>> wrote: >>> >>> Thanks for the update. Let's assume it is a bug in MPI :) >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson wrote: >>> >>>> Just as an update to this, I can confirm that using the mpich version >>>> (3.3.2) downloaded with the petsc download solved this issue on my end. >>>> >>>> *Chris Hewson* >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>>> >>>> >>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang >>>> wrote: >>>> >>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith wrote: >>>>> >>>>>> >>>>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is >>>>>> there any way to run this full test suite under the problematic MPI and see >>>>>> if it detects any problems? >>>>>> >>>>>> Is so, could someone add it to the FAQ in the debugging section? >>>>>> >>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>> downloaded mpich >>>>> . It >>>>> annoyed me since it is not user-friendly. It might be helpful in catching >>>>> bugs at very small scale. But say if I want to test allreduce on 1024 ranks >>>>> on 100 doubles, I have to hack the test suite. >>>>> Anyway, the instructions are here. >>>>> >>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>> cxx but I had to set CXX! >>>>> $make -k -j8 // -k is to keep going and ignore compilation errors, >>>>> e.g., when building tests for MPICH extensions not in MPI standard, but >>>>> your MPI is OpenMPI. >>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>>>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>> $ make testings or directly './runtests -tests=testlist' >>>>> >>>>> On a batch system, >>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say >>>>> btest, >>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use >>>>> 1024 ranks if a test does no specify the number of processes. >>>>> $ // It copies test binaries to the batch dir and generates a >>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>> then submit a job and wait for its finish. >>>>> $ cd btest && ../checktests --ignorebogus >>>>> >>>>> >>>>> PS: Fande, changing an MPI fixed your problem does not >>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>> not matter if the code is big. >>>>> >>>>> >>>>>> Thanks >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong wrote: >>>>>> >>>>>> Trace could look like this: >>>>>> >>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>> -------------------------------------------------------------- >>>>>> [640]PETSC ERROR: Argument out of range >>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>>>> 740521 >>>>>> [640]PETSC ERROR: See >>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>> shooting. >>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >>>>>> wangy2 Sun Jul 19 17:14:28 2020 >>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>> --download-mumps=0 >>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line >>>>>> 901 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>>>> 3180 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>> >>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>> wrote: >>>>>> >>>>>>> I am not entirely sure what is happening, but we encountered similar >>>>>>> issues recently. It was not reproducible. It might occur at different >>>>>>> stages, and errors could be weird other than "ctable stuff." Our code was >>>>>>> Valgrind clean since every PR in moose needs to go through rigorous >>>>>>> Valgrind checks before it reaches the devel branch. The errors happened >>>>>>> when we used mvapich. >>>>>>> >>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything >>>>>>> was smooth. May you try a different MPI? It is better to try a system >>>>>>> carried one. >>>>>>> >>>>>>> We did not get the bottom of this problem yet, but we at least know >>>>>>> this is kind of MPI-related. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Fande, >>>>>>> >>>>>>> >>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am having a bug that is occurring in PETSC with the return string: >>>>>>>> >>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>> largest key allowed 5693 >>>>>>>> >>>>>>>> This is using petsc-3.13.2, compiled and running using mpich with >>>>>>>> -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>> same set of instructions etc.). >>>>>>>> >>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>> time and is happening somewhat rarely. >>>>>>>> >>>>>>>> More than likely I am using a static variable (code is written in >>>>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>> >>>>>>>> *Chris Hewson* >>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>> ResFrac >>>>>>>> +1.587.575.9792 >>>>>>>> >>>>>>> >>>>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 24 11:17:24 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 24 Sep 2020 12:17:24 -0400 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: Message-ID: This communication is all in PCApply. What -pc_type are you using? It looks like -pc_type ssor (or is it sor). That is not implemented on the GPU. You can use 'jacobi' On Thu, Sep 24, 2020 at 11:08 AM Zhang, Chonglin wrote: > Dear PETSc Users, > > I have some questions regarding the proper GPU usage. I would like to know > the proper way to: > (1) solve linear equation in SNES, using GPU in PETSc; what > syntax/arguments should I be using; > (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data > transfer showed in PETSc log file, when using CUDA aware MPI. > > > Details of what I am doing now and my observations are below: > > System and compilers used: > (1) RPI?s AiMOS computer (node wise, it is the same as Summit); > (2) using GCC 7.4.0 and Spectrum-MPI 10.3. > > I am doing the followings to solve the linear Poisson equation with SNES > interface, under DMPlex: > (1) using DMPlex to set up the unstructured mesh; > (2) using DM to create vector and matrix; > (3) using SNES interface to solve the linear Poisson equation, with > ?-snes_type ksponly?; > (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector > and matrix, as suggested in this webpage: > https://www.mcs.anl.gov/petsc/features/gpus.html > (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to > enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): > https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; > https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf > (6) using ?-options_left? to check and make sure all the arguments are > accepted and used by PETSc. > (7) After problem setup, I am running the ?SNESSolve()? multiple times to > solve the linear problem and observe the log file with ?-log_view" > > I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the > ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for > some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, > VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log > corresponding to running SNESSolve() 500 times: > > > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total GPU - CpuToGpu - - > GpuToCpu - GPU > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count > Size %F > > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > --- Event Stage 0: Main Stage > > BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 > 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 > 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 > 1.44e+03 82 > SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 > 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 > 2.48e+01 0 > SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 > 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 > 1.24e+01 0 > DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 > 1.24e+01 0 > DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 > 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 > 0.00e+00 0 > MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 > 1.41e+03 0 > MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 > 0.00e+00 0 > MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 > 0.00e+00 0 > VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 > 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 > 0.00e+00 100 > VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 > 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 > 0.00e+00 100 > VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 > 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 > 0.00e+00 100 > VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 > 0.00e+00 0 > VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 > 1.44e+03 0 > KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 > 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 > 1.41e+03 89 > KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 > 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 > 0.00e+00 100 > PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 > 1.41e+03 0 > > > Thanks! > Chonglin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 24 11:17:51 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Sep 2020 11:17:51 -0500 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: Message-ID: MatSOR() runs on the CPU, this causes copy to CPU for each application of MatSOR() and then a copy to GPU for the next step. You can try, for example -pc_type jacobi better yet use PCGAMG if it amenable for your problem. Also the problem is way to small for a GPU. There will be copies between the GPU/CPU for each SNES iteration since the DMPLEX code does not run on GPUs. Barry > On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin wrote: > > Dear PETSc Users, > > I have some questions regarding the proper GPU usage. I would like to know the proper way to: > (1) solve linear equation in SNES, using GPU in PETSc; what syntax/arguments should I be using; > (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data transfer showed in PETSc log file, when using CUDA aware MPI. > > > Details of what I am doing now and my observations are below: > > System and compilers used: > (1) RPI?s AiMOS computer (node wise, it is the same as Summit); > (2) using GCC 7.4.0 and Spectrum-MPI 10.3. > > I am doing the followings to solve the linear Poisson equation with SNES interface, under DMPlex: > (1) using DMPlex to set up the unstructured mesh; > (2) using DM to create vector and matrix; > (3) using SNES interface to solve the linear Poisson equation, with ?-snes_type ksponly?; > (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector and matrix, as suggested in this webpage: https://www.mcs.anl.gov/petsc/features/gpus.html > (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct ; https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf > (6) using ?-options_left? to check and make sure all the arguments are accepted and used by PETSc. > (7) After problem setup, I am running the ?SNESSolve()? multiple times to solve the linear problem and observe the log file with ?-log_view" > > I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log corresponding to running SNESSolve() 500 times: > > > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > --- Event Stage 0: Main Stage > > BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 1.44e+03 82 > SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 2.48e+01 0 > SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 1.24e+01 0 > DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 1.24e+01 0 > DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 0.00e+00 0 > MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 1.41e+03 0 > MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 > MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 > VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 0.00e+00 100 > VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 0.00e+00 100 > VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 0.00e+00 100 > VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 0.00e+00 0 > VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 1.44e+03 0 > KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 1.41e+03 89 > KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 0.00e+00 100 > PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 1.41e+03 0 > > > Thanks! > Chonglin -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 24 11:32:30 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 12:32:30 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> Message-ID: On Thu, Sep 24, 2020 at 9:47 AM Chris Hewson wrote: > After about a month of not having this issue pop up, it has come up again > > We have been struggling with a similar PETSc Error for awhile now, the > error message is as follows: > > [7]PETSC ERROR: PetscTableFind() line 132 in > /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than > largest key allowed 23988 > The full stack would be really useful here. I am guessing this happens on MatMult(), but I do not know. Thanks, Matt > It is a particularly nasty bug as it doesn't reproduce itself when > debugging and doesn't happen all the time with the same inputs either. The > problem occurs after a long runtime of the code (12-40 hours) and we are > using a ksp solve with KSPBCGS. > > The PETSc compilation options that are used are: > > --download-metis > --download-mpich > --download-mumps > --download-parmetis > --download-ptscotch > --download-scalapack > --download-suitesparse > --prefix=/opt/anl/petsc-3.13.3 > --with-debugging=0 > --with-mpi=1 > COPTFLAGS=-O3 -march=haswell -mtune=haswell > CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell > FOPTFLAGS=-O3 -march=haswell -mtune=haswell > > This is being run across 8 processes and is failing consistently on the > rank 7 process. We also use openmp outside of PETSC and the linear solve > portion of the code. The rank 0 process is always using compute, during > this the slave processes use an MPI_Wait call to wait for the collective > parts of the code to be called again. All PETSC calls are done across all > of the processes. > > We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 package. > > At every PETSC call we are checking the error return from the function > collectively to ensure that no errors have been returned from PETSC. > > Some possible causes that I can think of are as follows: > 1. Memory leak causing a corruption either in our program or in petsc or > with one of the petsc objects. This seems unlikely as we have checked runs > with the option -malloc_dump for PETSc and using valgrind. > > 2. Optimization flags set for petsc compilation are causing variables that > go out of scope to be optimized out. > > 3. We are giving the wrong number of elements for a process or the value > is changing during the simulation. This seems unlikely as there is nothing > overly unique about these simulations and it's not reproducing itself. > > 4. An MPI channel or socket error causing an error in the collective > values for PETSc. > > Any input on this issue would be greatly appreciated. > > *Chris Hewson* > Senior Reservoir Simulation Engineer > ResFrac > +1.587.575.9792 > > > On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang > wrote: > >> That is a great idea. I'll figure it out. >> --Junchao Zhang >> >> >> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: >> >>> >>> Junchao, >>> >>> Any way in the PETSc configure to warn that MPICH version is "bad" >>> or "untrustworthy" or even the vague "we have suspicians about this version >>> and recommend avoiding it"? A lot of time could be saved if others don't >>> deal with the same problem. >>> >>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and >>> always check against that list and print a boxed warning at configure time? >>> Better you could somehow generalize it and put it in package.py for use by >>> all packages, then any package can included lists of "suspect" versions. >>> (There are definitely HDF5 versions that should be avoided :-)). >>> >>> Barry >>> >>> >>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang >>> wrote: >>> >>> Thanks for the update. Let's assume it is a bug in MPI :) >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson wrote: >>> >>>> Just as an update to this, I can confirm that using the mpich version >>>> (3.3.2) downloaded with the petsc download solved this issue on my end. >>>> >>>> *Chris Hewson* >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>>> >>>> >>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang >>>> wrote: >>>> >>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith wrote: >>>>> >>>>>> >>>>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is >>>>>> there any way to run this full test suite under the problematic MPI and see >>>>>> if it detects any problems? >>>>>> >>>>>> Is so, could someone add it to the FAQ in the debugging section? >>>>>> >>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>> downloaded mpich >>>>> . It >>>>> annoyed me since it is not user-friendly. It might be helpful in catching >>>>> bugs at very small scale. But say if I want to test allreduce on 1024 ranks >>>>> on 100 doubles, I have to hack the test suite. >>>>> Anyway, the instructions are here. >>>>> >>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>> cxx but I had to set CXX! >>>>> $make -k -j8 // -k is to keep going and ignore compilation errors, >>>>> e.g., when building tests for MPICH extensions not in MPI standard, but >>>>> your MPI is OpenMPI. >>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>>>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>> $ make testings or directly './runtests -tests=testlist' >>>>> >>>>> On a batch system, >>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say >>>>> btest, >>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use >>>>> 1024 ranks if a test does no specify the number of processes. >>>>> $ // It copies test binaries to the batch dir and generates a >>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>> then submit a job and wait for its finish. >>>>> $ cd btest && ../checktests --ignorebogus >>>>> >>>>> >>>>> PS: Fande, changing an MPI fixed your problem does not >>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>> not matter if the code is big. >>>>> >>>>> >>>>>> Thanks >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong wrote: >>>>>> >>>>>> Trace could look like this: >>>>>> >>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>> -------------------------------------------------------------- >>>>>> [640]PETSC ERROR: Argument out of range >>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>>>> 740521 >>>>>> [640]PETSC ERROR: See >>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>> shooting. >>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >>>>>> wangy2 Sun Jul 19 17:14:28 2020 >>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>> --download-mumps=0 >>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line >>>>>> 901 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>>>> 3180 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>> >>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>> wrote: >>>>>> >>>>>>> I am not entirely sure what is happening, but we encountered similar >>>>>>> issues recently. It was not reproducible. It might occur at different >>>>>>> stages, and errors could be weird other than "ctable stuff." Our code was >>>>>>> Valgrind clean since every PR in moose needs to go through rigorous >>>>>>> Valgrind checks before it reaches the devel branch. The errors happened >>>>>>> when we used mvapich. >>>>>>> >>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything >>>>>>> was smooth. May you try a different MPI? It is better to try a system >>>>>>> carried one. >>>>>>> >>>>>>> We did not get the bottom of this problem yet, but we at least know >>>>>>> this is kind of MPI-related. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Fande, >>>>>>> >>>>>>> >>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am having a bug that is occurring in PETSC with the return string: >>>>>>>> >>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>> largest key allowed 5693 >>>>>>>> >>>>>>>> This is using petsc-3.13.2, compiled and running using mpich with >>>>>>>> -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>> same set of instructions etc.). >>>>>>>> >>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>> time and is happening somewhat rarely. >>>>>>>> >>>>>>>> More than likely I am using a static variable (code is written in >>>>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>> >>>>>>>> *Chris Hewson* >>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>> ResFrac >>>>>>>> +1.587.575.9792 >>>>>>>> >>>>>>> >>>>>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangc20 at rpi.edu Thu Sep 24 11:48:07 2020 From: zhangc20 at rpi.edu (Zhang, Chonglin) Date: Thu, 24 Sep 2020 16:48:07 +0000 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: Message-ID: Thanks Mark and Barry! A quick try of using ?-pc_type jacobi? did reduce the number of count for ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts did not decrease in this case) solves the problem faster (may not be of any meaning since the problem size is too small; the function ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size preventing running larger problems, separate issue). Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a significant amount of time for a realistic sized problem, say for example a linear problem with ~1-2 million DOFs? Also, is there any plan to have the SNES and DMPlex code run on GPU? Thanks! Chonglin On Sep 24, 2020, at 12:17 PM, Barry Smith > wrote: MatSOR() runs on the CPU, this causes copy to CPU for each application of MatSOR() and then a copy to GPU for the next step. You can try, for example -pc_type jacobi better yet use PCGAMG if it amenable for your problem. Also the problem is way to small for a GPU. There will be copies between the GPU/CPU for each SNES iteration since the DMPLEX code does not run on GPUs. Barry On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin > wrote: Dear PETSc Users, I have some questions regarding the proper GPU usage. I would like to know the proper way to: (1) solve linear equation in SNES, using GPU in PETSc; what syntax/arguments should I be using; (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data transfer showed in PETSc log file, when using CUDA aware MPI. Details of what I am doing now and my observations are below: System and compilers used: (1) RPI?s AiMOS computer (node wise, it is the same as Summit); (2) using GCC 7.4.0 and Spectrum-MPI 10.3. I am doing the followings to solve the linear Poisson equation with SNES interface, under DMPlex: (1) using DMPlex to set up the unstructured mesh; (2) using DM to create vector and matrix; (3) using SNES interface to solve the linear Poisson equation, with ?-snes_type ksponly?; (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector and matrix, as suggested in this webpage: https://www.mcs.anl.gov/petsc/features/gpus.html (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf (6) using ?-options_left? to check and make sure all the arguments are accepted and used by PETSc. (7) After problem setup, I am running the ?SNESSolve()? multiple times to solve the linear problem and observe the log file with ?-log_view" I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log corresponding to running SNESSolve() 500 times: Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 1.44e+03 82 SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 2.48e+01 0 SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 1.24e+01 0 DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 1.24e+01 0 DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 0.00e+00 0 MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 1.41e+03 0 MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 0.00e+00 100 VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 0.00e+00 100 VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 0.00e+00 100 VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 0.00e+00 0 VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 1.44e+03 0 KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 1.41e+03 89 KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 0.00e+00 100 PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 1.41e+03 0 Thanks! Chonglin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 24 12:05:48 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Sep 2020 12:05:48 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> Message-ID: <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> Chris, We realize how frustrating this type of problem is to deal with. Here is the code: ierr = PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); for (i=0; iB->rmap->n; i++) { for (j=0; jilen[i]; j++) { PetscInt data,gid1 = aj[B->i[i] + j] + 1; ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); if (!data) { /* one based table */ ierr = PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); } } } It is simply looping over the rows of the sparse matrix putting the columns it finds into the hash table. aj[B->i[i] + j] are the column entries, the number of columns in the matrix is mat->cmap->N so the column entries should always be less than the number of columns. The code is crashing when column entry 24443 which is larger than the number of columns 23988. So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is incorrect. >>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); } Seems to indicate it is setting up a new multiple because it is either the first time into the algorithm or the nonzero structure changed on some rank requiring a new assembly process. Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? Since the code has been running for a very long time already I have to conclude that this is not the first time through and so something has changed in the matrix? I think we have to put more diagnostics into the library to provide more information before or at the time of the error detection. Does this particular run always crash at the same place? Similar place? Doesn't always crash? Barry > On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: > > After about a month of not having this issue pop up, it has come up again > > We have been struggling with a similar PETSc Error for awhile now, the error message is as follows: > > [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than largest key allowed 23988 > > It is a particularly nasty bug as it doesn't reproduce itself when debugging and doesn't happen all the time with the same inputs either. The problem occurs after a long runtime of the code (12-40 hours) and we are using a ksp solve with KSPBCGS. > > The PETSc compilation options that are used are: > > --download-metis > --download-mpich > --download-mumps > --download-parmetis > --download-ptscotch > --download-scalapack > --download-suitesparse > --prefix=/opt/anl/petsc-3.13.3 > --with-debugging=0 > --with-mpi=1 > COPTFLAGS=-O3 -march=haswell -mtune=haswell > CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell > FOPTFLAGS=-O3 -march=haswell -mtune=haswell > > This is being run across 8 processes and is failing consistently on the rank 7 process. We also use openmp outside of PETSC and the linear solve portion of the code. The rank 0 process is always using compute, during this the slave processes use an MPI_Wait call to wait for the collective parts of the code to be called again. All PETSC calls are done across all of the processes. > > We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 package. > > At every PETSC call we are checking the error return from the function collectively to ensure that no errors have been returned from PETSC. > > Some possible causes that I can think of are as follows: > 1. Memory leak causing a corruption either in our program or in petsc or with one of the petsc objects. This seems unlikely as we have checked runs with the option -malloc_dump for PETSc and using valgrind. > > 2. Optimization flags set for petsc compilation are causing variables that go out of scope to be optimized out. > > 3. We are giving the wrong number of elements for a process or the value is changing during the simulation. This seems unlikely as there is nothing overly unique about these simulations and it's not reproducing itself. > > 4. An MPI channel or socket error causing an error in the collective values for PETSc. > > Any input on this issue would be greatly appreciated. > > Chris Hewson > Senior Reservoir Simulation Engineer > ResFrac > +1.587.575.9792 > > > On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang > wrote: > That is a great idea. I'll figure it out. > --Junchao Zhang > > > On Thu, Aug 13, 2020 at 4:31 PM Barry Smith > wrote: > > Junchao, > > Any way in the PETSc configure to warn that MPICH version is "bad" or "untrustworthy" or even the vague "we have suspicians about this version and recommend avoiding it"? A lot of time could be saved if others don't deal with the same problem. > > Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and always check against that list and print a boxed warning at configure time? Better you could somehow generalize it and put it in package.py for use by all packages, then any package can included lists of "suspect" versions. (There are definitely HDF5 versions that should be avoided :-)). > > Barry > > >> On Aug 13, 2020, at 12:14 PM, Junchao Zhang > wrote: >> >> Thanks for the update. Let's assume it is a bug in MPI :) >> --Junchao Zhang >> >> >> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson > wrote: >> Just as an update to this, I can confirm that using the mpich version (3.3.2) downloaded with the petsc download solved this issue on my end. >> >> Chris Hewson >> Senior Reservoir Simulation Engineer >> ResFrac >> +1.587.575.9792 >> >> >> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang > wrote: >> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith > wrote: >> >> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is there any way to run this full test suite under the problematic MPI and see if it detects any problems? >> >> Is so, could someone add it to the FAQ in the debugging section? >> MPICH does have a test suite. It is at the subdir test/mpi of downloaded mpich . It annoyed me since it is not user-friendly. It might be helpful in catching bugs at very small scale. But say if I want to test allreduce on 1024 ranks on 100 doubles, I have to hack the test suite. >> Anyway, the instructions are here. >> For the purpose of petsc, under test/mpi one can configure it with >> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled cxx but I had to set CXX! >> $make -k -j8 // -k is to keep going and ignore compilation errors, e.g., when building tests for MPICH extensions not in MPI standard, but your MPI is OpenMPI. >> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are sub-dirs containing tests for MPI routines Petsc does not rely on. >> $ make testings or directly './runtests -tests=testlist' >> >> On a batch system, >> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say btest, >> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use 1024 ranks if a test does no specify the number of processes. >> $ // It copies test binaries to the batch dir and generates a script runtests.batch there. Edit the script to fit your batch system and then submit a job and wait for its finish. >> $ cd btest && ../checktests --ignorebogus >> >> PS: Fande, changing an MPI fixed your problem does not necessarily mean the old MPI has bugs. It is complicated. It could be a petsc bug. You need to provide us a code to reproduce your error. It does not matter if the code is big. >> >> >> Thanks >> >> Barry >> >> >>> On Jul 20, 2020, at 12:16 AM, Fande Kong > wrote: >>> >>> Trace could look like this: >>> >>> [640]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [640]PETSC ERROR: Argument out of range >>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521 >>> [640]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by wangy2 Sun Jul 19 17:14:28 2020 >>> [640]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices --download-mumps=0 >>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>> [640]PETSC ERROR: #13 PCSetUp() line 898 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>> [640]PETSC ERROR: #16 KSPSolve() line 853 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>> >>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong > wrote: >>> I am not entirely sure what is happening, but we encountered similar issues recently. It was not reproducible. It might occur at different stages, and errors could be weird other than "ctable stuff." Our code was Valgrind clean since every PR in moose needs to go through rigorous Valgrind checks before it reaches the devel branch. The errors happened when we used mvapich. >>> >>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was smooth. May you try a different MPI? It is better to try a system carried one. >>> >>> We did not get the bottom of this problem yet, but we at least know this is kind of MPI-related. >>> >>> Thanks, >>> >>> Fande, >>> >>> >>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson > wrote: >>> Hi, >>> >>> I am having a bug that is occurring in PETSC with the return string: >>> >>> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than largest key allowed 5693 >>> >>> This is using petsc-3.13.2, compiled and running using mpich with -O3 and debugging turned off tuned to the haswell architecture and occurring either before or during a KSPBCGS solve/setup or during a MUMPS factorization solve (I haven't been able to replicate this issue with the same set of instructions etc.). >>> >>> This is a terrible way to ask a question, I know, and not very helpful from your side, but this is what I have from a user's run and can't reproduce on my end (either with the optimization compilation or with debugging turned on). This happens when the code has run for quite some time and is happening somewhat rarely. >>> >>> More than likely I am using a static variable (code is written in c++) that I'm not updating when the matrix size is changing or something silly like that, but any help or guidance on this would be appreciated. >>> >>> Chris Hewson >>> Senior Reservoir Simulation Engineer >>> ResFrac >>> +1.587.575.9792 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 24 12:11:58 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Sep 2020 12:11:58 -0500 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: Message-ID: <3F063CDD-B80E-433C-80B9-74C4236328C5@petsc.dev> > On Sep 24, 2020, at 11:48 AM, Zhang, Chonglin wrote: > > Thanks Mark and Barry! > > A quick try of using ?-pc_type jacobi? did reduce the number of count for ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts did not decrease in this case) solves the problem faster (may not be of any meaning since the problem size is too small; the function ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size preventing running larger problems, separate issue). > > Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a significant amount of time for a realistic sized problem, say for example a linear problem with ~1-2 million DOFs? It depends on how often the copies are done. With GAMG once the preconditioner is built the entire linear solve can run on the GPU and Mark has some good speed ups of the liner solve using GAMG on the GPU instead of the CPU on Summit. The speedup of the entire simulation will depend on the relative cost of the finite element matrix assembly vs the linear solver time and Amdahl's law kicks in so, for example, if the finite element assembly takes 50 percent of the time even if the linear solve takes 0 time one cannot only get a speedup of two which is not much. > > Also, is there any plan to have the SNES and DMPlex code run on GPU? Basically the finite element computation for the nonlinear function and its Jacobian need to run on the GPU, this is a big project that we've barely begun thinking about. If this is something you are interested in it would be fantastic if you could take a look at that. Barry > > Thanks! > Chonglin > >> On Sep 24, 2020, at 12:17 PM, Barry Smith > wrote: >> >> >> MatSOR() runs on the CPU, this causes copy to CPU for each application of MatSOR() and then a copy to GPU for the next step. >> >> You can try, for example -pc_type jacobi better yet use PCGAMG if it amenable for your problem. >> >> Also the problem is way to small for a GPU. >> >> There will be copies between the GPU/CPU for each SNES iteration since the DMPLEX code does not run on GPUs. >> >> Barry >> >> >> >>> On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin > wrote: >>> >>> Dear PETSc Users, >>> >>> I have some questions regarding the proper GPU usage. I would like to know the proper way to: >>> (1) solve linear equation in SNES, using GPU in PETSc; what syntax/arguments should I be using; >>> (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data transfer showed in PETSc log file, when using CUDA aware MPI. >>> >>> >>> Details of what I am doing now and my observations are below: >>> >>> System and compilers used: >>> (1) RPI?s AiMOS computer (node wise, it is the same as Summit); >>> (2) using GCC 7.4.0 and Spectrum-MPI 10.3. >>> >>> I am doing the followings to solve the linear Poisson equation with SNES interface, under DMPlex: >>> (1) using DMPlex to set up the unstructured mesh; >>> (2) using DM to create vector and matrix; >>> (3) using SNES interface to solve the linear Poisson equation, with ?-snes_type ksponly?; >>> (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector and matrix, as suggested in this webpage: https://www.mcs.anl.gov/petsc/features/gpus.html >>> (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct ; https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf >>> (6) using ?-options_left? to check and make sure all the arguments are accepted and used by PETSc. >>> (7) After problem setup, I am running the ?SNESSolve()? multiple times to solve the linear problem and observe the log file with ?-log_view" >>> >>> I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log corresponding to running SNESSolve() 500 times: >>> >>> >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> --- Event Stage 0: Main Stage >>> >>> BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 0.00e+00 0 >>> BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 >>> SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 1.44e+03 82 >>> SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 >>> SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 2.48e+01 0 >>> SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 1.24e+01 0 >>> DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 1.24e+01 0 >>> DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 0.00e+00 0 >>> MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 1.41e+03 0 >>> MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 >>> MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 >>> MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 >>> VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 0.00e+00 100 >>> VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 0.00e+00 100 >>> VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 0.00e+00 100 >>> VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 0.00e+00 0 >>> VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 1.44e+03 0 >>> KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 >>> KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 1.41e+03 89 >>> KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 0.00e+00 100 >>> PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 1.41e+03 0 >>> >>> >>> Thanks! >>> Chonglin >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at resfrac.com Thu Sep 24 12:22:31 2020 From: chris at resfrac.com (Chris Hewson) Date: Thu, 24 Sep 2020 11:22:31 -0600 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> Message-ID: Hi Guys, Thanks for all of the prompt responses, very helpful and appreciated. By "when debugging", did you mean when configure petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a debugger? - Both, I have run with a debugger attached and detached, all compiled with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" 1) Try OpenMPI (probably won't help, but worth trying) - Worth a try for sure 2) Find which part of the simulation makes it non-deterministic. Is it the mesh partitioning (parmetis)? Then try to make it deterministic. - Good tip, it is the mesh partitioning and along the lines of a question from Barry, the matrix size is changing. I will make this deterministic and give it a try 3) Dump matrices, vectors, etc and see when it fails, you can quickly reproduce the error by reading in the intermediate data. - Also a great suggestion, will give it a try The full stack would be really useful here. I am guessing this happens on MatMult(), but I do not know. - Agreed, I am currently running it so that the full stack will be produced, but waiting for it to fail, had compiled with all available optimizations on, but downside is of course if there is a failure. As a general question, roughly what's the performance impact on petsc with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging = 1? Obviously problem/machine dependant, wondering on guidance more for this than anything Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? The non-zero structure is changing, although the structures are reformed when this happens and this happens thousands of time before the failure has occured. Does this particular run always crash at the same place? Similar place? Doesn't always crash? Doesn't always crash, but other similar runs have crashed in different spots, which makes it difficult to resolve. I am going to try out a few of the strategies suggested above and will let you know what comes of that. *Chris Hewson* Senior Reservoir Simulation Engineer ResFrac +1.587.575.9792 On Thu, Sep 24, 2020 at 11:05 AM Barry Smith wrote: > Chris, > > We realize how frustrating this type of problem is to deal with. > > Here is the code: > > ierr = > PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); > for (i=0; iB->rmap->n; i++) { > for (j=0; jilen[i]; j++) { > PetscInt data,gid1 = aj[B->i[i] + j] + 1; > ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); > if (!data) { > /* one based table */ > ierr = > PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); > } > } > } > > It is simply looping over the rows of the sparse matrix putting the > columns it finds into the hash table. > > aj[B->i[i] + j] are the column entries, the number of columns in the > matrix is mat->cmap->N so the column entries should always be > less than the number of columns. The code is crashing when column entry > 24443 which is larger than the number of columns 23988. > > So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is > incorrect. > > 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>> >>>>>> > if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { > ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); > } > > Seems to indicate it is setting up a new multiple because it is either the > first time into the algorithm or the nonzero structure changed on some rank > requiring a new assembly process. > > Is the nonzero structure of your matrices changing or is it fixed for > the entire simulation? > > Since the code has been running for a very long time already I have to > conclude that this is not the first time through and so something has > changed in the matrix? > > I think we have to put more diagnostics into the library to provide more > information before or at the time of the error detection. > > Does this particular run always crash at the same place? Similar > place? Doesn't always crash? > > Barry > > > > > On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: > > After about a month of not having this issue pop up, it has come up again > > We have been struggling with a similar PETSc Error for awhile now, the > error message is as follows: > > [7]PETSC ERROR: PetscTableFind() line 132 in > /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than > largest key allowed 23988 > > It is a particularly nasty bug as it doesn't reproduce itself when > debugging and doesn't happen all the time with the same inputs either. The > problem occurs after a long runtime of the code (12-40 hours) and we are > using a ksp solve with KSPBCGS. > > The PETSc compilation options that are used are: > > --download-metis > --download-mpich > --download-mumps > --download-parmetis > --download-ptscotch > --download-scalapack > --download-suitesparse > --prefix=/opt/anl/petsc-3.13.3 > --with-debugging=0 > --with-mpi=1 > COPTFLAGS=-O3 -march=haswell -mtune=haswell > CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell > FOPTFLAGS=-O3 -march=haswell -mtune=haswell > > This is being run across 8 processes and is failing consistently on the > rank 7 process. We also use openmp outside of PETSC and the linear solve > portion of the code. The rank 0 process is always using compute, during > this the slave processes use an MPI_Wait call to wait for the collective > parts of the code to be called again. All PETSC calls are done across all > of the processes. > > We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 package. > > At every PETSC call we are checking the error return from the function > collectively to ensure that no errors have been returned from PETSC. > > Some possible causes that I can think of are as follows: > 1. Memory leak causing a corruption either in our program or in petsc or > with one of the petsc objects. This seems unlikely as we have checked runs > with the option -malloc_dump for PETSc and using valgrind. > > 2. Optimization flags set for petsc compilation are causing variables that > go out of scope to be optimized out. > > 3. We are giving the wrong number of elements for a process or the value > is changing during the simulation. This seems unlikely as there is nothing > overly unique about these simulations and it's not reproducing itself. > > 4. An MPI channel or socket error causing an error in the collective > values for PETSc. > > Any input on this issue would be greatly appreciated. > > *Chris Hewson* > Senior Reservoir Simulation Engineer > ResFrac > +1.587.575.9792 > > > On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang > wrote: > >> That is a great idea. I'll figure it out. >> --Junchao Zhang >> >> >> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: >> >>> >>> Junchao, >>> >>> Any way in the PETSc configure to warn that MPICH version is "bad" >>> or "untrustworthy" or even the vague "we have suspicians about this version >>> and recommend avoiding it"? A lot of time could be saved if others don't >>> deal with the same problem. >>> >>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and >>> always check against that list and print a boxed warning at configure time? >>> Better you could somehow generalize it and put it in package.py for use by >>> all packages, then any package can included lists of "suspect" versions. >>> (There are definitely HDF5 versions that should be avoided :-)). >>> >>> Barry >>> >>> >>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang >>> wrote: >>> >>> Thanks for the update. Let's assume it is a bug in MPI :) >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson wrote: >>> >>>> Just as an update to this, I can confirm that using the mpich version >>>> (3.3.2) downloaded with the petsc download solved this issue on my end. >>>> >>>> *Chris Hewson* >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>>> >>>> >>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang >>>> wrote: >>>> >>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith wrote: >>>>> >>>>>> >>>>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is >>>>>> there any way to run this full test suite under the problematic MPI and see >>>>>> if it detects any problems? >>>>>> >>>>>> Is so, could someone add it to the FAQ in the debugging section? >>>>>> >>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>> downloaded mpich >>>>> . It >>>>> annoyed me since it is not user-friendly. It might be helpful in catching >>>>> bugs at very small scale. But say if I want to test allreduce on 1024 ranks >>>>> on 100 doubles, I have to hack the test suite. >>>>> Anyway, the instructions are here. >>>>> >>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>> cxx but I had to set CXX! >>>>> $make -k -j8 // -k is to keep going and ignore compilation errors, >>>>> e.g., when building tests for MPICH extensions not in MPI standard, but >>>>> your MPI is OpenMPI. >>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>>>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>> $ make testings or directly './runtests -tests=testlist' >>>>> >>>>> On a batch system, >>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say >>>>> btest, >>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use >>>>> 1024 ranks if a test does no specify the number of processes. >>>>> $ // It copies test binaries to the batch dir and generates a >>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>> then submit a job and wait for its finish. >>>>> $ cd btest && ../checktests --ignorebogus >>>>> >>>>> >>>>> PS: Fande, changing an MPI fixed your problem does not >>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>> not matter if the code is big. >>>>> >>>>> >>>>>> Thanks >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong wrote: >>>>>> >>>>>> Trace could look like this: >>>>>> >>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>> -------------------------------------------------------------- >>>>>> [640]PETSC ERROR: Argument out of range >>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>>>> 740521 >>>>>> [640]PETSC ERROR: See >>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>> shooting. >>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >>>>>> wangy2 Sun Jul 19 17:14:28 2020 >>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>> --download-mumps=0 >>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line >>>>>> 901 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>>>> 3180 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>> >>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>> wrote: >>>>>> >>>>>>> I am not entirely sure what is happening, but we encountered similar >>>>>>> issues recently. It was not reproducible. It might occur at different >>>>>>> stages, and errors could be weird other than "ctable stuff." Our code was >>>>>>> Valgrind clean since every PR in moose needs to go through rigorous >>>>>>> Valgrind checks before it reaches the devel branch. The errors happened >>>>>>> when we used mvapich. >>>>>>> >>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything >>>>>>> was smooth. May you try a different MPI? It is better to try a system >>>>>>> carried one. >>>>>>> >>>>>>> We did not get the bottom of this problem yet, but we at least know >>>>>>> this is kind of MPI-related. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Fande, >>>>>>> >>>>>>> >>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am having a bug that is occurring in PETSC with the return string: >>>>>>>> >>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>> largest key allowed 5693 >>>>>>>> >>>>>>>> This is using petsc-3.13.2, compiled and running using mpich with >>>>>>>> -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>> same set of instructions etc.). >>>>>>>> >>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>> time and is happening somewhat rarely. >>>>>>>> >>>>>>>> More than likely I am using a static variable (code is written in >>>>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>> >>>>>>>> *Chris Hewson* >>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>> ResFrac >>>>>>>> +1.587.575.9792 >>>>>>>> >>>>>>> >>>>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangc20 at rpi.edu Thu Sep 24 12:25:49 2020 From: zhangc20 at rpi.edu (Zhang, Chonglin) Date: Thu, 24 Sep 2020 17:25:49 +0000 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: <3F063CDD-B80E-433C-80B9-74C4236328C5@petsc.dev> References: <3F063CDD-B80E-433C-80B9-74C4236328C5@petsc.dev> Message-ID: <5CBBDCE7-285D-402A-9C02-2C0CD2872A04@rpi.edu> On Sep 24, 2020, at 1:11 PM, Barry Smith > wrote: On Sep 24, 2020, at 11:48 AM, Zhang, Chonglin > wrote: Thanks Mark and Barry! A quick try of using ?-pc_type jacobi? did reduce the number of count for ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts did not decrease in this case) solves the problem faster (may not be of any meaning since the problem size is too small; the function ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size preventing running larger problems, separate issue). Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a significant amount of time for a realistic sized problem, say for example a linear problem with ~1-2 million DOFs? It depends on how often the copies are done. With GAMG once the preconditioner is built the entire linear solve can run on the GPU and Mark has some good speed ups of the liner solve using GAMG on the GPU instead of the CPU on Summit. The speedup of the entire simulation will depend on the relative cost of the finite element matrix assembly vs the linear solver time and Amdahl's law kicks in so, for example, if the finite element assembly takes 50 percent of the time even if the linear solve takes 0 time one cannot only get a speedup of two which is not much. Thanks for the detailed explanation Barry! Mark: could you share the results of GAMG on GPU vs CPU on Summit, or pointing to me where I could see them. (Actual code how you are doing this would be even better as a learning opportunity for me). Thanks! Also, is there any plan to have the SNES and DMPlex code run on GPU? Basically the finite element computation for the nonlinear function and its Jacobian need to run on the GPU, this is a big project that we've barely begun thinking about. If this is something you are interested in it would be fantastic if you could take a look at that. I see. I will think about this, discuss internally and get back to you if I can! Thanks! Chonglin Barry Thanks! Chonglin On Sep 24, 2020, at 12:17 PM, Barry Smith > wrote: MatSOR() runs on the CPU, this causes copy to CPU for each application of MatSOR() and then a copy to GPU for the next step. You can try, for example -pc_type jacobi better yet use PCGAMG if it amenable for your problem. Also the problem is way to small for a GPU. There will be copies between the GPU/CPU for each SNES iteration since the DMPLEX code does not run on GPUs. Barry On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin > wrote: Dear PETSc Users, I have some questions regarding the proper GPU usage. I would like to know the proper way to: (1) solve linear equation in SNES, using GPU in PETSc; what syntax/arguments should I be using; (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data transfer showed in PETSc log file, when using CUDA aware MPI. Details of what I am doing now and my observations are below: System and compilers used: (1) RPI?s AiMOS computer (node wise, it is the same as Summit); (2) using GCC 7.4.0 and Spectrum-MPI 10.3. I am doing the followings to solve the linear Poisson equation with SNES interface, under DMPlex: (1) using DMPlex to set up the unstructured mesh; (2) using DM to create vector and matrix; (3) using SNES interface to solve the linear Poisson equation, with ?-snes_type ksponly?; (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector and matrix, as suggested in this webpage: https://www.mcs.anl.gov/petsc/features/gpus.html (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf (6) using ?-options_left? to check and make sure all the arguments are accepted and used by PETSc. (7) After problem setup, I am running the ?SNESSolve()? multiple times to solve the linear problem and observe the log file with ?-log_view" I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log corresponding to running SNESSolve() 500 times: Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 1.44e+03 82 SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 2.48e+01 0 SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 1.24e+01 0 DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 1.24e+01 0 DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 0.00e+00 0 MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 1.41e+03 0 MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 0.00e+00 100 VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 0.00e+00 100 VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 0.00e+00 100 VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 0.00e+00 0 VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 1.44e+03 0 KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 1.41e+03 89 KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 0.00e+00 100 PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 1.41e+03 0 Thanks! Chonglin -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 24 12:35:45 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 13:35:45 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> Message-ID: On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson wrote: > Hi Guys, > > Thanks for all of the prompt responses, very helpful and appreciated. > > By "when debugging", did you mean when configure petsc --with-debugging=1 > COPTFLAGS=-O0 -g etc or when you attached a debugger? > - Both, I have run with a debugger attached and detached, all compiled > with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" > > 1) Try OpenMPI (probably won't help, but worth trying) > - Worth a try for sure > > 2) Find which part of the simulation makes it non-deterministic. Is it the > mesh partitioning (parmetis)? Then try to make it deterministic. > - Good tip, it is the mesh partitioning and along the lines of a question > from Barry, the matrix size is changing. I will make this deterministic and > give it a try > > 3) Dump matrices, vectors, etc and see when it fails, you can quickly > reproduce the error by reading in the intermediate data. > - Also a great suggestion, will give it a try > > The full stack would be really useful here. I am guessing this happens on > MatMult(), but I do not know. > - Agreed, I am currently running it so that the full stack will be > produced, but waiting for it to fail, had compiled with all available > optimizations on, but downside is of course if there is a failure. > As a general question, roughly what's the performance impact on petsc with > -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging = 1? > Obviously problem/machine dependant, wondering on guidance more for this > than anything > > Is the nonzero structure of your matrices changing or is it fixed for the > entire simulation? > The non-zero structure is changing, although the structures are reformed > when this happens and this happens thousands of time before the failure has > occured. > Okay, this is the most likely spot for a bug. How are you changing the matrix? It should be impossible to put in an invalid column index when using MatSetValues() because we check all the inputs. However, I do not think we check when you just yank out the arrays. Thanks, Matt > Does this particular run always crash at the same place? Similar place? > Doesn't always crash? > Doesn't always crash, but other similar runs have crashed in different > spots, which makes it difficult to resolve. I am going to try out a few of > the strategies suggested above and will let you know what comes of that. > > *Chris Hewson* > Senior Reservoir Simulation Engineer > ResFrac > +1.587.575.9792 > > > On Thu, Sep 24, 2020 at 11:05 AM Barry Smith wrote: > >> Chris, >> >> We realize how frustrating this type of problem is to deal with. >> >> Here is the code: >> >> ierr = >> PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >> for (i=0; iB->rmap->n; i++) { >> for (j=0; jilen[i]; j++) { >> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >> if (!data) { >> /* one based table */ >> ierr = >> PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >> } >> } >> } >> >> It is simply looping over the rows of the sparse matrix putting the >> columns it finds into the hash table. >> >> aj[B->i[i] + j] are the column entries, the number of columns in the >> matrix is mat->cmap->N so the column entries should always be >> less than the number of columns. The code is crashing when column entry >> 24443 which is larger than the number of columns 23988. >> >> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is >> incorrect. >> >> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>> >>>>>>> >> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >> } >> >> Seems to indicate it is setting up a new multiple because it is either >> the first time into the algorithm or the nonzero structure changed on some >> rank requiring a new assembly process. >> >> Is the nonzero structure of your matrices changing or is it fixed for >> the entire simulation? >> >> Since the code has been running for a very long time already I have to >> conclude that this is not the first time through and so something has >> changed in the matrix? >> >> I think we have to put more diagnostics into the library to provide more >> information before or at the time of the error detection. >> >> Does this particular run always crash at the same place? Similar >> place? Doesn't always crash? >> >> Barry >> >> >> >> >> On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: >> >> After about a month of not having this issue pop up, it has come up again >> >> We have been struggling with a similar PETSc Error for awhile now, the >> error message is as follows: >> >> [7]PETSC ERROR: PetscTableFind() line 132 in >> /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than >> largest key allowed 23988 >> >> It is a particularly nasty bug as it doesn't reproduce itself when >> debugging and doesn't happen all the time with the same inputs either. The >> problem occurs after a long runtime of the code (12-40 hours) and we are >> using a ksp solve with KSPBCGS. >> >> The PETSc compilation options that are used are: >> >> --download-metis >> --download-mpich >> --download-mumps >> --download-parmetis >> --download-ptscotch >> --download-scalapack >> --download-suitesparse >> --prefix=/opt/anl/petsc-3.13.3 >> --with-debugging=0 >> --with-mpi=1 >> COPTFLAGS=-O3 -march=haswell -mtune=haswell >> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >> >> This is being run across 8 processes and is failing consistently on the >> rank 7 process. We also use openmp outside of PETSC and the linear solve >> portion of the code. The rank 0 process is always using compute, during >> this the slave processes use an MPI_Wait call to wait for the collective >> parts of the code to be called again. All PETSC calls are done across all >> of the processes. >> >> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 >> package. >> >> At every PETSC call we are checking the error return from the function >> collectively to ensure that no errors have been returned from PETSC. >> >> Some possible causes that I can think of are as follows: >> 1. Memory leak causing a corruption either in our program or in petsc or >> with one of the petsc objects. This seems unlikely as we have checked runs >> with the option -malloc_dump for PETSc and using valgrind. >> >> 2. Optimization flags set for petsc compilation are causing variables >> that go out of scope to be optimized out. >> >> 3. We are giving the wrong number of elements for a process or the value >> is changing during the simulation. This seems unlikely as there is nothing >> overly unique about these simulations and it's not reproducing itself. >> >> 4. An MPI channel or socket error causing an error in the collective >> values for PETSc. >> >> Any input on this issue would be greatly appreciated. >> >> *Chris Hewson* >> Senior Reservoir Simulation Engineer >> ResFrac >> +1.587.575.9792 >> >> >> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang >> wrote: >> >>> That is a great idea. I'll figure it out. >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: >>> >>>> >>>> Junchao, >>>> >>>> Any way in the PETSc configure to warn that MPICH version is "bad" >>>> or "untrustworthy" or even the vague "we have suspicians about this version >>>> and recommend avoiding it"? A lot of time could be saved if others don't >>>> deal with the same problem. >>>> >>>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and >>>> always check against that list and print a boxed warning at configure time? >>>> Better you could somehow generalize it and put it in package.py for use by >>>> all packages, then any package can included lists of "suspect" versions. >>>> (There are definitely HDF5 versions that should be avoided :-)). >>>> >>>> Barry >>>> >>>> >>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang >>>> wrote: >>>> >>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson >>>> wrote: >>>> >>>>> Just as an update to this, I can confirm that using the mpich version >>>>> (3.3.2) downloaded with the petsc download solved this issue on my end. >>>>> >>>>> *Chris Hewson* >>>>> Senior Reservoir Simulation Engineer >>>>> ResFrac >>>>> +1.587.575.9792 >>>>> >>>>> >>>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith wrote: >>>>>> >>>>>>> >>>>>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? >>>>>>> Is there any way to run this full test suite under the problematic MPI and >>>>>>> see if it detects any problems? >>>>>>> >>>>>>> Is so, could someone add it to the FAQ in the debugging section? >>>>>>> >>>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>>> downloaded mpich >>>>>> . >>>>>> It annoyed me since it is not user-friendly. It might be helpful in >>>>>> catching bugs at very small scale. But say if I want to test allreduce on >>>>>> 1024 ranks on 100 doubles, I have to hack the test suite. >>>>>> Anyway, the instructions are here. >>>>>> >>>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>>> cxx but I had to set CXX! >>>>>> $make -k -j8 // -k is to keep going and ignore compilation errors, >>>>>> e.g., when building tests for MPICH extensions not in MPI standard, but >>>>>> your MPI is OpenMPI. >>>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>>>>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>>> $ make testings or directly './runtests -tests=testlist' >>>>>> >>>>>> On a batch system, >>>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, >>>>>> say btest, >>>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use >>>>>> 1024 ranks if a test does no specify the number of processes. >>>>>> $ // It copies test binaries to the batch dir and generates a >>>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>>> then submit a job and wait for its finish. >>>>>> $ cd btest && ../checktests --ignorebogus >>>>>> >>>>>> >>>>>> PS: Fande, changing an MPI fixed your problem does not >>>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>>> not matter if the code is big. >>>>>> >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong >>>>>>> wrote: >>>>>>> >>>>>>> Trace could look like this: >>>>>>> >>>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>>> -------------------------------------------------------------- >>>>>>> [640]PETSC ERROR: Argument out of range >>>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>>>>> 740521 >>>>>>> [640]PETSC ERROR: See >>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>> shooting. >>>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >>>>>>> wangy2 Sun Jul 19 17:14:28 2020 >>>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>>> --download-mumps=0 >>>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line >>>>>>> 901 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>>>>> 3180 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>>> >>>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>>> wrote: >>>>>>> >>>>>>>> I am not entirely sure what is happening, but we encountered >>>>>>>> similar issues recently. It was not reproducible. It might occur at >>>>>>>> different stages, and errors could be weird other than "ctable stuff." Our >>>>>>>> code was Valgrind clean since every PR in moose needs to go through >>>>>>>> rigorous Valgrind checks before it reaches the devel branch. The errors >>>>>>>> happened when we used mvapich. >>>>>>>> >>>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything >>>>>>>> was smooth. May you try a different MPI? It is better to try a system >>>>>>>> carried one. >>>>>>>> >>>>>>>> We did not get the bottom of this problem yet, but we at least know >>>>>>>> this is kind of MPI-related. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Fande, >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am having a bug that is occurring in PETSC with the return >>>>>>>>> string: >>>>>>>>> >>>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>>> largest key allowed 5693 >>>>>>>>> >>>>>>>>> This is using petsc-3.13.2, compiled and running using mpich with >>>>>>>>> -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>>> same set of instructions etc.). >>>>>>>>> >>>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>>> time and is happening somewhat rarely. >>>>>>>>> >>>>>>>>> More than likely I am using a static variable (code is written in >>>>>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>>> >>>>>>>>> *Chris Hewson* >>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>> ResFrac >>>>>>>>> +1.587.575.9792 >>>>>>>>> >>>>>>>> >>>>>>> >>>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 24 12:37:41 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 13:37:41 -0400 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: Message-ID: On Thu, Sep 24, 2020 at 12:48 PM Zhang, Chonglin wrote: > Thanks Mark and Barry! > > A quick try of using ?-pc_type jacobi? did reduce the number of count for > ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts did > not decrease in this case) solves the problem faster (may not be of any > meaning since the problem size is too small; the function > ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size > preventing running larger problems, separate issue). > It sounds like something is wrong then, or I do not understand what you mean by slow. Thanks, Matt > Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a > significant amount of time for a realistic sized problem, say for example a > linear problem with ~1-2 million DOFs? > > Also, is there any plan to have the SNES and DMPlex code run on GPU? > > Thanks! > Chonglin > > On Sep 24, 2020, at 12:17 PM, Barry Smith wrote: > > > MatSOR() runs on the CPU, this causes copy to CPU for each application > of MatSOR() and then a copy to GPU for the next step. > > You can try, for example -pc_type jacobi better yet use PCGAMG if it > amenable for your problem. > > Also the problem is way to small for a GPU. > > There will be copies between the GPU/CPU for each SNES iteration since > the DMPLEX code does not run on GPUs. > > Barry > > > > On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin wrote: > > Dear PETSc Users, > > I have some questions regarding the proper GPU usage. I would like to know > the proper way to: > (1) solve linear equation in SNES, using GPU in PETSc; what > syntax/arguments should I be using; > (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data > transfer showed in PETSc log file, when using CUDA aware MPI. > > > Details of what I am doing now and my observations are below: > > System and compilers used: > (1) RPI?s AiMOS computer (node wise, it is the same as Summit); > (2) using GCC 7.4.0 and Spectrum-MPI 10.3. > > I am doing the followings to solve the linear Poisson equation with SNES > interface, under DMPlex: > (1) using DMPlex to set up the unstructured mesh; > (2) using DM to create vector and matrix; > (3) using SNES interface to solve the linear Poisson equation, with > ?-snes_type ksponly?; > (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector > and matrix, as suggested in this webpage: > https://www.mcs.anl.gov/petsc/features/gpus.html > (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to > enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): > https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; > https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf > (6) using ?-options_left? to check and make sure all the arguments are > accepted and used by PETSc. > (7) After problem setup, I am running the ?SNESSolve()? multiple times to > solve the linear problem and observe the log file with ?-log_view" > > I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the > ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for > some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, > VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log > corresponding to running SNESSolve() 500 times: > > > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total GPU - CpuToGpu - - > GpuToCpu - GPU > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count > Size %F > > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > --- Event Stage 0: Main Stage > > BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 > 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 > 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 > 1.44e+03 82 > SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 > 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 > 2.48e+01 0 > SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 > 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 > 1.24e+01 0 > DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 > 1.24e+01 0 > DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 > 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 > 0.00e+00 0 > MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 > 1.41e+03 0 > MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 > 0.00e+00 0 > MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 > 0.00e+00 0 > VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 > 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 > 0.00e+00 100 > VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 > 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 > 0.00e+00 100 > VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 > 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 > 0.00e+00 100 > VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 > 0.00e+00 0 > VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 > 1.44e+03 0 > KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 > 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 > 1.41e+03 89 > KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 > 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 > 0.00e+00 100 > PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 > 1.41e+03 0 > > > Thanks! > Chonglin > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 24 13:04:27 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 24 Sep 2020 14:04:27 -0400 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: Message-ID: On Thu, Sep 24, 2020 at 1:38 PM Matthew Knepley wrote: > On Thu, Sep 24, 2020 at 12:48 PM Zhang, Chonglin wrote: > >> Thanks Mark and Barry! >> >> A quick try of using ?-pc_type jacobi? did reduce the number of count for >> ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts did >> not decrease in this case) solves the problem faster (may not be of any >> meaning since the problem size is too small; the function >> ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size >> preventing running larger problems, separate issue). >> > > It sounds like something is wrong then, or I do not understand what you > mean by slow. > sor may be the default so you need to set the -mg_level_ksp[pc]_type chebyshev[jacobi]. chebyshev is the ksp default. > > Thanks, > > Matt > > >> Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a >> significant amount of time for a realistic sized problem, say for example a >> linear problem with ~1-2 million DOFs? >> >> Also, is there any plan to have the SNES and DMPlex code run on GPU? >> >> Thanks! >> Chonglin >> >> On Sep 24, 2020, at 12:17 PM, Barry Smith wrote: >> >> >> MatSOR() runs on the CPU, this causes copy to CPU for each application >> of MatSOR() and then a copy to GPU for the next step. >> >> You can try, for example -pc_type jacobi better yet use PCGAMG if it >> amenable for your problem. >> >> Also the problem is way to small for a GPU. >> >> There will be copies between the GPU/CPU for each SNES iteration since >> the DMPLEX code does not run on GPUs. >> >> Barry >> >> >> >> On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin wrote: >> >> Dear PETSc Users, >> >> I have some questions regarding the proper GPU usage. I would like to >> know the proper way to: >> (1) solve linear equation in SNES, using GPU in PETSc; what >> syntax/arguments should I be using; >> (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data >> transfer showed in PETSc log file, when using CUDA aware MPI. >> >> >> Details of what I am doing now and my observations are below: >> >> System and compilers used: >> (1) RPI?s AiMOS computer (node wise, it is the same as Summit); >> (2) using GCC 7.4.0 and Spectrum-MPI 10.3. >> >> I am doing the followings to solve the linear Poisson equation with SNES >> interface, under DMPlex: >> (1) using DMPlex to set up the unstructured mesh; >> (2) using DM to create vector and matrix; >> (3) using SNES interface to solve the linear Poisson equation, with >> ?-snes_type ksponly?; >> (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU >> vector and matrix, as suggested in this webpage: >> https://www.mcs.anl.gov/petsc/features/gpus.html >> (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to >> enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): >> https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; >> https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf >> (6) using ?-options_left? to check and make sure all the arguments are >> accepted and used by PETSc. >> (7) After problem setup, I am running the ?SNESSolve()? multiple times to >> solve the linear problem and observe the log file with ?-log_view" >> >> I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the >> ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for >> some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, >> VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log >> corresponding to running SNESSolve() 500 times: >> >> >> Event Count Time (sec) Flop >> --- Global --- --- Stage ---- Total GPU - CpuToGpu - - >> GpuToCpu - GPU >> Max Ratio Max Ratio Max Ratio Mess AvgLen >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count >> Size %F >> >> --------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> --- Event Stage 0: Main Stage >> >> BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 >> 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0 >> BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0 >> SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 >> 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 >> 1.44e+03 82 >> SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0 >> SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 >> 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 >> 2.48e+01 0 >> SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 >> 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 >> 1.24e+01 0 >> DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 >> 1.24e+01 0 >> DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 >> 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 >> 0.00e+00 0 >> MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 >> 1.41e+03 0 >> MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0 >> MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 >> 0.00e+00 0 >> MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 >> 0.00e+00 0 >> VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 >> 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 >> 0.00e+00 100 >> VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 >> 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 >> 0.00e+00 100 >> VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 >> 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 >> 0.00e+00 100 >> VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 >> 0.00e+00 0 >> VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 >> 1.44e+03 0 >> KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0 >> KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 >> 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 >> 1.41e+03 89 >> KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 >> 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 >> 0.00e+00 100 >> PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 >> 1.41e+03 0 >> >> >> Thanks! >> Chonglin >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 24 13:06:29 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 14:06:29 -0400 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: Message-ID: On Thu, Sep 24, 2020 at 2:04 PM Mark Adams wrote: > On Thu, Sep 24, 2020 at 1:38 PM Matthew Knepley wrote: > >> On Thu, Sep 24, 2020 at 12:48 PM Zhang, Chonglin >> wrote: >> >>> Thanks Mark and Barry! >>> >>> A quick try of using ?-pc_type jacobi? did reduce the number of count >>> for ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts >>> did not decrease in this case) solves the problem faster (may not be of any >>> meaning since the problem size is too small; the function >>> ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size >>> preventing running larger problems, separate issue). >>> >> >> It sounds like something is wrong then, or I do not understand what you >> mean by slow. >> > > sor may be the default so you need to set the -mg_level_ksp[pc]_type > chebyshev[jacobi]. chebyshev is the ksp default. > I meant for the mesh creation. Thanks, Matt > Thanks, >> >> Matt >> >> >>> Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a >>> significant amount of time for a realistic sized problem, say for example a >>> linear problem with ~1-2 million DOFs? >>> >>> Also, is there any plan to have the SNES and DMPlex code run on GPU? >>> >>> Thanks! >>> Chonglin >>> >>> On Sep 24, 2020, at 12:17 PM, Barry Smith wrote: >>> >>> >>> MatSOR() runs on the CPU, this causes copy to CPU for each >>> application of MatSOR() and then a copy to GPU for the next step. >>> >>> You can try, for example -pc_type jacobi better yet use PCGAMG if it >>> amenable for your problem. >>> >>> Also the problem is way to small for a GPU. >>> >>> There will be copies between the GPU/CPU for each SNES iteration since >>> the DMPLEX code does not run on GPUs. >>> >>> Barry >>> >>> >>> >>> On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin wrote: >>> >>> Dear PETSc Users, >>> >>> I have some questions regarding the proper GPU usage. I would like to >>> know the proper way to: >>> (1) solve linear equation in SNES, using GPU in PETSc; what >>> syntax/arguments should I be using; >>> (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data >>> transfer showed in PETSc log file, when using CUDA aware MPI. >>> >>> >>> Details of what I am doing now and my observations are below: >>> >>> System and compilers used: >>> (1) RPI?s AiMOS computer (node wise, it is the same as Summit); >>> (2) using GCC 7.4.0 and Spectrum-MPI 10.3. >>> >>> I am doing the followings to solve the linear Poisson equation with SNES >>> interface, under DMPlex: >>> (1) using DMPlex to set up the unstructured mesh; >>> (2) using DM to create vector and matrix; >>> (3) using SNES interface to solve the linear Poisson equation, with >>> ?-snes_type ksponly?; >>> (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU >>> vector and matrix, as suggested in this webpage: >>> https://www.mcs.anl.gov/petsc/features/gpus.html >>> (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to >>> enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): >>> https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; >>> https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf >>> (6) using ?-options_left? to check and make sure all the arguments are >>> accepted and used by PETSc. >>> (7) After problem setup, I am running the ?SNESSolve()? multiple times >>> to solve the linear problem and observe the log file with ?-log_view" >>> >>> I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, >>> the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for >>> some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, >>> VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log >>> corresponding to running SNESSolve() 500 times: >>> >>> >>> Event Count Time (sec) Flop >>> --- Global --- --- Stage ---- Total GPU - CpuToGpu - - >>> GpuToCpu - GPU >>> Max Ratio Max Ratio Max Ratio Mess AvgLen >>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count >>> Size %F >>> >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> --- Event Stage 0: Main Stage >>> >>> BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 >>> 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 >>> 0.00e+00 0 >>> BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >>> 0.00e+00 0 >>> SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 >>> 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 >>> 1.44e+03 82 >>> SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >>> 0.00e+00 0 >>> SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 >>> 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 >>> 2.48e+01 0 >>> SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 >>> 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 >>> 1.24e+01 0 >>> DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 >>> 1.24e+01 0 >>> DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 >>> 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 >>> 0.00e+00 0 >>> MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 >>> 1.41e+03 0 >>> MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0 >>> MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 >>> 0.00e+00 0 >>> MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 >>> 0.00e+00 0 >>> VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 >>> 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 >>> 0.00e+00 100 >>> VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 >>> 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 >>> 0.00e+00 100 >>> VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 >>> 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 >>> 0.00e+00 100 >>> VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 >>> 0.00e+00 0 >>> VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 >>> 1.44e+03 0 >>> KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >>> 0.00e+00 0 >>> KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 >>> 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 >>> 1.41e+03 89 >>> KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 >>> 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 >>> 0.00e+00 100 >>> PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 >>> 1.41e+03 0 >>> >>> >>> Thanks! >>> Chonglin >>> >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 24 13:15:19 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 24 Sep 2020 14:15:19 -0400 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: <5CBBDCE7-285D-402A-9C02-2C0CD2872A04@rpi.edu> References: <3F063CDD-B80E-433C-80B9-74C4236328C5@petsc.dev> <5CBBDCE7-285D-402A-9C02-2C0CD2872A04@rpi.edu> Message-ID: On Thu, Sep 24, 2020 at 1:26 PM Zhang, Chonglin wrote: > > > On Sep 24, 2020, at 1:11 PM, Barry Smith wrote: > > > > On Sep 24, 2020, at 11:48 AM, Zhang, Chonglin wrote: > > Thanks Mark and Barry! > > A quick try of using ?-pc_type jacobi? did reduce the number of count for > ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts did > not decrease in this case) solves the problem faster (may not be of any > meaning since the problem size is too small; the function > ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size > preventing running larger problems, separate issue). > > Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a > significant amount of time for a realistic sized problem, say for example a > linear problem with ~1-2 million DOFs? > > > It depends on how often the copies are done. > > With GAMG once the preconditioner is built the entire linear solve can > run on the GPU and Mark has some good speed ups of the liner solve using > GAMG on the GPU instead of the CPU on Summit. > > The speedup of the entire simulation will depend on the relative cost > of the finite element matrix assembly vs the linear solver time and > Amdahl's law kicks in so, for example, if the finite element assembly takes > 50 percent of the time even if the linear solve takes 0 time one cannot > only get a speedup of two which is not much. > > > Thanks for the detailed explanation Barry! > > Mark: could you share the results of GAMG on GPU vs CPU on Summit, or > pointing to me where I could see them. (Actual code how you are doing this > would be even better as a learning opportunity for me). Thanks! > Here are a few plots. These use snes/ex13. I am actually testing that now after many months and its not quite right, but here is how its run on SUMMIT: 12:46 master= ~/petsc/src/snes/tutorials$ make PETSC_DIR=/ccs/home/adams/petsc PETSC_ARCH=arch-summit-opt-gnu-cuda-omp -f mymake run NP=4 EXTRA='-dm_refine 2 -dm_view' jsrun -n 1 -a 4 -c 4 -g 1 -r 1 --smpiargs "-gpu" ./ex13 -dm_plex_box_dim 3 -dm_plex_box_simplex 0 -potential_petscspace_degree 1 -dm_refine 1 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 1000 -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -mg_levels_ksp_type chebyshev -mg_levels_ksp_max_it 1 -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -ksp_monitor -dm_refine 2 -dm_view -vec_type cuda -mat_type aijcusparse DM Object: Mesh 4 MPI processes type: plex Mesh in 3 dimensions: 0-cells: 125 0 0 0 1-cells: 300 0 0 0 2-cells: 240 0 0 0 3-cells: 64 0 0 0 Labels: celltype: 4 strata with value/size (0 (125), 1 (300), 4 (240), 7 (64)) depth: 4 strata with value/size (0 (125), 1 (300), 2 (240), 3 (64)) marker: 1 strata with value/size (1 (378)) Face Sets: 6 strata with value/size (6 (49), 5 (49), 3 (49), 4 (49), 1 (49), 2 (49)) 0 KSP Residual norm 4.381403663299e+00 1 KSP Residual norm 5.547944212592e-01 2 KSP Residual norm 8.282166658954e-02 3 KSP Residual norm 6.363083491776e-03 4 KSP Residual norm 1.177626983807e-03 5 KSP Residual norm 2.448983144610e-04 6 KSP Residual norm 4.343194953221e-05 7 KSP Residual norm 2.341208781337e-18 > > > Also, is there any plan to have the SNES and DMPlex code run on GPU? > > > Basically the finite element computation for the nonlinear function and > its Jacobian need to run on the GPU, this is a big project that we've > barely begun thinking about. If this is something you are interested in it > would be fantastic if you could take a look at that. > > > I see. I will think about this, discuss internally and get back to you if > I can! > > Thanks! > Chonglin > > > Barry > > > > > Thanks! > Chonglin > > On Sep 24, 2020, at 12:17 PM, Barry Smith wrote: > > > MatSOR() runs on the CPU, this causes copy to CPU for each application > of MatSOR() and then a copy to GPU for the next step. > > You can try, for example -pc_type jacobi better yet use PCGAMG if it > amenable for your problem. > > Also the problem is way to small for a GPU. > > There will be copies between the GPU/CPU for each SNES iteration since > the DMPLEX code does not run on GPUs. > > Barry > > > > On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin wrote: > > Dear PETSc Users, > > I have some questions regarding the proper GPU usage. I would like to know > the proper way to: > (1) solve linear equation in SNES, using GPU in PETSc; what > syntax/arguments should I be using; > (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data > transfer showed in PETSc log file, when using CUDA aware MPI. > > > Details of what I am doing now and my observations are below: > > System and compilers used: > (1) RPI?s AiMOS computer (node wise, it is the same as Summit); > (2) using GCC 7.4.0 and Spectrum-MPI 10.3. > > I am doing the followings to solve the linear Poisson equation with SNES > interface, under DMPlex: > (1) using DMPlex to set up the unstructured mesh; > (2) using DM to create vector and matrix; > (3) using SNES interface to solve the linear Poisson equation, with > ?-snes_type ksponly?; > (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector > and matrix, as suggested in this webpage: > https://www.mcs.anl.gov/petsc/features/gpus.html > (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to > enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): > https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; > https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf > (6) using ?-options_left? to check and make sure all the arguments are > accepted and used by PETSc. > (7) After problem setup, I am running the ?SNESSolve()? multiple times to > solve the linear problem and observe the log file with ?-log_view" > > I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the > ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for > some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, > VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log > corresponding to running SNESSolve() 500 times: > > > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total GPU - CpuToGpu - - > GpuToCpu - GPU > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count > Size %F > > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > --- Event Stage 0: Main Stage > > BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 > 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 > 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 > 1.44e+03 82 > SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 > 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 > 2.48e+01 0 > SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 > 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 > 1.24e+01 0 > DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 > 1.24e+01 0 > DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 > 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 > 0.00e+00 0 > MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 > 1.41e+03 0 > MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 > 0.00e+00 0 > MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 > 0.00e+00 0 > VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 > 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 > 0.00e+00 100 > VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 > 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 > 0.00e+00 100 > VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 > 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 > 0.00e+00 100 > VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 > 0.00e+00 0 > VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 > 1.44e+03 0 > KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 > 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 > 1.41e+03 89 > KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 > 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 > 0.00e+00 100 > PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 > 1.41e+03 0 > > > Thanks! > Chonglin > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: weak_scaling_cpu_a12g3r2.png Type: image/png Size: 67697 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: weak_scaling_gpu_a12g3r2.png Type: image/png Size: 66173 bytes Desc: not available URL: From mfadams at lbl.gov Thu Sep 24 13:26:53 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 24 Sep 2020 14:26:53 -0400 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: <3F063CDD-B80E-433C-80B9-74C4236328C5@petsc.dev> <5CBBDCE7-285D-402A-9C02-2C0CD2872A04@rpi.edu> Message-ID: > > > > 12:46 master= ~/petsc/src/snes/tutorials$ make > PETSC_DIR=/ccs/home/adams/petsc PETSC_ARCH=arch-summit-opt-gnu-cuda-omp -f > mymake run NP=4 EXTRA='-dm_refine 2 -dm_view' > jsrun -n 1 -a 4 -c 4 -g 1 -r 1 --smpiargs "-gpu" ./ex13 -dm_plex_box_dim 3 > -dm_plex_box_simplex 0 -potential_petscspace_degree 1 -dm_refine 1 > -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -pc_type gamg > -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 1000 > -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > -mg_levels_ksp_type chebyshev -mg_levels_ksp_max_it 1 > -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi > -ksp_monitor -dm_refine 2 -dm_view -vec_type cuda -mat_type aijcusparse > DM Object: Mesh 4 MPI processes > type: plex > Mesh in 3 dimensions: > > > > * 0-cells: 125 0 0 0 1-cells: 300 0 0 0 2-cells: 240 0 0 0 3-cells: 64 > 0 0 0* > Labels: > celltype: 4 strata with value/size (0 (125), 1 (300), 4 (240), 7 (64)) > depth: 4 strata with value/size (0 (125), 1 (300), 2 (240), 3 (64)) > marker: 1 strata with value/size (1 (378)) > Face Sets: 6 strata with value/size (6 (49), 5 (49), 3 (49), 4 (49), 1 > (49), 2 (49) > This is not getting distributed. 14:24 master= ~/petsc/src/snes/tutorials$ make PETSC_DIR=/ccs/home/adams/petsc PETSC_ARCH=arch-summit-opt-gnu-cuda-omp -f mymake run NP=4 EXTRA='-dm_plex_box_faces 4,4,4 -dm_distribute -dm_view' jsrun -n 1 -a 4 -c 4 -g 1 -r 1 --smpiargs "-gpu" ./ex13 -dm_plex_box_dim 3 -dm_plex_box_simplex 0 -potential_petscspace_degree 1 -dm_refine 1 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 1000 -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -mg_levels_ksp_type chebyshev -mg_levels_ksp_max_it 1 -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -ksp_monitor *-dm_plex_box_faces 4,4,4 -dm_distribute *-dm_view -vec_type cuda -mat_type aijcusparse DM Object: Mesh 4 MPI processes type: plex Mesh in 3 dimensions: * 0-cells: 225 225 225 225 1-cells: 560 560 560 560 2-cells: 464 464 464 464 3-cells: 128 128 128 128* Labels: celltype: 4 strata with value/size (0 (225), 1 (560), 4 (464), 7 (128)) depth: 4 strata with value/size (0 (225), 1 (560), 2 (464), 3 (128)) marker: 1 strata with value/size (1 (384)) Face Sets: 4 strata with value/size (1 (72), 3 (72), 5 (36), 6 (36)) 0 KSP Residual norm 3.405958130078e+00 1 KSP Residual norm 3.963373201766e-01 2 KSP Residual norm 1.282320967427e-01 3 KSP Residual norm 2.824310691528e-02 4 KSP Residual norm 7.599375477471e-03 5 KSP Residual norm 2.083060123161e-03 6 KSP Residual norm 3.808511520034e-04 7 KSP Residual norm 8.175282902639e-05 8 KSP Residual norm 2.175564241206e-05 9 KSP Residual norm 5.595617772296e-06 10 KSP Residual norm 1.577629043326e-06 11 KSP Residual norm 4.401090414293e-07 12 KSP Residual norm 9.285166456648e-08 13 KSP Residual norm 1.606071620113e-08 14 KSP Residual norm 3.371039707655e-09 15 KSP Residual norm 7.260259461122e-10 16 KSP Residual norm 1.511146527931e-10 17 KSP Residual norm 3.804907486855e-11 18 KSP Residual norm 1.062507039176e-11 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangc20 at rpi.edu Thu Sep 24 14:08:56 2020 From: zhangc20 at rpi.edu (Zhang, Chonglin) Date: Thu, 24 Sep 2020 19:08:56 +0000 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: Message-ID: <09E01C0C-808B-489E-85AF-E9EF568B3F01@rpi.edu> Hi Matt, I will quickly summarize what I found with ?CreateMesh" for running ex12 here: https://gitlab.com/petsc/petsc/-/blob/master/src/snes/tutorials/ex12.c. If this is not a proper threads to discuss this, I can open a new one. Commands used (relevant to mesh creation) to run ex12 (quad core desktop computer with CPU only, 4 MPI ranks): mpirun -np 4 -cells 100, 100, 0 -options_left -log_view I built PETSc commit: 2bbfc05, dated Sep 23, 2020, with debug=no. Mesh size CreateMesh (seconds) DMPlexDistribute (seconds) 100 *100 0.14 0.081 500 *500 2.28 1.33 1000*1000 10.1 5.10 2000*1000 24.6 10.96 2000*2000 73.7 22.72 Is the performance reasonable for the ?CreateMesh? functionality? Anything I am not doing correctly with DMPlex running ex12? Thanks! Chonglin On Sep 24, 2020, at 2:06 PM, Matthew Knepley > wrote: On Thu, Sep 24, 2020 at 2:04 PM Mark Adams > wrote: On Thu, Sep 24, 2020 at 1:38 PM Matthew Knepley > wrote: On Thu, Sep 24, 2020 at 12:48 PM Zhang, Chonglin > wrote: Thanks Mark and Barry! A quick try of using ?-pc_type jacobi? did reduce the number of count for ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts did not decrease in this case) solves the problem faster (may not be of any meaning since the problem size is too small; the function ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size preventing running larger problems, separate issue). It sounds like something is wrong then, or I do not understand what you mean by slow. sor may be the default so you need to set the -mg_level_ksp[pc]_type chebyshev[jacobi]. chebyshev is the ksp default. I meant for the mesh creation. Thanks, Matt Thanks, Matt Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a significant amount of time for a realistic sized problem, say for example a linear problem with ~1-2 million DOFs? Also, is there any plan to have the SNES and DMPlex code run on GPU? Thanks! Chonglin On Sep 24, 2020, at 12:17 PM, Barry Smith > wrote: MatSOR() runs on the CPU, this causes copy to CPU for each application of MatSOR() and then a copy to GPU for the next step. You can try, for example -pc_type jacobi better yet use PCGAMG if it amenable for your problem. Also the problem is way to small for a GPU. There will be copies between the GPU/CPU for each SNES iteration since the DMPLEX code does not run on GPUs. Barry On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin > wrote: Dear PETSc Users, I have some questions regarding the proper GPU usage. I would like to know the proper way to: (1) solve linear equation in SNES, using GPU in PETSc; what syntax/arguments should I be using; (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data transfer showed in PETSc log file, when using CUDA aware MPI. Details of what I am doing now and my observations are below: System and compilers used: (1) RPI?s AiMOS computer (node wise, it is the same as Summit); (2) using GCC 7.4.0 and Spectrum-MPI 10.3. I am doing the followings to solve the linear Poisson equation with SNES interface, under DMPlex: (1) using DMPlex to set up the unstructured mesh; (2) using DM to create vector and matrix; (3) using SNES interface to solve the linear Poisson equation, with ?-snes_type ksponly?; (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector and matrix, as suggested in this webpage: https://www.mcs.anl.gov/petsc/features/gpus.html (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf (6) using ?-options_left? to check and make sure all the arguments are accepted and used by PETSc. (7) After problem setup, I am running the ?SNESSolve()? multiple times to solve the linear problem and observe the log file with ?-log_view" I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log corresponding to running SNESSolve() 500 times: Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 1.44e+03 82 SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 2.48e+01 0 SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 1.24e+01 0 DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 1.24e+01 0 DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 0.00e+00 0 MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 1.41e+03 0 MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 0.00e+00 100 VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 0.00e+00 100 VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 0.00e+00 100 VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 0.00e+00 0 VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 1.44e+03 0 KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 1.41e+03 89 KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 0.00e+00 100 PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 1.41e+03 0 Thanks! Chonglin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 24 14:26:21 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 15:26:21 -0400 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: <09E01C0C-808B-489E-85AF-E9EF568B3F01@rpi.edu> References: <09E01C0C-808B-489E-85AF-E9EF568B3F01@rpi.edu> Message-ID: On Thu, Sep 24, 2020 at 3:08 PM Zhang, Chonglin wrote: > Hi Matt, > > I will quickly summarize what I found with ?CreateMesh" for running ex12 > here: > https://gitlab.com/petsc/petsc/-/blob/master/src/snes/tutorials/ex12.c. > If this is not a proper threads to discuss this, I can open a new one. > > Commands used (relevant to mesh creation) to run ex12 (quad core desktop > computer with CPU only, 4 MPI ranks): > mpirun -np 4 -cells 100, 100, 0 -options_left -log_view > I built PETSc commit: 2bbfc05, dated Sep 23, 2020, with debug=no. > > Mesh size CreateMesh (seconds) DMPlexDistribute (seconds) > 100 *100 0.14 0.081 > 500 *500 2.28 1.33 > 1000*1000 10.1 5.10 > 2000*1000 24.6 10.96 > 2000*2000 73.7 22.72 > > Is the performance reasonable for the ?CreateMesh? functionality? > > Anything I am not doing correctly with DMPlex running ex12? > ex12 is a little old. I have been meaning to update it. ex13 does the same thing in a more modern way. Above looks reasonable I think. The CreateMesh time includes generating the mesh using Triangle, since simplex is the default. In example 12, you could use -simplex 0 or in ex13 -dm_plex_box_simplex 0 to get hexes, which do not use a generator. Second, you are interpolating all on process 0, which is probably the bulk of the time. I do that because I do not care about parallel performance in the examples and it is simpler. You can also refine the mesh after distribution, which is faster, and cuts down on the distribution time. So if you want the whole thing, you could use DM odm, dm; /* Create a cell-vertex box mesh */ ierr = DMPlexCreateBoxMesh(comm, 2, PETSC_TRUE, NULL, NULL, NULL, NULL, PETSC_FALSE, &odm);CHKERRQ(ierr); ierr = PetscObjectSetOptionsPrefix((PetscObject) dm, "orig_");CHKERRQ(ierr); /* Distributes the mesh here */ ierr = DMSetFromOptions(odm);CHKERRQ(ierr); /* Interpolate the mesh */ ierr = DMPlexInterpolate(odm, &dm);CHKERRQ(ierr); ierr = DMDestroy(&odm);CHKERRQ(ierr); /* Refine the mesh */ ierr = DMSetFromOptions(dm);CHKERRQ(ierr); and run with -dm_plex_box_simplex 0 -dm_plex_box_faces 100,100 -orig_dm_distribute -dm_refine 3 Thanks, Matt Thanks! > Chonglin > > On Sep 24, 2020, at 2:06 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 2:04 PM Mark Adams wrote: > >> On Thu, Sep 24, 2020 at 1:38 PM Matthew Knepley >> wrote: >> >>> On Thu, Sep 24, 2020 at 12:48 PM Zhang, Chonglin >>> wrote: >>> >>>> Thanks Mark and Barry! >>>> >>>> A quick try of using ?-pc_type jacobi? did reduce the number of count >>>> for ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts >>>> did not decrease in this case) solves the problem faster (may not be of any >>>> meaning since the problem size is too small; the function >>>> ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size >>>> preventing running larger problems, separate issue). >>>> >>> >>> It sounds like something is wrong then, or I do not understand what you >>> mean by slow. >>> >> >> sor may be the default so you need to set the -mg_level_ksp[pc]_type >> chebyshev[jacobi]. chebyshev is the ksp default. >> > > I meant for the mesh creation. > > Thanks, > > Matt > > >> Thanks, >>> >>> Matt >>> >>> >>>> Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a >>>> significant amount of time for a realistic sized problem, say for example a >>>> linear problem with ~1-2 million DOFs? >>>> >>>> Also, is there any plan to have the SNES and DMPlex code run on GPU? >>>> >>>> Thanks! >>>> Chonglin >>>> >>>> On Sep 24, 2020, at 12:17 PM, Barry Smith wrote: >>>> >>>> >>>> MatSOR() runs on the CPU, this causes copy to CPU for each >>>> application of MatSOR() and then a copy to GPU for the next step. >>>> >>>> You can try, for example -pc_type jacobi better yet use PCGAMG if >>>> it amenable for your problem. >>>> >>>> Also the problem is way to small for a GPU. >>>> >>>> There will be copies between the GPU/CPU for each SNES iteration >>>> since the DMPLEX code does not run on GPUs. >>>> >>>> Barry >>>> >>>> >>>> >>>> On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin wrote: >>>> >>>> Dear PETSc Users, >>>> >>>> I have some questions regarding the proper GPU usage. I would like to >>>> know the proper way to: >>>> (1) solve linear equation in SNES, using GPU in PETSc; what >>>> syntax/arguments should I be using; >>>> (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data >>>> transfer showed in PETSc log file, when using CUDA aware MPI. >>>> >>>> >>>> Details of what I am doing now and my observations are below: >>>> >>>> System and compilers used: >>>> (1) RPI?s AiMOS computer (node wise, it is the same as Summit); >>>> (2) using GCC 7.4.0 and Spectrum-MPI 10.3. >>>> >>>> I am doing the followings to solve the linear Poisson equation with >>>> SNES interface, under DMPlex: >>>> (1) using DMPlex to set up the unstructured mesh; >>>> (2) using DM to create vector and matrix; >>>> (3) using SNES interface to solve the linear Poisson equation, with >>>> ?-snes_type ksponly?; >>>> (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU >>>> vector and matrix, as suggested in this webpage: >>>> https://www.mcs.anl.gov/petsc/features/gpus.html >>>> (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to >>>> enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): >>>> https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; >>>> https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf >>>> (6) using ?-options_left? to check and make sure all the arguments are >>>> accepted and used by PETSc. >>>> (7) After problem setup, I am running the ?SNESSolve()? multiple times >>>> to solve the linear problem and observe the log file with ?-log_view" >>>> >>>> I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, >>>> the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for >>>> some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, >>>> VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log >>>> corresponding to running SNESSolve() 500 times: >>>> >>>> >>>> Event Count Time (sec) Flop >>>> --- Global --- --- Stage ---- Total GPU - CpuToGpu - - >>>> GpuToCpu - GPU >>>> Max Ratio Max Ratio Max Ratio Mess >>>> AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size >>>> Count Size %F >>>> >>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> --- Event Stage 0: Main Stage >>>> >>>> BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 >>>> 4.0e+00 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 >>>> 0.00e+00 0 0.00e+00 0 >>>> BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0.00e+00 0 0.00e+00 0 >>>> SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 >>>> 0.0e+00 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 >>>> 7.82e+02 63363 1.44e+03 82 >>>> SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0.00e+00 0 0.00e+00 0 >>>> SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 >>>> 0.0e+00 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 >>>> 0.00e+00 1000 2.48e+01 0 >>>> SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 >>>> 0.0e+00 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 >>>> 7.77e+01 500 1.24e+01 0 >>>> DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 >>>> 0.00e+00 500 1.24e+01 0 >>>> DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 >>>> 0.0e+00 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 >>>> 7.77e+01 0 0.00e+00 0 >>>> MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 >>>> 0.00e+00 61863 1.41e+03 0 >>>> MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 >>>> 0.00e+00 0 0.00e+00 0 >>>> MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 >>>> 7.80e+01 0 0.00e+00 0 >>>> MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 >>>> 7.80e+01 0 0.00e+00 0 >>>> VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 >>>> 0.0e+00 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 >>>> 6.81e+02 0 0.00e+00 100 >>>> VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 >>>> 0.0e+00 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 >>>> 2.31e+01 0 0.00e+00 100 >>>> VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 >>>> 0.0e+00 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 >>>> 2.31e+01 0 0.00e+00 100 >>>> VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 >>>> 7.04e+02 0 0.00e+00 0 >>>> VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0.00e+00 63363 1.44e+03 0 >>>> KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0.00e+00 0 0.00e+00 0 >>>> KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 >>>> 0.0e+00 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 >>>> 7.04e+02 61863 1.41e+03 89 >>>> KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 >>>> 0.0e+00 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 >>>> 6.81e+02 0 0.00e+00 100 >>>> PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 >>>> 0.00e+00 61863 1.41e+03 0 >>>> >>>> >>>> Thanks! >>>> Chonglin >>>> >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 24 14:42:00 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Sep 2020 14:42:00 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> Message-ID: <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> The stack is listed below. It crashes inside MatPtAP(). It is possible there is some subtle bug in the rather complex PETSc code for MatPtAP() but I am included to blame MPI again. I think we should add some simple low-overhead always on communication error detecting code to PetscSF where some check sums are also communicated at the highest level of PetscSF(). I don't know how but perhaps when the data is packed per destination rank a checksum is computed and when unpacked the checksum is compared using extra space at the end of the communicated packed array to store and send the checksum. Yes, it is kind of odd for a high level library like PETSc to not trust the communication channel but MPI implementations have proven themselves to not be trustworthy and adding this to PetscSF is not intrusive to the PETSc API or user. Plus it gives a definitive yes or no as to the problem being from an error in the communication. Barry > On Sep 24, 2020, at 12:35 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson > wrote: > Hi Guys, > > Thanks for all of the prompt responses, very helpful and appreciated. > > By "when debugging", did you mean when configure petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a debugger? > - Both, I have run with a debugger attached and detached, all compiled with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" > > 1) Try OpenMPI (probably won't help, but worth trying) > - Worth a try for sure > > 2) Find which part of the simulation makes it non-deterministic. Is it the mesh partitioning (parmetis)? Then try to make it deterministic. > - Good tip, it is the mesh partitioning and along the lines of a question from Barry, the matrix size is changing. I will make this deterministic and give it a try > > 3) Dump matrices, vectors, etc and see when it fails, you can quickly reproduce the error by reading in the intermediate data. > - Also a great suggestion, will give it a try > > The full stack would be really useful here. I am guessing this happens on MatMult(), but I do not know. > - Agreed, I am currently running it so that the full stack will be produced, but waiting for it to fail, had compiled with all available optimizations on, but downside is of course if there is a failure. > As a general question, roughly what's the performance impact on petsc with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging = 1? > Obviously problem/machine dependant, wondering on guidance more for this than anything > > Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? > The non-zero structure is changing, although the structures are reformed when this happens and this happens thousands of time before the failure has occured. > > Okay, this is the most likely spot for a bug. How are you changing the matrix? It should be impossible to put in an invalid column index when using MatSetValues() > because we check all the inputs. However, I do not think we check when you just yank out the arrays. > > Thanks, > > Matt > > Does this particular run always crash at the same place? Similar place? Doesn't always crash? > Doesn't always crash, but other similar runs have crashed in different spots, which makes it difficult to resolve. I am going to try out a few of the strategies suggested above and will let you know what comes of that. > > Chris Hewson > Senior Reservoir Simulation Engineer > ResFrac > +1.587.575.9792 > > > On Thu, Sep 24, 2020 at 11:05 AM Barry Smith > wrote: > Chris, > > We realize how frustrating this type of problem is to deal with. > > Here is the code: > > ierr = PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); > for (i=0; iB->rmap->n; i++) { > for (j=0; jilen[i]; j++) { > PetscInt data,gid1 = aj[B->i[i] + j] + 1; > ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); > if (!data) { > /* one based table */ > ierr = PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); > } > } > } > > It is simply looping over the rows of the sparse matrix putting the columns it finds into the hash table. > > aj[B->i[i] + j] are the column entries, the number of columns in the matrix is mat->cmap->N so the column entries should always be > less than the number of columns. The code is crashing when column entry 24443 which is larger than the number of columns 23988. > > So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is incorrect. > >>>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c > > if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { > ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); > } > > Seems to indicate it is setting up a new multiple because it is either the first time into the algorithm or the nonzero structure changed on some rank requiring a new assembly process. > > Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? > > Since the code has been running for a very long time already I have to conclude that this is not the first time through and so something has changed in the matrix? > > I think we have to put more diagnostics into the library to provide more information before or at the time of the error detection. > > Does this particular run always crash at the same place? Similar place? Doesn't always crash? > > Barry > > > > >> On Sep 24, 2020, at 8:46 AM, Chris Hewson > wrote: >> >> After about a month of not having this issue pop up, it has come up again >> >> We have been struggling with a similar PETSc Error for awhile now, the error message is as follows: >> >> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than largest key allowed 23988 >> >> It is a particularly nasty bug as it doesn't reproduce itself when debugging and doesn't happen all the time with the same inputs either. The problem occurs after a long runtime of the code (12-40 hours) and we are using a ksp solve with KSPBCGS. >> >> The PETSc compilation options that are used are: >> >> --download-metis >> --download-mpich >> --download-mumps >> --download-parmetis >> --download-ptscotch >> --download-scalapack >> --download-suitesparse >> --prefix=/opt/anl/petsc-3.13.3 >> --with-debugging=0 >> --with-mpi=1 >> COPTFLAGS=-O3 -march=haswell -mtune=haswell >> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >> >> This is being run across 8 processes and is failing consistently on the rank 7 process. We also use openmp outside of PETSC and the linear solve portion of the code. The rank 0 process is always using compute, during this the slave processes use an MPI_Wait call to wait for the collective parts of the code to be called again. All PETSC calls are done across all of the processes. >> >> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 package. >> >> At every PETSC call we are checking the error return from the function collectively to ensure that no errors have been returned from PETSC. >> >> Some possible causes that I can think of are as follows: >> 1. Memory leak causing a corruption either in our program or in petsc or with one of the petsc objects. This seems unlikely as we have checked runs with the option -malloc_dump for PETSc and using valgrind. >> >> 2. Optimization flags set for petsc compilation are causing variables that go out of scope to be optimized out. >> >> 3. We are giving the wrong number of elements for a process or the value is changing during the simulation. This seems unlikely as there is nothing overly unique about these simulations and it's not reproducing itself. >> >> 4. An MPI channel or socket error causing an error in the collective values for PETSc. >> >> Any input on this issue would be greatly appreciated. >> >> Chris Hewson >> Senior Reservoir Simulation Engineer >> ResFrac >> +1.587.575.9792 >> >> >> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang > wrote: >> That is a great idea. I'll figure it out. >> --Junchao Zhang >> >> >> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith > wrote: >> >> Junchao, >> >> Any way in the PETSc configure to warn that MPICH version is "bad" or "untrustworthy" or even the vague "we have suspicians about this version and recommend avoiding it"? A lot of time could be saved if others don't deal with the same problem. >> >> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and always check against that list and print a boxed warning at configure time? Better you could somehow generalize it and put it in package.py for use by all packages, then any package can included lists of "suspect" versions. (There are definitely HDF5 versions that should be avoided :-)). >> >> Barry >> >> >>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang > wrote: >>> >>> Thanks for the update. Let's assume it is a bug in MPI :) >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson > wrote: >>> Just as an update to this, I can confirm that using the mpich version (3.3.2) downloaded with the petsc download solved this issue on my end. >>> >>> Chris Hewson >>> Senior Reservoir Simulation Engineer >>> ResFrac >>> +1.587.575.9792 >>> >>> >>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang > wrote: >>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith > wrote: >>> >>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is there any way to run this full test suite under the problematic MPI and see if it detects any problems? >>> >>> Is so, could someone add it to the FAQ in the debugging section? >>> MPICH does have a test suite. It is at the subdir test/mpi of downloaded mpich . It annoyed me since it is not user-friendly. It might be helpful in catching bugs at very small scale. But say if I want to test allreduce on 1024 ranks on 100 doubles, I have to hack the test suite. >>> Anyway, the instructions are here. >>> For the purpose of petsc, under test/mpi one can configure it with >>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled cxx but I had to set CXX! >>> $make -k -j8 // -k is to keep going and ignore compilation errors, e.g., when building tests for MPICH extensions not in MPI standard, but your MPI is OpenMPI. >>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are sub-dirs containing tests for MPI routines Petsc does not rely on. >>> $ make testings or directly './runtests -tests=testlist' >>> >>> On a batch system, >>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say btest, >>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use 1024 ranks if a test does no specify the number of processes. >>> $ // It copies test binaries to the batch dir and generates a script runtests.batch there. Edit the script to fit your batch system and then submit a job and wait for its finish. >>> $ cd btest && ../checktests --ignorebogus >>> >>> PS: Fande, changing an MPI fixed your problem does not necessarily mean the old MPI has bugs. It is complicated. It could be a petsc bug. You need to provide us a code to reproduce your error. It does not matter if the code is big. >>> >>> >>> Thanks >>> >>> Barry >>> >>> >>>> On Jul 20, 2020, at 12:16 AM, Fande Kong > wrote: >>>> >>>> Trace could look like this: >>>> >>>> [640]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [640]PETSC ERROR: Argument out of range >>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521 >>>> [640]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by wangy2 Sun Jul 19 17:14:28 2020 >>>> [640]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices --download-mumps=0 >>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>> >>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong > wrote: >>>> I am not entirely sure what is happening, but we encountered similar issues recently. It was not reproducible. It might occur at different stages, and errors could be weird other than "ctable stuff." Our code was Valgrind clean since every PR in moose needs to go through rigorous Valgrind checks before it reaches the devel branch. The errors happened when we used mvapich. >>>> >>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was smooth. May you try a different MPI? It is better to try a system carried one. >>>> >>>> We did not get the bottom of this problem yet, but we at least know this is kind of MPI-related. >>>> >>>> Thanks, >>>> >>>> Fande, >>>> >>>> >>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson > wrote: >>>> Hi, >>>> >>>> I am having a bug that is occurring in PETSC with the return string: >>>> >>>> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than largest key allowed 5693 >>>> >>>> This is using petsc-3.13.2, compiled and running using mpich with -O3 and debugging turned off tuned to the haswell architecture and occurring either before or during a KSPBCGS solve/setup or during a MUMPS factorization solve (I haven't been able to replicate this issue with the same set of instructions etc.). >>>> >>>> This is a terrible way to ask a question, I know, and not very helpful from your side, but this is what I have from a user's run and can't reproduce on my end (either with the optimization compilation or with debugging turned on). This happens when the code has run for quite some time and is happening somewhat rarely. >>>> >>>> More than likely I am using a static variable (code is written in c++) that I'm not updating when the matrix size is changing or something silly like that, but any help or guidance on this would be appreciated. >>>> >>>> Chris Hewson >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>> >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 24 14:47:27 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 15:47:27 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> Message-ID: On Thu, Sep 24, 2020 at 3:42 PM Barry Smith wrote: > > The stack is listed below. It crashes inside MatPtAP(). > What about just checking that the column indices that PtAP receives are valid? Are we not doing that? Matt > It is possible there is some subtle bug in the rather complex PETSc code > for MatPtAP() but I am included to blame MPI again. > > I think we should add some simple low-overhead always on communication > error detecting code to PetscSF where some check sums are also communicated > at the highest level of PetscSF(). > > I don't know how but perhaps when the data is packed per destination > rank a checksum is computed and when unpacked the checksum is compared > using extra space at the end of the communicated packed array to store and > send the checksum. Yes, it is kind of odd for a high level library like > PETSc to not trust the communication channel but MPI implementations have > proven themselves to not be trustworthy and adding this to PetscSF is not > intrusive to the PETSc API or user. Plus it gives a definitive yes or no as > to the problem being from an error in the communication. > > Barry > > On Sep 24, 2020, at 12:35 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson wrote: > >> Hi Guys, >> >> Thanks for all of the prompt responses, very helpful and appreciated. >> >> By "when debugging", did you mean when configure petsc --with-debugging=1 >> COPTFLAGS=-O0 -g etc or when you attached a debugger? >> - Both, I have run with a debugger attached and detached, all compiled >> with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >> >> 1) Try OpenMPI (probably won't help, but worth trying) >> - Worth a try for sure >> >> 2) Find which part of the simulation makes it non-deterministic. Is it >> the mesh partitioning (parmetis)? Then try to make it deterministic. >> - Good tip, it is the mesh partitioning and along the lines of a question >> from Barry, the matrix size is changing. I will make this deterministic and >> give it a try >> >> 3) Dump matrices, vectors, etc and see when it fails, you can quickly >> reproduce the error by reading in the intermediate data. >> - Also a great suggestion, will give it a try >> >> The full stack would be really useful here. I am guessing this happens on >> MatMult(), but I do not know. >> - Agreed, I am currently running it so that the full stack will be >> produced, but waiting for it to fail, had compiled with all available >> optimizations on, but downside is of course if there is a failure. >> As a general question, roughly what's the performance impact on petsc >> with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging >> = 1? >> Obviously problem/machine dependant, wondering on guidance more for this >> than anything >> >> Is the nonzero structure of your matrices changing or is it fixed for the >> entire simulation? >> The non-zero structure is changing, although the structures are reformed >> when this happens and this happens thousands of time before the failure has >> occured. >> > > Okay, this is the most likely spot for a bug. How are you changing the > matrix? It should be impossible to put in an invalid column index when > using MatSetValues() > because we check all the inputs. However, I do not think we check when you > just yank out the arrays. > > Thanks, > > Matt > > >> Does this particular run always crash at the same place? Similar place? >> Doesn't always crash? >> Doesn't always crash, but other similar runs have crashed in different >> spots, which makes it difficult to resolve. I am going to try out a few of >> the strategies suggested above and will let you know what comes of that. >> >> *Chris Hewson* >> Senior Reservoir Simulation Engineer >> ResFrac >> +1.587.575.9792 >> >> >> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith wrote: >> >>> Chris, >>> >>> We realize how frustrating this type of problem is to deal with. >>> >>> Here is the code: >>> >>> ierr = >>> PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >>> for (i=0; iB->rmap->n; i++) { >>> for (j=0; jilen[i]; j++) { >>> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >>> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >>> if (!data) { >>> /* one based table */ >>> ierr = >>> PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >>> } >>> } >>> } >>> >>> It is simply looping over the rows of the sparse matrix putting the >>> columns it finds into the hash table. >>> >>> aj[B->i[i] + j] are the column entries, the number of columns in the >>> matrix is mat->cmap->N so the column entries should always be >>> less than the number of columns. The code is crashing when column entry >>> 24443 which is larger than the number of columns 23988. >>> >>> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is >>> incorrect. >>> >>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> >>>>>>>> >>> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >>> } >>> >>> Seems to indicate it is setting up a new multiple because it is either >>> the first time into the algorithm or the nonzero structure changed on some >>> rank requiring a new assembly process. >>> >>> Is the nonzero structure of your matrices changing or is it fixed >>> for the entire simulation? >>> >>> Since the code has been running for a very long time already I have to >>> conclude that this is not the first time through and so something has >>> changed in the matrix? >>> >>> I think we have to put more diagnostics into the library to provide more >>> information before or at the time of the error detection. >>> >>> Does this particular run always crash at the same place? Similar >>> place? Doesn't always crash? >>> >>> Barry >>> >>> >>> >>> >>> On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: >>> >>> After about a month of not having this issue pop up, it has come up again >>> >>> We have been struggling with a similar PETSc Error for awhile now, the >>> error message is as follows: >>> >>> [7]PETSC ERROR: PetscTableFind() line 132 in >>> /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than >>> largest key allowed 23988 >>> >>> It is a particularly nasty bug as it doesn't reproduce itself when >>> debugging and doesn't happen all the time with the same inputs either. The >>> problem occurs after a long runtime of the code (12-40 hours) and we are >>> using a ksp solve with KSPBCGS. >>> >>> The PETSc compilation options that are used are: >>> >>> --download-metis >>> --download-mpich >>> --download-mumps >>> --download-parmetis >>> --download-ptscotch >>> --download-scalapack >>> --download-suitesparse >>> --prefix=/opt/anl/petsc-3.13.3 >>> --with-debugging=0 >>> --with-mpi=1 >>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>> >>> This is being run across 8 processes and is failing consistently on the >>> rank 7 process. We also use openmp outside of PETSC and the linear solve >>> portion of the code. The rank 0 process is always using compute, during >>> this the slave processes use an MPI_Wait call to wait for the collective >>> parts of the code to be called again. All PETSC calls are done across all >>> of the processes. >>> >>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 >>> package. >>> >>> At every PETSC call we are checking the error return from the function >>> collectively to ensure that no errors have been returned from PETSC. >>> >>> Some possible causes that I can think of are as follows: >>> 1. Memory leak causing a corruption either in our program or in petsc or >>> with one of the petsc objects. This seems unlikely as we have checked runs >>> with the option -malloc_dump for PETSc and using valgrind. >>> >>> 2. Optimization flags set for petsc compilation are causing variables >>> that go out of scope to be optimized out. >>> >>> 3. We are giving the wrong number of elements for a process or the value >>> is changing during the simulation. This seems unlikely as there is nothing >>> overly unique about these simulations and it's not reproducing itself. >>> >>> 4. An MPI channel or socket error causing an error in the collective >>> values for PETSc. >>> >>> Any input on this issue would be greatly appreciated. >>> >>> *Chris Hewson* >>> Senior Reservoir Simulation Engineer >>> ResFrac >>> +1.587.575.9792 >>> >>> >>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang >>> wrote: >>> >>>> That is a great idea. I'll figure it out. >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: >>>> >>>>> >>>>> Junchao, >>>>> >>>>> Any way in the PETSc configure to warn that MPICH version is "bad" >>>>> or "untrustworthy" or even the vague "we have suspicians about this version >>>>> and recommend avoiding it"? A lot of time could be saved if others don't >>>>> deal with the same problem. >>>>> >>>>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and >>>>> always check against that list and print a boxed warning at configure time? >>>>> Better you could somehow generalize it and put it in package.py for use by >>>>> all packages, then any package can included lists of "suspect" versions. >>>>> (There are definitely HDF5 versions that should be avoided :-)). >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang >>>>> wrote: >>>>> >>>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson >>>>> wrote: >>>>> >>>>>> Just as an update to this, I can confirm that using the mpich version >>>>>> (3.3.2) downloaded with the petsc download solved this issue on my end. >>>>>> >>>>>> *Chris Hewson* >>>>>> Senior Reservoir Simulation Engineer >>>>>> ResFrac >>>>>> +1.587.575.9792 >>>>>> >>>>>> >>>>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang < >>>>>> junchao.zhang at gmail.com> wrote: >>>>>> >>>>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? >>>>>>>> Is there any way to run this full test suite under the problematic MPI and >>>>>>>> see if it detects any problems? >>>>>>>> >>>>>>>> Is so, could someone add it to the FAQ in the debugging >>>>>>>> section? >>>>>>>> >>>>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>>>> downloaded mpich >>>>>>> . >>>>>>> It annoyed me since it is not user-friendly. It might be helpful in >>>>>>> catching bugs at very small scale. But say if I want to test allreduce on >>>>>>> 1024 ranks on 100 doubles, I have to hack the test suite. >>>>>>> Anyway, the instructions are here. >>>>>>> >>>>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>>>> cxx but I had to set CXX! >>>>>>> $make -k -j8 // -k is to keep going and ignore compilation errors, >>>>>>> e.g., when building tests for MPICH extensions not in MPI standard, but >>>>>>> your MPI is OpenMPI. >>>>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>>>>>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>>>> $ make testings or directly './runtests -tests=testlist' >>>>>>> >>>>>>> On a batch system, >>>>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, >>>>>>> say btest, >>>>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use >>>>>>> 1024 ranks if a test does no specify the number of processes. >>>>>>> $ // It copies test binaries to the batch dir and generates a >>>>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>>>> then submit a job and wait for its finish. >>>>>>> $ cd btest && ../checktests --ignorebogus >>>>>>> >>>>>>> >>>>>>> PS: Fande, changing an MPI fixed your problem does not >>>>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>>>> not matter if the code is big. >>>>>>> >>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong >>>>>>>> wrote: >>>>>>>> >>>>>>>> Trace could look like this: >>>>>>>> >>>>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>>>> -------------------------------------------------------------- >>>>>>>> [640]PETSC ERROR: Argument out of range >>>>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>>>>>> 740521 >>>>>>>> [640]PETSC ERROR: See >>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>> shooting. >>>>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 >>>>>>>> by wangy2 Sun Jul 19 17:14:28 2020 >>>>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>>>> --download-mumps=0 >>>>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line >>>>>>>> 901 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>>>>>> 3180 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>>>> >>>>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I am not entirely sure what is happening, but we encountered >>>>>>>>> similar issues recently. It was not reproducible. It might occur at >>>>>>>>> different stages, and errors could be weird other than "ctable stuff." Our >>>>>>>>> code was Valgrind clean since every PR in moose needs to go through >>>>>>>>> rigorous Valgrind checks before it reaches the devel branch. The errors >>>>>>>>> happened when we used mvapich. >>>>>>>>> >>>>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything >>>>>>>>> was smooth. May you try a different MPI? It is better to try a system >>>>>>>>> carried one. >>>>>>>>> >>>>>>>>> We did not get the bottom of this problem yet, but we at least >>>>>>>>> know this is kind of MPI-related. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Fande, >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am having a bug that is occurring in PETSC with the return >>>>>>>>>> string: >>>>>>>>>> >>>>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>>>> largest key allowed 5693 >>>>>>>>>> >>>>>>>>>> This is using petsc-3.13.2, compiled and running using mpich with >>>>>>>>>> -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>>>> same set of instructions etc.). >>>>>>>>>> >>>>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>>>> time and is happening somewhat rarely. >>>>>>>>>> >>>>>>>>>> More than likely I am using a static variable (code is written in >>>>>>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>>>> >>>>>>>>>> *Chris Hewson* >>>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>>> ResFrac >>>>>>>>>> +1.587.575.9792 >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangc20 at rpi.edu Thu Sep 24 14:48:51 2020 From: zhangc20 at rpi.edu (Zhang, Chonglin) Date: Thu, 24 Sep 2020 19:48:51 +0000 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: References: <09E01C0C-808B-489E-85AF-E9EF568B3F01@rpi.edu> Message-ID: <65CDCC48-C177-4439-A214-DAE4671839C5@rpi.edu> Hi Matt, Thanks for the comments and the nice example code. Right now our objective is to use XGC unstructured flux-surface-following mesh (fixed in size), I will keep your comment on mesh refinement in mind. Thanks! Chonglin On Sep 24, 2020, at 3:26 PM, Matthew Knepley > wrote: On Thu, Sep 24, 2020 at 3:08 PM Zhang, Chonglin > wrote: Hi Matt, I will quickly summarize what I found with ?CreateMesh" for running ex12 here: https://gitlab.com/petsc/petsc/-/blob/master/src/snes/tutorials/ex12.c. If this is not a proper threads to discuss this, I can open a new one. Commands used (relevant to mesh creation) to run ex12 (quad core desktop computer with CPU only, 4 MPI ranks): mpirun -np 4 -cells 100, 100, 0 -options_left -log_view I built PETSc commit: 2bbfc05, dated Sep 23, 2020, with debug=no. Mesh size CreateMesh (seconds) DMPlexDistribute (seconds) 100 *100 0.14 0.081 500 *500 2.28 1.33 1000*1000 10.1 5.10 2000*1000 24.6 10.96 2000*2000 73.7 22.72 Is the performance reasonable for the ?CreateMesh? functionality? Anything I am not doing correctly with DMPlex running ex12? ex12 is a little old. I have been meaning to update it. ex13 does the same thing in a more modern way. Above looks reasonable I think. The CreateMesh time includes generating the mesh using Triangle, since simplex is the default. In example 12, you could use -simplex 0 or in ex13 -dm_plex_box_simplex 0 to get hexes, which do not use a generator. Second, you are interpolating all on process 0, which is probably the bulk of the time. I do that because I do not care about parallel performance in the examples and it is simpler. You can also refine the mesh after distribution, which is faster, and cuts down on the distribution time. So if you want the whole thing, you could use DM odm, dm; /* Create a cell-vertex box mesh */ ierr = DMPlexCreateBoxMesh(comm, 2, PETSC_TRUE, NULL, NULL, NULL, NULL, PETSC_FALSE, &odm);CHKERRQ(ierr); ierr = PetscObjectSetOptionsPrefix((PetscObject) dm, "orig_");CHKERRQ(ierr); /* Distributes the mesh here */ ierr = DMSetFromOptions(odm);CHKERRQ(ierr); /* Interpolate the mesh */ ierr = DMPlexInterpolate(odm, &dm);CHKERRQ(ierr); ierr = DMDestroy(&odm);CHKERRQ(ierr); /* Refine the mesh */ ierr = DMSetFromOptions(dm);CHKERRQ(ierr); and run with -dm_plex_box_simplex 0 -dm_plex_box_faces 100,100 -orig_dm_distribute -dm_refine 3 Thanks, Matt Thanks! Chonglin On Sep 24, 2020, at 2:06 PM, Matthew Knepley > wrote: On Thu, Sep 24, 2020 at 2:04 PM Mark Adams > wrote: On Thu, Sep 24, 2020 at 1:38 PM Matthew Knepley > wrote: On Thu, Sep 24, 2020 at 12:48 PM Zhang, Chonglin > wrote: Thanks Mark and Barry! A quick try of using ?-pc_type jacobi? did reduce the number of count for ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts did not decrease in this case) solves the problem faster (may not be of any meaning since the problem size is too small; the function ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size preventing running larger problems, separate issue). It sounds like something is wrong then, or I do not understand what you mean by slow. sor may be the default so you need to set the -mg_level_ksp[pc]_type chebyshev[jacobi]. chebyshev is the ksp default. I meant for the mesh creation. Thanks, Matt Thanks, Matt Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a significant amount of time for a realistic sized problem, say for example a linear problem with ~1-2 million DOFs? Also, is there any plan to have the SNES and DMPlex code run on GPU? Thanks! Chonglin On Sep 24, 2020, at 12:17 PM, Barry Smith > wrote: MatSOR() runs on the CPU, this causes copy to CPU for each application of MatSOR() and then a copy to GPU for the next step. You can try, for example -pc_type jacobi better yet use PCGAMG if it amenable for your problem. Also the problem is way to small for a GPU. There will be copies between the GPU/CPU for each SNES iteration since the DMPLEX code does not run on GPUs. Barry On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin > wrote: Dear PETSc Users, I have some questions regarding the proper GPU usage. I would like to know the proper way to: (1) solve linear equation in SNES, using GPU in PETSc; what syntax/arguments should I be using; (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data transfer showed in PETSc log file, when using CUDA aware MPI. Details of what I am doing now and my observations are below: System and compilers used: (1) RPI?s AiMOS computer (node wise, it is the same as Summit); (2) using GCC 7.4.0 and Spectrum-MPI 10.3. I am doing the followings to solve the linear Poisson equation with SNES interface, under DMPlex: (1) using DMPlex to set up the unstructured mesh; (2) using DM to create vector and matrix; (3) using SNES interface to solve the linear Poisson equation, with ?-snes_type ksponly?; (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU vector and matrix, as suggested in this webpage: https://www.mcs.anl.gov/petsc/features/gpus.html (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf (6) using ?-options_left? to check and make sure all the arguments are accepted and used by PETSc. (7) After problem setup, I am running the ?SNESSolve()? multiple times to solve the linear problem and observe the log file with ?-log_view" I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log corresponding to running SNESSolve() 500 times: Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 4.0e+00 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 0.0e+00 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 7.82e+02 63363 1.44e+03 82 SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 0.0e+00 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 0.00e+00 1000 2.48e+01 0 SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 0.0e+00 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 7.77e+01 500 1.24e+01 0 DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 0.00e+00 500 1.24e+01 0 DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 7.77e+01 0 0.00e+00 0 MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 0.00e+00 61863 1.41e+03 0 MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 7.80e+01 0 0.00e+00 0 VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 6.81e+02 0 0.00e+00 100 VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 0.0e+00 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 2.31e+01 0 0.00e+00 100 VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 0.0e+00 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 2.31e+01 0 0.00e+00 100 VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 7.04e+02 0 0.00e+00 0 VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 63363 1.44e+03 0 KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 0.0e+00 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 7.04e+02 61863 1.41e+03 89 KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 0.0e+00 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 6.81e+02 0 0.00e+00 100 PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 0.00e+00 61863 1.41e+03 0 Thanks! Chonglin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 24 15:01:37 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 16:01:37 -0400 Subject: [petsc-users] Proper GPU usage in PETSc In-Reply-To: <65CDCC48-C177-4439-A214-DAE4671839C5@rpi.edu> References: <09E01C0C-808B-489E-85AF-E9EF568B3F01@rpi.edu> <65CDCC48-C177-4439-A214-DAE4671839C5@rpi.edu> Message-ID: On Thu, Sep 24, 2020 at 3:48 PM Zhang, Chonglin wrote: > Hi Matt, > > Thanks for the comments and the nice example code. Right now our objective > is to use XGC unstructured flux-surface-following mesh (fixed in size), I > will keep your comment on mesh refinement in mind. > Okay. If you load the mesh in parallel, then interpolation is done in parallel too, so that is fine. The main time in parallel load is then the redistribution of vertices. Vaclav wrote a nice paper about it: https://arxiv.org/abs/2004.08729 Thanks, Matt > Thanks! > Chonglin > > On Sep 24, 2020, at 3:26 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 3:08 PM Zhang, Chonglin wrote: > >> Hi Matt, >> >> I will quickly summarize what I found with ?CreateMesh" for running ex12 >> here: >> https://gitlab.com/petsc/petsc/-/blob/master/src/snes/tutorials/ex12.c. >> If this is not a proper threads to discuss this, I can open a new one. >> >> Commands used (relevant to mesh creation) to run ex12 (quad core desktop >> computer with CPU only, 4 MPI ranks): >> mpirun -np 4 -cells 100, 100, 0 -options_left -log_view >> I built PETSc commit: 2bbfc05, dated Sep 23, 2020, with debug=no. >> >> Mesh size CreateMesh (seconds) DMPlexDistribute (seconds) >> 100 *100 0.14 0.081 >> 500 *500 2.28 1.33 >> 1000*1000 10.1 5.10 >> 2000*1000 24.6 10.96 >> 2000*2000 73.7 22.72 >> >> Is the performance reasonable for the ?CreateMesh? functionality? >> >> Anything I am not doing correctly with DMPlex running ex12? >> > > ex12 is a little old. I have been meaning to update it. ex13 does the same > thing in a more modern way. > > Above looks reasonable I think. The CreateMesh time includes generating > the mesh using Triangle, since simplex is the > default. In example 12, you could use > > -simplex 0 > > or in ex13 > > -dm_plex_box_simplex 0 > > to get hexes, which do not use a generator. Second, you are interpolating > all on process 0, which is probably > the bulk of the time. I do that because I do not care about parallel > performance in the examples and it is simpler. > You can also refine the mesh after distribution, which is faster, and cuts > down on the distribution time. So if you > want the whole thing, you could use > > DM odm, dm; > > /* Create a cell-vertex box mesh */ > ierr = DMPlexCreateBoxMesh(comm, 2, PETSC_TRUE, NULL, NULL, NULL, NULL, > PETSC_FALSE, &odm);CHKERRQ(ierr); > ierr = PetscObjectSetOptionsPrefix((PetscObject) dm, > "orig_");CHKERRQ(ierr); > /* Distributes the mesh here */ > ierr = DMSetFromOptions(odm);CHKERRQ(ierr); > /* Interpolate the mesh */ > ierr = DMPlexInterpolate(odm, &dm);CHKERRQ(ierr); > ierr = DMDestroy(&odm);CHKERRQ(ierr); > /* Refine the mesh */ > ierr = DMSetFromOptions(dm);CHKERRQ(ierr); > > and run with > > -dm_plex_box_simplex 0 -dm_plex_box_faces 100,100 -orig_dm_distribute > -dm_refine 3 > > Thanks, > > Matt > > Thanks! >> Chonglin >> >> On Sep 24, 2020, at 2:06 PM, Matthew Knepley wrote: >> >> On Thu, Sep 24, 2020 at 2:04 PM Mark Adams wrote: >> >>> On Thu, Sep 24, 2020 at 1:38 PM Matthew Knepley >>> wrote: >>> >>>> On Thu, Sep 24, 2020 at 12:48 PM Zhang, Chonglin >>>> wrote: >>>> >>>>> Thanks Mark and Barry! >>>>> >>>>> A quick try of using ?-pc_type jacobi? did reduce the number of count >>>>> for ?CpuToGpu? and ?GpuToCpu?, although using ?-pc_type gamg? (the counts >>>>> did not decrease in this case) solves the problem faster (may not be of any >>>>> meaning since the problem size is too small; the function >>>>> ?DMPlexCreateFromCellListParallelPetsc()" is slow for large problem size >>>>> preventing running larger problems, separate issue). >>>>> >>>> >>>> It sounds like something is wrong then, or I do not understand what you >>>> mean by slow. >>>> >>> >>> sor may be the default so you need to set the -mg_level_ksp[pc]_type >>> chebyshev[jacobi]. chebyshev is the ksp default. >>> >> >> I meant for the mesh creation. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Would this ?CpuToGpu? and ?GpuToCpu? data transfer contribute a >>>>> significant amount of time for a realistic sized problem, say for example a >>>>> linear problem with ~1-2 million DOFs? >>>>> >>>>> Also, is there any plan to have the SNES and DMPlex code run on GPU? >>>>> >>>>> Thanks! >>>>> Chonglin >>>>> >>>>> On Sep 24, 2020, at 12:17 PM, Barry Smith wrote: >>>>> >>>>> >>>>> MatSOR() runs on the CPU, this causes copy to CPU for each >>>>> application of MatSOR() and then a copy to GPU for the next step. >>>>> >>>>> You can try, for example -pc_type jacobi better yet use PCGAMG if >>>>> it amenable for your problem. >>>>> >>>>> Also the problem is way to small for a GPU. >>>>> >>>>> There will be copies between the GPU/CPU for each SNES iteration >>>>> since the DMPLEX code does not run on GPUs. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> On Sep 24, 2020, at 10:08 AM, Zhang, Chonglin >>>>> wrote: >>>>> >>>>> Dear PETSc Users, >>>>> >>>>> I have some questions regarding the proper GPU usage. I would like to >>>>> know the proper way to: >>>>> (1) solve linear equation in SNES, using GPU in PETSc; what >>>>> syntax/arguments should I be using; >>>>> (2) how to avoid/reduce the ?CpuToGpu count? and ?GpuToCpu count? data >>>>> transfer showed in PETSc log file, when using CUDA aware MPI. >>>>> >>>>> >>>>> Details of what I am doing now and my observations are below: >>>>> >>>>> System and compilers used: >>>>> (1) RPI?s AiMOS computer (node wise, it is the same as Summit); >>>>> (2) using GCC 7.4.0 and Spectrum-MPI 10.3. >>>>> >>>>> I am doing the followings to solve the linear Poisson equation with >>>>> SNES interface, under DMPlex: >>>>> (1) using DMPlex to set up the unstructured mesh; >>>>> (2) using DM to create vector and matrix; >>>>> (3) using SNES interface to solve the linear Poisson equation, with >>>>> ?-snes_type ksponly?; >>>>> (4) using ?dm_vec_type cuda?, ?dm_mat_type aijcusparse ? to use GPU >>>>> vector and matrix, as suggested in this webpage: >>>>> https://www.mcs.anl.gov/petsc/features/gpus.html >>>>> (5) using ?use_gpu_aware_mpi? with PETSc, and using `mpirun -gpu` to >>>>> enable GPU-Direct ( similar as "srun --smpiargs=?-gpu?" for Summit): >>>>> https://secure.cci.rpi.edu/wiki/Slurm/#gpu-direct; >>>>> https://www.olcf.ornl.gov/wp-content/uploads/2018/11/multi-gpu-workshop.pdf >>>>> (6) using ?-options_left? to check and make sure all the arguments are >>>>> accepted and used by PETSc. >>>>> (7) After problem setup, I am running the ?SNESSolve()? multiple times >>>>> to solve the linear problem and observe the log file with ?-log_view" >>>>> >>>>> I noticed that if I run ?SNESSolve()? 500 times, instead of 50 times, >>>>> the ?CpuToGpu count? and/or ?GpuToCpu count? increased roughly 10 times for >>>>> some of the operations: SNESSolve, MatSOR, VecMDot, VecCUDACopyTo, >>>>> VecCUDACopyFrom, MatCUSPARSCopyTo. See below for a truncated log >>>>> corresponding to running SNESSolve() 500 times: >>>>> >>>>> >>>>> Event Count Time (sec) Flop >>>>> --- Global --- --- Stage ---- Total GPU - CpuToGpu - - >>>>> GpuToCpu - GPU >>>>> Max Ratio Max Ratio Max Ratio Mess >>>>> AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size >>>>> Count Size %F >>>>> >>>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> --- Event Stage 0: Main Stage >>>>> >>>>> BuildTwoSided 510 1.0 4.9205e-03 1.1 0.00e+00 0.0 3.5e+01 >>>>> 4.0e+00 1.0e+03 0 0 0 0 0 0 0 21 0 0 0 0 0 >>>>> 0.00e+00 0 0.00e+00 0 >>>>> BuildTwoSidedF 501 1.0 1.0199e-02 1.4 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 1.0e+03 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>>> 0.00e+00 0 0.00e+00 0 >>>>> SNESSolve 500 1.0 3.2570e+02 1.0 1.18e+10 1.0 0.0e+00 >>>>> 0.0e+00 8.7e+05100100 0 0100 100100 0 0100 144 202 31947 >>>>> 7.82e+02 63363 1.44e+03 82 >>>>> SNESSetUp 1 1.0 6.0082e-04 1.7 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>>> 0.00e+00 0 0.00e+00 0 >>>>> SNESFunctionEval 500 1.0 3.9826e+01 1.0 3.60e+08 1.0 0.0e+00 >>>>> 0.0e+00 5.0e+02 12 3 0 0 0 12 3 0 0 0 36 13 0 >>>>> 0.00e+00 1000 2.48e+01 0 >>>>> SNESJacobianEval 500 1.0 4.8200e+01 1.0 5.97e+08 1.0 0.0e+00 >>>>> 0.0e+00 2.0e+03 15 5 0 0 0 15 5 0 0 0 50 0 1000 >>>>> 7.77e+01 500 1.24e+01 0 >>>>> DMPlexResidualFE 500 1.0 3.6923e+01 1.1 3.56e+08 1.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 10 3 0 0 0 10 3 0 0 0 39 0 0 >>>>> 0.00e+00 500 1.24e+01 0 >>>>> DMPlexJacobianFE 500 1.0 4.6013e+01 1.0 5.95e+08 1.0 0.0e+00 >>>>> 0.0e+00 2.0e+03 14 5 0 0 0 14 5 0 0 0 52 0 1000 >>>>> 7.77e+01 0 0.00e+00 0 >>>>> MatSOR 30947 1.0 3.1254e+00 1.1 1.21e+09 1.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1542 0 0 >>>>> 0.00e+00 61863 1.41e+03 0 >>>>> MatAssemblyBegin 511 1.0 5.3428e+00256.4 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 2.0e+03 1 0 0 0 0 1 0 0 0 0 0 0 0 >>>>> 0.00e+00 0 0.00e+00 0 >>>>> MatAssemblyEnd 511 1.0 4.3440e-02 1.2 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0 0 1002 >>>>> 7.80e+01 0 0.00e+00 0 >>>>> MatCUSPARSCopyTo 1002 1.0 3.6557e-02 1.2 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1002 >>>>> 7.80e+01 0 0.00e+00 0 >>>>> VecMDot 29930 1.0 3.7843e+01 1.0 2.62e+09 1.0 0.0e+00 >>>>> 0.0e+00 6.0e+04 12 22 0 0 7 12 22 0 0 7 277 3236 29930 >>>>> 6.81e+02 0 0.00e+00 100 >>>>> VecNorm 31447 1.0 2.1164e+01 1.4 1.79e+08 1.0 0.0e+00 >>>>> 0.0e+00 6.3e+04 5 2 0 0 7 5 2 0 0 7 34 55 1017 >>>>> 2.31e+01 0 0.00e+00 100 >>>>> VecNormalize 30947 1.0 2.3957e+01 1.1 2.65e+08 1.0 0.0e+00 >>>>> 0.0e+00 6.2e+04 7 2 0 0 7 7 2 0 0 7 44 51 1017 >>>>> 2.31e+01 0 0.00e+00 100 >>>>> VecCUDACopyTo 30947 1.0 7.8866e+00 3.4 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 30947 >>>>> 7.04e+02 0 0.00e+00 0 >>>>> VecCUDACopyFrom 63363 1.0 1.0873e+00 1.1 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>>> 0.00e+00 63363 1.44e+03 0 >>>>> KSPSetUp 500 1.0 2.2737e-03 1.1 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>>> 0.00e+00 0 0.00e+00 0 >>>>> KSPSolve 500 1.0 2.3687e+02 1.0 1.08e+10 1.0 0.0e+00 >>>>> 0.0e+00 8.6e+05 72 92 0 0 99 73 92 0 0 99 182 202 30947 >>>>> 7.04e+02 61863 1.41e+03 89 >>>>> KSPGMRESOrthog 29930 1.0 1.8920e+02 1.0 7.87e+09 1.0 0.0e+00 >>>>> 0.0e+00 6.4e+05 58 67 0 0 74 58 67 0 0 74 166 209 29930 >>>>> 6.81e+02 0 0.00e+00 100 >>>>> PCApply 30947 1.0 3.1555e+00 1.1 1.21e+09 1.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 1527 0 0 >>>>> 0.00e+00 61863 1.41e+03 0 >>>>> >>>>> >>>>> Thanks! >>>>> Chonglin >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Thu Sep 24 15:02:59 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Thu, 24 Sep 2020 20:02:59 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> <56056045-E253-44BE-AE4C-7EFE44D867ED@petsc.dev> , Message-ID: Hi Barry, I probably should have sent this output before (with -log_view_memory to get an idea of where the vectors are created). I looked at it but it doesn't help me much... Cheers, Sophie ________________________________ From: Barry Smith Sent: Wednesday, September 16, 2020 16:38 To: Blondel, Sophie Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Subject: Re: [petsc-users] Matrix Free Method questions Yikes, GAMG is using a lot of vectors. But many of these are much smaller vectors so not of major concern. I think this will just have to be an ongoing issue to see where the vectors are created internally and reuse or eliminate as many extra as possible. The option -log_view_memory causes the PETSc logging summary to print additional columns showing the memory allocated during the different events in PETSc. This can be useful to see "when" the memory is mostly created; it does not tell us "why" it is created but at least tells us were to look. Barry On Sep 16, 2020, at 1:54 PM, Blondel, Sophie > wrote: Hi Barry, I don't think we're explicitly creating many PETSc vectors in Xolotl. There is a global one created for the solution when the TS is set up, and local ones in RHSFunction and RHSJacobian; everywhere else we just get the array from it with DMDAVecGetArrayDOF and DMDAVecRestoreArrayDOF. I tried a few things to see if it changed the number of Vec from 85 (removing monitors, fewer time steps, fewer MPI tasks) but it stayed the same, except when I changed the PC option from "-fieldsplit_1_pc_type redundant" to "-fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1" where I got 10567 vectors. Cheers, Sophie ________________________________ From: Barry Smith > Sent: Tuesday, September 15, 2020 18:37 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions Sophie, Great, everything looks good. So the new version takes about 7 times longer, due to the relatively modest increase (about 25 percent) in the number of iterations from the poorer preconditioner convergence and the rest from the much slower matrix-vector product due to using matrix free instead of matrix based precondtioner. Both of these are expected. The matrix is taking about 10% of the memory it used to require, also expected. I noticed in the logging the memory for the vectors Vector 85 85 82303208 0. Matrix 15 15 8744032 0. is substantial/huge, with the much smaller matrix memory the vector memory dominates. It indicates 85 vectors are used. This is a large number, there are some needed for the TS (maybe 5?) and some needed for the KSP solve (maybe about 37) but I am not sure why there are so many. Perhaps this number could be reduced. Are there are lot of vectors created in the Xolotyl code? I would it could run with about 45 vectors. Barry On Sep 15, 2020, at 5:12 PM, Blondel, Sophie > wrote: Hi Barry, I fixed everything and re-ran the 4 cases in 1D. They took more time than before because I used the Kokkos serial backend on the Xolotl side instead of the CUDA one previously (long story short, I tried to update CUDA and messed up the whole installation). Step 4 looks much better than prevously, I was even able to remove MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and it ran without throwing errors. The log files are attached. Cheers, Sophie ________________________________ From: Barry Smith > Sent: Friday, September 11, 2020 18:03 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions On Sep 11, 2020, at 7:45 AM, Blondel, Sophie > wrote: Thank you Barry, Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations. I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code). Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms? Yes, you need to leave in everything that affects the diagonal otherwise the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner and likely perform poorly. Barry Cheers, Sophie ________________________________ From: Barry Smith > Sent: Thursday, September 10, 2020 17:04 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions On Sep 10, 2020, at 2:46 PM, Blondel, Sophie > wrote: Hi Barry, Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). but now when I run step 3 again I get the following error: [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Argument out of range [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. Barry Cheers, Sophie ________________________________ From: Barry Smith > Sent: Friday, September 4, 2020 01:06 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. I have started looking through the logs The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. * It roughly doubles the time since doing the matrix-free product requires a function evaluation * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations The change from 2 to 3 (not storing the entire matrix) * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). The change from 3 to 4 * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. Good luck, Barry On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: Hi Barry, Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mercredi 2 septembre 2020 15:53 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: Thank you Barry, The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. Sophie, That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : mardi 1 septembre 2020 14:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. Barry On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: Hi Barry, I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: Mat mat; DMCreateMatrix(da, &mat); MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 I get an error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c PetscSolver::solve: TSSolve failed. Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : lundi 31 ao?t 2020 14:50 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions Sophie, Thanks. The factor of 4 is lot, the 1.5 not so bad. You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. If you have any questions please let me know. Barry On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: Hi Barry, I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: * 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) * 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 18:31 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1D_4_mem.txt URL: From knepley at gmail.com Thu Sep 24 15:08:13 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 16:08:13 -0400 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> <56056045-E253-44BE-AE4C-7EFE44D867ED@petsc.dev> Message-ID: On Thu, Sep 24, 2020 at 4:03 PM Blondel, Sophie via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi Barry, > > I probably should have sent this output before (with -log_view_memory to > get an idea of where the vectors are created). I looked at it but it > doesn't help me much... > Just quickie, there is 82M in 85 vectors, but your system has 1.5M unknowns, so a single vector is about 12M. Thus, there are probably 5 or 6 systems vectors, and a bunch of small ones. Thanks, Matt > Cheers, > > Sophie > ------------------------------ > *From:* Barry Smith > *Sent:* Wednesday, September 16, 2020 16:38 > *To:* Blondel, Sophie > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Subject:* Re: [petsc-users] Matrix Free Method questions > > > Yikes, GAMG is using a lot of vectors. But many of these are much > smaller vectors so not of major concern. > > I think this will just have to be an ongoing issue to see where the > vectors are created internally and reuse or eliminate as many extra as > possible. > > The option -log_view_memory causes the PETSc logging summary to print > additional columns showing the memory allocated during the different events > in PETSc. This can be useful to see "when" the memory is mostly created; it > does not tell us "why" it is created but at least tells us were to look. > > Barry > > > On Sep 16, 2020, at 1:54 PM, Blondel, Sophie wrote: > > Hi Barry, > > I don't think we're explicitly creating many PETSc vectors in Xolotl. > There is a global one created for the solution when the TS is set up, and > local ones in RHSFunction and RHSJacobian; everywhere else we just get the > array from it with DMDAVecGetArrayDOF and DMDAVecRestoreArrayDOF. I tried a > few things to see if it changed the number of Vec from 85 (removing > monitors, fewer time steps, fewer MPI tasks) but it stayed the same, except > when I changed the PC option from "-fieldsplit_1_pc_type redundant" to > "-fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres > -fieldsplit_1_pc_gamg_threshold -1" where I got 10567 vectors. > > Cheers, > > Sophie > ------------------------------ > *From:* Barry Smith > *Sent:* Tuesday, September 15, 2020 18:37 > *To:* Blondel, Sophie > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Subject:* Re: [petsc-users] Matrix Free Method questions > > > Sophie, > > Great, everything looks good. > > So the new version takes about 7 times longer, due to the relatively > modest increase (about 25 percent) in the number of iterations from the > poorer preconditioner convergence and the rest from the much slower > matrix-vector product due to using matrix free instead of matrix based > precondtioner. Both of these are expected. > > The matrix is taking about 10% of the memory it used to require, also > expected. > > I noticed in the logging the memory for the vectors > > Vector 85 85 82303208 0. > Matrix 15 15 8744032 0. > > is substantial/huge, with the much smaller matrix memory the vector > memory dominates. > > It indicates 85 vectors are used. This is a large number, there are > some needed for the TS (maybe 5?) and some needed for the KSP solve (maybe > about 37) but I am not sure why there are so many. Perhaps this number > could be reduced. Are there are lot of vectors created in the Xolotyl code? > I would it could run with about 45 vectors. > > Barry > > > > > On Sep 15, 2020, at 5:12 PM, Blondel, Sophie wrote: > > Hi Barry, > > I fixed everything and re-ran the 4 cases in 1D. They took more time than > before because I used the Kokkos serial backend on the Xolotl side instead > of the CUDA one previously (long story short, I tried to update CUDA and > messed up the whole installation). Step 4 looks much better than prevously, > I was even able to remove > MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and > it ran without throwing errors. The log files are attached. > > Cheers, > > Sophie > ------------------------------ > *From:* Barry Smith > *Sent:* Friday, September 11, 2020 18:03 > *To:* Blondel, Sophie > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Subject:* Re: [petsc-users] Matrix Free Method questions > > > > On Sep 11, 2020, at 7:45 AM, Blondel, Sophie wrote: > > Thank you Barry, > > Step 3 worked after I moved MatSetOption at the beginning of > computeJacobian(). Attached is the updated log which is pretty similar to > what I had before. Step 4 still uses many more iterations. > > I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I > can see the difference between step 3 and 4 is that the contribution from > the reactions is not included in the step 4 Jacobian (as expected from the > fact that I removed their setting from the code). > > Looking back at one of your previous email, you wrote "This routine should > only compute the elements of the Jacobian needed for this reduced matrix > Jacobian, so the diagonals and the diffusion/convection terms. ", does it > mean that I should still include the contributions from the reactions that > affect the pure diagonal terms? > > > Yes, you need to leave in everything that affects the diagonal otherwise > the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner > and likely perform poorly. > > Barry > > > Cheers, > > Sophie > ------------------------------ > > *From:* Barry Smith > *Sent:* Thursday, September 10, 2020 17:04 > *To:* Blondel, Sophie > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Subject:* Re: [petsc-users] Matrix Free Method questions > > > > On Sep 10, 2020, at 2:46 PM, Blondel, Sophie wrote: > > Hi Barry, > > Going through the different changes again to understand what was going > wrong with the last step, I discovered that my changes from 2 to 3 > (keeping only the pure diagonal for the reaction Jacobian setup and adding > MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the > sparsity of the matrix was correct but then the RHSJacobian method was > wrong. I updated it > > > I'm not sure what you mean here. My hope was that in step 3 you won't > need to change RHSJacobian at all (that is just for step 4). > > but now when I run step 3 again I get the following error: > > [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [2]PETSC ERROR: Argument out of range > [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, > 316825) into matrix > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 > GIT Date: 2020-09-01 13:07:58 -0500 > [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 > 15:38:58 2020 > [2]PETSC ERROR: Configure options > PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 > --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no > --with-shared-libraries --download-fblaslapack=1 > [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in > /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c > [2]PETSC ERROR: #2 MatSetValues() line 1392 in > /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c > [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in > /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c > [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in > /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c > PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) > failed. > > Because the RHSJacobian method is trying to update the elements > corresponding to the reactions. I'm not sure I understood correctly what > step 3 was supposed to be. > > > In step the three the RHSJacobian was suppose to be unchanged, only the > option to ignore the "unneeded" Jacobian entries inside MatSetValues (set > with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed > (plus changing the DMDASetBlockFillsXXX argument). > > The error message Inserting a new nonzero at global row/column > (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR > is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is > setting values the Jacobian values. > > The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is > normally called inside the DMCreateMatrix() so I am not sure how they could > be getting called in the wrong order but it seems somehow it is > > When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) > in the code? You can call it at the beginning of computeJacobian(). > > If this still doesn't work and you get the same error you can run in the > debugger on one process and put a breakpoint for MatSetOptions() to found > out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the > apple cart. You should see MatSetOption() called at least twice and the > last one should have the MAT_NEW_NONZERO_LOCATION flag. > > Barry > > > > > > Cheers, > > Sophie > > > ------------------------------ > *From:* Barry Smith > *Sent:* Friday, September 4, 2020 01:06 > *To:* Blondel, Sophie > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Subject:* Re: [petsc-users] Matrix Free Method questions > > > Sophie, > > Thanks. I have started looking through the logs > > The change to matrix-free multiple (from 1 to 2) which reduces the > accuracy of the multiply to about half the digits is not surprising. > > * It roughly doubles the time since doing the matrix-free product > requires a function evaluation > > * It increases the iteration count, but not significantly since the > reduced precision of the multiple induces some additional linear iterations > > The change from 2 to 3 (not storing the entire matrix) > > * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so > it succeds in not storing the unneeded part of the matrix > > * the number of MatMult_MF goes from 2331 to 2418. I don't understand > this, I expected it to be identical because it should be using the same > preconditioner in 3 as in 2 and thus get the same convergence. Could > possibility be due to the variability in convergence due to different runs > with the matrix-free preconditioner preconditioner and not related to not > storing the entire matrix. > > * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial > difference which is what I would expect > > * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 > percent drop which is reasonable because 3 doesn't spend as much time > inserting matrix values (it still computes them but doesn't insert the ones > we don't want for the preconditioner). > > The change from 3 to 4 > > * something goes seriously wrong here. The total number of linear > solve iterations goes from 2282 to 97403 so something has gone seriously > wrong with the preconditioner, but since the preconditioner operations are > the same it seems something has gone wrong with the new reduced > preconditioner. > > I think there is an error in computing the reduced matrix entries, that > is the new compute Jacobian code is not computing the entries it needs to > correctly. > > To debug this you can run case 3 and case 4 for a single time step with > -ksp_view_pmat binary This should create a binary file with the initial > Jacobian matrices in each. You can use Matlab or Python to do the > difference in the matrices and see how possibly the new Jacobian > computation code is not producing the correct values in some locations. > > Good luck, > > Barry > > > > > On Sep 3, 2020, at 12:26 PM, Blondel, Sophie wrote: > > Hi Barry, > > Attached are the log files for the 1D case, for each of the 4 steps. I > don't know how I did it yesterday but the differences between steps look > better today, except for step 4 that takes many more iterations and smaller > time steps. > > Cheers, > > Sophie > ------------------------------ > > *De :* Barry Smith > *Envoy? :* mercredi 2 septembre 2020 15:53 > *? :* Blondel, Sophie > *Cc :* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Objet :* Re: [petsc-users] Matrix Free Method questions > > > > On Sep 2, 2020, at 1:44 PM, Blondel, Sophie wrote: > > Thank you Barry, > > The code ran with your branch but it's much slower than running with the > full Jacobian and Jacobi PC subtype (around 10 times slower). It is using > less memory as expected. I tried step 2 as well and it's even slower. > > > Sophie, > > That is puzzling. It should be using the same matrix in the solver so > should be the same speed and the setup time should be a bit better since it > does not form the full Jacobian. (We'll get to this later) > > The TS iteration for step 1 are the same as with full Jacobian. Let me > know what I can look at to check if I've done something wrong. > > > We need to see if the KSP iterations are pretty similar for four > approaches (1) original code with Jacobi PC subtype (2) matrix free with > Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the > MatSetOption() to not store the entire Jacobian also with > the -snes_mf_operator and (4) the new code that doesn't compute the > unneeded part of the Jacobian also with the -snes_mf_operator > > You could run each case with same 20 timesteps and -ts_monitor > -ksp_monitor and -ts_view and send the four output files around. > > Once we are sure the four cases are behaving as expected then you can get > timings for them but let's not do that until we confirm the similar > results. There could easily be a flaw in my reasoning or the PETSc code > somewhere that affects the correctness so its best to check that first. > > > Barry > > > Cheers, > > Sophie > ------------------------------ > *De :* Barry Smith > *Envoy? :* mardi 1 septembre 2020 14:12 > *? :* Blondel, Sophie > *Cc :* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Objet :* Re: [petsc-users] Matrix Free Method questions > > > Sophie, > > Sorry, looks like an old bug in PETSc that was undetected due to lack > of use. The code is trying to use the first of the two matrices to > determine the preconditioner which won't work in your case since it is > matrix-free. It should be using the second matrix. > > I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this > issue for you. > > Barry > > > On Sep 1, 2020, at 12:45 PM, Blondel, Sophie wrote: > > Hi Barry, > > I'm working through step 1) but I think I am doing something wrong. I'm > using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing > clusters (small He clusters here, from size 1 to 7) and all the diagonal > entries. Then I added a few lines in the code: > Mat mat; > DMCreateMatrix(da, &mat); > MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); > > When I try to run with the following options: -snes_mf_operator -ts_dt > 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration > -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type > jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max > 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover > -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 > > I get an error: > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal > entries defined > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da > GIT Date: 2020-08-28 16:47:50 +0000 > [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by > sophie Tue Sep 1 10:58:44 2020 > [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc > PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 > --with-debugging=no --with-shared-libraries > [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in > /home/sophie/Code/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in > /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c > [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in > /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c > [0]PETSC ERROR: #4 PCSetUp() line 1009 in > /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #5 KSPSetUp() line 406 in > /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in > /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #7 KSPSolve() line 889 in > /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in > /home/sophie/Code/petsc/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #9 SNESSolve() line 4524 in > /home/sophie/Code/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in > /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c > [0]PETSC ERROR: #11 TSStep() line 3731 in > /home/sophie/Code/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #12 TSSolve() line 4128 in > /home/sophie/Code/petsc/src/ts/interface/ts.c > PetscSolver::solve: TSSolve failed. > > Cheers, > > Sophie > ------------------------------ > *De :* Barry Smith > *Envoy? :* lundi 31 ao?t 2020 14:50 > *? :* Blondel, Sophie > *Cc :* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Objet :* Re: [petsc-users] Matrix Free Method questions > > > Sophie, > > Thanks. > > The factor of 4 is lot, the 1.5 not so bad. > > You will definitely want to retain the full matrix assembly codes for > speed and to verify a reduced matrix version. > > It is worth trying a "reduced matrix version" with matrix-free multiply > based on these numbers. This reduced matrix Jacobian will only have the > diagonals and all the terms connected to the cluster sizes that move. In > other words you will be building just the part of the Jacobian needed for > the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector > product matrix free. (SOR requires all the Jacobian entries). > > Fortunately this is hopefully pretty straightforward for this code. You > will not have to change the structure of the main code at all. > > Step 1) create a new "sparse matrix" that will be passed to > DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the > diagonal entries and also all the entries that are associated with the > variables that diffuse. If I remember correctly these are just the smallest > cluster size, plain Helium? > > Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); > > Then you would run the code with -snes_mf_operator and the new PC subtype > for Jacobi. > > A test that the new reduced Jacobian is correct will be that you get > almost the same iterations as the runs you just make using the PC subtype > of Jacobi. Hopefully not slower and using a great deal less memory. The > iterations will not be identical because of the matrix-free multiple. > > Step 2) create a new version of the Jacobian computation routine. This > routine should only compute the elements of the Jacobian needed for this > reduced matrix Jacobian, so the diagonals and the diffusion/convection > terms. > > Again run with with -snes_mf_operator and the new PC subtype for Jacobi > and you should again get the same convergence history. > > I made two steps because it makes it easier to validate and debug to > get the same results as before. The first step cheats in that it still > computes the full Jacobian but ignores the entries that we don't need to > store for the preconditioner. The second step is more efficient because it > only computes the Jacobian entries needed for the preconditioner but it > requires you going through the Jacobian code and making sure only the > needed parts are computed. > > > If you have any questions please let me know. > > Barry > > > > > On Aug 31, 2020, at 1:13 PM, Blondel, Sophie wrote: > > Hi Barry, > > I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: > > - 1D with 200 grid points and 7759 DOF per grid point (for the PSI > application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 > MatMult for SOR and 4162 MatMult for Jacobi) > - 2D with 63x63 grid points and 4124 DOF per grid point (for the NE > application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) > > Cheers, > > Sophie > ------------------------------ > *De :* Barry Smith > *Envoy? :* vendredi 28 ao?t 2020 18:31 > *? :* Blondel, Sophie > *Cc :* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Objet :* Re: [petsc-users] Matrix Free Method questions > > > > On Aug 28, 2020, at 4:11 PM, Blondel, Sophie wrote: > > Thank you Jed and Barry, > > First, attached are the logs from the benchmark runs I did without > (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to > get the -log_view to work because I'm using push and pop for the options > which means that PETSc is initialized with no argument so the command line > argument was not taken into account, but I guess this is for a separate > discussion. > > To answer questions about the current per-conditioners: > > - I used the same pre-conditioner options as listed in my previous > email when I added the -snes_mf option; I did try to remove all the PC > related options at one point with the MF method but didn't see a change in > runtime so I put them back in > - this benchmark is for a 1D DMDA using 20 grid points; when running > in 2D or 3D I switch the PC options to: -pc_type fieldsplit > -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type > gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 > > I haven't tried a Jacobi PC instead of SOR, I will run a set of more > realistic runs (1D and 2D) without MF but with Jacobi and report on it next > week. When you say "iterations" do you mean what is given by -ksp_monitor? > > > Yes, the number of MatMult is a good enough surrogate. > > So using matrix-free (which means no preconditioning) has > > 35846/160 > > ans = > > 224.0375 > > or 224 as many iterations. So even for this modest 1d problem > preconditioning is doing a great deal. > > Barry > > > > > Cheers, > > Sophie > ------------------------------ > *De :* Barry Smith > *Envoy? :* vendredi 28 ao?t 2020 12:12 > *? :* Blondel, Sophie > *Cc :* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Objet :* Re: [petsc-users] Matrix Free Method questions > > *[External Email]* > > Sophie, > > This is exactly what i would expect. If you run with -ksp_monitor you > will see the -snes_mf run takes many more iterations. > > I am puzzled that the argument -pc_type fieldsplit did not stop the > run since this is under normal circumstances not a viable preconditioner > with -snes_mf. Did you also remove the -pc_type fieldsplit argument? > > In order to see how one can avoid forming the entire matrix and use > matrix-free to do the matrix-vector but still have an effective > preconditioner let's look at what the current preconditioner options do. > > -pc_fieldsplit_detect_coupling > > > creates two sub-preconditioners, the first for all the variables and the > second for those that are coupled by the matrix to variables in neighboring > cells Since only the smallest cluster sizes have diffusion/advection this > second set contains only the cluster size one variables. > > -fieldsplit_0_pc_type sor > > > Runs SOR on all the variables; you can think of this as running SOR on the > reactions, it is a pretty good preconditioner for the reactions since the > reactions are local, per cell. > > -fieldsplit_1_pc_type redundant > > > This runs the default preconditioner (ILU) on just the variables that > diffuse, i.e. the elliptic part. For smallish problems this is fine, for > larger problems and 2d and 3d presumably you have also -redundant_pc_type > gamg to use algebraic multigrid for the diffusion. This part of the matrix > will always need to be formed and used in the preconditioner. It is very > important since the diffusion is what brings in most of the > ill-conditioning for larger problems into the linear system. Note that it > only needs the matrix entries for the cluster size of 1 so it is very small > compared to the entire sparse matrix. > > ---- > The first preconditioner SOR requires ALL the matrix entries which are > almost all (except for the diffusion terms) the coupling between different > size clusters within a cell. Especially each cell has its own sparse matrix > of the size of total number of clusters, it is sparse but not super sparse. > > So the to significantly lower memory usage we need to remove the SOR and > the storing of all the matrix entries but still have an efficient > preconditioner for the "reaction" terms. > > The simplest thing would be to use Jacobi instead of SOR for the first > subpreconditioner since it only requires the diagonal entries in the > matrix. But Jacobi is a worse preconditioner than SOR (since it totally > ignores the matrix coupling) and sometimes can be much worse. > > Before anyone writes additional code we need to know if doing something > along these lines does not ruin the convergence that. > > Have you used the same options as before but with -fieldsplit_0_pc_type > jacobi ? (Not using any matrix free). We need to get an idea of how many > more linear iterations it requires (not time, comparing time won't be > helpful for this exercise.) We also need this information for realistic > size problems in 2 or 3 dimensions that you really want to run; for small > problems this approach will work ok and give misleading information about > what happens for large problems. > > I suspect the iteration counts will shot up. Can you run some cases and > see how the iteration counts change? > > Based on that we can decide if we still retain "good convergence" by > changing the SOR to Jacobi and then change the code to make this change > efficient (basically by skipping the explicit computation of the reaction > Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) > > Barry > > > > > > > > > > On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi everyone, > > I have been using PETSc for a few years with a fully implicit TS ARKIMEX > method and am now exploring the matrix free method option. Here is the list > of PETSc options I typically use: -ts_dt 1.0e-12 > -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time > 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY > -ts_exact_final_time stepover -fieldsplit_0_pc_type sor > -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor > -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 > > I started to compare the performance of the code without changing anything > of the executable and simply adding "-snes_mf", I see a reduction of memory > usage as expected and a benchmark that would usually take ~5min to run now > takes ~50min. Reading the documentation I saw that there are a few option > to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or > switching to -snes_mf_type wp. I used and modified the values of each of > these options separately but never saw a sizable change in runtime, is it > expected? > > And are there other ways to make the matrix free method faster? I saw in > the documentation that you can define your own per-conditioner for > instance. Let me know if you need additional information about the PETSc > setup in the application I use. > > Best, > > Sophie > > > > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 24 15:17:50 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Sep 2020 15:17:50 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> Message-ID: <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> > On Sep 24, 2020, at 2:47 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 3:42 PM Barry Smith > wrote: > > The stack is listed below. It crashes inside MatPtAP(). > > What about just checking that the column indices that PtAP receives are valid? Are we not doing that? The code that checks for column too large in MatSetValues_MPIXXAIJ() is turned off for optimized builds, I am making a MR to always have it on. But I doubt this is the problem, other, more harsh crashes, are likely if the column index is too large. This is difficult to debug because all we get is a stack trace. It would be good if we produce some information about the current state of the objects when the error is detected. We should think about what light-weight stuff we could report when errors are detected. Barry > > Matt > > It is possible there is some subtle bug in the rather complex PETSc code for MatPtAP() but I am included to blame MPI again. > > I think we should add some simple low-overhead always on communication error detecting code to PetscSF where some check sums are also communicated at the highest level of PetscSF(). > > I don't know how but perhaps when the data is packed per destination rank a checksum is computed and when unpacked the checksum is compared using extra space at the end of the communicated packed array to store and send the checksum. Yes, it is kind of odd for a high level library like PETSc to not trust the communication channel but MPI implementations have proven themselves to not be trustworthy and adding this to PetscSF is not intrusive to the PETSc API or user. Plus it gives a definitive yes or no as to the problem being from an error in the communication. > > Barry > >> On Sep 24, 2020, at 12:35 PM, Matthew Knepley > wrote: >> >> On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson > wrote: >> Hi Guys, >> >> Thanks for all of the prompt responses, very helpful and appreciated. >> >> By "when debugging", did you mean when configure petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a debugger? >> - Both, I have run with a debugger attached and detached, all compiled with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >> >> 1) Try OpenMPI (probably won't help, but worth trying) >> - Worth a try for sure >> >> 2) Find which part of the simulation makes it non-deterministic. Is it the mesh partitioning (parmetis)? Then try to make it deterministic. >> - Good tip, it is the mesh partitioning and along the lines of a question from Barry, the matrix size is changing. I will make this deterministic and give it a try >> >> 3) Dump matrices, vectors, etc and see when it fails, you can quickly reproduce the error by reading in the intermediate data. >> - Also a great suggestion, will give it a try >> >> The full stack would be really useful here. I am guessing this happens on MatMult(), but I do not know. >> - Agreed, I am currently running it so that the full stack will be produced, but waiting for it to fail, had compiled with all available optimizations on, but downside is of course if there is a failure. >> As a general question, roughly what's the performance impact on petsc with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging = 1? >> Obviously problem/machine dependant, wondering on guidance more for this than anything >> >> Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? >> The non-zero structure is changing, although the structures are reformed when this happens and this happens thousands of time before the failure has occured. >> >> Okay, this is the most likely spot for a bug. How are you changing the matrix? It should be impossible to put in an invalid column index when using MatSetValues() >> because we check all the inputs. However, I do not think we check when you just yank out the arrays. >> >> Thanks, >> >> Matt >> >> Does this particular run always crash at the same place? Similar place? Doesn't always crash? >> Doesn't always crash, but other similar runs have crashed in different spots, which makes it difficult to resolve. I am going to try out a few of the strategies suggested above and will let you know what comes of that. >> >> Chris Hewson >> Senior Reservoir Simulation Engineer >> ResFrac >> +1.587.575.9792 >> >> >> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith > wrote: >> Chris, >> >> We realize how frustrating this type of problem is to deal with. >> >> Here is the code: >> >> ierr = PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >> for (i=0; iB->rmap->n; i++) { >> for (j=0; jilen[i]; j++) { >> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >> if (!data) { >> /* one based table */ >> ierr = PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >> } >> } >> } >> >> It is simply looping over the rows of the sparse matrix putting the columns it finds into the hash table. >> >> aj[B->i[i] + j] are the column entries, the number of columns in the matrix is mat->cmap->N so the column entries should always be >> less than the number of columns. The code is crashing when column entry 24443 which is larger than the number of columns 23988. >> >> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is incorrect. >> >>>>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >> >> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >> } >> >> Seems to indicate it is setting up a new multiple because it is either the first time into the algorithm or the nonzero structure changed on some rank requiring a new assembly process. >> >> Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? >> >> Since the code has been running for a very long time already I have to conclude that this is not the first time through and so something has changed in the matrix? >> >> I think we have to put more diagnostics into the library to provide more information before or at the time of the error detection. >> >> Does this particular run always crash at the same place? Similar place? Doesn't always crash? >> >> Barry >> >> >> >> >>> On Sep 24, 2020, at 8:46 AM, Chris Hewson > wrote: >>> >>> After about a month of not having this issue pop up, it has come up again >>> >>> We have been struggling with a similar PETSc Error for awhile now, the error message is as follows: >>> >>> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than largest key allowed 23988 >>> >>> It is a particularly nasty bug as it doesn't reproduce itself when debugging and doesn't happen all the time with the same inputs either. The problem occurs after a long runtime of the code (12-40 hours) and we are using a ksp solve with KSPBCGS. >>> >>> The PETSc compilation options that are used are: >>> >>> --download-metis >>> --download-mpich >>> --download-mumps >>> --download-parmetis >>> --download-ptscotch >>> --download-scalapack >>> --download-suitesparse >>> --prefix=/opt/anl/petsc-3.13.3 >>> --with-debugging=0 >>> --with-mpi=1 >>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>> >>> This is being run across 8 processes and is failing consistently on the rank 7 process. We also use openmp outside of PETSC and the linear solve portion of the code. The rank 0 process is always using compute, during this the slave processes use an MPI_Wait call to wait for the collective parts of the code to be called again. All PETSC calls are done across all of the processes. >>> >>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 package. >>> >>> At every PETSC call we are checking the error return from the function collectively to ensure that no errors have been returned from PETSC. >>> >>> Some possible causes that I can think of are as follows: >>> 1. Memory leak causing a corruption either in our program or in petsc or with one of the petsc objects. This seems unlikely as we have checked runs with the option -malloc_dump for PETSc and using valgrind. >>> >>> 2. Optimization flags set for petsc compilation are causing variables that go out of scope to be optimized out. >>> >>> 3. We are giving the wrong number of elements for a process or the value is changing during the simulation. This seems unlikely as there is nothing overly unique about these simulations and it's not reproducing itself. >>> >>> 4. An MPI channel or socket error causing an error in the collective values for PETSc. >>> >>> Any input on this issue would be greatly appreciated. >>> >>> Chris Hewson >>> Senior Reservoir Simulation Engineer >>> ResFrac >>> +1.587.575.9792 >>> >>> >>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang > wrote: >>> That is a great idea. I'll figure it out. >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith > wrote: >>> >>> Junchao, >>> >>> Any way in the PETSc configure to warn that MPICH version is "bad" or "untrustworthy" or even the vague "we have suspicians about this version and recommend avoiding it"? A lot of time could be saved if others don't deal with the same problem. >>> >>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and always check against that list and print a boxed warning at configure time? Better you could somehow generalize it and put it in package.py for use by all packages, then any package can included lists of "suspect" versions. (There are definitely HDF5 versions that should be avoided :-)). >>> >>> Barry >>> >>> >>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang > wrote: >>>> >>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson > wrote: >>>> Just as an update to this, I can confirm that using the mpich version (3.3.2) downloaded with the petsc download solved this issue on my end. >>>> >>>> Chris Hewson >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>>> >>>> >>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang > wrote: >>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith > wrote: >>>> >>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is there any way to run this full test suite under the problematic MPI and see if it detects any problems? >>>> >>>> Is so, could someone add it to the FAQ in the debugging section? >>>> MPICH does have a test suite. It is at the subdir test/mpi of downloaded mpich . It annoyed me since it is not user-friendly. It might be helpful in catching bugs at very small scale. But say if I want to test allreduce on 1024 ranks on 100 doubles, I have to hack the test suite. >>>> Anyway, the instructions are here. >>>> For the purpose of petsc, under test/mpi one can configure it with >>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled cxx but I had to set CXX! >>>> $make -k -j8 // -k is to keep going and ignore compilation errors, e.g., when building tests for MPICH extensions not in MPI standard, but your MPI is OpenMPI. >>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are sub-dirs containing tests for MPI routines Petsc does not rely on. >>>> $ make testings or directly './runtests -tests=testlist' >>>> >>>> On a batch system, >>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say btest, >>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use 1024 ranks if a test does no specify the number of processes. >>>> $ // It copies test binaries to the batch dir and generates a script runtests.batch there. Edit the script to fit your batch system and then submit a job and wait for its finish. >>>> $ cd btest && ../checktests --ignorebogus >>>> >>>> PS: Fande, changing an MPI fixed your problem does not necessarily mean the old MPI has bugs. It is complicated. It could be a petsc bug. You need to provide us a code to reproduce your error. It does not matter if the code is big. >>>> >>>> >>>> Thanks >>>> >>>> Barry >>>> >>>> >>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong > wrote: >>>>> >>>>> Trace could look like this: >>>>> >>>>> [640]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>> [640]PETSC ERROR: Argument out of range >>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521 >>>>> [640]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by wangy2 Sun Jul 19 17:14:28 2020 >>>>> [640]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices --download-mumps=0 >>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>> >>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong > wrote: >>>>> I am not entirely sure what is happening, but we encountered similar issues recently. It was not reproducible. It might occur at different stages, and errors could be weird other than "ctable stuff." Our code was Valgrind clean since every PR in moose needs to go through rigorous Valgrind checks before it reaches the devel branch. The errors happened when we used mvapich. >>>>> >>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was smooth. May you try a different MPI? It is better to try a system carried one. >>>>> >>>>> We did not get the bottom of this problem yet, but we at least know this is kind of MPI-related. >>>>> >>>>> Thanks, >>>>> >>>>> Fande, >>>>> >>>>> >>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson > wrote: >>>>> Hi, >>>>> >>>>> I am having a bug that is occurring in PETSC with the return string: >>>>> >>>>> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than largest key allowed 5693 >>>>> >>>>> This is using petsc-3.13.2, compiled and running using mpich with -O3 and debugging turned off tuned to the haswell architecture and occurring either before or during a KSPBCGS solve/setup or during a MUMPS factorization solve (I haven't been able to replicate this issue with the same set of instructions etc.). >>>>> >>>>> This is a terrible way to ask a question, I know, and not very helpful from your side, but this is what I have from a user's run and can't reproduce on my end (either with the optimization compilation or with debugging turned on). This happens when the code has run for quite some time and is happening somewhat rarely. >>>>> >>>>> More than likely I am using a static variable (code is written in c++) that I'm not updating when the matrix size is changing or something silly like that, but any help or guidance on this would be appreciated. >>>>> >>>>> Chris Hewson >>>>> Senior Reservoir Simulation Engineer >>>>> ResFrac >>>>> +1.587.575.9792 >>>> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Sep 24 15:31:06 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 24 Sep 2020 15:31:06 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> Message-ID: That error stack was from Fande Kong. We should wait for Chris's. --Junchao Zhang On Thu, Sep 24, 2020 at 2:42 PM Barry Smith wrote: > > The stack is listed below. It crashes inside MatPtAP(). > > It is possible there is some subtle bug in the rather complex PETSc code > for MatPtAP() but I am included to blame MPI again. > > I think we should add some simple low-overhead always on communication > error detecting code to PetscSF where some check sums are also communicated > at the highest level of PetscSF(). > > I don't know how but perhaps when the data is packed per destination > rank a checksum is computed and when unpacked the checksum is compared > using extra space at the end of the communicated packed array to store and > send the checksum. Yes, it is kind of odd for a high level library like > PETSc to not trust the communication channel but MPI implementations have > proven themselves to not be trustworthy and adding this to PetscSF is not > intrusive to the PETSc API or user. Plus it gives a definitive yes or no as > to the problem being from an error in the communication. > > Barry > > On Sep 24, 2020, at 12:35 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson wrote: > >> Hi Guys, >> >> Thanks for all of the prompt responses, very helpful and appreciated. >> >> By "when debugging", did you mean when configure petsc --with-debugging=1 >> COPTFLAGS=-O0 -g etc or when you attached a debugger? >> - Both, I have run with a debugger attached and detached, all compiled >> with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >> >> 1) Try OpenMPI (probably won't help, but worth trying) >> - Worth a try for sure >> >> 2) Find which part of the simulation makes it non-deterministic. Is it >> the mesh partitioning (parmetis)? Then try to make it deterministic. >> - Good tip, it is the mesh partitioning and along the lines of a question >> from Barry, the matrix size is changing. I will make this deterministic and >> give it a try >> >> 3) Dump matrices, vectors, etc and see when it fails, you can quickly >> reproduce the error by reading in the intermediate data. >> - Also a great suggestion, will give it a try >> >> The full stack would be really useful here. I am guessing this happens on >> MatMult(), but I do not know. >> - Agreed, I am currently running it so that the full stack will be >> produced, but waiting for it to fail, had compiled with all available >> optimizations on, but downside is of course if there is a failure. >> As a general question, roughly what's the performance impact on petsc >> with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging >> = 1? >> Obviously problem/machine dependant, wondering on guidance more for this >> than anything >> >> Is the nonzero structure of your matrices changing or is it fixed for the >> entire simulation? >> The non-zero structure is changing, although the structures are reformed >> when this happens and this happens thousands of time before the failure has >> occured. >> > > Okay, this is the most likely spot for a bug. How are you changing the > matrix? It should be impossible to put in an invalid column index when > using MatSetValues() > because we check all the inputs. However, I do not think we check when you > just yank out the arrays. > > Thanks, > > Matt > > >> Does this particular run always crash at the same place? Similar place? >> Doesn't always crash? >> Doesn't always crash, but other similar runs have crashed in different >> spots, which makes it difficult to resolve. I am going to try out a few of >> the strategies suggested above and will let you know what comes of that. >> >> *Chris Hewson* >> Senior Reservoir Simulation Engineer >> ResFrac >> +1.587.575.9792 >> >> >> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith wrote: >> >>> Chris, >>> >>> We realize how frustrating this type of problem is to deal with. >>> >>> Here is the code: >>> >>> ierr = >>> PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >>> for (i=0; iB->rmap->n; i++) { >>> for (j=0; jilen[i]; j++) { >>> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >>> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >>> if (!data) { >>> /* one based table */ >>> ierr = >>> PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >>> } >>> } >>> } >>> >>> It is simply looping over the rows of the sparse matrix putting the >>> columns it finds into the hash table. >>> >>> aj[B->i[i] + j] are the column entries, the number of columns in the >>> matrix is mat->cmap->N so the column entries should always be >>> less than the number of columns. The code is crashing when column entry >>> 24443 which is larger than the number of columns 23988. >>> >>> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is >>> incorrect. >>> >>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> >>>>>>>> >>> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >>> } >>> >>> Seems to indicate it is setting up a new multiple because it is either >>> the first time into the algorithm or the nonzero structure changed on some >>> rank requiring a new assembly process. >>> >>> Is the nonzero structure of your matrices changing or is it fixed >>> for the entire simulation? >>> >>> Since the code has been running for a very long time already I have to >>> conclude that this is not the first time through and so something has >>> changed in the matrix? >>> >>> I think we have to put more diagnostics into the library to provide more >>> information before or at the time of the error detection. >>> >>> Does this particular run always crash at the same place? Similar >>> place? Doesn't always crash? >>> >>> Barry >>> >>> >>> >>> >>> On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: >>> >>> After about a month of not having this issue pop up, it has come up again >>> >>> We have been struggling with a similar PETSc Error for awhile now, the >>> error message is as follows: >>> >>> [7]PETSC ERROR: PetscTableFind() line 132 in >>> /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than >>> largest key allowed 23988 >>> >>> It is a particularly nasty bug as it doesn't reproduce itself when >>> debugging and doesn't happen all the time with the same inputs either. The >>> problem occurs after a long runtime of the code (12-40 hours) and we are >>> using a ksp solve with KSPBCGS. >>> >>> The PETSc compilation options that are used are: >>> >>> --download-metis >>> --download-mpich >>> --download-mumps >>> --download-parmetis >>> --download-ptscotch >>> --download-scalapack >>> --download-suitesparse >>> --prefix=/opt/anl/petsc-3.13.3 >>> --with-debugging=0 >>> --with-mpi=1 >>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>> >>> This is being run across 8 processes and is failing consistently on the >>> rank 7 process. We also use openmp outside of PETSC and the linear solve >>> portion of the code. The rank 0 process is always using compute, during >>> this the slave processes use an MPI_Wait call to wait for the collective >>> parts of the code to be called again. All PETSC calls are done across all >>> of the processes. >>> >>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 >>> package. >>> >>> At every PETSC call we are checking the error return from the function >>> collectively to ensure that no errors have been returned from PETSC. >>> >>> Some possible causes that I can think of are as follows: >>> 1. Memory leak causing a corruption either in our program or in petsc or >>> with one of the petsc objects. This seems unlikely as we have checked runs >>> with the option -malloc_dump for PETSc and using valgrind. >>> >>> 2. Optimization flags set for petsc compilation are causing variables >>> that go out of scope to be optimized out. >>> >>> 3. We are giving the wrong number of elements for a process or the value >>> is changing during the simulation. This seems unlikely as there is nothing >>> overly unique about these simulations and it's not reproducing itself. >>> >>> 4. An MPI channel or socket error causing an error in the collective >>> values for PETSc. >>> >>> Any input on this issue would be greatly appreciated. >>> >>> *Chris Hewson* >>> Senior Reservoir Simulation Engineer >>> ResFrac >>> +1.587.575.9792 >>> >>> >>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang >>> wrote: >>> >>>> That is a great idea. I'll figure it out. >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: >>>> >>>>> >>>>> Junchao, >>>>> >>>>> Any way in the PETSc configure to warn that MPICH version is "bad" >>>>> or "untrustworthy" or even the vague "we have suspicians about this version >>>>> and recommend avoiding it"? A lot of time could be saved if others don't >>>>> deal with the same problem. >>>>> >>>>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and >>>>> always check against that list and print a boxed warning at configure time? >>>>> Better you could somehow generalize it and put it in package.py for use by >>>>> all packages, then any package can included lists of "suspect" versions. >>>>> (There are definitely HDF5 versions that should be avoided :-)). >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang >>>>> wrote: >>>>> >>>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson >>>>> wrote: >>>>> >>>>>> Just as an update to this, I can confirm that using the mpich version >>>>>> (3.3.2) downloaded with the petsc download solved this issue on my end. >>>>>> >>>>>> *Chris Hewson* >>>>>> Senior Reservoir Simulation Engineer >>>>>> ResFrac >>>>>> +1.587.575.9792 >>>>>> >>>>>> >>>>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang < >>>>>> junchao.zhang at gmail.com> wrote: >>>>>> >>>>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? >>>>>>>> Is there any way to run this full test suite under the problematic MPI and >>>>>>>> see if it detects any problems? >>>>>>>> >>>>>>>> Is so, could someone add it to the FAQ in the debugging >>>>>>>> section? >>>>>>>> >>>>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>>>> downloaded mpich >>>>>>> . >>>>>>> It annoyed me since it is not user-friendly. It might be helpful in >>>>>>> catching bugs at very small scale. But say if I want to test allreduce on >>>>>>> 1024 ranks on 100 doubles, I have to hack the test suite. >>>>>>> Anyway, the instructions are here. >>>>>>> >>>>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>>>> cxx but I had to set CXX! >>>>>>> $make -k -j8 // -k is to keep going and ignore compilation errors, >>>>>>> e.g., when building tests for MPICH extensions not in MPI standard, but >>>>>>> your MPI is OpenMPI. >>>>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>>>>>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>>>> $ make testings or directly './runtests -tests=testlist' >>>>>>> >>>>>>> On a batch system, >>>>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, >>>>>>> say btest, >>>>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use >>>>>>> 1024 ranks if a test does no specify the number of processes. >>>>>>> $ // It copies test binaries to the batch dir and generates a >>>>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>>>> then submit a job and wait for its finish. >>>>>>> $ cd btest && ../checktests --ignorebogus >>>>>>> >>>>>>> >>>>>>> PS: Fande, changing an MPI fixed your problem does not >>>>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>>>> not matter if the code is big. >>>>>>> >>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong >>>>>>>> wrote: >>>>>>>> >>>>>>>> Trace could look like this: >>>>>>>> >>>>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>>>> -------------------------------------------------------------- >>>>>>>> [640]PETSC ERROR: Argument out of range >>>>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>>>>>> 740521 >>>>>>>> [640]PETSC ERROR: See >>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>> shooting. >>>>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 >>>>>>>> by wangy2 Sun Jul 19 17:14:28 2020 >>>>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>>>> --download-mumps=0 >>>>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line >>>>>>>> 901 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>>>>>> 3180 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>>>> >>>>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I am not entirely sure what is happening, but we encountered >>>>>>>>> similar issues recently. It was not reproducible. It might occur at >>>>>>>>> different stages, and errors could be weird other than "ctable stuff." Our >>>>>>>>> code was Valgrind clean since every PR in moose needs to go through >>>>>>>>> rigorous Valgrind checks before it reaches the devel branch. The errors >>>>>>>>> happened when we used mvapich. >>>>>>>>> >>>>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything >>>>>>>>> was smooth. May you try a different MPI? It is better to try a system >>>>>>>>> carried one. >>>>>>>>> >>>>>>>>> We did not get the bottom of this problem yet, but we at least >>>>>>>>> know this is kind of MPI-related. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Fande, >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am having a bug that is occurring in PETSC with the return >>>>>>>>>> string: >>>>>>>>>> >>>>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>>>> largest key allowed 5693 >>>>>>>>>> >>>>>>>>>> This is using petsc-3.13.2, compiled and running using mpich with >>>>>>>>>> -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>>>> same set of instructions etc.). >>>>>>>>>> >>>>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>>>> time and is happening somewhat rarely. >>>>>>>>>> >>>>>>>>>> More than likely I am using a static variable (code is written in >>>>>>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>>>> >>>>>>>>>> *Chris Hewson* >>>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>>> ResFrac >>>>>>>>>> +1.587.575.9792 >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 24 15:38:50 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Sep 2020 15:38:50 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> Message-ID: <2C16B3CA-A3CE-4B74-9E7D-F90B9EBD6384@petsc.dev> Oh, sorry, my mistake. > On Sep 24, 2020, at 3:31 PM, Junchao Zhang wrote: > > That error stack was from Fande Kong. We should wait for Chris's. > > --Junchao Zhang > > > On Thu, Sep 24, 2020 at 2:42 PM Barry Smith > wrote: > > The stack is listed below. It crashes inside MatPtAP(). > > It is possible there is some subtle bug in the rather complex PETSc code for MatPtAP() but I am included to blame MPI again. > > I think we should add some simple low-overhead always on communication error detecting code to PetscSF where some check sums are also communicated at the highest level of PetscSF(). > > I don't know how but perhaps when the data is packed per destination rank a checksum is computed and when unpacked the checksum is compared using extra space at the end of the communicated packed array to store and send the checksum. Yes, it is kind of odd for a high level library like PETSc to not trust the communication channel but MPI implementations have proven themselves to not be trustworthy and adding this to PetscSF is not intrusive to the PETSc API or user. Plus it gives a definitive yes or no as to the problem being from an error in the communication. > > Barry > >> On Sep 24, 2020, at 12:35 PM, Matthew Knepley > wrote: >> >> On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson > wrote: >> Hi Guys, >> >> Thanks for all of the prompt responses, very helpful and appreciated. >> >> By "when debugging", did you mean when configure petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a debugger? >> - Both, I have run with a debugger attached and detached, all compiled with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >> >> 1) Try OpenMPI (probably won't help, but worth trying) >> - Worth a try for sure >> >> 2) Find which part of the simulation makes it non-deterministic. Is it the mesh partitioning (parmetis)? Then try to make it deterministic. >> - Good tip, it is the mesh partitioning and along the lines of a question from Barry, the matrix size is changing. I will make this deterministic and give it a try >> >> 3) Dump matrices, vectors, etc and see when it fails, you can quickly reproduce the error by reading in the intermediate data. >> - Also a great suggestion, will give it a try >> >> The full stack would be really useful here. I am guessing this happens on MatMult(), but I do not know. >> - Agreed, I am currently running it so that the full stack will be produced, but waiting for it to fail, had compiled with all available optimizations on, but downside is of course if there is a failure. >> As a general question, roughly what's the performance impact on petsc with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging = 1? >> Obviously problem/machine dependant, wondering on guidance more for this than anything >> >> Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? >> The non-zero structure is changing, although the structures are reformed when this happens and this happens thousands of time before the failure has occured. >> >> Okay, this is the most likely spot for a bug. How are you changing the matrix? It should be impossible to put in an invalid column index when using MatSetValues() >> because we check all the inputs. However, I do not think we check when you just yank out the arrays. >> >> Thanks, >> >> Matt >> >> Does this particular run always crash at the same place? Similar place? Doesn't always crash? >> Doesn't always crash, but other similar runs have crashed in different spots, which makes it difficult to resolve. I am going to try out a few of the strategies suggested above and will let you know what comes of that. >> >> Chris Hewson >> Senior Reservoir Simulation Engineer >> ResFrac >> +1.587.575.9792 >> >> >> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith > wrote: >> Chris, >> >> We realize how frustrating this type of problem is to deal with. >> >> Here is the code: >> >> ierr = PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >> for (i=0; iB->rmap->n; i++) { >> for (j=0; jilen[i]; j++) { >> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >> if (!data) { >> /* one based table */ >> ierr = PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >> } >> } >> } >> >> It is simply looping over the rows of the sparse matrix putting the columns it finds into the hash table. >> >> aj[B->i[i] + j] are the column entries, the number of columns in the matrix is mat->cmap->N so the column entries should always be >> less than the number of columns. The code is crashing when column entry 24443 which is larger than the number of columns 23988. >> >> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is incorrect. >> >>>>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >> >> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >> } >> >> Seems to indicate it is setting up a new multiple because it is either the first time into the algorithm or the nonzero structure changed on some rank requiring a new assembly process. >> >> Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? >> >> Since the code has been running for a very long time already I have to conclude that this is not the first time through and so something has changed in the matrix? >> >> I think we have to put more diagnostics into the library to provide more information before or at the time of the error detection. >> >> Does this particular run always crash at the same place? Similar place? Doesn't always crash? >> >> Barry >> >> >> >> >>> On Sep 24, 2020, at 8:46 AM, Chris Hewson > wrote: >>> >>> After about a month of not having this issue pop up, it has come up again >>> >>> We have been struggling with a similar PETSc Error for awhile now, the error message is as follows: >>> >>> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than largest key allowed 23988 >>> >>> It is a particularly nasty bug as it doesn't reproduce itself when debugging and doesn't happen all the time with the same inputs either. The problem occurs after a long runtime of the code (12-40 hours) and we are using a ksp solve with KSPBCGS. >>> >>> The PETSc compilation options that are used are: >>> >>> --download-metis >>> --download-mpich >>> --download-mumps >>> --download-parmetis >>> --download-ptscotch >>> --download-scalapack >>> --download-suitesparse >>> --prefix=/opt/anl/petsc-3.13.3 >>> --with-debugging=0 >>> --with-mpi=1 >>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>> >>> This is being run across 8 processes and is failing consistently on the rank 7 process. We also use openmp outside of PETSC and the linear solve portion of the code. The rank 0 process is always using compute, during this the slave processes use an MPI_Wait call to wait for the collective parts of the code to be called again. All PETSC calls are done across all of the processes. >>> >>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 package. >>> >>> At every PETSC call we are checking the error return from the function collectively to ensure that no errors have been returned from PETSC. >>> >>> Some possible causes that I can think of are as follows: >>> 1. Memory leak causing a corruption either in our program or in petsc or with one of the petsc objects. This seems unlikely as we have checked runs with the option -malloc_dump for PETSc and using valgrind. >>> >>> 2. Optimization flags set for petsc compilation are causing variables that go out of scope to be optimized out. >>> >>> 3. We are giving the wrong number of elements for a process or the value is changing during the simulation. This seems unlikely as there is nothing overly unique about these simulations and it's not reproducing itself. >>> >>> 4. An MPI channel or socket error causing an error in the collective values for PETSc. >>> >>> Any input on this issue would be greatly appreciated. >>> >>> Chris Hewson >>> Senior Reservoir Simulation Engineer >>> ResFrac >>> +1.587.575.9792 >>> >>> >>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang > wrote: >>> That is a great idea. I'll figure it out. >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith > wrote: >>> >>> Junchao, >>> >>> Any way in the PETSc configure to warn that MPICH version is "bad" or "untrustworthy" or even the vague "we have suspicians about this version and recommend avoiding it"? A lot of time could be saved if others don't deal with the same problem. >>> >>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and always check against that list and print a boxed warning at configure time? Better you could somehow generalize it and put it in package.py for use by all packages, then any package can included lists of "suspect" versions. (There are definitely HDF5 versions that should be avoided :-)). >>> >>> Barry >>> >>> >>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang > wrote: >>>> >>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson > wrote: >>>> Just as an update to this, I can confirm that using the mpich version (3.3.2) downloaded with the petsc download solved this issue on my end. >>>> >>>> Chris Hewson >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>>> >>>> >>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang > wrote: >>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith > wrote: >>>> >>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is there any way to run this full test suite under the problematic MPI and see if it detects any problems? >>>> >>>> Is so, could someone add it to the FAQ in the debugging section? >>>> MPICH does have a test suite. It is at the subdir test/mpi of downloaded mpich . It annoyed me since it is not user-friendly. It might be helpful in catching bugs at very small scale. But say if I want to test allreduce on 1024 ranks on 100 doubles, I have to hack the test suite. >>>> Anyway, the instructions are here. >>>> For the purpose of petsc, under test/mpi one can configure it with >>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled cxx but I had to set CXX! >>>> $make -k -j8 // -k is to keep going and ignore compilation errors, e.g., when building tests for MPICH extensions not in MPI standard, but your MPI is OpenMPI. >>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are sub-dirs containing tests for MPI routines Petsc does not rely on. >>>> $ make testings or directly './runtests -tests=testlist' >>>> >>>> On a batch system, >>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say btest, >>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use 1024 ranks if a test does no specify the number of processes. >>>> $ // It copies test binaries to the batch dir and generates a script runtests.batch there. Edit the script to fit your batch system and then submit a job and wait for its finish. >>>> $ cd btest && ../checktests --ignorebogus >>>> >>>> PS: Fande, changing an MPI fixed your problem does not necessarily mean the old MPI has bugs. It is complicated. It could be a petsc bug. You need to provide us a code to reproduce your error. It does not matter if the code is big. >>>> >>>> >>>> Thanks >>>> >>>> Barry >>>> >>>> >>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong > wrote: >>>>> >>>>> Trace could look like this: >>>>> >>>>> [640]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>> [640]PETSC ERROR: Argument out of range >>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521 >>>>> [640]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by wangy2 Sun Jul 19 17:14:28 2020 >>>>> [640]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices --download-mumps=0 >>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>> >>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong > wrote: >>>>> I am not entirely sure what is happening, but we encountered similar issues recently. It was not reproducible. It might occur at different stages, and errors could be weird other than "ctable stuff." Our code was Valgrind clean since every PR in moose needs to go through rigorous Valgrind checks before it reaches the devel branch. The errors happened when we used mvapich. >>>>> >>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was smooth. May you try a different MPI? It is better to try a system carried one. >>>>> >>>>> We did not get the bottom of this problem yet, but we at least know this is kind of MPI-related. >>>>> >>>>> Thanks, >>>>> >>>>> Fande, >>>>> >>>>> >>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson > wrote: >>>>> Hi, >>>>> >>>>> I am having a bug that is occurring in PETSC with the return string: >>>>> >>>>> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than largest key allowed 5693 >>>>> >>>>> This is using petsc-3.13.2, compiled and running using mpich with -O3 and debugging turned off tuned to the haswell architecture and occurring either before or during a KSPBCGS solve/setup or during a MUMPS factorization solve (I haven't been able to replicate this issue with the same set of instructions etc.). >>>>> >>>>> This is a terrible way to ask a question, I know, and not very helpful from your side, but this is what I have from a user's run and can't reproduce on my end (either with the optimization compilation or with debugging turned on). This happens when the code has run for quite some time and is happening somewhat rarely. >>>>> >>>>> More than likely I am using a static variable (code is written in c++) that I'm not updating when the matrix size is changing or something silly like that, but any help or guidance on this would be appreciated. >>>>> >>>>> Chris Hewson >>>>> Senior Reservoir Simulation Engineer >>>>> ResFrac >>>>> +1.587.575.9792 >>>> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 24 15:41:15 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 24 Sep 2020 16:41:15 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> Message-ID: You might add code here like: if (ierr) { for (; iB->rmap->n; i++) { for ( jilen[i]; j++) { PetscInt data,gid1 = aj[B->i[i] + j] + 1; // don't use ierr print rank, gid1; } CHKERRQ(ierr); I am guessing that somehow you have a table that is bad and too small. It failed on an index not much bigger than the largest key allowed. Maybe just compute the max and see if it goes much larger than the largest key allowed. If your mesh just changed to you know if it got bigger or smaller... Anyway just some thoughts, Mark On Thu, Sep 24, 2020 at 4:18 PM Barry Smith wrote: > > > On Sep 24, 2020, at 2:47 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 3:42 PM Barry Smith wrote: > >> >> The stack is listed below. It crashes inside MatPtAP(). >> > > What about just checking that the column indices that PtAP receives are > valid? Are we not doing that? > > > The code that checks for column too large in MatSetValues_MPIXXAIJ() is > turned off for optimized builds, I am making a MR to always have it on. But > I doubt this is the problem, other, more harsh crashes, are likely if the > column index is too large. > > This is difficult to debug because all we get is a stack trace. It would > be good if we produce some information about the current state of the > objects when the error is detected. We should think about what light-weight > stuff we could report when errors are detected. > > > Barry > > > Matt > > >> It is possible there is some subtle bug in the rather complex PETSc >> code for MatPtAP() but I am included to blame MPI again. >> >> I think we should add some simple low-overhead always on communication >> error detecting code to PetscSF where some check sums are also communicated >> at the highest level of PetscSF(). >> >> I don't know how but perhaps when the data is packed per destination >> rank a checksum is computed and when unpacked the checksum is compared >> using extra space at the end of the communicated packed array to store and >> send the checksum. Yes, it is kind of odd for a high level library like >> PETSc to not trust the communication channel but MPI implementations have >> proven themselves to not be trustworthy and adding this to PetscSF is not >> intrusive to the PETSc API or user. Plus it gives a definitive yes or no as >> to the problem being from an error in the communication. >> >> Barry >> >> On Sep 24, 2020, at 12:35 PM, Matthew Knepley wrote: >> >> On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson wrote: >> >>> Hi Guys, >>> >>> Thanks for all of the prompt responses, very helpful and appreciated. >>> >>> By "when debugging", did you mean when configure >>> petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a >>> debugger? >>> - Both, I have run with a debugger attached and detached, all compiled >>> with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >>> >>> 1) Try OpenMPI (probably won't help, but worth trying) >>> - Worth a try for sure >>> >>> 2) Find which part of the simulation makes it non-deterministic. Is it >>> the mesh partitioning (parmetis)? Then try to make it deterministic. >>> - Good tip, it is the mesh partitioning and along the lines of a >>> question from Barry, the matrix size is changing. I will make this >>> deterministic and give it a try >>> >>> 3) Dump matrices, vectors, etc and see when it fails, you can quickly >>> reproduce the error by reading in the intermediate data. >>> - Also a great suggestion, will give it a try >>> >>> The full stack would be really useful here. I am guessing this happens >>> on MatMult(), but I do not know. >>> - Agreed, I am currently running it so that the full stack will be >>> produced, but waiting for it to fail, had compiled with all available >>> optimizations on, but downside is of course if there is a failure. >>> As a general question, roughly what's the performance impact on petsc >>> with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging >>> = 1? >>> Obviously problem/machine dependant, wondering on guidance more for this >>> than anything >>> >>> Is the nonzero structure of your matrices changing or is it fixed for >>> the entire simulation? >>> The non-zero structure is changing, although the structures are reformed >>> when this happens and this happens thousands of time before the failure has >>> occured. >>> >> >> Okay, this is the most likely spot for a bug. How are you changing the >> matrix? It should be impossible to put in an invalid column index when >> using MatSetValues() >> because we check all the inputs. However, I do not think we check when >> you just yank out the arrays. >> >> Thanks, >> >> Matt >> >> >>> Does this particular run always crash at the same place? Similar place? >>> Doesn't always crash? >>> Doesn't always crash, but other similar runs have crashed in different >>> spots, which makes it difficult to resolve. I am going to try out a few of >>> the strategies suggested above and will let you know what comes of that. >>> >>> *Chris Hewson* >>> Senior Reservoir Simulation Engineer >>> ResFrac >>> +1.587.575.9792 >>> >>> >>> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith wrote: >>> >>>> Chris, >>>> >>>> We realize how frustrating this type of problem is to deal with. >>>> >>>> Here is the code: >>>> >>>> ierr = >>>> PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >>>> for (i=0; iB->rmap->n; i++) { >>>> for (j=0; jilen[i]; j++) { >>>> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >>>> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >>>> if (!data) { >>>> /* one based table */ >>>> ierr = >>>> PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >>>> } >>>> } >>>> } >>>> >>>> It is simply looping over the rows of the sparse matrix putting the >>>> columns it finds into the hash table. >>>> >>>> aj[B->i[i] + j] are the column entries, the number of columns in >>>> the matrix is mat->cmap->N so the column entries should always be >>>> less than the number of columns. The code is crashing when column entry >>>> 24443 which is larger than the number of columns 23988. >>>> >>>> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is >>>> incorrect. >>>> >>>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>> >>>>>>>>> >>>> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >>>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >>>> } >>>> >>>> Seems to indicate it is setting up a new multiple because it is either >>>> the first time into the algorithm or the nonzero structure changed on some >>>> rank requiring a new assembly process. >>>> >>>> Is the nonzero structure of your matrices changing or is it fixed >>>> for the entire simulation? >>>> >>>> Since the code has been running for a very long time already I have to >>>> conclude that this is not the first time through and so something has >>>> changed in the matrix? >>>> >>>> I think we have to put more diagnostics into the library to provide >>>> more information before or at the time of the error detection. >>>> >>>> Does this particular run always crash at the same place? Similar >>>> place? Doesn't always crash? >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: >>>> >>>> After about a month of not having this issue pop up, it has come up >>>> again >>>> >>>> We have been struggling with a similar PETSc Error for awhile now, the >>>> error message is as follows: >>>> >>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>> /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than >>>> largest key allowed 23988 >>>> >>>> It is a particularly nasty bug as it doesn't reproduce itself when >>>> debugging and doesn't happen all the time with the same inputs either. The >>>> problem occurs after a long runtime of the code (12-40 hours) and we are >>>> using a ksp solve with KSPBCGS. >>>> >>>> The PETSc compilation options that are used are: >>>> >>>> --download-metis >>>> --download-mpich >>>> --download-mumps >>>> --download-parmetis >>>> --download-ptscotch >>>> --download-scalapack >>>> --download-suitesparse >>>> --prefix=/opt/anl/petsc-3.13.3 >>>> --with-debugging=0 >>>> --with-mpi=1 >>>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>> >>>> This is being run across 8 processes and is failing consistently on the >>>> rank 7 process. We also use openmp outside of PETSC and the linear solve >>>> portion of the code. The rank 0 process is always using compute, during >>>> this the slave processes use an MPI_Wait call to wait for the collective >>>> parts of the code to be called again. All PETSC calls are done across all >>>> of the processes. >>>> >>>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 >>>> package. >>>> >>>> At every PETSC call we are checking the error return from the function >>>> collectively to ensure that no errors have been returned from PETSC. >>>> >>>> Some possible causes that I can think of are as follows: >>>> 1. Memory leak causing a corruption either in our program or in petsc >>>> or with one of the petsc objects. This seems unlikely as we have checked >>>> runs with the option -malloc_dump for PETSc and using valgrind. >>>> >>>> 2. Optimization flags set for petsc compilation are causing variables >>>> that go out of scope to be optimized out. >>>> >>>> 3. We are giving the wrong number of elements for a process or the >>>> value is changing during the simulation. This seems unlikely as there is >>>> nothing overly unique about these simulations and it's not reproducing >>>> itself. >>>> >>>> 4. An MPI channel or socket error causing an error in the collective >>>> values for PETSc. >>>> >>>> Any input on this issue would be greatly appreciated. >>>> >>>> *Chris Hewson* >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>>> >>>> >>>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang >>>> wrote: >>>> >>>>> That is a great idea. I'll figure it out. >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: >>>>> >>>>>> >>>>>> Junchao, >>>>>> >>>>>> Any way in the PETSc configure to warn that MPICH version is >>>>>> "bad" or "untrustworthy" or even the vague "we have suspicians about this >>>>>> version and recommend avoiding it"? A lot of time could be saved if others >>>>>> don't deal with the same problem. >>>>>> >>>>>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and >>>>>> always check against that list and print a boxed warning at configure time? >>>>>> Better you could somehow generalize it and put it in package.py for use by >>>>>> all packages, then any package can included lists of "suspect" versions. >>>>>> (There are definitely HDF5 versions that should be avoided :-)). >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang >>>>>> wrote: >>>>>> >>>>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson >>>>>> wrote: >>>>>> >>>>>>> Just as an update to this, I can confirm that using the mpich >>>>>>> version (3.3.2) downloaded with the petsc download solved this issue on my >>>>>>> end. >>>>>>> >>>>>>> *Chris Hewson* >>>>>>> Senior Reservoir Simulation Engineer >>>>>>> ResFrac >>>>>>> +1.587.575.9792 >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang < >>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>> >>>>>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? >>>>>>>>> Is there any way to run this full test suite under the problematic MPI and >>>>>>>>> see if it detects any problems? >>>>>>>>> >>>>>>>>> Is so, could someone add it to the FAQ in the debugging >>>>>>>>> section? >>>>>>>>> >>>>>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>>>>> downloaded mpich >>>>>>>> . >>>>>>>> It annoyed me since it is not user-friendly. It might be helpful in >>>>>>>> catching bugs at very small scale. But say if I want to test allreduce on >>>>>>>> 1024 ranks on 100 doubles, I have to hack the test suite. >>>>>>>> Anyway, the instructions are here. >>>>>>>> >>>>>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>>>>> cxx but I had to set CXX! >>>>>>>> $make -k -j8 // -k is to keep going and ignore compilation errors, >>>>>>>> e.g., when building tests for MPICH extensions not in MPI standard, but >>>>>>>> your MPI is OpenMPI. >>>>>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>>>>>>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>>>>> $ make testings or directly './runtests -tests=testlist' >>>>>>>> >>>>>>>> On a batch system, >>>>>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, >>>>>>>> say btest, >>>>>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // >>>>>>>> Use 1024 ranks if a test does no specify the number of processes. >>>>>>>> $ // It copies test binaries to the batch dir and generates a >>>>>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>>>>> then submit a job and wait for its finish. >>>>>>>> $ cd btest && ../checktests --ignorebogus >>>>>>>> >>>>>>>> >>>>>>>> PS: Fande, changing an MPI fixed your problem does not >>>>>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>>>>> not matter if the code is big. >>>>>>>> >>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Trace could look like this: >>>>>>>>> >>>>>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> [640]PETSC ERROR: Argument out of range >>>>>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>>>>>>> 740521 >>>>>>>>> [640]PETSC ERROR: See >>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>>> shooting. >>>>>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 >>>>>>>>> by wangy2 Sun Jul 19 17:14:28 2020 >>>>>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>>>>> --download-mumps=0 >>>>>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() >>>>>>>>> line 901 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() >>>>>>>>> line 3180 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>>>>> >>>>>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am not entirely sure what is happening, but we encountered >>>>>>>>>> similar issues recently. It was not reproducible. It might occur at >>>>>>>>>> different stages, and errors could be weird other than "ctable stuff." Our >>>>>>>>>> code was Valgrind clean since every PR in moose needs to go through >>>>>>>>>> rigorous Valgrind checks before it reaches the devel branch. The errors >>>>>>>>>> happened when we used mvapich. >>>>>>>>>> >>>>>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything >>>>>>>>>> was smooth. May you try a different MPI? It is better to try a system >>>>>>>>>> carried one. >>>>>>>>>> >>>>>>>>>> We did not get the bottom of this problem yet, but we at least >>>>>>>>>> know this is kind of MPI-related. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Fande, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am having a bug that is occurring in PETSC with the return >>>>>>>>>>> string: >>>>>>>>>>> >>>>>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>>>>> largest key allowed 5693 >>>>>>>>>>> >>>>>>>>>>> This is using petsc-3.13.2, compiled and running using mpich >>>>>>>>>>> with -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>>>>> same set of instructions etc.). >>>>>>>>>>> >>>>>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>>>>> time and is happening somewhat rarely. >>>>>>>>>>> >>>>>>>>>>> More than likely I am using a static variable (code is written >>>>>>>>>>> in c++) that I'm not updating when the matrix size is changing or something >>>>>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>>>>> >>>>>>>>>>> *Chris Hewson* >>>>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>>>> ResFrac >>>>>>>>>>> +1.587.575.9792 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 24 15:44:35 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Sep 2020 15:44:35 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> <56056045-E253-44BE-AE4C-7EFE44D867ED@petsc.dev> Message-ID: <5731FF4D-68B8-490D-B2B4-ED689290D55C@petsc.dev> > On Sep 24, 2020, at 3:08 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 4:03 PM Blondel, Sophie via petsc-users > wrote: > Hi Barry, > > I probably should have sent this output before (with -log_view_memory to get an idea of where the vectors are created). I looked at it but it doesn't help me much... > > Just quickie, there is 82M in 85 vectors, but your system has 1.5M unknowns, so a single vector is about 12M. Thus, there are probably 5 or 6 systems vectors, and a bunch of small ones. Matt is right, maybe the vectors are as expected. But there is something totally off about the memory used for the matrix. type: mpiaij rows=1552000, cols=1552000, bs=7760 total: nonzeros=1558766, allocated nonzeros=1558766 12*1558766 = 18,705,192 but Matrix 15 15 8,744,032 0. somehow it is logging much less memory for the matrix than it must contain. A quick look at the code seems ok. PetscErrorCode MatSeqAIJSetPreallocation_SeqAIJ(Mat B,PetscInt nz,const PetscInt *nnz) PetscLogObjectMemory((PetscObject)B,(B->rmap->n+1)*sizeof(PetscInt)+nz*(sizeof(PetscScalar)+sizeof(PetscInt)));CHKERRQ(ierr); Well, probably not important. So the memory usage is probably fine. Barry > > Thanks, > > Matt > > Cheers, > > Sophie > From: Barry Smith > > Sent: Wednesday, September 16, 2020 16:38 > To: Blondel, Sophie > > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Subject: Re: [petsc-users] Matrix Free Method questions > > > Yikes, GAMG is using a lot of vectors. But many of these are much smaller vectors so not of major concern. > > I think this will just have to be an ongoing issue to see where the vectors are created internally and reuse or eliminate as many extra as possible. > > The option -log_view_memory causes the PETSc logging summary to print additional columns showing the memory allocated during the different events in PETSc. This can be useful to see "when" the memory is mostly created; it does not tell us "why" it is created but at least tells us were to look. > > Barry > > >> On Sep 16, 2020, at 1:54 PM, Blondel, Sophie > wrote: >> >> Hi Barry, >> >> I don't think we're explicitly creating many PETSc vectors in Xolotl. There is a global one created for the solution when the TS is set up, and local ones in RHSFunction and RHSJacobian; everywhere else we just get the array from it with DMDAVecGetArrayDOF and DMDAVecRestoreArrayDOF. I tried a few things to see if it changed the number of Vec from 85 (removing monitors, fewer time steps, fewer MPI tasks) but it stayed the same, except when I changed the PC option from "-fieldsplit_1_pc_type redundant" to "-fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1" where I got 10567 vectors. >> >> Cheers, >> >> Sophie >> From: Barry Smith > >> Sent: Tuesday, September 15, 2020 18:37 >> To: Blondel, Sophie > >> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Subject: Re: [petsc-users] Matrix Free Method questions >> >> >> Sophie, >> >> Great, everything looks good. >> >> So the new version takes about 7 times longer, due to the relatively modest increase (about 25 percent) in the number of iterations from the poorer preconditioner convergence and the rest from the much slower matrix-vector product due to using matrix free instead of matrix based precondtioner. Both of these are expected. >> >> The matrix is taking about 10% of the memory it used to require, also expected. >> >> I noticed in the logging the memory for the vectors >> >> Vector 85 85 82303208 0. >> Matrix 15 15 8744032 0. >> >> is substantial/huge, with the much smaller matrix memory the vector memory dominates. >> >> It indicates 85 vectors are used. This is a large number, there are some needed for the TS (maybe 5?) and some needed for the KSP solve (maybe about 37) but I am not sure why there are so many. Perhaps this number could be reduced. Are there are lot of vectors created in the Xolotyl code? I would it could run with about 45 vectors. >> >> Barry >> >> >> >> >>> On Sep 15, 2020, at 5:12 PM, Blondel, Sophie > wrote: >>> >>> Hi Barry, >>> >>> I fixed everything and re-ran the 4 cases in 1D. They took more time than before because I used the Kokkos serial backend on the Xolotl side instead of the CUDA one previously (long story short, I tried to update CUDA and messed up the whole installation). Step 4 looks much better than prevously, I was even able to remove MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and it ran without throwing errors. The log files are attached. >>> >>> Cheers, >>> >>> Sophie >>> From: Barry Smith > >>> Sent: Friday, September 11, 2020 18:03 >>> To: Blondel, Sophie > >>> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>> Subject: Re: [petsc-users] Matrix Free Method questions >>> >>> >>> >>>> On Sep 11, 2020, at 7:45 AM, Blondel, Sophie > wrote: >>>> >>>> Thank you Barry, >>>> >>>> Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations. >>>> >>>> I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code). >>>> >>>> Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms? >>> >>> Yes, you need to leave in everything that affects the diagonal otherwise the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner and likely perform poorly. >>> >>> Barry >>> >>>> >>>> Cheers, >>>> >>>> Sophie >>>> >>>> From: Barry Smith > >>>> Sent: Thursday, September 10, 2020 17:04 >>>> To: Blondel, Sophie > >>>> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>> Subject: Re: [petsc-users] Matrix Free Method questions >>>> >>>> >>>> >>>>> On Sep 10, 2020, at 2:46 PM, Blondel, Sophie > wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it >>>> >>>> I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). >>>> >>>>> but now when I run step 3 again I get the following error: >>>>> >>>>> [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>> [2]PETSC ERROR: Argument out of range >>>>> [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix >>>>> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>> [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 >>>>> [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 >>>>> [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 >>>>> [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c >>>>> [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>>>> [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>>>> [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>>>> PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. >>>>> >>>>> Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. >>>> >>>> In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). >>>> >>>> The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. >>>> >>>> The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is >>>> >>>> When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). >>>> >>>> If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>>> >>>>> Cheers, >>>>> >>>>> Sophie >>>>> >>>>> >>>>> From: Barry Smith > >>>>> Sent: Friday, September 4, 2020 01:06 >>>>> To: Blondel, Sophie > >>>>> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>> Subject: Re: [petsc-users] Matrix Free Method questions >>>>> >>>>> >>>>> Sophie, >>>>> >>>>> Thanks. I have started looking through the logs >>>>> >>>>> The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. >>>>> >>>>> * It roughly doubles the time since doing the matrix-free product requires a function evaluation >>>>> >>>>> * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations >>>>> >>>>> The change from 2 to 3 (not storing the entire matrix) >>>>> >>>>> * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix >>>>> >>>>> * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. >>>>> >>>>> * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect >>>>> >>>>> * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). >>>>> >>>>> The change from 3 to 4 >>>>> >>>>> * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. >>>>> >>>>> I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. >>>>> >>>>> To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. >>>>> >>>>> Good luck, >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>>> On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Sophie >>>>>> >>>>>> De : Barry Smith > >>>>>> Envoy? : mercredi 2 septembre 2020 15:53 >>>>>> ? : Blondel, Sophie > >>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>> >>>>>> >>>>>> >>>>>>> On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: >>>>>>> >>>>>>> Thank you Barry, >>>>>>> >>>>>>> The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. >>>>>> >>>>>> Sophie, >>>>>> >>>>>> That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) >>>>>> >>>>>>> The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. >>>>>> >>>>>> We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator >>>>>> >>>>>> You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. >>>>>> >>>>>> Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Sophie >>>>>>> De : Barry Smith > >>>>>>> Envoy? : mardi 1 septembre 2020 14:12 >>>>>>> ? : Blondel, Sophie > >>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>> >>>>>>> >>>>>>> Sophie, >>>>>>> >>>>>>> Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. >>>>>>> >>>>>>> I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>>> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: >>>>>>>> >>>>>>>> Hi Barry, >>>>>>>> >>>>>>>> I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: >>>>>>>> Mat mat; >>>>>>>> DMCreateMatrix(da, &mat); >>>>>>>> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>>>>> >>>>>>>> When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 >>>>>>>> >>>>>>>> I get an error: >>>>>>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>>> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined >>>>>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 >>>>>>>> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 >>>>>>>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries >>>>>>>> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c >>>>>>>> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>>>>> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>>>>> [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c >>>>>>>> [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c >>>>>>>> [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c >>>>>>>> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c >>>>>>>> [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>>>>> [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>>>>> PetscSolver::solve: TSSolve failed. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Sophie >>>>>>>> De : Barry Smith > >>>>>>>> Envoy? : lundi 31 ao?t 2020 14:50 >>>>>>>> ? : Blondel, Sophie > >>>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>>> >>>>>>>> >>>>>>>> Sophie, >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> The factor of 4 is lot, the 1.5 not so bad. >>>>>>>> >>>>>>>> You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. >>>>>>>> >>>>>>>> It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). >>>>>>>> >>>>>>>> Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. >>>>>>>> >>>>>>>> Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? >>>>>>>> >>>>>>>> Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>>>>> >>>>>>>> Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. >>>>>>>> >>>>>>>> A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. >>>>>>>> >>>>>>>> Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. >>>>>>>> >>>>>>>> Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. >>>>>>>> >>>>>>>> I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. >>>>>>>> >>>>>>>> >>>>>>>> If you have any questions please let me know. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: >>>>>>>>> >>>>>>>>> Hi Barry, >>>>>>>>> >>>>>>>>> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: >>>>>>>>> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) >>>>>>>>> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Sophie >>>>>>>>> De : Barry Smith > >>>>>>>>> Envoy? : vendredi 28 ao?t 2020 18:31 >>>>>>>>> ? : Blondel, Sophie > >>>>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >>>>>>>>>> >>>>>>>>>> Thank you Jed and Barry, >>>>>>>>>> >>>>>>>>>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >>>>>>>>>> >>>>>>>>>> To answer questions about the current per-conditioners: >>>>>>>>>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >>>>>>>>>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >>>>>>>>>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? >>>>>>>>> >>>>>>>>> Yes, the number of MatMult is a good enough surrogate. >>>>>>>>> >>>>>>>>> So using matrix-free (which means no preconditioning) has >>>>>>>>> >>>>>>>>> 35846/160 >>>>>>>>> >>>>>>>>> ans = >>>>>>>>> >>>>>>>>> 224.0375 >>>>>>>>> >>>>>>>>> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Sophie >>>>>>>>>> De : Barry Smith > >>>>>>>>>> Envoy? : vendredi 28 ao?t 2020 12:12 >>>>>>>>>> ? : Blondel, Sophie > >>>>>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>>>>> >>>>>>>>>> [External Email] >>>>>>>>>> >>>>>>>>>> Sophie, >>>>>>>>>> >>>>>>>>>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >>>>>>>>>> >>>>>>>>>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >>>>>>>>>> >>>>>>>>>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >>>>>>>>>> >>>>>>>>>>> -pc_fieldsplit_detect_coupling >>>>>>>>>> >>>>>>>>>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >>>>>>>>>> >>>>>>>>>>> -fieldsplit_0_pc_type sor >>>>>>>>>> >>>>>>>>>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >>>>>>>>>> >>>>>>>>>>> -fieldsplit_1_pc_type redundant >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >>>>>>>>>> >>>>>>>>>> ---- >>>>>>>>>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >>>>>>>>>> >>>>>>>>>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >>>>>>>>>> >>>>>>>>>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >>>>>>>>>> >>>>>>>>>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >>>>>>>>>> >>>>>>>>>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >>>>>>>>>> >>>>>>>>>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >>>>>>>>>> >>>>>>>>>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>>>>>>>>>> >>>>>>>>>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>>>>>>>>>> >>>>>>>>>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Sophie >>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>> >>>> >>>> >>> >>> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Thu Sep 24 17:09:42 2020 From: jeremy at seamplex.com (Jeremy Theler) Date: Thu, 24 Sep 2020 19:09:42 -0300 Subject: [petsc-users] Finding which cell an arbitrary point belongs to in DMPlex In-Reply-To: <929B80C1-9066-441C-B8E5-49413C36C640@erdw.ethz.ch> References: <70fd5e6fb338d6fc4b2471ec17d7dc7ddaaa9599.camel@seamplex.com> <929B80C1-9066-441C-B8E5-49413C36C640@erdw.ethz.ch> Message-ID: On Wed, 2020-09-16 at 14:29 +0000, Hapla Vaclav wrote: > There is also DMPlexFindVertices() which finds the nearest vertex to > the given coords in the given radius. At first I had understood that this function performed a nearest- neighbor search but after a closer look it sweeps the local DM and marks whether the sought points coincide with a mesh node within eps or not. Neat. > You can then get support or its transitive closure for that vertex. Not directly because in general the sought points will not coincide with a mesh node, but a combination of this function and DMLocatePoints() seems to do the trick. > I wrote it some time ago mainly for debug purposes. It uses just > brute force. I'm not sure it deserves to exist :-) Maybe we should > somehow merge these functionalities. It works, although a kd-tree-based search would be far more efficient than a full sweep over the DM. Thanks -- jeremy > > Thanks, > > Vaclav > > > On 16 Sep 2020, at 01:44, Matthew Knepley > > wrote: > > > > On Tue, Sep 15, 2020 at 6:18 PM Jeremy Theler > > wrote: > > > On Mon, 2020-09-14 at 20:28 -0400, Matthew Knepley wrote: > > > > On Mon, Sep 14, 2020 at 6:15 PM Jeremy Theler < > > > jeremy at seamplex.com> > > > > wrote: > > > > > Hello all > > > > > > > > > > Say I have a fully-interpolated 3D DMPlex and a point with > > > > > arbitrary > > > > > coordinates x,y,z. What's the most efficient way to know > > > which cell > > > > > this point belongs to in parallel? Cells can be either tets > > > or > > > > > hexes. > > > > > > > > I should make a tutorial on this, but have not had time so far. > > > > > > Thank you very much for this mini-tutorial. > > > > > > > > > > > The intention is that you use > > > > > > > > > > > > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMLocatePoints.html > > > > > > > > This will just brute force search unless you also give > > > > > > > > -dm_plex_hash_location > > > > > > Well, for a 3D DMplex PETSc (and git blame) tells me that you > > > "have > > > only coded this for 2D." :-) > > > > > > > Crap. I need to do 3D. It's not hard, just work. > > > > > > which builds a grid hash to accelerate it. I should probably > > > expose > > > > > > > > DMPlexLocatePoint_Internal() > > > > > > > > which handles the single cell queries. If you just had one > > > point, > > > > that might make it simpler, > > > > although you would still write your own loop. > > > > > > I see that DMLocatePoints() loops over all the cells until it > > > finds the > > > right one. I was thinking about finding first the nearest vertex > > > to the > > > point and then sweeping over all the cells that share this vertex > > > testing for DMPlexLocatePoint_Internal(). The nearest node ought > > > to be > > > found using an octree or similar. Any direction regarding this > > > idea? > > > > > > > So you can imagine both a topological search and a geometric > > search. Generally, people want geometric. > > The geometric hash we use is just to bin elements on a regular > > grid. > > > > > > If your intention is to interpolate a field at these > > > > locations, I created > > > > > > > > > > > > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/DMInterpolationCreate.html > > > > > > > > which no one but me uses so far, but I think it is convenient. > > > > > > Any other example apart from src/snes/tutorials/ex63.c? > > > > > > > That is the only one in PETSc. The PyLith code uses this to > > interpolate to seismic stations. > > > > Thanks, > > > > Matt > > > > > Thank you. > > > > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > > Regards > > > > > -- > > > > > jeremy theler > > > > > www.seamplex.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to > > which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ From bsmith at petsc.dev Thu Sep 24 17:23:54 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Sep 2020 17:23:54 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> <600E6AA4-9534-4B39-B7E0-0218AB02E19A@petsc.dev> <60260FA5-BDAE-4F18-8310-D0F3C03B318D@petsc.dev> <4BA6D58A-89C3-44E2-AF34-F4AE94211DC4@petsc.dev> <6DEA0A2F-3020-4C3D-8726-7BE6346B86BB@petsc.dev> <56056045-E253-44BE-AE4C-7EFE44D867ED@petsc.dev> Message-ID: <7F9574E6-DB15-4BA7-B933-0DABA516507A@petsc.dev> > On Sep 24, 2020, at 3:08 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 4:03 PM Blondel, Sophie via petsc-users > wrote: > Hi Barry, > > I probably should have sent this output before (with -log_view_memory to get an idea of where the vectors are created). I looked at it but it doesn't help me much... > > Just quickie, there is 82M in 85 vectors, but your system has 1.5M unknowns, so a single vector is about 12M. Thus, there are probably 5 or 6 systems vectors, and a bunch of small ones. Matt is right, maybe the vectors are as expected. But there is something totally off about the memory used for the matrix. type: mpiaij rows=1552000, cols=1552000, bs=7760 total: nonzeros=1558766, allocated nonzeros=1558766 lower bound should be 12*1558766 = 18,705,192 but it prints Matrix 15 15 8744032 0. Not sure why it is not logging correctly. Anyways probably not important. > > Thanks, > > Matt > > Cheers, > > Sophie > From: Barry Smith > > Sent: Wednesday, September 16, 2020 16:38 > To: Blondel, Sophie > > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Subject: Re: [petsc-users] Matrix Free Method questions > > > Yikes, GAMG is using a lot of vectors. But many of these are much smaller vectors so not of major concern. > > I think this will just have to be an ongoing issue to see where the vectors are created internally and reuse or eliminate as many extra as possible. > > The option -log_view_memory causes the PETSc logging summary to print additional columns showing the memory allocated during the different events in PETSc. This can be useful to see "when" the memory is mostly created; it does not tell us "why" it is created but at least tells us were to look. > > Barry > > >> On Sep 16, 2020, at 1:54 PM, Blondel, Sophie > wrote: >> >> Hi Barry, >> >> I don't think we're explicitly creating many PETSc vectors in Xolotl. There is a global one created for the solution when the TS is set up, and local ones in RHSFunction and RHSJacobian; everywhere else we just get the array from it with DMDAVecGetArrayDOF and DMDAVecRestoreArrayDOF. I tried a few things to see if it changed the number of Vec from 85 (removing monitors, fewer time steps, fewer MPI tasks) but it stayed the same, except when I changed the PC option from "-fieldsplit_1_pc_type redundant" to "-fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1" where I got 10567 vectors. >> >> Cheers, >> >> Sophie >> From: Barry Smith > >> Sent: Tuesday, September 15, 2020 18:37 >> To: Blondel, Sophie > >> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Subject: Re: [petsc-users] Matrix Free Method questions >> >> >> Sophie, >> >> Great, everything looks good. >> >> So the new version takes about 7 times longer, due to the relatively modest increase (about 25 percent) in the number of iterations from the poorer preconditioner convergence and the rest from the much slower matrix-vector product due to using matrix free instead of matrix based precondtioner. Both of these are expected. >> >> The matrix is taking about 10% of the memory it used to require, also expected. >> >> I noticed in the logging the memory for the vectors >> >> Vector 85 85 82303208 0. >> Matrix 15 15 8744032 0. >> >> is substantial/huge, with the much smaller matrix memory the vector memory dominates. >> >> It indicates 85 vectors are used. This is a large number, there are some needed for the TS (maybe 5?) and some needed for the KSP solve (maybe about 37) but I am not sure why there are so many. Perhaps this number could be reduced. Are there are lot of vectors created in the Xolotyl code? I would it could run with about 45 vectors. >> >> Barry >> >> >> >> >>> On Sep 15, 2020, at 5:12 PM, Blondel, Sophie > wrote: >>> >>> Hi Barry, >>> >>> I fixed everything and re-ran the 4 cases in 1D. They took more time than before because I used the Kokkos serial backend on the Xolotl side instead of the CUDA one previously (long story short, I tried to update CUDA and messed up the whole installation). Step 4 looks much better than prevously, I was even able to remove MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and it ran without throwing errors. The log files are attached. >>> >>> Cheers, >>> >>> Sophie >>> From: Barry Smith > >>> Sent: Friday, September 11, 2020 18:03 >>> To: Blondel, Sophie > >>> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>> Subject: Re: [petsc-users] Matrix Free Method questions >>> >>> >>> >>>> On Sep 11, 2020, at 7:45 AM, Blondel, Sophie > wrote: >>>> >>>> Thank you Barry, >>>> >>>> Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations. >>>> >>>> I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code). >>>> >>>> Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms? >>> >>> Yes, you need to leave in everything that affects the diagonal otherwise the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner and likely perform poorly. >>> >>> Barry >>> >>>> >>>> Cheers, >>>> >>>> Sophie >>>> >>>> From: Barry Smith > >>>> Sent: Thursday, September 10, 2020 17:04 >>>> To: Blondel, Sophie > >>>> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>> Subject: Re: [petsc-users] Matrix Free Method questions >>>> >>>> >>>> >>>>> On Sep 10, 2020, at 2:46 PM, Blondel, Sophie > wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it >>>> >>>> I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4). >>>> >>>>> but now when I run step 3 again I get the following error: >>>>> >>>>> [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>> [2]PETSC ERROR: Argument out of range >>>>> [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix >>>>> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>> [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500 >>>>> [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020 >>>>> [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1 >>>>> [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c >>>>> [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>>>> [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>>>> [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c >>>>> PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed. >>>>> >>>>> Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be. >>>> >>>> In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument). >>>> >>>> The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values. >>>> >>>> The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is >>>> >>>> When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian(). >>>> >>>> If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag. >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>>> >>>>> Cheers, >>>>> >>>>> Sophie >>>>> >>>>> >>>>> From: Barry Smith > >>>>> Sent: Friday, September 4, 2020 01:06 >>>>> To: Blondel, Sophie > >>>>> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>> Subject: Re: [petsc-users] Matrix Free Method questions >>>>> >>>>> >>>>> Sophie, >>>>> >>>>> Thanks. I have started looking through the logs >>>>> >>>>> The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising. >>>>> >>>>> * It roughly doubles the time since doing the matrix-free product requires a function evaluation >>>>> >>>>> * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations >>>>> >>>>> The change from 2 to 3 (not storing the entire matrix) >>>>> >>>>> * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix >>>>> >>>>> * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix. >>>>> >>>>> * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect >>>>> >>>>> * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner). >>>>> >>>>> The change from 3 to 4 >>>>> >>>>> * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner. >>>>> >>>>> I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly. >>>>> >>>>> To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations. >>>>> >>>>> Good luck, >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>>> On Sep 3, 2020, at 12:26 PM, Blondel, Sophie > wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Sophie >>>>>> >>>>>> De : Barry Smith > >>>>>> Envoy? : mercredi 2 septembre 2020 15:53 >>>>>> ? : Blondel, Sophie > >>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>> >>>>>> >>>>>> >>>>>>> On Sep 2, 2020, at 1:44 PM, Blondel, Sophie > wrote: >>>>>>> >>>>>>> Thank you Barry, >>>>>>> >>>>>>> The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower. >>>>>> >>>>>> Sophie, >>>>>> >>>>>> That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later) >>>>>> >>>>>>> The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong. >>>>>> >>>>>> We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator >>>>>> >>>>>> You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around. >>>>>> >>>>>> Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first. >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Sophie >>>>>>> De : Barry Smith > >>>>>>> Envoy? : mardi 1 septembre 2020 14:12 >>>>>>> ? : Blondel, Sophie > >>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>> >>>>>>> >>>>>>> Sophie, >>>>>>> >>>>>>> Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix. >>>>>>> >>>>>>> I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>>> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie > wrote: >>>>>>>> >>>>>>>> Hi Barry, >>>>>>>> >>>>>>>> I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code: >>>>>>>> Mat mat; >>>>>>>> DMCreateMatrix(da, &mat); >>>>>>>> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>>>>> >>>>>>>> When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20 >>>>>>>> >>>>>>>> I get an error: >>>>>>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>>> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined >>>>>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000 >>>>>>>> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020 >>>>>>>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries >>>>>>>> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c >>>>>>>> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>>>>> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >>>>>>>> [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c >>>>>>>> [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c >>>>>>>> [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c >>>>>>>> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c >>>>>>>> [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>>>>> [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c >>>>>>>> PetscSolver::solve: TSSolve failed. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Sophie >>>>>>>> De : Barry Smith > >>>>>>>> Envoy? : lundi 31 ao?t 2020 14:50 >>>>>>>> ? : Blondel, Sophie > >>>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>>> >>>>>>>> >>>>>>>> Sophie, >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> The factor of 4 is lot, the 1.5 not so bad. >>>>>>>> >>>>>>>> You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. >>>>>>>> >>>>>>>> It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). >>>>>>>> >>>>>>>> Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. >>>>>>>> >>>>>>>> Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? >>>>>>>> >>>>>>>> Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); >>>>>>>> >>>>>>>> Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. >>>>>>>> >>>>>>>> A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. >>>>>>>> >>>>>>>> Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. >>>>>>>> >>>>>>>> Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. >>>>>>>> >>>>>>>> I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. >>>>>>>> >>>>>>>> >>>>>>>> If you have any questions please let me know. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie > wrote: >>>>>>>>> >>>>>>>>> Hi Barry, >>>>>>>>> >>>>>>>>> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: >>>>>>>>> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) >>>>>>>>> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Sophie >>>>>>>>> De : Barry Smith > >>>>>>>>> Envoy? : vendredi 28 ao?t 2020 18:31 >>>>>>>>> ? : Blondel, Sophie > >>>>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >>>>>>>>>> >>>>>>>>>> Thank you Jed and Barry, >>>>>>>>>> >>>>>>>>>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >>>>>>>>>> >>>>>>>>>> To answer questions about the current per-conditioners: >>>>>>>>>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >>>>>>>>>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >>>>>>>>>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? >>>>>>>>> >>>>>>>>> Yes, the number of MatMult is a good enough surrogate. >>>>>>>>> >>>>>>>>> So using matrix-free (which means no preconditioning) has >>>>>>>>> >>>>>>>>> 35846/160 >>>>>>>>> >>>>>>>>> ans = >>>>>>>>> >>>>>>>>> 224.0375 >>>>>>>>> >>>>>>>>> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Sophie >>>>>>>>>> De : Barry Smith > >>>>>>>>>> Envoy? : vendredi 28 ao?t 2020 12:12 >>>>>>>>>> ? : Blondel, Sophie > >>>>>>>>>> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >>>>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions >>>>>>>>>> >>>>>>>>>> [External Email] >>>>>>>>>> >>>>>>>>>> Sophie, >>>>>>>>>> >>>>>>>>>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >>>>>>>>>> >>>>>>>>>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >>>>>>>>>> >>>>>>>>>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >>>>>>>>>> >>>>>>>>>>> -pc_fieldsplit_detect_coupling >>>>>>>>>> >>>>>>>>>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >>>>>>>>>> >>>>>>>>>>> -fieldsplit_0_pc_type sor >>>>>>>>>> >>>>>>>>>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >>>>>>>>>> >>>>>>>>>>> -fieldsplit_1_pc_type redundant >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >>>>>>>>>> >>>>>>>>>> ---- >>>>>>>>>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >>>>>>>>>> >>>>>>>>>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >>>>>>>>>> >>>>>>>>>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >>>>>>>>>> >>>>>>>>>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >>>>>>>>>> >>>>>>>>>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >>>>>>>>>> >>>>>>>>>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >>>>>>>>>> >>>>>>>>>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>>>>>>>>>> >>>>>>>>>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>>>>>>>>>> >>>>>>>>>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Sophie >>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>> >>>> >>>> >>> >>> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 24 17:30:48 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Sep 2020 18:30:48 -0400 Subject: [petsc-users] Finding which cell an arbitrary point belongs to in DMPlex In-Reply-To: References: <70fd5e6fb338d6fc4b2471ec17d7dc7ddaaa9599.camel@seamplex.com> <929B80C1-9066-441C-B8E5-49413C36C640@erdw.ethz.ch> Message-ID: On Thu, Sep 24, 2020 at 6:09 PM Jeremy Theler wrote: > On Wed, 2020-09-16 at 14:29 +0000, Hapla Vaclav wrote: > > There is also DMPlexFindVertices() which finds the nearest vertex to > > the given coords in the given radius. > > At first I had understood that this function performed a nearest- > neighbor search but after a closer look it sweeps the local DM and > marks whether the sought points coincide with a mesh node within eps or > not. Neat. > > > > You can then get support or its transitive closure for that vertex. > > Not directly because in general the sought points will not coincide > with a mesh node, but a combination of this function and > DMLocatePoints() seems to do the trick. > > > I wrote it some time ago mainly for debug purposes. It uses just > > brute force. I'm not sure it deserves to exist :-) Maybe we should > > somehow merge these functionalities. > > It works, although a kd-tree-based search would be far more efficient > than a full sweep over the DM. > We should not need to do that. LocatePoints() does not sweep the mesh. It just does grid hashing. kd is a little better with really irregular distributions, but hashing should be fine. Thanks, Matt > Thanks > -- > jeremy > > > > > Thanks, > > > > Vaclav > > > > > On 16 Sep 2020, at 01:44, Matthew Knepley > > > wrote: > > > > > > On Tue, Sep 15, 2020 at 6:18 PM Jeremy Theler > > > wrote: > > > > On Mon, 2020-09-14 at 20:28 -0400, Matthew Knepley wrote: > > > > > On Mon, Sep 14, 2020 at 6:15 PM Jeremy Theler < > > > > jeremy at seamplex.com> > > > > > wrote: > > > > > > Hello all > > > > > > > > > > > > Say I have a fully-interpolated 3D DMPlex and a point with > > > > > > arbitrary > > > > > > coordinates x,y,z. What's the most efficient way to know > > > > which cell > > > > > > this point belongs to in parallel? Cells can be either tets > > > > or > > > > > > hexes. > > > > > > > > > > I should make a tutorial on this, but have not had time so far. > > > > > > > > Thank you very much for this mini-tutorial. > > > > > > > > > > > > > > The intention is that you use > > > > > > > > > > > > > > > > > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMLocatePoints.html > > > > > > > > > > This will just brute force search unless you also give > > > > > > > > > > -dm_plex_hash_location > > > > > > > > Well, for a 3D DMplex PETSc (and git blame) tells me that you > > > > "have > > > > only coded this for 2D." :-) > > > > > > > > > > Crap. I need to do 3D. It's not hard, just work. > > > > > > > > which builds a grid hash to accelerate it. I should probably > > > > expose > > > > > > > > > > DMPlexLocatePoint_Internal() > > > > > > > > > > which handles the single cell queries. If you just had one > > > > point, > > > > > that might make it simpler, > > > > > although you would still write your own loop. > > > > > > > > I see that DMLocatePoints() loops over all the cells until it > > > > finds the > > > > right one. I was thinking about finding first the nearest vertex > > > > to the > > > > point and then sweeping over all the cells that share this vertex > > > > testing for DMPlexLocatePoint_Internal(). The nearest node ought > > > > to be > > > > found using an octree or similar. Any direction regarding this > > > > idea? > > > > > > > > > > So you can imagine both a topological search and a geometric > > > search. Generally, people want geometric. > > > The geometric hash we use is just to bin elements on a regular > > > grid. > > > > > > > > If your intention is to interpolate a field at these > > > > > locations, I created > > > > > > > > > > > > > > > > > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/DMInterpolationCreate.html > > > > > > > > > > which no one but me uses so far, but I think it is convenient. > > > > > > > > Any other example apart from src/snes/tutorials/ex63.c? > > > > > > > > > > That is the only one in PETSc. The PyLith code uses this to > > > interpolate to seismic stations. > > > > > > Thanks, > > > > > > Matt > > > > > > > Thank you. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Matt > > > > > > > > > > > Regards > > > > > > -- > > > > > > jeremy theler > > > > > > www.seamplex.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to > > > which their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Fri Sep 25 01:35:39 2020 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Fri, 25 Sep 2020 06:35:39 +0000 Subject: [petsc-users] something wrong with digest? Message-ID: <7e6cbcb43d4640479c569517b2e92e1a@MAR190n2.marin.local> Today I got more than 20 petsc-user Digest emails in my inbox (Vol 141, Issue 78 to 101), many containing only one message and being sent within a few minutes of each other (10:39, 10:41 10:44, ...). Is this really how Digest is supposed to work? Chris dr. ir. Christiaan Klaij | Senior Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl From balay at mcs.anl.gov Fri Sep 25 02:02:30 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 25 Sep 2020 02:02:30 -0500 (CDT) Subject: [petsc-users] something wrong with digest? In-Reply-To: <7e6cbcb43d4640479c569517b2e92e1a@MAR190n2.marin.local> References: <7e6cbcb43d4640479c569517b2e92e1a@MAR190n2.marin.local> Message-ID: I see one of the mailman settings is: How big in Kb should a digest be before it gets sent out? 0 implies no maximum size. (Edit digest_size_threshhold) 40 I don't think we ever changed this value [so likely a default] And I see a bunch of e-mails on the list exceeding this number. Perhaps this is the reason for this many digests [There is no archive of the digests (that I can find) - so can't verify] If this is the case - it must happen every time there is an e-mail with attachments to the list.. Satish On Fri, 25 Sep 2020, Klaij, Christiaan wrote: > > Today I got more than 20 petsc-user Digest emails in my inbox (Vol 141, Issue 78 to 101), many containing only one message and being sent within a few minutes of each other (10:39, 10:41 10:44, ...). Is this really how Digest is supposed to work? > > Chris > > > dr. ir. Christiaan Klaij | Senior Researcher | Research & Development > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > > From C.Klaij at marin.nl Fri Sep 25 02:16:19 2020 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Fri, 25 Sep 2020 07:16:19 +0000 Subject: [petsc-users] something wrong with digest? In-Reply-To: References: <7e6cbcb43d4640479c569517b2e92e1a@MAR190n2.marin.local>, Message-ID: <1601018178791.66693@marin.nl> That could be the reason, there were some lengthy emails indeed. But the attachments are removed in Digest. Chris dr. ir. Christiaan Klaij | Senior Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl ________________________________________ From: Satish Balay Sent: Friday, September 25, 2020 9:02 AM To: Klaij, Christiaan Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] something wrong with digest? I see one of the mailman settings is: How big in Kb should a digest be before it gets sent out? 0 implies no maximum size. (Edit digest_size_threshhold) 40 I don't think we ever changed this value [so likely a default] And I see a bunch of e-mails on the list exceeding this number. Perhaps this is the reason for this many digests [There is no archive of the digests (that I can find) - so can't verify] If this is the case - it must happen every time there is an e-mail with attachments to the list.. Satish On Fri, 25 Sep 2020, Klaij, Christiaan wrote: > > Today I got more than 20 petsc-user Digest emails in my inbox (Vol 141, Issue 78 to 101), many containing only one message and being sent within a few minutes of each other (10:39, 10:41 10:44, ...). Is this really how Digest is supposed to work? > > Chris > > > dr. ir. Christiaan Klaij | Senior Researcher | Research & Development > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > > From olivier.jamond at cea.fr Fri Sep 25 02:31:10 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Fri, 25 Sep 2020 09:31:10 +0200 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct In-Reply-To: References: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> Message-ID: @Mark: Sorry, I totally misunderstood your answer, so my answer to your answer about petsc versions is non-sense... @Hong: MatTransposeMatMult gives (Ct*C), no? And I feel that MatCreateTranspose actually works for mpiaij. On 24/09/2020 16:16, Zhang, Hong via petsc-users wrote: > Olivier and Matt, > > MatPtAP with A=I gives Pt*P, not P*Pt. We have sequential MatRARt > and?MatMatTransposeMult(), but no support for mpiaij matrices. The > problem is that we do not have a way to implement C*Ct without > explicitly transpose C in parallel. > > > We support?MatTransposeMatMult (A*Bt) for mpiaij. Can you use this > instead? > > > Hong > > > ------------------------------------------------------------------------ > *From:* petsc-users on behalf of > Zhang, Hong via petsc-users > *Sent:* Thursday, September 24, 2020 8:56 AM > *To:* Matthew Knepley ; Mark Adams > *Cc:* PETSc > *Subject:* Re: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct > Indeed, we do not have MatCreateTranspose for mpaij matrix. > I can adding such support. How soon do you need it? > Hong > > ------------------------------------------------------------------------ > *From:* petsc-users on behalf of > Matthew Knepley > *Sent:* Thursday, September 24, 2020 6:12 AM > *To:* Mark Adams > *Cc:* PETSc > *Subject:* Re: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct > On Thu, Sep 24, 2020 at 6:48 AM Mark Adams > wrote: > > > Is there a way to avoid the explicit transposition of the matrix? > > > It does not look like we have A*B^T for mpiaij as the?error > message says. I am not finding it in the code. > > Note, MatMatMult with a transpose shell matrix, I suspect that it > does an explicit transpose internally, or it could?notice that you > have C^T*C and we might have that implemented in-place (I doubt > it, but it would be legal and fine to do). > > > We definitely have > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatPtAP.html > > For now, you can put the identity in for A. It would be nice it we > assumed that when A = NULL. > > Patrick, the implementation strategy is broken for the MatProduct > mechanism that was just introduced, so > we cannot see which things are implemented in the documentation. How > would I go about fixing it? > > ? Thanks, > > ? ? ?Matt > > Many thanks, > Olivier Jamond > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Fri Sep 25 06:44:11 2020 From: jeremy at seamplex.com (Jeremy Theler) Date: Fri, 25 Sep 2020 08:44:11 -0300 Subject: [petsc-users] Finding which cell an arbitrary point belongs to in DMPlex In-Reply-To: References: <70fd5e6fb338d6fc4b2471ec17d7dc7ddaaa9599.camel@seamplex.com> <929B80C1-9066-441C-B8E5-49413C36C640@erdw.ethz.ch> Message-ID: On Thu, 2020-09-24 at 18:30 -0400, Matthew Knepley wrote: > > There is also DMPlexFindVertices() which finds the nearest vertex > > to > > > the given coords in the given radius. > > > > At first I had understood that this function performed a nearest- > > neighbor search but after a closer look it sweeps the local DM and > > marks whether the sought points coincide with a mesh node within > > eps or > > not. Neat. This DMPlexFindVertices() sweeps over DMGetCoordinatesLocal() which returns both the local and ghost coordinates, so at the end of the day I might get more than one process claiming to have found the same node. How can I ignore ghost points so each vertex actually belongs to the process that found it? > > > I wrote it some time ago mainly for debug purposes. It uses just > > > brute force. I'm not sure it deserves to exist :-) Maybe we > > should > > > somehow merge these functionalities. > > > > It works, although a kd-tree-based search would be far more > > efficient > > than a full sweep over the DM. > > We should not need to do that. LocatePoints() does not sweep the > mesh. > It just does grid hashing. kd is a little better with really > irregular distributions, > but hashing should be fine. Yes, it seems to be pretty efficent (although there is no support for 3D so far). Two more things about plexgeometry.c: 1. shouldn't line 224 in DMPlexLocatePoint_Simplex_3D_Internal() compare against -eps instead of against zero as donde in line 145 in DMPlexLocatePoint_Simplex_2D_Internal()? 2. wouldn't it be better to replace DMGetDimension() by DMGetCoordinateDim() in line 45 inside DMPlexFindVertices? I have a 2D mesh with 3D coordinates and PetscUnlikely() is triggered. Thanks -- jeremy From knepley at gmail.com Fri Sep 25 07:08:49 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 25 Sep 2020 08:08:49 -0400 Subject: [petsc-users] Finding which cell an arbitrary point belongs to in DMPlex In-Reply-To: References: <70fd5e6fb338d6fc4b2471ec17d7dc7ddaaa9599.camel@seamplex.com> <929B80C1-9066-441C-B8E5-49413C36C640@erdw.ethz.ch> Message-ID: On Fri, Sep 25, 2020 at 7:44 AM Jeremy Theler wrote: > On Thu, 2020-09-24 at 18:30 -0400, Matthew Knepley wrote: > > > There is also DMPlexFindVertices() which finds the nearest vertex > > > to > > > > the given coords in the given radius. > > > > > > At first I had understood that this function performed a nearest- > > > neighbor search but after a closer look it sweeps the local DM and > > > marks whether the sought points coincide with a mesh node within > > > eps or > > > not. Neat. > > This DMPlexFindVertices() sweeps over DMGetCoordinatesLocal() which > returns both the local and ghost coordinates, so at the end of the day > I might get more than one process claiming to have found the same node. > > How can I ignore ghost points so each vertex actually belongs to the > process that found it? > This is just a debugging function from Vaclav. It is not intended to be used for production code. If you wanted to change it, you can just have it search DMGetCoordinates(). It is consistent with Plex since mesh points exist in the local space, and you only have a notion of global with the SF. I think the right modification would be to optionally pass the pointSF and not return points contained in it. > > > > I wrote it some time ago mainly for debug purposes. It uses just > > > > brute force. I'm not sure it deserves to exist :-) Maybe we > > > should > > > > somehow merge these functionalities. > > > > > > It works, although a kd-tree-based search would be far more > > > efficient > > > than a full sweep over the DM. > > > > We should not need to do that. LocatePoints() does not sweep the > > mesh. > > It just does grid hashing. kd is a little better with really > > irregular distributions, > > but hashing should be fine. > > Yes, it seems to be pretty efficent (although there is no support for > 3D so far). > > Two more things about plexgeometry.c: > > 1. shouldn't line 224 in DMPlexLocatePoint_Simplex_3D_Internal() > compare against -eps instead of against zero as donde in line 145 in > DMPlexLocatePoint_Simplex_2D_Internal()? > I hate to do it, but I made the change. It is practical. The Right Thing to do is use the robust predicates from Shewchuk and Bailey. That is on my list for someday. > 2. wouldn't it be better to replace DMGetDimension() by > DMGetCoordinateDim() in line 45 inside DMPlexFindVertices? I have a 2D > mesh with 3D coordinates and PetscUnlikely() is triggered. > You are right. I fixed it. I will put this branch in today hopefully. Thanks, Matt > Thanks > -- > jeremy > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at resfrac.com Fri Sep 25 12:02:35 2020 From: mark at resfrac.com (Mark McClure) Date: Fri, 25 Sep 2020 13:02:35 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> Message-ID: Hello, I work with Chris, and have a few comments, hopefully helpful. Thank you all, for your help. Our program is unfortunately behaving a little bit nondeterministically. I am not sure why because for the OpenMP parts, I test it for race conditions using Intel Inspector and see none. We are using Metis, not Parmetis. Do Petsc or Metis have any behavior that is nondeterministic? We will continue to investigate the cause of the nondeterminism. But that is a challenge for us reproducing this problem because we can run the same simulation 10 times, and we do not get precisely the same result each time. In fact, for this bug, Chris did run it 10 times and did not reproduce. Every month, 1000s of hours of this simulator are being run on the Azure cluster. This crash has been occurring for months, but at infrequent intervals, and have never been able to reproduce it. As such, we can't generate an error dump, unless we gave all users a Petsc build with no optimization and waited weeks until a problem cropped up. Here's what we'll do - we'll internally make a build with debug and then run a large number of simulations until the problem reproduces and we get the error dump. That will take us some time, but we'll do it. Here is a bit more background that might be helpful. At first, we used OpenMPI 2.1.1-8 with Petsc 3.13.2. With that combo, we observed a memory leak, and simulations failed when the node ran out of RAM. Then we switched to MPICH3.3a2 and Petsc 3.13.3. The memory leak went away. That's when we started seeing this bug "greater than largest key allowed". It was unreproducible, but happening relatively more often than it is now (I think) - I was getting a couple user reports a week. Then, we updated to MPICH 3.3.2 and the same Petsc version (3.13.3). The problem seems to be less common - hadn't happened for the past month. But then it happened four times this week. Other background to note - our linear system very frequently changes size/shape. There will be 10,000s of Petsc solves with different matrices (different positions of nonzero values) over the course of a simulation. As far as we can tell, the crash only occurs only after the simulator has run for a long time, 10+ hours. Having said that, it does not appear to be a memory leak - at least, the node has plenty of remaining RAM when these crashes are occurring. Mark On Thu, Sep 24, 2020 at 4:41 PM Mark Adams wrote: > You might add code here like: > > if (ierr) { > for (; iB->rmap->n; i++) { > for ( jilen[i]; j++) { > PetscInt data,gid1 = aj[B->i[i] + j] + 1; // don't use ierr > print rank, gid1; > } > CHKERRQ(ierr); > > I am guessing that somehow you have a table that is bad and too small. It > failed on an index not much bigger than the largest key allowed. Maybe just > compute the max and see if it goes much larger than the largest key allowed. > > If your mesh just changed to you know if it got bigger or smaller... > > Anyway just some thoughts, > Mark > > > > On Thu, Sep 24, 2020 at 4:18 PM Barry Smith wrote: > >> >> >> On Sep 24, 2020, at 2:47 PM, Matthew Knepley wrote: >> >> On Thu, Sep 24, 2020 at 3:42 PM Barry Smith wrote: >> >>> >>> The stack is listed below. It crashes inside MatPtAP(). >>> >> >> What about just checking that the column indices that PtAP receives are >> valid? Are we not doing that? >> >> >> The code that checks for column too large in MatSetValues_MPIXXAIJ() is >> turned off for optimized builds, I am making a MR to always have it on. But >> I doubt this is the problem, other, more harsh crashes, are likely if the >> column index is too large. >> >> This is difficult to debug because all we get is a stack trace. It >> would be good if we produce some information about the current state of the >> objects when the error is detected. We should think about what light-weight >> stuff we could report when errors are detected. >> >> >> Barry >> >> >> Matt >> >> >>> It is possible there is some subtle bug in the rather complex PETSc >>> code for MatPtAP() but I am included to blame MPI again. >>> >>> I think we should add some simple low-overhead always on communication >>> error detecting code to PetscSF where some check sums are also communicated >>> at the highest level of PetscSF(). >>> >>> I don't know how but perhaps when the data is packed per destination >>> rank a checksum is computed and when unpacked the checksum is compared >>> using extra space at the end of the communicated packed array to store and >>> send the checksum. Yes, it is kind of odd for a high level library like >>> PETSc to not trust the communication channel but MPI implementations have >>> proven themselves to not be trustworthy and adding this to PetscSF is not >>> intrusive to the PETSc API or user. Plus it gives a definitive yes or no as >>> to the problem being from an error in the communication. >>> >>> Barry >>> >>> On Sep 24, 2020, at 12:35 PM, Matthew Knepley wrote: >>> >>> On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson wrote: >>> >>>> Hi Guys, >>>> >>>> Thanks for all of the prompt responses, very helpful and appreciated. >>>> >>>> By "when debugging", did you mean when configure >>>> petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a >>>> debugger? >>>> - Both, I have run with a debugger attached and detached, all compiled >>>> with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >>>> >>>> 1) Try OpenMPI (probably won't help, but worth trying) >>>> - Worth a try for sure >>>> >>>> 2) Find which part of the simulation makes it non-deterministic. Is it >>>> the mesh partitioning (parmetis)? Then try to make it deterministic. >>>> - Good tip, it is the mesh partitioning and along the lines of a >>>> question from Barry, the matrix size is changing. I will make this >>>> deterministic and give it a try >>>> >>>> 3) Dump matrices, vectors, etc and see when it fails, you can quickly >>>> reproduce the error by reading in the intermediate data. >>>> - Also a great suggestion, will give it a try >>>> >>>> The full stack would be really useful here. I am guessing this happens >>>> on MatMult(), but I do not know. >>>> - Agreed, I am currently running it so that the full stack will be >>>> produced, but waiting for it to fail, had compiled with all available >>>> optimizations on, but downside is of course if there is a failure. >>>> As a general question, roughly what's the performance impact on petsc >>>> with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging >>>> = 1? >>>> Obviously problem/machine dependant, wondering on guidance more for >>>> this than anything >>>> >>>> Is the nonzero structure of your matrices changing or is it fixed for >>>> the entire simulation? >>>> The non-zero structure is changing, although the structures are >>>> reformed when this happens and this happens thousands of time before the >>>> failure has occured. >>>> >>> >>> Okay, this is the most likely spot for a bug. How are you changing the >>> matrix? It should be impossible to put in an invalid column index when >>> using MatSetValues() >>> because we check all the inputs. However, I do not think we check when >>> you just yank out the arrays. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Does this particular run always crash at the same place? Similar place? >>>> Doesn't always crash? >>>> Doesn't always crash, but other similar runs have crashed in different >>>> spots, which makes it difficult to resolve. I am going to try out a few of >>>> the strategies suggested above and will let you know what comes of that. >>>> >>>> *Chris Hewson* >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>>> >>>> >>>> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith wrote: >>>> >>>>> Chris, >>>>> >>>>> We realize how frustrating this type of problem is to deal with. >>>>> >>>>> Here is the code: >>>>> >>>>> ierr = >>>>> PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >>>>> for (i=0; iB->rmap->n; i++) { >>>>> for (j=0; jilen[i]; j++) { >>>>> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >>>>> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >>>>> if (!data) { >>>>> /* one based table */ >>>>> ierr = >>>>> PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >>>>> } >>>>> } >>>>> } >>>>> >>>>> It is simply looping over the rows of the sparse matrix putting the >>>>> columns it finds into the hash table. >>>>> >>>>> aj[B->i[i] + j] are the column entries, the number of columns in >>>>> the matrix is mat->cmap->N so the column entries should always be >>>>> less than the number of columns. The code is crashing when column >>>>> entry 24443 which is larger than the number of columns 23988. >>>>> >>>>> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is >>>>> incorrect. >>>>> >>>>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>> >>>>>>>>>> >>>>> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >>>>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >>>>> } >>>>> >>>>> Seems to indicate it is setting up a new multiple because it is either >>>>> the first time into the algorithm or the nonzero structure changed on some >>>>> rank requiring a new assembly process. >>>>> >>>>> Is the nonzero structure of your matrices changing or is it fixed >>>>> for the entire simulation? >>>>> >>>>> Since the code has been running for a very long time already I have to >>>>> conclude that this is not the first time through and so something has >>>>> changed in the matrix? >>>>> >>>>> I think we have to put more diagnostics into the library to provide >>>>> more information before or at the time of the error detection. >>>>> >>>>> Does this particular run always crash at the same place? Similar >>>>> place? Doesn't always crash? >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>> On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: >>>>> >>>>> After about a month of not having this issue pop up, it has come up >>>>> again >>>>> >>>>> We have been struggling with a similar PETSc Error for awhile now, the >>>>> error message is as follows: >>>>> >>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>> /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than >>>>> largest key allowed 23988 >>>>> >>>>> It is a particularly nasty bug as it doesn't reproduce itself when >>>>> debugging and doesn't happen all the time with the same inputs either. The >>>>> problem occurs after a long runtime of the code (12-40 hours) and we are >>>>> using a ksp solve with KSPBCGS. >>>>> >>>>> The PETSc compilation options that are used are: >>>>> >>>>> --download-metis >>>>> --download-mpich >>>>> --download-mumps >>>>> --download-parmetis >>>>> --download-ptscotch >>>>> --download-scalapack >>>>> --download-suitesparse >>>>> --prefix=/opt/anl/petsc-3.13.3 >>>>> --with-debugging=0 >>>>> --with-mpi=1 >>>>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>> >>>>> This is being run across 8 processes and is failing consistently on >>>>> the rank 7 process. We also use openmp outside of PETSC and the linear >>>>> solve portion of the code. The rank 0 process is always using compute, >>>>> during this the slave processes use an MPI_Wait call to wait for the >>>>> collective parts of the code to be called again. All PETSC calls are done >>>>> across all of the processes. >>>>> >>>>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 >>>>> package. >>>>> >>>>> At every PETSC call we are checking the error return from the function >>>>> collectively to ensure that no errors have been returned from PETSC. >>>>> >>>>> Some possible causes that I can think of are as follows: >>>>> 1. Memory leak causing a corruption either in our program or in petsc >>>>> or with one of the petsc objects. This seems unlikely as we have checked >>>>> runs with the option -malloc_dump for PETSc and using valgrind. >>>>> >>>>> 2. Optimization flags set for petsc compilation are causing variables >>>>> that go out of scope to be optimized out. >>>>> >>>>> 3. We are giving the wrong number of elements for a process or the >>>>> value is changing during the simulation. This seems unlikely as there is >>>>> nothing overly unique about these simulations and it's not reproducing >>>>> itself. >>>>> >>>>> 4. An MPI channel or socket error causing an error in the collective >>>>> values for PETSc. >>>>> >>>>> Any input on this issue would be greatly appreciated. >>>>> >>>>> *Chris Hewson* >>>>> Senior Reservoir Simulation Engineer >>>>> ResFrac >>>>> +1.587.575.9792 >>>>> >>>>> >>>>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> That is a great idea. I'll figure it out. >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: >>>>>> >>>>>>> >>>>>>> Junchao, >>>>>>> >>>>>>> Any way in the PETSc configure to warn that MPICH version is >>>>>>> "bad" or "untrustworthy" or even the vague "we have suspicians about this >>>>>>> version and recommend avoiding it"? A lot of time could be saved if others >>>>>>> don't deal with the same problem. >>>>>>> >>>>>>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and >>>>>>> always check against that list and print a boxed warning at configure time? >>>>>>> Better you could somehow generalize it and put it in package.py for use by >>>>>>> all packages, then any package can included lists of "suspect" versions. >>>>>>> (There are definitely HDF5 versions that should be avoided :-)). >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang >>>>>>> wrote: >>>>>>> >>>>>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson >>>>>>> wrote: >>>>>>> >>>>>>>> Just as an update to this, I can confirm that using the mpich >>>>>>>> version (3.3.2) downloaded with the petsc download solved this issue on my >>>>>>>> end. >>>>>>>> >>>>>>>> *Chris Hewson* >>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>> ResFrac >>>>>>>> +1.587.575.9792 >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang < >>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>> >>>>>>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Is there a comprehensive MPI test suite (perhaps from >>>>>>>>>> MPICH)? Is there any way to run this full test suite under the problematic >>>>>>>>>> MPI and see if it detects any problems? >>>>>>>>>> >>>>>>>>>> Is so, could someone add it to the FAQ in the debugging >>>>>>>>>> section? >>>>>>>>>> >>>>>>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>>>>>> downloaded mpich >>>>>>>>> . >>>>>>>>> It annoyed me since it is not user-friendly. It might be helpful in >>>>>>>>> catching bugs at very small scale. But say if I want to test allreduce on >>>>>>>>> 1024 ranks on 100 doubles, I have to hack the test suite. >>>>>>>>> Anyway, the instructions are here. >>>>>>>>> >>>>>>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>>>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>>>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>>>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>>>>>> cxx but I had to set CXX! >>>>>>>>> $make -k -j8 // -k is to keep going and ignore compilation >>>>>>>>> errors, e.g., when building tests for MPICH extensions not in MPI standard, >>>>>>>>> but your MPI is OpenMPI. >>>>>>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>>>>>>>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>>>>>> $ make testings or directly './runtests -tests=testlist' >>>>>>>>> >>>>>>>>> On a batch system, >>>>>>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, >>>>>>>>> say btest, >>>>>>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // >>>>>>>>> Use 1024 ranks if a test does no specify the number of processes. >>>>>>>>> $ // It copies test binaries to the batch dir and generates a >>>>>>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>>>>>> then submit a job and wait for its finish. >>>>>>>>> $ cd btest && ../checktests --ignorebogus >>>>>>>>> >>>>>>>>> >>>>>>>>> PS: Fande, changing an MPI fixed your problem does not >>>>>>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>>>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>>>>>> not matter if the code is big. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Trace could look like this: >>>>>>>>>> >>>>>>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>> [640]PETSC ERROR: Argument out of range >>>>>>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key >>>>>>>>>> allowed 740521 >>>>>>>>>> [640]PETSC ERROR: See >>>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>>>> shooting. >>>>>>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 >>>>>>>>>> by wangy2 Sun Jul 19 17:14:28 2020 >>>>>>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>>>>>> --download-mumps=0 >>>>>>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() >>>>>>>>>> line 901 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() >>>>>>>>>> line 3180 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>>>>>> >>>>>>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I am not entirely sure what is happening, but we encountered >>>>>>>>>>> similar issues recently. It was not reproducible. It might occur at >>>>>>>>>>> different stages, and errors could be weird other than "ctable stuff." Our >>>>>>>>>>> code was Valgrind clean since every PR in moose needs to go through >>>>>>>>>>> rigorous Valgrind checks before it reaches the devel branch. The errors >>>>>>>>>>> happened when we used mvapich. >>>>>>>>>>> >>>>>>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then >>>>>>>>>>> everything was smooth. May you try a different MPI? It is better to try a >>>>>>>>>>> system carried one. >>>>>>>>>>> >>>>>>>>>>> We did not get the bottom of this problem yet, but we at least >>>>>>>>>>> know this is kind of MPI-related. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Fande, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I am having a bug that is occurring in PETSC with the return >>>>>>>>>>>> string: >>>>>>>>>>>> >>>>>>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>>>>>> largest key allowed 5693 >>>>>>>>>>>> >>>>>>>>>>>> This is using petsc-3.13.2, compiled and running using mpich >>>>>>>>>>>> with -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>>>>>> same set of instructions etc.). >>>>>>>>>>>> >>>>>>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>>>>>> time and is happening somewhat rarely. >>>>>>>>>>>> >>>>>>>>>>>> More than likely I am using a static variable (code is written >>>>>>>>>>>> in c++) that I'm not updating when the matrix size is changing or something >>>>>>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>>>>>> >>>>>>>>>>>> *Chris Hewson* >>>>>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>>>>> ResFrac >>>>>>>>>>>> +1.587.575.9792 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 25 12:37:58 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 25 Sep 2020 13:37:58 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> Message-ID: On Fri, Sep 25, 2020 at 1:29 PM Mark McClure wrote: > Hello, > > I work with Chris, and have a few comments, hopefully helpful. Thank you > all, for your help. > > Our program is unfortunately behaving a little bit nondeterministically. I > am not sure why because for the OpenMP parts, I test it for race conditions > using Intel Inspector and see none. We are using Metis, not Parmetis. Do > Petsc or Metis have any behavior that is nondeterministic? We will continue > to investigate the cause of the nondeterminism. But that is a challenge for > us reproducing this problem because we can run the same simulation 10 > times, and we do not get precisely the same result each time. In fact, for > this bug, Chris did run it 10 times and did not reproduce. > > Every month, 1000s of hours of this simulator are being run on the > Azure cluster. This crash has been occurring for months, but at infrequent > intervals, and have never been able to reproduce it. As such, we can't > generate an error dump, unless we gave all users a Petsc build with no > optimization and waited weeks until a problem cropped up. > > Here's what we'll do - we'll internally make a build with debug and then > run a large number of simulations until the problem reproduces and we get > the error dump. That will take us some time, but we'll do it. > > Here is a bit more background that might be helpful. At first, we used > OpenMPI 2.1.1-8 with Petsc 3.13.2. With that combo, we observed a memory > leak, and simulations failed when the node ran out of RAM. Then we switched > to MPICH3.3a2 and Petsc 3.13.3. The memory leak went away. That's when we > started seeing this bug "greater than largest key allowed". It was > unreproducible, but happening relatively more often than it is now (I > think) - I was getting a couple user reports a week. Then, we updated to > MPICH 3.3.2 and the same Petsc version (3.13.3). The problem seems to be > less common - hadn't happened for the past month. But then it happened four > times this week. > > Other background to note - our linear system very frequently changes > size/shape. There will be 10,000s of Petsc solves with different matrices > (different positions of nonzero values) over the course of a simulation. As > far as we can tell, the crash only occurs only after the simulator has run > for a long time, 10+ hours. Having said that, it does not appear to be a > memory leak - at least, the node has plenty of remaining RAM when these > crashes are occurring. > If it helps, I believe you can configure an optimized build to add -g, COPTFLAGS="-g" CPPOPTFLAGS="-g" so that symbols are kept in the optimized libraries. That way we should be able to get a stack on a run that performs well. Thanks, Matt > Mark > > > On Thu, Sep 24, 2020 at 4:41 PM Mark Adams wrote: > >> You might add code here like: >> >> if (ierr) { >> for (; iB->rmap->n; i++) { >> for ( jilen[i]; j++) { >> PetscInt data,gid1 = aj[B->i[i] + j] + 1; // don't use ierr >> print rank, gid1; >> } >> CHKERRQ(ierr); >> >> I am guessing that somehow you have a table that is bad and too small. It >> failed on an index not much bigger than the largest key allowed. Maybe just >> compute the max and see if it goes much larger than the largest key allowed. >> >> If your mesh just changed to you know if it got bigger or smaller... >> >> Anyway just some thoughts, >> Mark >> >> >> >> On Thu, Sep 24, 2020 at 4:18 PM Barry Smith wrote: >> >>> >>> >>> On Sep 24, 2020, at 2:47 PM, Matthew Knepley wrote: >>> >>> On Thu, Sep 24, 2020 at 3:42 PM Barry Smith wrote: >>> >>>> >>>> The stack is listed below. It crashes inside MatPtAP(). >>>> >>> >>> What about just checking that the column indices that PtAP receives are >>> valid? Are we not doing that? >>> >>> >>> The code that checks for column too large in MatSetValues_MPIXXAIJ() >>> is turned off for optimized builds, I am making a MR to always have it on. >>> But I doubt this is the problem, other, more harsh crashes, are likely if >>> the column index is too large. >>> >>> This is difficult to debug because all we get is a stack trace. It >>> would be good if we produce some information about the current state of the >>> objects when the error is detected. We should think about what light-weight >>> stuff we could report when errors are detected. >>> >>> >>> Barry >>> >>> >>> Matt >>> >>> >>>> It is possible there is some subtle bug in the rather complex PETSc >>>> code for MatPtAP() but I am included to blame MPI again. >>>> >>>> I think we should add some simple low-overhead always on >>>> communication error detecting code to PetscSF where some check sums are >>>> also communicated at the highest level of PetscSF(). >>>> >>>> I don't know how but perhaps when the data is packed per destination >>>> rank a checksum is computed and when unpacked the checksum is compared >>>> using extra space at the end of the communicated packed array to store and >>>> send the checksum. Yes, it is kind of odd for a high level library like >>>> PETSc to not trust the communication channel but MPI implementations have >>>> proven themselves to not be trustworthy and adding this to PetscSF is not >>>> intrusive to the PETSc API or user. Plus it gives a definitive yes or no as >>>> to the problem being from an error in the communication. >>>> >>>> Barry >>>> >>>> On Sep 24, 2020, at 12:35 PM, Matthew Knepley >>>> wrote: >>>> >>>> On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson wrote: >>>> >>>>> Hi Guys, >>>>> >>>>> Thanks for all of the prompt responses, very helpful and appreciated. >>>>> >>>>> By "when debugging", did you mean when configure >>>>> petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a >>>>> debugger? >>>>> - Both, I have run with a debugger attached and detached, all compiled >>>>> with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >>>>> >>>>> 1) Try OpenMPI (probably won't help, but worth trying) >>>>> - Worth a try for sure >>>>> >>>>> 2) Find which part of the simulation makes it non-deterministic. Is it >>>>> the mesh partitioning (parmetis)? Then try to make it deterministic. >>>>> - Good tip, it is the mesh partitioning and along the lines of a >>>>> question from Barry, the matrix size is changing. I will make this >>>>> deterministic and give it a try >>>>> >>>>> 3) Dump matrices, vectors, etc and see when it fails, you can quickly >>>>> reproduce the error by reading in the intermediate data. >>>>> - Also a great suggestion, will give it a try >>>>> >>>>> The full stack would be really useful here. I am guessing this happens >>>>> on MatMult(), but I do not know. >>>>> - Agreed, I am currently running it so that the full stack will be >>>>> produced, but waiting for it to fail, had compiled with all available >>>>> optimizations on, but downside is of course if there is a failure. >>>>> As a general question, roughly what's the performance impact on petsc >>>>> with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging >>>>> = 1? >>>>> Obviously problem/machine dependant, wondering on guidance more for >>>>> this than anything >>>>> >>>>> Is the nonzero structure of your matrices changing or is it fixed for >>>>> the entire simulation? >>>>> The non-zero structure is changing, although the structures are >>>>> reformed when this happens and this happens thousands of time before the >>>>> failure has occured. >>>>> >>>> >>>> Okay, this is the most likely spot for a bug. How are you changing the >>>> matrix? It should be impossible to put in an invalid column index when >>>> using MatSetValues() >>>> because we check all the inputs. However, I do not think we check when >>>> you just yank out the arrays. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Does this particular run always crash at the same place? Similar >>>>> place? Doesn't always crash? >>>>> Doesn't always crash, but other similar runs have crashed in different >>>>> spots, which makes it difficult to resolve. I am going to try out a few of >>>>> the strategies suggested above and will let you know what comes of that. >>>>> >>>>> *Chris Hewson* >>>>> Senior Reservoir Simulation Engineer >>>>> ResFrac >>>>> +1.587.575.9792 >>>>> >>>>> >>>>> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith wrote: >>>>> >>>>>> Chris, >>>>>> >>>>>> We realize how frustrating this type of problem is to deal with. >>>>>> >>>>>> Here is the code: >>>>>> >>>>>> ierr = >>>>>> PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >>>>>> for (i=0; iB->rmap->n; i++) { >>>>>> for (j=0; jilen[i]; j++) { >>>>>> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >>>>>> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >>>>>> if (!data) { >>>>>> /* one based table */ >>>>>> ierr = >>>>>> PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> It is simply looping over the rows of the sparse matrix putting >>>>>> the columns it finds into the hash table. >>>>>> >>>>>> aj[B->i[i] + j] are the column entries, the number of columns in >>>>>> the matrix is mat->cmap->N so the column entries should always be >>>>>> less than the number of columns. The code is crashing when column >>>>>> entry 24443 which is larger than the number of columns 23988. >>>>>> >>>>>> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N >>>>>> is incorrect. >>>>>> >>>>>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>> >>>>>>>>>>> >>>>>> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >>>>>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >>>>>> } >>>>>> >>>>>> Seems to indicate it is setting up a new multiple because it is >>>>>> either the first time into the algorithm or the nonzero structure changed >>>>>> on some rank requiring a new assembly process. >>>>>> >>>>>> Is the nonzero structure of your matrices changing or is it fixed >>>>>> for the entire simulation? >>>>>> >>>>>> Since the code has been running for a very long time already I have >>>>>> to conclude that this is not the first time through and so something has >>>>>> changed in the matrix? >>>>>> >>>>>> I think we have to put more diagnostics into the library to provide >>>>>> more information before or at the time of the error detection. >>>>>> >>>>>> Does this particular run always crash at the same place? Similar >>>>>> place? Doesn't always crash? >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: >>>>>> >>>>>> After about a month of not having this issue pop up, it has come up >>>>>> again >>>>>> >>>>>> We have been struggling with a similar PETSc Error for awhile now, >>>>>> the error message is as follows: >>>>>> >>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>> /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than >>>>>> largest key allowed 23988 >>>>>> >>>>>> It is a particularly nasty bug as it doesn't reproduce itself when >>>>>> debugging and doesn't happen all the time with the same inputs either. The >>>>>> problem occurs after a long runtime of the code (12-40 hours) and we are >>>>>> using a ksp solve with KSPBCGS. >>>>>> >>>>>> The PETSc compilation options that are used are: >>>>>> >>>>>> --download-metis >>>>>> --download-mpich >>>>>> --download-mumps >>>>>> --download-parmetis >>>>>> --download-ptscotch >>>>>> --download-scalapack >>>>>> --download-suitesparse >>>>>> --prefix=/opt/anl/petsc-3.13.3 >>>>>> --with-debugging=0 >>>>>> --with-mpi=1 >>>>>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>>> >>>>>> This is being run across 8 processes and is failing consistently on >>>>>> the rank 7 process. We also use openmp outside of PETSC and the linear >>>>>> solve portion of the code. The rank 0 process is always using compute, >>>>>> during this the slave processes use an MPI_Wait call to wait for the >>>>>> collective parts of the code to be called again. All PETSC calls are done >>>>>> across all of the processes. >>>>>> >>>>>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 >>>>>> package. >>>>>> >>>>>> At every PETSC call we are checking the error return from the >>>>>> function collectively to ensure that no errors have been returned from >>>>>> PETSC. >>>>>> >>>>>> Some possible causes that I can think of are as follows: >>>>>> 1. Memory leak causing a corruption either in our program or in petsc >>>>>> or with one of the petsc objects. This seems unlikely as we have checked >>>>>> runs with the option -malloc_dump for PETSc and using valgrind. >>>>>> >>>>>> 2. Optimization flags set for petsc compilation are causing variables >>>>>> that go out of scope to be optimized out. >>>>>> >>>>>> 3. We are giving the wrong number of elements for a process or the >>>>>> value is changing during the simulation. This seems unlikely as there is >>>>>> nothing overly unique about these simulations and it's not reproducing >>>>>> itself. >>>>>> >>>>>> 4. An MPI channel or socket error causing an error in the collective >>>>>> values for PETSc. >>>>>> >>>>>> Any input on this issue would be greatly appreciated. >>>>>> >>>>>> *Chris Hewson* >>>>>> Senior Reservoir Simulation Engineer >>>>>> ResFrac >>>>>> +1.587.575.9792 >>>>>> >>>>>> >>>>>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang < >>>>>> junchao.zhang at gmail.com> wrote: >>>>>> >>>>>>> That is a great idea. I'll figure it out. >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Junchao, >>>>>>>> >>>>>>>> Any way in the PETSc configure to warn that MPICH version is >>>>>>>> "bad" or "untrustworthy" or even the vague "we have suspicians about this >>>>>>>> version and recommend avoiding it"? A lot of time could be saved if others >>>>>>>> don't deal with the same problem. >>>>>>>> >>>>>>>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc >>>>>>>> and always check against that list and print a boxed warning at configure >>>>>>>> time? Better you could somehow generalize it and put it in package.py for >>>>>>>> use by all packages, then any package can included lists of "suspect" >>>>>>>> versions. (There are definitely HDF5 versions that should be avoided :-)). >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang < >>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>> >>>>>>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Just as an update to this, I can confirm that using the mpich >>>>>>>>> version (3.3.2) downloaded with the petsc download solved this issue on my >>>>>>>>> end. >>>>>>>>> >>>>>>>>> *Chris Hewson* >>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>> ResFrac >>>>>>>>> +1.587.575.9792 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Is there a comprehensive MPI test suite (perhaps from >>>>>>>>>>> MPICH)? Is there any way to run this full test suite under the problematic >>>>>>>>>>> MPI and see if it detects any problems? >>>>>>>>>>> >>>>>>>>>>> Is so, could someone add it to the FAQ in the debugging >>>>>>>>>>> section? >>>>>>>>>>> >>>>>>>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>>>>>>> downloaded mpich >>>>>>>>>> . >>>>>>>>>> It annoyed me since it is not user-friendly. It might be helpful in >>>>>>>>>> catching bugs at very small scale. But say if I want to test allreduce on >>>>>>>>>> 1024 ranks on 100 doubles, I have to hack the test suite. >>>>>>>>>> Anyway, the instructions are here. >>>>>>>>>> >>>>>>>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>>>>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>>>>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>>>>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>>>>>>> cxx but I had to set CXX! >>>>>>>>>> $make -k -j8 // -k is to keep going and ignore compilation >>>>>>>>>> errors, e.g., when building tests for MPICH extensions not in MPI standard, >>>>>>>>>> but your MPI is OpenMPI. >>>>>>>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those >>>>>>>>>> are sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>>>>>>> $ make testings or directly './runtests -tests=testlist' >>>>>>>>>> >>>>>>>>>> On a batch system, >>>>>>>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch >>>>>>>>>> dir, say btest, >>>>>>>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // >>>>>>>>>> Use 1024 ranks if a test does no specify the number of processes. >>>>>>>>>> $ // It copies test binaries to the batch dir and generates a >>>>>>>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>>>>>>> then submit a job and wait for its finish. >>>>>>>>>> $ cd btest && ../checktests --ignorebogus >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> PS: Fande, changing an MPI fixed your problem does not >>>>>>>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>>>>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>>>>>>> not matter if the code is big. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Trace could look like this: >>>>>>>>>>> >>>>>>>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>> [640]PETSC ERROR: Argument out of range >>>>>>>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key >>>>>>>>>>> allowed 740521 >>>>>>>>>>> [640]PETSC ERROR: See >>>>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for >>>>>>>>>>> trouble shooting. >>>>>>>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>>>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named >>>>>>>>>>> r6i5n18 by wangy2 Sun Jul 19 17:14:28 2020 >>>>>>>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>>>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>>>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>>>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>>>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>>>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>>>>>>> --download-mumps=0 >>>>>>>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>>>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>>>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() >>>>>>>>>>> line 901 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>>>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() >>>>>>>>>>> line 3180 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>>>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>>>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>>>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>>>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>>>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>>>>>>> >>>>>>>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I am not entirely sure what is happening, but we encountered >>>>>>>>>>>> similar issues recently. It was not reproducible. It might occur at >>>>>>>>>>>> different stages, and errors could be weird other than "ctable stuff." Our >>>>>>>>>>>> code was Valgrind clean since every PR in moose needs to go through >>>>>>>>>>>> rigorous Valgrind checks before it reaches the devel branch. The errors >>>>>>>>>>>> happened when we used mvapich. >>>>>>>>>>>> >>>>>>>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then >>>>>>>>>>>> everything was smooth. May you try a different MPI? It is better to try a >>>>>>>>>>>> system carried one. >>>>>>>>>>>> >>>>>>>>>>>> We did not get the bottom of this problem yet, but we at least >>>>>>>>>>>> know this is kind of MPI-related. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Fande, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I am having a bug that is occurring in PETSC with the return >>>>>>>>>>>>> string: >>>>>>>>>>>>> >>>>>>>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>>>>>>> largest key allowed 5693 >>>>>>>>>>>>> >>>>>>>>>>>>> This is using petsc-3.13.2, compiled and running using mpich >>>>>>>>>>>>> with -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>>>>>>> same set of instructions etc.). >>>>>>>>>>>>> >>>>>>>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>>>>>>> time and is happening somewhat rarely. >>>>>>>>>>>>> >>>>>>>>>>>>> More than likely I am using a static variable (code is written >>>>>>>>>>>>> in c++) that I'm not updating when the matrix size is changing or something >>>>>>>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>>>>>>> >>>>>>>>>>>>> *Chris Hewson* >>>>>>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>>>>>> ResFrac >>>>>>>>>>>>> +1.587.575.9792 >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 25 13:08:36 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 25 Sep 2020 13:08:36 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> Message-ID: <7DBD79E0-B593-4364-84E2-6204BECC168E@petsc.dev> I May 2019 Lisandro changed the versions of Metis and ParMetis that PETSc uses to use a portable machine independent random number generator so if you are having PETSc install Metis then its random number generator should generate the exact same random numbers on repeated identical runs on the same machine or different machines. If this does not appear to be the case please let us know. PETSc doesn't use random numbers much but when it does they are all portable and machine independent and produce identical results for identical runs. Due to the non-commutativity of floating point arithmetic with parallelism (numbers arrive in different orders on identical runs) the numerical results will differ and hence decisions (iteration convergence etc) will differ so the "results" can vary a great deal in different identical runs. We did have a user getting "random" errors only after very long runs that were due to "hardware errors" perhaps due to overheating so it is also possible your problems are not due directly to memory corruption, PETSc or MPI or even OS software. This may be too complicated for your work flow but perhaps if you restricted your jobs to shorter times with a restart mechanism these problems would not appear. Barry Truly random "hardware" errors often produce just wildly crazy numbers, in your report the numbers are off but not absurdly large which points to possible software errors. > On Sep 25, 2020, at 12:02 PM, Mark McClure wrote: > > Hello, > > I work with Chris, and have a few comments, hopefully helpful. Thank you all, for your help. > > Our program is unfortunately behaving a little bit nondeterministically. I am not sure why because for the OpenMP parts, I test it for race conditions using Intel Inspector and see none. We are using Metis, not Parmetis. Do Petsc or Metis have any behavior that is nondeterministic? We will continue to investigate the cause of the nondeterminism. But that is a challenge for us reproducing this problem because we can run the same simulation 10 times, and we do not get precisely the same result each time. In fact, for this bug, Chris did run it 10 times and did not reproduce. > > Every month, 1000s of hours of this simulator are being run on the Azure cluster. This crash has been occurring for months, but at infrequent intervals, and have never been able to reproduce it. As such, we can't generate an error dump, unless we gave all users a Petsc build with no optimization and waited weeks until a problem cropped up. > > Here's what we'll do - we'll internally make a build with debug and then run a large number of simulations until the problem reproduces and we get the error dump. That will take us some time, but we'll do it. > > Here is a bit more background that might be helpful. At first, we used OpenMPI 2.1.1-8 with Petsc 3.13.2. With that combo, we observed a memory leak, and simulations failed when the node ran out of RAM. Then we switched to MPICH3.3a2 and Petsc 3.13.3. The memory leak went away. That's when we started seeing this bug "greater than largest key allowed". It was unreproducible, but happening relatively more often than it is now (I think) - I was getting a couple user reports a week. Then, we updated to MPICH 3.3.2 and the same Petsc version (3.13.3). The problem seems to be less common - hadn't happened for the past month. But then it happened four times this week. > > Other background to note - our linear system very frequently changes size/shape. There will be 10,000s of Petsc solves with different matrices (different positions of nonzero values) over the course of a simulation. As far as we can tell, the crash only occurs only after the simulator has run for a long time, 10+ hours. Having said that, it does not appear to be a memory leak - at least, the node has plenty of remaining RAM when these crashes are occurring. > > Mark > > > On Thu, Sep 24, 2020 at 4:41 PM Mark Adams > wrote: > You might add code here like: > > if (ierr) { > for (; iB->rmap->n; i++) { > for ( jilen[i]; j++) { > PetscInt data,gid1 = aj[B->i[i] + j] + 1; // don't use ierr > print rank, gid1; > } > CHKERRQ(ierr); > > I am guessing that somehow you have a table that is bad and too small. It failed on an index not much bigger than the largest key allowed. Maybe just compute the max and see if it goes much larger than the largest key allowed. > > If your mesh just changed to you know if it got bigger or smaller... > > Anyway just some thoughts, > Mark > > > > On Thu, Sep 24, 2020 at 4:18 PM Barry Smith > wrote: > > >> On Sep 24, 2020, at 2:47 PM, Matthew Knepley > wrote: >> >> On Thu, Sep 24, 2020 at 3:42 PM Barry Smith > wrote: >> >> The stack is listed below. It crashes inside MatPtAP(). >> >> What about just checking that the column indices that PtAP receives are valid? Are we not doing that? > > The code that checks for column too large in MatSetValues_MPIXXAIJ() is turned off for optimized builds, I am making a MR to always have it on. But I doubt this is the problem, other, more harsh crashes, are likely if the column index is too large. > > This is difficult to debug because all we get is a stack trace. It would be good if we produce some information about the current state of the objects when the error is detected. We should think about what light-weight stuff we could report when errors are detected. > > > Barry > >> >> Matt >> >> It is possible there is some subtle bug in the rather complex PETSc code for MatPtAP() but I am included to blame MPI again. >> >> I think we should add some simple low-overhead always on communication error detecting code to PetscSF where some check sums are also communicated at the highest level of PetscSF(). >> >> I don't know how but perhaps when the data is packed per destination rank a checksum is computed and when unpacked the checksum is compared using extra space at the end of the communicated packed array to store and send the checksum. Yes, it is kind of odd for a high level library like PETSc to not trust the communication channel but MPI implementations have proven themselves to not be trustworthy and adding this to PetscSF is not intrusive to the PETSc API or user. Plus it gives a definitive yes or no as to the problem being from an error in the communication. >> >> Barry >> >>> On Sep 24, 2020, at 12:35 PM, Matthew Knepley > wrote: >>> >>> On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson > wrote: >>> Hi Guys, >>> >>> Thanks for all of the prompt responses, very helpful and appreciated. >>> >>> By "when debugging", did you mean when configure petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a debugger? >>> - Both, I have run with a debugger attached and detached, all compiled with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >>> >>> 1) Try OpenMPI (probably won't help, but worth trying) >>> - Worth a try for sure >>> >>> 2) Find which part of the simulation makes it non-deterministic. Is it the mesh partitioning (parmetis)? Then try to make it deterministic. >>> - Good tip, it is the mesh partitioning and along the lines of a question from Barry, the matrix size is changing. I will make this deterministic and give it a try >>> >>> 3) Dump matrices, vectors, etc and see when it fails, you can quickly reproduce the error by reading in the intermediate data. >>> - Also a great suggestion, will give it a try >>> >>> The full stack would be really useful here. I am guessing this happens on MatMult(), but I do not know. >>> - Agreed, I am currently running it so that the full stack will be produced, but waiting for it to fail, had compiled with all available optimizations on, but downside is of course if there is a failure. >>> As a general question, roughly what's the performance impact on petsc with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging = 1? >>> Obviously problem/machine dependant, wondering on guidance more for this than anything >>> >>> Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? >>> The non-zero structure is changing, although the structures are reformed when this happens and this happens thousands of time before the failure has occured. >>> >>> Okay, this is the most likely spot for a bug. How are you changing the matrix? It should be impossible to put in an invalid column index when using MatSetValues() >>> because we check all the inputs. However, I do not think we check when you just yank out the arrays. >>> >>> Thanks, >>> >>> Matt >>> >>> Does this particular run always crash at the same place? Similar place? Doesn't always crash? >>> Doesn't always crash, but other similar runs have crashed in different spots, which makes it difficult to resolve. I am going to try out a few of the strategies suggested above and will let you know what comes of that. >>> >>> Chris Hewson >>> Senior Reservoir Simulation Engineer >>> ResFrac >>> +1.587.575.9792 >>> >>> >>> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith > wrote: >>> Chris, >>> >>> We realize how frustrating this type of problem is to deal with. >>> >>> Here is the code: >>> >>> ierr = PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >>> for (i=0; iB->rmap->n; i++) { >>> for (j=0; jilen[i]; j++) { >>> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >>> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >>> if (!data) { >>> /* one based table */ >>> ierr = PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >>> } >>> } >>> } >>> >>> It is simply looping over the rows of the sparse matrix putting the columns it finds into the hash table. >>> >>> aj[B->i[i] + j] are the column entries, the number of columns in the matrix is mat->cmap->N so the column entries should always be >>> less than the number of columns. The code is crashing when column entry 24443 which is larger than the number of columns 23988. >>> >>> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N is incorrect. >>> >>>>>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> >>> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >>> } >>> >>> Seems to indicate it is setting up a new multiple because it is either the first time into the algorithm or the nonzero structure changed on some rank requiring a new assembly process. >>> >>> Is the nonzero structure of your matrices changing or is it fixed for the entire simulation? >>> >>> Since the code has been running for a very long time already I have to conclude that this is not the first time through and so something has changed in the matrix? >>> >>> I think we have to put more diagnostics into the library to provide more information before or at the time of the error detection. >>> >>> Does this particular run always crash at the same place? Similar place? Doesn't always crash? >>> >>> Barry >>> >>> >>> >>> >>>> On Sep 24, 2020, at 8:46 AM, Chris Hewson > wrote: >>>> >>>> After about a month of not having this issue pop up, it has come up again >>>> >>>> We have been struggling with a similar PETSc Error for awhile now, the error message is as follows: >>>> >>>> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than largest key allowed 23988 >>>> >>>> It is a particularly nasty bug as it doesn't reproduce itself when debugging and doesn't happen all the time with the same inputs either. The problem occurs after a long runtime of the code (12-40 hours) and we are using a ksp solve with KSPBCGS. >>>> >>>> The PETSc compilation options that are used are: >>>> >>>> --download-metis >>>> --download-mpich >>>> --download-mumps >>>> --download-parmetis >>>> --download-ptscotch >>>> --download-scalapack >>>> --download-suitesparse >>>> --prefix=/opt/anl/petsc-3.13.3 >>>> --with-debugging=0 >>>> --with-mpi=1 >>>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>> >>>> This is being run across 8 processes and is failing consistently on the rank 7 process. We also use openmp outside of PETSC and the linear solve portion of the code. The rank 0 process is always using compute, during this the slave processes use an MPI_Wait call to wait for the collective parts of the code to be called again. All PETSC calls are done across all of the processes. >>>> >>>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 package. >>>> >>>> At every PETSC call we are checking the error return from the function collectively to ensure that no errors have been returned from PETSC. >>>> >>>> Some possible causes that I can think of are as follows: >>>> 1. Memory leak causing a corruption either in our program or in petsc or with one of the petsc objects. This seems unlikely as we have checked runs with the option -malloc_dump for PETSc and using valgrind. >>>> >>>> 2. Optimization flags set for petsc compilation are causing variables that go out of scope to be optimized out. >>>> >>>> 3. We are giving the wrong number of elements for a process or the value is changing during the simulation. This seems unlikely as there is nothing overly unique about these simulations and it's not reproducing itself. >>>> >>>> 4. An MPI channel or socket error causing an error in the collective values for PETSc. >>>> >>>> Any input on this issue would be greatly appreciated. >>>> >>>> Chris Hewson >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>>> >>>> >>>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang > wrote: >>>> That is a great idea. I'll figure it out. >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith > wrote: >>>> >>>> Junchao, >>>> >>>> Any way in the PETSc configure to warn that MPICH version is "bad" or "untrustworthy" or even the vague "we have suspicians about this version and recommend avoiding it"? A lot of time could be saved if others don't deal with the same problem. >>>> >>>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and always check against that list and print a boxed warning at configure time? Better you could somehow generalize it and put it in package.py for use by all packages, then any package can included lists of "suspect" versions. (There are definitely HDF5 versions that should be avoided :-)). >>>> >>>> Barry >>>> >>>> >>>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang > wrote: >>>>> >>>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson > wrote: >>>>> Just as an update to this, I can confirm that using the mpich version (3.3.2) downloaded with the petsc download solved this issue on my end. >>>>> >>>>> Chris Hewson >>>>> Senior Reservoir Simulation Engineer >>>>> ResFrac >>>>> +1.587.575.9792 >>>>> >>>>> >>>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang > wrote: >>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith > wrote: >>>>> >>>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is there any way to run this full test suite under the problematic MPI and see if it detects any problems? >>>>> >>>>> Is so, could someone add it to the FAQ in the debugging section? >>>>> MPICH does have a test suite. It is at the subdir test/mpi of downloaded mpich . It annoyed me since it is not user-friendly. It might be helpful in catching bugs at very small scale. But say if I want to test allreduce on 1024 ranks on 100 doubles, I have to hack the test suite. >>>>> Anyway, the instructions are here. >>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled cxx but I had to set CXX! >>>>> $make -k -j8 // -k is to keep going and ignore compilation errors, e.g., when building tests for MPICH extensions not in MPI standard, but your MPI is OpenMPI. >>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>> $ make testings or directly './runtests -tests=testlist' >>>>> >>>>> On a batch system, >>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say btest, >>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use 1024 ranks if a test does no specify the number of processes. >>>>> $ // It copies test binaries to the batch dir and generates a script runtests.batch there. Edit the script to fit your batch system and then submit a job and wait for its finish. >>>>> $ cd btest && ../checktests --ignorebogus >>>>> >>>>> PS: Fande, changing an MPI fixed your problem does not necessarily mean the old MPI has bugs. It is complicated. It could be a petsc bug. You need to provide us a code to reproduce your error. It does not matter if the code is big. >>>>> >>>>> >>>>> Thanks >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong > wrote: >>>>>> >>>>>> Trace could look like this: >>>>>> >>>>>> [640]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>>> [640]PETSC ERROR: Argument out of range >>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521 >>>>>> [640]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by wangy2 Sun Jul 19 17:14:28 2020 >>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices --download-mumps=0 >>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>> >>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong > wrote: >>>>>> I am not entirely sure what is happening, but we encountered similar issues recently. It was not reproducible. It might occur at different stages, and errors could be weird other than "ctable stuff." Our code was Valgrind clean since every PR in moose needs to go through rigorous Valgrind checks before it reaches the devel branch. The errors happened when we used mvapich. >>>>>> >>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was smooth. May you try a different MPI? It is better to try a system carried one. >>>>>> >>>>>> We did not get the bottom of this problem yet, but we at least know this is kind of MPI-related. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Fande, >>>>>> >>>>>> >>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson > wrote: >>>>>> Hi, >>>>>> >>>>>> I am having a bug that is occurring in PETSC with the return string: >>>>>> >>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than largest key allowed 5693 >>>>>> >>>>>> This is using petsc-3.13.2, compiled and running using mpich with -O3 and debugging turned off tuned to the haswell architecture and occurring either before or during a KSPBCGS solve/setup or during a MUMPS factorization solve (I haven't been able to replicate this issue with the same set of instructions etc.). >>>>>> >>>>>> This is a terrible way to ask a question, I know, and not very helpful from your side, but this is what I have from a user's run and can't reproduce on my end (either with the optimization compilation or with debugging turned on). This happens when the code has run for quite some time and is happening somewhat rarely. >>>>>> >>>>>> More than likely I am using a static variable (code is written in c++) that I'm not updating when the matrix size is changing or something silly like that, but any help or guidance on this would be appreciated. >>>>>> >>>>>> Chris Hewson >>>>>> Senior Reservoir Simulation Engineer >>>>>> ResFrac >>>>>> +1.587.575.9792 >>>>> >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at resfrac.com Sat Sep 26 10:17:14 2020 From: mark at resfrac.com (Mark McClure) Date: Sat, 26 Sep 2020 11:17:14 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: <7DBD79E0-B593-4364-84E2-6204BECC168E@petsc.dev> References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> <7DBD79E0-B593-4364-84E2-6204BECC168E@petsc.dev> Message-ID: Thank you, all for the explanations. Following Matt's suggestion, we'll use -g (and not use -with-debugging=0) all future compiles to all users, so in future, we can provide better information. Second, Chris is going to boil our function down to minimum stub and share in case there is some subtle issue with the way the functions are being called. Third, I have question/request - Petsc is, in fact, detecting an error. As far as I can tell, this is not an uncontrolled 'seg fault'. It seems to me that maybe Petsc could choose to return out from the function when it detects this error, returning an error code, rather than dumping the core and terminating the program. If Petsc simply returned out with an error message, this would resolve the problem for us. After the Petsc call, we check for Petsc error messages. If Petsc returns an error - that's fine - we use a direct solver as a backup, and the simulation continues. So - I am not sure whether this is feasible - but if Petsc could return out with an error message - rather than dumping the core and terminating the program - then that would effectively resolve the issue for us. Would this change be possible? Mark On Fri, Sep 25, 2020 at 2:08 PM Barry Smith wrote: > > I May 2019 Lisandro changed the versions of Metis and ParMetis that > PETSc uses to use a portable machine independent random number generator so > if you are having PETSc install Metis then its random number generator > should generate the exact same random numbers on repeated identical runs on > the same machine or different machines. If this does not appear to be the > case please let us know. PETSc doesn't use random numbers much but when it > does they are all portable and machine independent and produce identical > results for identical runs. > > Due to the non-commutativity of floating point arithmetic with > parallelism (numbers arrive in different orders on identical runs) the > numerical results will differ and hence decisions (iteration convergence > etc) will differ so the "results" can vary a great deal in different > identical runs. > > We did have a user getting "random" errors only after very long runs > that were due to "hardware errors" perhaps due to overheating so it is also > possible your problems are not due directly to memory corruption, PETSc or > MPI or even OS software. This may be too complicated for your work flow but > perhaps if you restricted your jobs to shorter times with a restart > mechanism these problems would not appear. > > Barry > > Truly random "hardware" errors often produce just wildly crazy numbers, > in your report the numbers are off but not absurdly large which points to > possible software errors. > > > > > On Sep 25, 2020, at 12:02 PM, Mark McClure wrote: > > Hello, > > I work with Chris, and have a few comments, hopefully helpful. Thank you > all, for your help. > > Our program is unfortunately behaving a little bit nondeterministically. I > am not sure why because for the OpenMP parts, I test it for race conditions > using Intel Inspector and see none. We are using Metis, not Parmetis. Do > Petsc or Metis have any behavior that is nondeterministic? We will continue > to investigate the cause of the nondeterminism. But that is a challenge for > us reproducing this problem because we can run the same simulation 10 > times, and we do not get precisely the same result each time. In fact, for > this bug, Chris did run it 10 times and did not reproduce. > > Every month, 1000s of hours of this simulator are being run on the > Azure cluster. This crash has been occurring for months, but at infrequent > intervals, and have never been able to reproduce it. As such, we can't > generate an error dump, unless we gave all users a Petsc build with no > optimization and waited weeks until a problem cropped up. > > Here's what we'll do - we'll internally make a build with debug and then > run a large number of simulations until the problem reproduces and we get > the error dump. That will take us some time, but we'll do it. > > Here is a bit more background that might be helpful. At first, we used > OpenMPI 2.1.1-8 with Petsc 3.13.2. With that combo, we observed a memory > leak, and simulations failed when the node ran out of RAM. Then we switched > to MPICH3.3a2 and Petsc 3.13.3. The memory leak went away. That's when we > started seeing this bug "greater than largest key allowed". It was > unreproducible, but happening relatively more often than it is now (I > think) - I was getting a couple user reports a week. Then, we updated to > MPICH 3.3.2 and the same Petsc version (3.13.3). The problem seems to be > less common - hadn't happened for the past month. But then it happened four > times this week. > > Other background to note - our linear system very frequently changes > size/shape. There will be 10,000s of Petsc solves with different matrices > (different positions of nonzero values) over the course of a simulation. As > far as we can tell, the crash only occurs only after the simulator has run > for a long time, 10+ hours. Having said that, it does not appear to be a > memory leak - at least, the node has plenty of remaining RAM when these > crashes are occurring. > > Mark > > > On Thu, Sep 24, 2020 at 4:41 PM Mark Adams wrote: > >> You might add code here like: >> >> if (ierr) { >> for (; iB->rmap->n; i++) { >> for ( jilen[i]; j++) { >> PetscInt data,gid1 = aj[B->i[i] + j] + 1; // don't use ierr >> print rank, gid1; >> } >> CHKERRQ(ierr); >> >> I am guessing that somehow you have a table that is bad and too small. It >> failed on an index not much bigger than the largest key allowed. Maybe just >> compute the max and see if it goes much larger than the largest key allowed. >> >> If your mesh just changed to you know if it got bigger or smaller... >> >> Anyway just some thoughts, >> Mark >> >> >> >> On Thu, Sep 24, 2020 at 4:18 PM Barry Smith wrote: >> >>> >>> >>> On Sep 24, 2020, at 2:47 PM, Matthew Knepley wrote: >>> >>> On Thu, Sep 24, 2020 at 3:42 PM Barry Smith wrote: >>> >>>> >>>> The stack is listed below. It crashes inside MatPtAP(). >>>> >>> >>> What about just checking that the column indices that PtAP receives are >>> valid? Are we not doing that? >>> >>> >>> The code that checks for column too large in MatSetValues_MPIXXAIJ() >>> is turned off for optimized builds, I am making a MR to always have it on. >>> But I doubt this is the problem, other, more harsh crashes, are likely if >>> the column index is too large. >>> >>> This is difficult to debug because all we get is a stack trace. It >>> would be good if we produce some information about the current state of the >>> objects when the error is detected. We should think about what light-weight >>> stuff we could report when errors are detected. >>> >>> >>> Barry >>> >>> >>> Matt >>> >>> >>>> It is possible there is some subtle bug in the rather complex PETSc >>>> code for MatPtAP() but I am included to blame MPI again. >>>> >>>> I think we should add some simple low-overhead always on >>>> communication error detecting code to PetscSF where some check sums are >>>> also communicated at the highest level of PetscSF(). >>>> >>>> I don't know how but perhaps when the data is packed per destination >>>> rank a checksum is computed and when unpacked the checksum is compared >>>> using extra space at the end of the communicated packed array to store and >>>> send the checksum. Yes, it is kind of odd for a high level library like >>>> PETSc to not trust the communication channel but MPI implementations have >>>> proven themselves to not be trustworthy and adding this to PetscSF is not >>>> intrusive to the PETSc API or user. Plus it gives a definitive yes or no as >>>> to the problem being from an error in the communication. >>>> >>>> Barry >>>> >>>> On Sep 24, 2020, at 12:35 PM, Matthew Knepley >>>> wrote: >>>> >>>> On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson wrote: >>>> >>>>> Hi Guys, >>>>> >>>>> Thanks for all of the prompt responses, very helpful and appreciated. >>>>> >>>>> By "when debugging", did you mean when configure >>>>> petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a >>>>> debugger? >>>>> - Both, I have run with a debugger attached and detached, all compiled >>>>> with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >>>>> >>>>> 1) Try OpenMPI (probably won't help, but worth trying) >>>>> - Worth a try for sure >>>>> >>>>> 2) Find which part of the simulation makes it non-deterministic. Is it >>>>> the mesh partitioning (parmetis)? Then try to make it deterministic. >>>>> - Good tip, it is the mesh partitioning and along the lines of a >>>>> question from Barry, the matrix size is changing. I will make this >>>>> deterministic and give it a try >>>>> >>>>> 3) Dump matrices, vectors, etc and see when it fails, you can quickly >>>>> reproduce the error by reading in the intermediate data. >>>>> - Also a great suggestion, will give it a try >>>>> >>>>> The full stack would be really useful here. I am guessing this happens >>>>> on MatMult(), but I do not know. >>>>> - Agreed, I am currently running it so that the full stack will be >>>>> produced, but waiting for it to fail, had compiled with all available >>>>> optimizations on, but downside is of course if there is a failure. >>>>> As a general question, roughly what's the performance impact on petsc >>>>> with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging >>>>> = 1? >>>>> Obviously problem/machine dependant, wondering on guidance more for >>>>> this than anything >>>>> >>>>> Is the nonzero structure of your matrices changing or is it fixed for >>>>> the entire simulation? >>>>> The non-zero structure is changing, although the structures are >>>>> reformed when this happens and this happens thousands of time before the >>>>> failure has occured. >>>>> >>>> >>>> Okay, this is the most likely spot for a bug. How are you changing the >>>> matrix? It should be impossible to put in an invalid column index when >>>> using MatSetValues() >>>> because we check all the inputs. However, I do not think we check when >>>> you just yank out the arrays. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Does this particular run always crash at the same place? Similar >>>>> place? Doesn't always crash? >>>>> Doesn't always crash, but other similar runs have crashed in different >>>>> spots, which makes it difficult to resolve. I am going to try out a few of >>>>> the strategies suggested above and will let you know what comes of that. >>>>> >>>>> *Chris Hewson* >>>>> Senior Reservoir Simulation Engineer >>>>> ResFrac >>>>> +1.587.575.9792 >>>>> >>>>> >>>>> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith wrote: >>>>> >>>>>> Chris, >>>>>> >>>>>> We realize how frustrating this type of problem is to deal with. >>>>>> >>>>>> Here is the code: >>>>>> >>>>>> ierr = >>>>>> PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >>>>>> for (i=0; iB->rmap->n; i++) { >>>>>> for (j=0; jilen[i]; j++) { >>>>>> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >>>>>> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >>>>>> if (!data) { >>>>>> /* one based table */ >>>>>> ierr = >>>>>> PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> It is simply looping over the rows of the sparse matrix putting >>>>>> the columns it finds into the hash table. >>>>>> >>>>>> aj[B->i[i] + j] are the column entries, the number of columns in >>>>>> the matrix is mat->cmap->N so the column entries should always be >>>>>> less than the number of columns. The code is crashing when column >>>>>> entry 24443 which is larger than the number of columns 23988. >>>>>> >>>>>> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N >>>>>> is incorrect. >>>>>> >>>>>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>> >>>>>>>>>>> >>>>>> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >>>>>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >>>>>> } >>>>>> >>>>>> Seems to indicate it is setting up a new multiple because it is >>>>>> either the first time into the algorithm or the nonzero structure changed >>>>>> on some rank requiring a new assembly process. >>>>>> >>>>>> Is the nonzero structure of your matrices changing or is it fixed >>>>>> for the entire simulation? >>>>>> >>>>>> Since the code has been running for a very long time already I have >>>>>> to conclude that this is not the first time through and so something has >>>>>> changed in the matrix? >>>>>> >>>>>> I think we have to put more diagnostics into the library to provide >>>>>> more information before or at the time of the error detection. >>>>>> >>>>>> Does this particular run always crash at the same place? Similar >>>>>> place? Doesn't always crash? >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: >>>>>> >>>>>> After about a month of not having this issue pop up, it has come up >>>>>> again >>>>>> >>>>>> We have been struggling with a similar PETSc Error for awhile now, >>>>>> the error message is as follows: >>>>>> >>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>> /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than >>>>>> largest key allowed 23988 >>>>>> >>>>>> It is a particularly nasty bug as it doesn't reproduce itself when >>>>>> debugging and doesn't happen all the time with the same inputs either. The >>>>>> problem occurs after a long runtime of the code (12-40 hours) and we are >>>>>> using a ksp solve with KSPBCGS. >>>>>> >>>>>> The PETSc compilation options that are used are: >>>>>> >>>>>> --download-metis >>>>>> --download-mpich >>>>>> --download-mumps >>>>>> --download-parmetis >>>>>> --download-ptscotch >>>>>> --download-scalapack >>>>>> --download-suitesparse >>>>>> --prefix=/opt/anl/petsc-3.13.3 >>>>>> --with-debugging=0 >>>>>> --with-mpi=1 >>>>>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>>> >>>>>> This is being run across 8 processes and is failing consistently on >>>>>> the rank 7 process. We also use openmp outside of PETSC and the linear >>>>>> solve portion of the code. The rank 0 process is always using compute, >>>>>> during this the slave processes use an MPI_Wait call to wait for the >>>>>> collective parts of the code to be called again. All PETSC calls are done >>>>>> across all of the processes. >>>>>> >>>>>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 >>>>>> package. >>>>>> >>>>>> At every PETSC call we are checking the error return from the >>>>>> function collectively to ensure that no errors have been returned from >>>>>> PETSC. >>>>>> >>>>>> Some possible causes that I can think of are as follows: >>>>>> 1. Memory leak causing a corruption either in our program or in petsc >>>>>> or with one of the petsc objects. This seems unlikely as we have checked >>>>>> runs with the option -malloc_dump for PETSc and using valgrind. >>>>>> >>>>>> 2. Optimization flags set for petsc compilation are causing variables >>>>>> that go out of scope to be optimized out. >>>>>> >>>>>> 3. We are giving the wrong number of elements for a process or the >>>>>> value is changing during the simulation. This seems unlikely as there is >>>>>> nothing overly unique about these simulations and it's not reproducing >>>>>> itself. >>>>>> >>>>>> 4. An MPI channel or socket error causing an error in the collective >>>>>> values for PETSc. >>>>>> >>>>>> Any input on this issue would be greatly appreciated. >>>>>> >>>>>> *Chris Hewson* >>>>>> Senior Reservoir Simulation Engineer >>>>>> ResFrac >>>>>> +1.587.575.9792 >>>>>> >>>>>> >>>>>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang < >>>>>> junchao.zhang at gmail.com> wrote: >>>>>> >>>>>>> That is a great idea. I'll figure it out. >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Junchao, >>>>>>>> >>>>>>>> Any way in the PETSc configure to warn that MPICH version is >>>>>>>> "bad" or "untrustworthy" or even the vague "we have suspicians about this >>>>>>>> version and recommend avoiding it"? A lot of time could be saved if others >>>>>>>> don't deal with the same problem. >>>>>>>> >>>>>>>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc >>>>>>>> and always check against that list and print a boxed warning at configure >>>>>>>> time? Better you could somehow generalize it and put it in package.py for >>>>>>>> use by all packages, then any package can included lists of "suspect" >>>>>>>> versions. (There are definitely HDF5 versions that should be avoided :-)). >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang < >>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>> >>>>>>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Just as an update to this, I can confirm that using the mpich >>>>>>>>> version (3.3.2) downloaded with the petsc download solved this issue on my >>>>>>>>> end. >>>>>>>>> >>>>>>>>> *Chris Hewson* >>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>> ResFrac >>>>>>>>> +1.587.575.9792 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Is there a comprehensive MPI test suite (perhaps from >>>>>>>>>>> MPICH)? Is there any way to run this full test suite under the problematic >>>>>>>>>>> MPI and see if it detects any problems? >>>>>>>>>>> >>>>>>>>>>> Is so, could someone add it to the FAQ in the debugging >>>>>>>>>>> section? >>>>>>>>>>> >>>>>>>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>>>>>>> downloaded mpich >>>>>>>>>> . >>>>>>>>>> It annoyed me since it is not user-friendly. It might be helpful in >>>>>>>>>> catching bugs at very small scale. But say if I want to test allreduce on >>>>>>>>>> 1024 ranks on 100 doubles, I have to hack the test suite. >>>>>>>>>> Anyway, the instructions are here. >>>>>>>>>> >>>>>>>>>> For the purpose of petsc, under test/mpi one can configure it with >>>>>>>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>>>>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>>>>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>>>>>>> cxx but I had to set CXX! >>>>>>>>>> $make -k -j8 // -k is to keep going and ignore compilation >>>>>>>>>> errors, e.g., when building tests for MPICH extensions not in MPI standard, >>>>>>>>>> but your MPI is OpenMPI. >>>>>>>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those >>>>>>>>>> are sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>>>>>>> $ make testings or directly './runtests -tests=testlist' >>>>>>>>>> >>>>>>>>>> On a batch system, >>>>>>>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch >>>>>>>>>> dir, say btest, >>>>>>>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // >>>>>>>>>> Use 1024 ranks if a test does no specify the number of processes. >>>>>>>>>> $ // It copies test binaries to the batch dir and generates a >>>>>>>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>>>>>>> then submit a job and wait for its finish. >>>>>>>>>> $ cd btest && ../checktests --ignorebogus >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> PS: Fande, changing an MPI fixed your problem does not >>>>>>>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>>>>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>>>>>>> not matter if the code is big. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Trace could look like this: >>>>>>>>>>> >>>>>>>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>> [640]PETSC ERROR: Argument out of range >>>>>>>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key >>>>>>>>>>> allowed 740521 >>>>>>>>>>> [640]PETSC ERROR: See >>>>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for >>>>>>>>>>> trouble shooting. >>>>>>>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>>>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named >>>>>>>>>>> r6i5n18 by wangy2 Sun Jul 19 17:14:28 2020 >>>>>>>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>>>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>>>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>>>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>>>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>>>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>>>>>>> --download-mumps=0 >>>>>>>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>>>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>>>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() >>>>>>>>>>> line 901 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>>>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() >>>>>>>>>>> line 3180 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>>>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>>>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>>>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>>>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>>>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>>>>>>> >>>>>>>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I am not entirely sure what is happening, but we encountered >>>>>>>>>>>> similar issues recently. It was not reproducible. It might occur at >>>>>>>>>>>> different stages, and errors could be weird other than "ctable stuff." Our >>>>>>>>>>>> code was Valgrind clean since every PR in moose needs to go through >>>>>>>>>>>> rigorous Valgrind checks before it reaches the devel branch. The errors >>>>>>>>>>>> happened when we used mvapich. >>>>>>>>>>>> >>>>>>>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then >>>>>>>>>>>> everything was smooth. May you try a different MPI? It is better to try a >>>>>>>>>>>> system carried one. >>>>>>>>>>>> >>>>>>>>>>>> We did not get the bottom of this problem yet, but we at least >>>>>>>>>>>> know this is kind of MPI-related. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Fande, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I am having a bug that is occurring in PETSC with the return >>>>>>>>>>>>> string: >>>>>>>>>>>>> >>>>>>>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>>>>>>> largest key allowed 5693 >>>>>>>>>>>>> >>>>>>>>>>>>> This is using petsc-3.13.2, compiled and running using mpich >>>>>>>>>>>>> with -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>>>>>>> same set of instructions etc.). >>>>>>>>>>>>> >>>>>>>>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>>>>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>>>>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>>>>>>> time and is happening somewhat rarely. >>>>>>>>>>>>> >>>>>>>>>>>>> More than likely I am using a static variable (code is written >>>>>>>>>>>>> in c++) that I'm not updating when the matrix size is changing or something >>>>>>>>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>>>>>>>> >>>>>>>>>>>>> *Chris Hewson* >>>>>>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>>>>>> ResFrac >>>>>>>>>>>>> +1.587.575.9792 >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Sep 26 12:07:26 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 26 Sep 2020 13:07:26 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> <7DBD79E0-B593-4364-84E2-6204BECC168E@petsc.dev> Message-ID: On Sat, Sep 26, 2020 at 11:17 AM Mark McClure wrote: > Thank you, all for the explanations. > > Following Matt's suggestion, we'll use -g (and not use -with-debugging=0) > all future compiles to all users, so in future, we can provide better > information. > > Second, Chris is going to boil our function down to minimum stub and share > in case there is some subtle issue with the way the functions are being > called. > > Third, I have question/request - Petsc is, in fact, detecting an error. As > far as I can tell, this is not an uncontrolled 'seg fault'. It seems to me > that maybe Petsc could choose to return out from the function when it > detects this error, returning an error code, rather than dumping the core > and terminating the program. If Petsc simply returned out with an error > message, this would resolve the problem for us. After the Petsc call, we > check for Petsc error messages. If Petsc returns an error - that's fine - > we use a direct solver as a backup, and the simulation continues. So - I am > not sure whether this is feasible - but if Petsc could return out with an > error message - rather than dumping the core and terminating the program - > then that would effectively resolve the issue for us. Would this change be > possible? > At some level, I think it is currently doing what you want. CHKERRQ() simply returns an error code from that function call, printing an error message. Suppressing the message is harder I think, but for now, if you know what function call is causing the error, you can just catch the (ierr != 0) yourself instead of using CHKERRQ. The drawback here is that we might not have cleaned up all the state so that restarting makes sense. It should be possible to just kill the solve, reset the solver, and retry, although it is not clear to me at first glance if MPI will be in an okay state. Thanks, Matt > Mark > > On Fri, Sep 25, 2020 at 2:08 PM Barry Smith wrote: > >> >> I May 2019 Lisandro changed the versions of Metis and ParMetis that >> PETSc uses to use a portable machine independent random number generator so >> if you are having PETSc install Metis then its random number generator >> should generate the exact same random numbers on repeated identical runs on >> the same machine or different machines. If this does not appear to be the >> case please let us know. PETSc doesn't use random numbers much but when it >> does they are all portable and machine independent and produce identical >> results for identical runs. >> >> Due to the non-commutativity of floating point arithmetic with >> parallelism (numbers arrive in different orders on identical runs) the >> numerical results will differ and hence decisions (iteration convergence >> etc) will differ so the "results" can vary a great deal in different >> identical runs. >> >> We did have a user getting "random" errors only after very long runs >> that were due to "hardware errors" perhaps due to overheating so it is also >> possible your problems are not due directly to memory corruption, PETSc or >> MPI or even OS software. This may be too complicated for your work flow but >> perhaps if you restricted your jobs to shorter times with a restart >> mechanism these problems would not appear. >> >> Barry >> >> Truly random "hardware" errors often produce just wildly crazy numbers, >> in your report the numbers are off but not absurdly large which points to >> possible software errors. >> >> >> >> >> On Sep 25, 2020, at 12:02 PM, Mark McClure wrote: >> >> Hello, >> >> I work with Chris, and have a few comments, hopefully helpful. Thank you >> all, for your help. >> >> Our program is unfortunately behaving a little bit nondeterministically. >> I am not sure why because for the OpenMP parts, I test it for race >> conditions using Intel Inspector and see none. We are using Metis, >> not Parmetis. Do Petsc or Metis have any behavior that is nondeterministic? >> We will continue to investigate the cause of the nondeterminism. But that >> is a challenge for us reproducing this problem because we can run the same >> simulation 10 times, and we do not get precisely the same result each time. >> In fact, for this bug, Chris did run it 10 times and did not reproduce. >> >> Every month, 1000s of hours of this simulator are being run on the >> Azure cluster. This crash has been occurring for months, but at infrequent >> intervals, and have never been able to reproduce it. As such, we can't >> generate an error dump, unless we gave all users a Petsc build with no >> optimization and waited weeks until a problem cropped up. >> >> Here's what we'll do - we'll internally make a build with debug and then >> run a large number of simulations until the problem reproduces and we get >> the error dump. That will take us some time, but we'll do it. >> >> Here is a bit more background that might be helpful. At first, we used >> OpenMPI 2.1.1-8 with Petsc 3.13.2. With that combo, we observed a memory >> leak, and simulations failed when the node ran out of RAM. Then we switched >> to MPICH3.3a2 and Petsc 3.13.3. The memory leak went away. That's when we >> started seeing this bug "greater than largest key allowed". It was >> unreproducible, but happening relatively more often than it is now (I >> think) - I was getting a couple user reports a week. Then, we updated to >> MPICH 3.3.2 and the same Petsc version (3.13.3). The problem seems to be >> less common - hadn't happened for the past month. But then it happened four >> times this week. >> >> Other background to note - our linear system very frequently changes >> size/shape. There will be 10,000s of Petsc solves with different matrices >> (different positions of nonzero values) over the course of a simulation. As >> far as we can tell, the crash only occurs only after the simulator has run >> for a long time, 10+ hours. Having said that, it does not appear to be a >> memory leak - at least, the node has plenty of remaining RAM when these >> crashes are occurring. >> >> Mark >> >> >> On Thu, Sep 24, 2020 at 4:41 PM Mark Adams wrote: >> >>> You might add code here like: >>> >>> if (ierr) { >>> for (; iB->rmap->n; i++) { >>> for ( jilen[i]; j++) { >>> PetscInt data,gid1 = aj[B->i[i] + j] + 1; // don't use ierr >>> print rank, gid1; >>> } >>> CHKERRQ(ierr); >>> >>> I am guessing that somehow you have a table that is bad and too small. >>> It failed on an index not much bigger than the largest key allowed. Maybe >>> just compute the max and see if it goes much larger than the largest key >>> allowed. >>> >>> If your mesh just changed to you know if it got bigger or smaller... >>> >>> Anyway just some thoughts, >>> Mark >>> >>> >>> >>> On Thu, Sep 24, 2020 at 4:18 PM Barry Smith wrote: >>> >>>> >>>> >>>> On Sep 24, 2020, at 2:47 PM, Matthew Knepley wrote: >>>> >>>> On Thu, Sep 24, 2020 at 3:42 PM Barry Smith wrote: >>>> >>>>> >>>>> The stack is listed below. It crashes inside MatPtAP(). >>>>> >>>> >>>> What about just checking that the column indices that PtAP receives are >>>> valid? Are we not doing that? >>>> >>>> >>>> The code that checks for column too large in MatSetValues_MPIXXAIJ() >>>> is turned off for optimized builds, I am making a MR to always have it on. >>>> But I doubt this is the problem, other, more harsh crashes, are likely if >>>> the column index is too large. >>>> >>>> This is difficult to debug because all we get is a stack trace. It >>>> would be good if we produce some information about the current state of the >>>> objects when the error is detected. We should think about what light-weight >>>> stuff we could report when errors are detected. >>>> >>>> >>>> Barry >>>> >>>> >>>> Matt >>>> >>>> >>>>> It is possible there is some subtle bug in the rather complex PETSc >>>>> code for MatPtAP() but I am included to blame MPI again. >>>>> >>>>> I think we should add some simple low-overhead always on >>>>> communication error detecting code to PetscSF where some check sums are >>>>> also communicated at the highest level of PetscSF(). >>>>> >>>>> I don't know how but perhaps when the data is packed per >>>>> destination rank a checksum is computed and when unpacked the checksum is >>>>> compared using extra space at the end of the communicated packed array to >>>>> store and send the checksum. Yes, it is kind of odd for a high level >>>>> library like PETSc to not trust the communication channel but MPI >>>>> implementations have proven themselves to not be trustworthy and adding >>>>> this to PetscSF is not intrusive to the PETSc API or user. Plus it gives a >>>>> definitive yes or no as to the problem being from an error in the >>>>> communication. >>>>> >>>>> Barry >>>>> >>>>> On Sep 24, 2020, at 12:35 PM, Matthew Knepley >>>>> wrote: >>>>> >>>>> On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson >>>>> wrote: >>>>> >>>>>> Hi Guys, >>>>>> >>>>>> Thanks for all of the prompt responses, very helpful and appreciated. >>>>>> >>>>>> By "when debugging", did you mean when configure >>>>>> petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a >>>>>> debugger? >>>>>> - Both, I have run with a debugger attached and detached, all >>>>>> compiled with the following flags "--with-debugging=1 COPTFLAGS=-O0 -g" >>>>>> >>>>>> 1) Try OpenMPI (probably won't help, but worth trying) >>>>>> - Worth a try for sure >>>>>> >>>>>> 2) Find which part of the simulation makes it non-deterministic. Is >>>>>> it the mesh partitioning (parmetis)? Then try to make it deterministic. >>>>>> - Good tip, it is the mesh partitioning and along the lines of a >>>>>> question from Barry, the matrix size is changing. I will make this >>>>>> deterministic and give it a try >>>>>> >>>>>> 3) Dump matrices, vectors, etc and see when it fails, you can quickly >>>>>> reproduce the error by reading in the intermediate data. >>>>>> - Also a great suggestion, will give it a try >>>>>> >>>>>> The full stack would be really useful here. I am guessing this >>>>>> happens on MatMult(), but I do not know. >>>>>> - Agreed, I am currently running it so that the full stack will be >>>>>> produced, but waiting for it to fail, had compiled with all available >>>>>> optimizations on, but downside is of course if there is a failure. >>>>>> As a general question, roughly what's the performance impact on petsc >>>>>> with -o1/-o2/-o0 as opposed to -o3? Performance impact of --with-debugging >>>>>> = 1? >>>>>> Obviously problem/machine dependant, wondering on guidance more for >>>>>> this than anything >>>>>> >>>>>> Is the nonzero structure of your matrices changing or is it fixed for >>>>>> the entire simulation? >>>>>> The non-zero structure is changing, although the structures are >>>>>> reformed when this happens and this happens thousands of time before the >>>>>> failure has occured. >>>>>> >>>>> >>>>> Okay, this is the most likely spot for a bug. How are you changing the >>>>> matrix? It should be impossible to put in an invalid column index when >>>>> using MatSetValues() >>>>> because we check all the inputs. However, I do not think we check when >>>>> you just yank out the arrays. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Does this particular run always crash at the same place? Similar >>>>>> place? Doesn't always crash? >>>>>> Doesn't always crash, but other similar runs have crashed in >>>>>> different spots, which makes it difficult to resolve. I am going to try out >>>>>> a few of the strategies suggested above and will let you know what comes of >>>>>> that. >>>>>> >>>>>> *Chris Hewson* >>>>>> Senior Reservoir Simulation Engineer >>>>>> ResFrac >>>>>> +1.587.575.9792 >>>>>> >>>>>> >>>>>> On Thu, Sep 24, 2020 at 11:05 AM Barry Smith >>>>>> wrote: >>>>>> >>>>>>> Chris, >>>>>>> >>>>>>> We realize how frustrating this type of problem is to deal with. >>>>>>> >>>>>>> Here is the code: >>>>>>> >>>>>>> ierr = >>>>>>> PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); >>>>>>> for (i=0; iB->rmap->n; i++) { >>>>>>> for (j=0; jilen[i]; j++) { >>>>>>> PetscInt data,gid1 = aj[B->i[i] + j] + 1; >>>>>>> ierr = PetscTableFind(gid1_lid1,gid1,&data);CHKERRQ(ierr); >>>>>>> if (!data) { >>>>>>> /* one based table */ >>>>>>> ierr = >>>>>>> PetscTableAdd(gid1_lid1,gid1,++ec,INSERT_VALUES);CHKERRQ(ierr); >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> It is simply looping over the rows of the sparse matrix putting >>>>>>> the columns it finds into the hash table. >>>>>>> >>>>>>> aj[B->i[i] + j] are the column entries, the number of columns in >>>>>>> the matrix is mat->cmap->N so the column entries should always be >>>>>>> less than the number of columns. The code is crashing when column >>>>>>> entry 24443 which is larger than the number of columns 23988. >>>>>>> >>>>>>> So either the aj[B->i[i] + j] + 1 are incorrect or the mat->cmap->N >>>>>>> is incorrect. >>>>>>> >>>>>>> 640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >>>>>>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >>>>>>> } >>>>>>> >>>>>>> Seems to indicate it is setting up a new multiple because it is >>>>>>> either the first time into the algorithm or the nonzero structure changed >>>>>>> on some rank requiring a new assembly process. >>>>>>> >>>>>>> Is the nonzero structure of your matrices changing or is it >>>>>>> fixed for the entire simulation? >>>>>>> >>>>>>> Since the code has been running for a very long time already I have >>>>>>> to conclude that this is not the first time through and so something has >>>>>>> changed in the matrix? >>>>>>> >>>>>>> I think we have to put more diagnostics into the library to provide >>>>>>> more information before or at the time of the error detection. >>>>>>> >>>>>>> Does this particular run always crash at the same place? Similar >>>>>>> place? Doesn't always crash? >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sep 24, 2020, at 8:46 AM, Chris Hewson wrote: >>>>>>> >>>>>>> After about a month of not having this issue pop up, it has come up >>>>>>> again >>>>>>> >>>>>>> We have been struggling with a similar PETSc Error for awhile now, >>>>>>> the error message is as follows: >>>>>>> >>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>> /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than >>>>>>> largest key allowed 23988 >>>>>>> >>>>>>> It is a particularly nasty bug as it doesn't reproduce itself when >>>>>>> debugging and doesn't happen all the time with the same inputs either. The >>>>>>> problem occurs after a long runtime of the code (12-40 hours) and we are >>>>>>> using a ksp solve with KSPBCGS. >>>>>>> >>>>>>> The PETSc compilation options that are used are: >>>>>>> >>>>>>> --download-metis >>>>>>> --download-mpich >>>>>>> --download-mumps >>>>>>> --download-parmetis >>>>>>> --download-ptscotch >>>>>>> --download-scalapack >>>>>>> --download-suitesparse >>>>>>> --prefix=/opt/anl/petsc-3.13.3 >>>>>>> --with-debugging=0 >>>>>>> --with-mpi=1 >>>>>>> COPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>>>> CXXOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>>>> FOPTFLAGS=-O3 -march=haswell -mtune=haswell >>>>>>> >>>>>>> This is being run across 8 processes and is failing consistently on >>>>>>> the rank 7 process. We also use openmp outside of PETSC and the linear >>>>>>> solve portion of the code. The rank 0 process is always using compute, >>>>>>> during this the slave processes use an MPI_Wait call to wait for the >>>>>>> collective parts of the code to be called again. All PETSC calls are done >>>>>>> across all of the processes. >>>>>>> >>>>>>> We are using mpich version 3.3.2, downloaded with the petsc 3.13.3 >>>>>>> package. >>>>>>> >>>>>>> At every PETSC call we are checking the error return from the >>>>>>> function collectively to ensure that no errors have been returned from >>>>>>> PETSC. >>>>>>> >>>>>>> Some possible causes that I can think of are as follows: >>>>>>> 1. Memory leak causing a corruption either in our program or in >>>>>>> petsc or with one of the petsc objects. This seems unlikely as we have >>>>>>> checked runs with the option -malloc_dump for PETSc and using valgrind. >>>>>>> >>>>>>> 2. Optimization flags set for petsc compilation are causing >>>>>>> variables that go out of scope to be optimized out. >>>>>>> >>>>>>> 3. We are giving the wrong number of elements for a process or the >>>>>>> value is changing during the simulation. This seems unlikely as there is >>>>>>> nothing overly unique about these simulations and it's not reproducing >>>>>>> itself. >>>>>>> >>>>>>> 4. An MPI channel or socket error causing an error in the collective >>>>>>> values for PETSc. >>>>>>> >>>>>>> Any input on this issue would be greatly appreciated. >>>>>>> >>>>>>> *Chris Hewson* >>>>>>> Senior Reservoir Simulation Engineer >>>>>>> ResFrac >>>>>>> +1.587.575.9792 >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 13, 2020 at 4:21 PM Junchao Zhang < >>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>> >>>>>>>> That is a great idea. I'll figure it out. >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 13, 2020 at 4:31 PM Barry Smith >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Junchao, >>>>>>>>> >>>>>>>>> Any way in the PETSc configure to warn that MPICH version is >>>>>>>>> "bad" or "untrustworthy" or even the vague "we have suspicians about this >>>>>>>>> version and recommend avoiding it"? A lot of time could be saved if others >>>>>>>>> don't deal with the same problem. >>>>>>>>> >>>>>>>>> Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc >>>>>>>>> and always check against that list and print a boxed warning at configure >>>>>>>>> time? Better you could somehow generalize it and put it in package.py for >>>>>>>>> use by all packages, then any package can included lists of "suspect" >>>>>>>>> versions. (There are definitely HDF5 versions that should be avoided :-)). >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> On Aug 13, 2020, at 12:14 PM, Junchao Zhang < >>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>> >>>>>>>>> Thanks for the update. Let's assume it is a bug in MPI :) >>>>>>>>> --Junchao Zhang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Just as an update to this, I can confirm that using the mpich >>>>>>>>>> version (3.3.2) downloaded with the petsc download solved this issue on my >>>>>>>>>> end. >>>>>>>>>> >>>>>>>>>> *Chris Hewson* >>>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>>> ResFrac >>>>>>>>>> +1.587.575.9792 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang < >>>>>>>>>> junchao.zhang at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is there a comprehensive MPI test suite (perhaps from >>>>>>>>>>>> MPICH)? Is there any way to run this full test suite under the problematic >>>>>>>>>>>> MPI and see if it detects any problems? >>>>>>>>>>>> >>>>>>>>>>>> Is so, could someone add it to the FAQ in the debugging >>>>>>>>>>>> section? >>>>>>>>>>>> >>>>>>>>>>> MPICH does have a test suite. It is at the subdir test/mpi of >>>>>>>>>>> downloaded mpich >>>>>>>>>>> . >>>>>>>>>>> It annoyed me since it is not user-friendly. It might be helpful in >>>>>>>>>>> catching bugs at very small scale. But say if I want to test allreduce on >>>>>>>>>>> 1024 ranks on 100 doubles, I have to hack the test suite. >>>>>>>>>>> Anyway, the instructions are here. >>>>>>>>>>> >>>>>>>>>>> For the purpose of petsc, under test/mpi one can configure it >>>>>>>>>>> with >>>>>>>>>>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>>>>>>>>>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>>>>>>>>>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>>>>>>>>>> cxx but I had to set CXX! >>>>>>>>>>> $make -k -j8 // -k is to keep going and ignore compilation >>>>>>>>>>> errors, e.g., when building tests for MPICH extensions not in MPI standard, >>>>>>>>>>> but your MPI is OpenMPI. >>>>>>>>>>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those >>>>>>>>>>> are sub-dirs containing tests for MPI routines Petsc does not rely on. >>>>>>>>>>> $ make testings or directly './runtests -tests=testlist' >>>>>>>>>>> >>>>>>>>>>> On a batch system, >>>>>>>>>>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch >>>>>>>>>>> dir, say btest, >>>>>>>>>>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // >>>>>>>>>>> Use 1024 ranks if a test does no specify the number of processes. >>>>>>>>>>> $ // It copies test binaries to the batch dir and generates a >>>>>>>>>>> script runtests.batch there. Edit the script to fit your batch system and >>>>>>>>>>> then submit a job and wait for its finish. >>>>>>>>>>> $ cd btest && ../checktests --ignorebogus >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> PS: Fande, changing an MPI fixed your problem does not >>>>>>>>>>> necessarily mean the old MPI has bugs. It is complicated. It could be a >>>>>>>>>>> petsc bug. You need to provide us a code to reproduce your error. It does >>>>>>>>>>> not matter if the code is big. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Jul 20, 2020, at 12:16 AM, Fande Kong >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Trace could look like this: >>>>>>>>>>>> >>>>>>>>>>>> [640]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>> [640]PETSC ERROR: Argument out of range >>>>>>>>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key >>>>>>>>>>>> allowed 740521 >>>>>>>>>>>> [640]PETSC ERROR: See >>>>>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for >>>>>>>>>>>> trouble shooting. >>>>>>>>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>>>>>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named >>>>>>>>>>>> r6i5n18 by wangy2 Sun Jul 19 17:14:28 2020 >>>>>>>>>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>>>>>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>>>>>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>>>>>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>>>>>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>>>>>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>>>>>>>>> --download-mumps=0 >>>>>>>>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>>>>>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>>>>>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() >>>>>>>>>>>> line 901 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>>>>>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() >>>>>>>>>>>> line 3180 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>>>>>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>>>>>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>>>>>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>>>>>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>>>>>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>>>>>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>>>>>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>>>>>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong < >>>>>>>>>>>> fdkong.jd at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I am not entirely sure what is happening, but we encountered >>>>>>>>>>>>> similar issues recently. It was not reproducible. It might occur at >>>>>>>>>>>>> different stages, and errors could be weird other than "ctable stuff." Our >>>>>>>>>>>>> code was Valgrind clean since every PR in moose needs to go through >>>>>>>>>>>>> rigorous Valgrind checks before it reaches the devel branch. The errors >>>>>>>>>>>>> happened when we used mvapich. >>>>>>>>>>>>> >>>>>>>>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then >>>>>>>>>>>>> everything was smooth. May you try a different MPI? It is better to try a >>>>>>>>>>>>> system carried one. >>>>>>>>>>>>> >>>>>>>>>>>>> We did not get the bottom of this problem yet, but we at least >>>>>>>>>>>>> know this is kind of MPI-related. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Fande, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson < >>>>>>>>>>>>> chris at resfrac.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am having a bug that is occurring in PETSC with the return >>>>>>>>>>>>>> string: >>>>>>>>>>>>>> >>>>>>>>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>>>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>>>>>>>>>> largest key allowed 5693 >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is using petsc-3.13.2, compiled and running using mpich >>>>>>>>>>>>>> with -O3 and debugging turned off tuned to the haswell architecture and >>>>>>>>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>>>>>>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>>>>>>>>>> same set of instructions etc.). >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is a terrible way to ask a question, I know, and not >>>>>>>>>>>>>> very helpful from your side, but this is what I have from a user's run and >>>>>>>>>>>>>> can't reproduce on my end (either with the optimization compilation or with >>>>>>>>>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>>>>>>>>> time and is happening somewhat rarely. >>>>>>>>>>>>>> >>>>>>>>>>>>>> More than likely I am using a static variable (code is >>>>>>>>>>>>>> written in c++) that I'm not updating when the matrix size is changing or >>>>>>>>>>>>>> something silly like that, but any help or guidance on this would be >>>>>>>>>>>>>> appreciated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Chris Hewson* >>>>>>>>>>>>>> Senior Reservoir Simulation Engineer >>>>>>>>>>>>>> ResFrac >>>>>>>>>>>>>> +1.587.575.9792 >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gautham3 at illinois.edu Sat Sep 26 15:13:19 2020 From: gautham3 at illinois.edu (Krishnan, Gautham) Date: Sat, 26 Sep 2020 20:13:19 +0000 Subject: [petsc-users] Regarding help with MatGetSubMatrix parallel use In-Reply-To: References: , , Message-ID: In parallel, I have passed the global indices that I want on the PE. However, the result I obtain seems to change based on whether I divide the initial domain between PEs along the x axis or the y axis. This is what I am trying to resolve- for example: the FORTRAN code I use is basically: PetscInt :: idx(ngx*ngy*ngz) Mat :: D2_t, lapl, , lapl_t IS :: irow, irow1, icol1 call DMCreateMatrix(grid_pressure, lapl, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) call DMCreateMatrix(grid_pressure, lapl_t, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) . . . !(Assembling matrix lapl) call VecCreate(petsc_xyzcom, vec_b, ierr) call VecSetSizes(vec_b, PETSC_DECIDE, ngx*ngy*ngz-1, ierr) call VecSetFromOptions(vec_b, ierr) call VecSetUp(vec_b, ierr) call VecGetLocalSize(vec_b,vecsx,ierr) call VecGetOwnershipRange(vec_b,veclo,vechi,ierr) call ISCreateStride(petsc_xyzcom, vecsx,veclo,1,icol1,ierr) idx = (/ (j, j=veclo,vechi-1)/) call ISCreateGeneral(petsc_xyzcom, vecsx,idx, PETSC_COPY_VALUES,irow,ierr) call MatTranspose(lapl, MAT_INPLACE_MATRIX, lapl_t,ierr) !transpose the laplacian call MatGetSubMatrix(lapl_t, irow, icol1, MAT_INITIAL_MATRIX, D2_t,ierr) call MatView(lapl_t, PETSC_VIEWER_STDOUT_WORLD, ierr) call MatView(D2_t, PETSC_VIEWER_STDOUT_WORLD, ierr) Output: ngx=ngy=4; ngz=1; such that n=4*4*1=16 lapl_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) (15, -3.) row 15: (15, 3.) Case 1: nprocx =1; nprocy=2; ! number of processors in x and y among which to divide the domain Here, the (n-1 x n-1) submatrix is extracted correctly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) However, for Case 2: nprocx =2; nprocy=1; lapl_t is correctly assembled and transposed but the (n-1 x n-1) submatrix is extracted incorrectly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (3, -9.) row 2: (2, -3.) (3, -9.) row 3: (1, 3.) (2, 3.) (3, 36.) (5, -9.) (10, -9.) row 4: (4, -3.) (5, -9.) row 5: (3, -9.) (4, 3.) (5, 36.) (7, -3.) (12, -9.) row 6: (6, -3.) row 7: (5, -9.) (6, 3.) (7, 3.) row 8: (8, -3.) (9, -3.) (10, -9.) row 9: (9, 3.) row 10: (3, -9.) (8, 3.) (10, 36.) (11, -3.) (12, -9.) row 11: (10, -9.) (11, 3.) row 12: (5, -9.) (10, -9.) (12, 36.) (13, -3.) (14, -3.) row 13: (12, -9.) (13, 3.) row 14: (12, -9.) (14, 3.) I am unable to understand why the extracted submatrix is incorrect when nprocx>1 but works when nprocx=1 and nprocy>=1. P.S. the parallel IS in cases 1 and 2 are the same and are as follows: irow: [0] Number of indices in set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 icol1: [0] Index set is permutation [0] Number of indices in (stride) set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in (stride) set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 Could you please help me find out what is going wrong here? Regards, Gautham ________________________________ From: Krishnan, Gautham Sent: Saturday, September 26, 2020 2:01 PM To: Matthew Knepley Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Regarding help with MatGetSubMatrix parallel use In parallel I have passed the global indices that I want on the PE. However, the result I obtain seems to change based on whether I divide the initial domain between PEs along the x axis or the y axis. This is what I am trying to resolve- for example: the FORTRAN code I use is basically: PetscInt :: idx(ngx*ngy*ngz) Mat :: D2_t, lapl, , lapl_t IS :: irow, irow1, icol1 call DMCreateMatrix(grid_pressure, lapl, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) call DMCreateMatrix(grid_pressure, lapl_t, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) . . . !(Assembling matrix lapl) call VecCreate(petsc_xyzcom, vec_b, ierr) call VecSetSizes(vec_b, PETSC_DECIDE, ngx*ngy*ngz-1, ierr) call VecSetFromOptions(vec_b, ierr) call VecSetUp(vec_b, ierr) call VecGetLocalSize(vec_b,vecsx,ierr) call VecGetOwnershipRange(vec_b,veclo,vechi,ierr) call ISCreateStride(petsc_xyzcom, vecsx,veclo,1,icol1,ierr) idx = (/ (j, j=veclo,vechi-1)/) call ISCreateGeneral(petsc_xyzcom, vecsx,idx, PETSC_COPY_VALUES,irow,ierr) call MatTranspose(lapl, MAT_INPLACE_MATRIX, lapl_t,ierr) !transpose the laplacian call MatGetSubMatrix(lapl_t, irow, icol1, MAT_INITIAL_MATRIX, D2_t,ierr) call MatView(lapl_t, PETSC_VIEWER_STDOUT_WORLD, ierr) call MatView(D2_t, PETSC_VIEWER_STDOUT_WORLD, ierr) Output: ngx=ngy=4; ngz=1; such that n=4*4*1=16 lapl_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) (15, -3.) row 15: (15, 3.) Case 1: nprocx =1; nprocy=2; ! number of processors in x and y among which to divide the domain Here, the (n-1 x n-1) submatrix is extracted correctly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) However, for Case 2: nprocx =2; nprocy=1; lapl_t is correctly assembled and transposed but the (n-1 x n-1) submatrix is extracted incorrectly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (3, -9.) row 2: (2, -3.) (3, -9.) row 3: (1, 3.) (2, 3.) (3, 36.) (5, -9.) (10, -9.) row 4: (4, -3.) (5, -9.) row 5: (3, -9.) (4, 3.) (5, 36.) (7, -3.) (12, -9.) row 6: (6, -3.) row 7: (5, -9.) (6, 3.) (7, 3.) row 8: (8, -3.) (9, -3.) (10, -9.) row 9: (9, 3.) row 10: (3, -9.) (8, 3.) (10, 36.) (11, -3.) (12, -9.) row 11: (10, -9.) (11, 3.) row 12: (5, -9.) (10, -9.) (12, 36.) (13, -3.) (14, -3.) row 13: (12, -9.) (13, 3.) row 14: (12, -9.) (14, 3.) I am unable to understand why the extracted submatrix is incorrect when nprocx>1 but works when nprocx=1 and nprocy>=1. P.S. the parallel IS in cases 1 and 2 are the same and are as follows: irow: [0] Number of indices in set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 icol1: [0] Index set is permutation [0] Number of indices in (stride) set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in (stride) set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 Could you please help me find out what is going wrong here? Regards, Gautham ________________________________ From: Matthew Knepley Sent: Wednesday, September 23, 2020 3:55 PM To: Krishnan, Gautham Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Regarding help with MatGetSubMatrix parallel use On Wed, Sep 23, 2020 at 4:12 PM Krishnan, Gautham > wrote: Hello, For a CFD code being developed with FORTRAN and MPI, I am using PETSC matrices and for a particular operation, I require to extract a submatrix(n-1 x n-1) of a matrix created (n x n). However using the petsc MatGetSubMatrix works for serial runs but fails when the domain is split up over PEs- I suspect the indexing changed for parallel runs and hence the global indexing that worked for serial case just shuffles around matrix entries in parallel undesirably. I would like to ask whether anybody could offer some guidance regarding this. I would like to note that the 2D domain is split along both x and y axes for parallel runs on multiple PEs. In parallel, you pass MatGetSubmatrix() the global indices that you want on your process. Thanks, Matt Regards, Gautham Krishnan, -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat Sep 26 17:43:48 2020 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 26 Sep 2020 18:43:48 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> <7DBD79E0-B593-4364-84E2-6204BECC168E@petsc.dev> Message-ID: On Sat, Sep 26, 2020 at 1:07 PM Matthew Knepley wrote: > On Sat, Sep 26, 2020 at 11:17 AM Mark McClure wrote: > >> Thank you, all for the explanations. >> >> Following Matt's suggestion, we'll use -g (and not use -with-debugging=0) >> all future compiles to all users, so in future, we can provide better >> information. >> >> Second, Chris is going to boil our function down to minimum stub and >> share in case there is some subtle issue with the way the functions are >> being called. >> >> Third, I have question/request - Petsc is, in fact, detecting an error. >> As far as I can tell, this is not an uncontrolled 'seg fault'. It seems to >> me that maybe Petsc could choose to return out from the function when it >> detects this error, returning an error code, rather than dumping the core >> and terminating the program. If Petsc simply returned out with an error >> message, this would resolve the problem for us. After the Petsc call, we >> check for Petsc error messages. If Petsc returns an error - that's fine - >> we use a direct solver as a backup, and the simulation continues. So - I am >> not sure whether this is feasible - but if Petsc could return out with an >> error message - rather than dumping the core and terminating the program - >> then that would effectively resolve the issue for us. Would this change be >> possible? >> > > At some level, I think it is currently doing what you want. CHKERRQ() > simply returns an error code from that function call, printing an error > message. Suppressing the message is harder I think, > He does not need this. > but for now, if you know what function call is causing the error, you can > just catch the (ierr != 0) yourself instead of using CHKERRQ. > This is what I suggested earlier but maybe I was not clear enough. Your code calls something like ierr = SNESSolve(....); CHKERRQ(ierr); You can replace this with: ierr = SNESSolve(....); if (ierr) { .... } I suggested something earlier to do here. Maybe call KSPView. You could even destroy the solver and start the solver from scratch and see if that works. Mark > The drawback here is that we might not have cleaned up > all the state so that restarting makes sense. It should be possible to > just kill the solve, reset the solver, and retry, although it is not clear > to me at first glance if MPI will be in an okay state. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Sat Sep 26 17:58:28 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sat, 26 Sep 2020 17:58:28 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> <7DBD79E0-B593-4364-84E2-6204BECC168E@petsc.dev> Message-ID: On Sat, Sep 26, 2020 at 5:44 PM Mark Adams wrote: > > > On Sat, Sep 26, 2020 at 1:07 PM Matthew Knepley wrote: > >> On Sat, Sep 26, 2020 at 11:17 AM Mark McClure wrote: >> >>> Thank you, all for the explanations. >>> >>> Following Matt's suggestion, we'll use -g (and not use >>> -with-debugging=0) all future compiles to all users, so in future, we can >>> provide better information. >>> >>> Second, Chris is going to boil our function down to minimum stub and >>> share in case there is some subtle issue with the way the functions are >>> being called. >>> >>> Third, I have question/request - Petsc is, in fact, detecting an error. >>> As far as I can tell, this is not an uncontrolled 'seg fault'. It seems to >>> me that maybe Petsc could choose to return out from the function when it >>> detects this error, returning an error code, rather than dumping the core >>> and terminating the program. If Petsc simply returned out with an error >>> message, this would resolve the problem for us. After the Petsc call, we >>> check for Petsc error messages. If Petsc returns an error - that's fine - >>> we use a direct solver as a backup, and the simulation continues. So - I am >>> not sure whether this is feasible - but if Petsc could return out with an >>> error message - rather than dumping the core and terminating the program - >>> then that would effectively resolve the issue for us. Would this change be >>> possible? >>> >> >> At some level, I think it is currently doing what you want. CHKERRQ() >> simply returns an error code from that function call, printing an error >> message. Suppressing the message is harder I think, >> > > He does not need this. > > >> but for now, if you know what function call is causing the error, you can >> just catch the (ierr != 0) yourself instead of using CHKERRQ. >> > > This is what I suggested earlier but maybe I was not clear enough. > > Your code calls something like > > ierr = SNESSolve(....); CHKERRQ(ierr); > > You can replace this with: > > ierr = SNESSolve(....); > if (ierr) { > How to deal with CHKERRQ(ierr); inside SNESSolve()? > .... > } > > I suggested something earlier to do here. Maybe call KSPView. You could > even destroy the solver and start the solver from scratch and see if that > works. > > Mark > > >> The drawback here is that we might not have cleaned up >> all the state so that restarting makes sense. It should be possible to >> just kill the solve, reset the solver, and retry, although it is not clear >> to me at first glance if MPI will be in an okay state. >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Sep 26 20:51:25 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 26 Sep 2020 20:51:25 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> <7DBD79E0-B593-4364-84E2-6204BECC168E@petsc.dev> Message-ID: <0F608F55-D14A-4DC3-BAA6-34BA85C2A858@petsc.dev> > On Sep 26, 2020, at 5:58 PM, Junchao Zhang wrote: > > > > On Sat, Sep 26, 2020 at 5:44 PM Mark Adams > wrote: > > > On Sat, Sep 26, 2020 at 1:07 PM Matthew Knepley > wrote: > On Sat, Sep 26, 2020 at 11:17 AM Mark McClure > wrote: > Thank you, all for the explanations. > > Following Matt's suggestion, we'll use -g (and not use -with-debugging=0) all future compiles to all users, so in future, we can provide better information. > > Second, Chris is going to boil our function down to minimum stub and share in case there is some subtle issue with the way the functions are being called. > > Third, I have question/request - Petsc is, in fact, detecting an error. As far as I can tell, this is not an uncontrolled 'seg fault'. It seems to me that maybe Petsc could choose to return out from the function when it detects this error, returning an error code, rather than dumping the core and terminating the program. If Petsc simply returned out with an error message, this would resolve the problem for us. After the Petsc call, we check for Petsc error messages. If Petsc returns an error - that's fine - we use a direct solver as a backup, and the simulation continues. So - I am not sure whether this is feasible - but if Petsc could return out with an error message - rather than dumping the core and terminating the program - then that would effectively resolve the issue for us. Would this change be possible? > > At some level, I think it is currently doing what you want. CHKERRQ() simply returns an error code from that function call, printing an error message. Suppressing the message is harder I think, > > He does not need this. > > but for now, if you know what function call is causing the error, you can just catch the (ierr != 0) yourself instead of using CHKERRQ. > > This is what I suggested earlier but maybe I was not clear enough. > > Your code calls something like > > ierr = SNESSolve(....); CHKERRQ(ierr); > > You can replace this with: > > ierr = SNESSolve(....); > if (ierr) { > How to deal with CHKERRQ(ierr); inside SNESSolve()? PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); But the problem in this code's runs appear to be due to corrupt data, why and how it gets corrupted is not known. Continuing with an alternative solver because a solver failed for numerical or algorithmic reasons is generally fine but continuing when there is corrupted data is always iffy because one doesn't know how far the corruption has spread. SNESDestroy(&snes); SNESCreate(&snes); may likely clean out any potentially corrupted data but if the corruption got into the mesh data structures it will still be there. A very sophisticated library code would, when it detects this type of corruption, sweep through all the data structures looking for any indications of corruption, to help determine the cause and/or even fix the problem. We don't have this code in place, though we could add some, because generally we relay on valgrind or -malloc_debug to detect such corruption, the problem is valgrind and -malloc_debug don't fit well in a production environment. Handling corruption that comes up in production but not testing is difficult. Barry > .... > } > > I suggested something earlier to do here. Maybe call KSPView. You could even destroy the solver and start the solver from scratch and see if that works. > > Mark > > The drawback here is that we might not have cleaned up > all the state so that restarting makes sense. It should be possible to just kill the solve, reset the solver, and retry, although it is not clear to me at first glance if MPI will be in an okay state. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at resfrac.com Sat Sep 26 21:53:12 2020 From: mark at resfrac.com (Mark McClure) Date: Sat, 26 Sep 2020 22:53:12 -0400 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: <0F608F55-D14A-4DC3-BAA6-34BA85C2A858@petsc.dev> References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> <8952CCCF-14F2-4102-91B4-921A54689813@petsc.dev> <1A56D208-6BF6-49BF-B653-FD9D15BB3BDD@petsc.dev> <4E113614-C8C0-44F5-AD95-DC37A9D4B5F5@petsc.dev> <7DBD79E0-B593-4364-84E2-6204BECC168E@petsc.dev> <0F608F55-D14A-4DC3-BAA6-34BA85C2A858@petsc.dev> Message-ID: Ok, I think we've made some progress. We were already calling the function like this: ierr = PetscCall(); if (ierr != 0) {do something to handle error}. We actually are doing that on every single call made to Petsc, just to be careful. This is what was confusing to me. Why was the program terminating from within Petsc and not returning out with an error? We'd written our code so that if Petsc did return with an error, we'd discard the full timestep, destroy all the Petsc data structures and redo everything with a smaller dt. So if Petsc did hit this error very rarely, we might be able to recovery gracefully. It does not appear to be seg faulting. So it seemed that the program was being terminated intentionally from within Petsc, which was puzzling, and why I was asking about that in my previous email. So - Chris made a great find. Turns out that right after PetscInitialize in our main.cpp, we had the line: PetscPushErrorHandler(PetscAbortErrorHandler, NULL); Which was telling Petsc to call MPI_Abort if there was an error. I probably put that line into the code years ago and forgot it was there. So, as Barry said, if we change the PetscErrorHandler option to ignore, then at least we can avoid the program aborting on the error, and hopefully be able to recover with our existing code logic. Also, may have found a clue on the root cause of the error. I had thought that were were checking all of our inputs to Petsc for issues such as out of range index values. But I went back and see that due to a versioning mistake, there is one particular error check on our inputs that was being removed from production builds by a preprocessor definition. Which means that it wouldn't be caught in our production builds, which means that it is possible that bad inputs could have been passed into Petsc. I don't know for sure - but is plausible. The missing error check was doing the following: checking to see if the fourth entry to MatSetValues ("n", for the number of nonzero values in the row) ( https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetValues.html) was equal to the sum of the number of diagonal and off diagonal values that we had specified in our previous call to MatMPIAIJSetPreallocation. So that is at least a theory for what was happening. The theory would be: very rarely, due to a bug in our code, we were running MatSetValues with "n" set to a value not equal to the number of nonzero values promised in the call to MatMPIAIJSetPreallocation. Maybe this led to the Petsc "key 7556 is greater than largest key allowed 5693" error message, and then our setting of 'PetscAbortErrorHandler' was causing the program to abort. Mark On Sat, Sep 26, 2020 at 9:51 PM Barry Smith wrote: > > > On Sep 26, 2020, at 5:58 PM, Junchao Zhang > wrote: > > > > On Sat, Sep 26, 2020 at 5:44 PM Mark Adams wrote: > >> >> >> On Sat, Sep 26, 2020 at 1:07 PM Matthew Knepley >> wrote: >> >>> On Sat, Sep 26, 2020 at 11:17 AM Mark McClure wrote: >>> >>>> Thank you, all for the explanations. >>>> >>>> Following Matt's suggestion, we'll use -g (and not use >>>> -with-debugging=0) all future compiles to all users, so in future, we can >>>> provide better information. >>>> >>>> Second, Chris is going to boil our function down to minimum stub and >>>> share in case there is some subtle issue with the way the functions are >>>> being called. >>>> >>>> Third, I have question/request - Petsc is, in fact, detecting an error. >>>> As far as I can tell, this is not an uncontrolled 'seg fault'. It seems to >>>> me that maybe Petsc could choose to return out from the function when it >>>> detects this error, returning an error code, rather than dumping the core >>>> and terminating the program. If Petsc simply returned out with an error >>>> message, this would resolve the problem for us. After the Petsc call, we >>>> check for Petsc error messages. If Petsc returns an error - that's fine - >>>> we use a direct solver as a backup, and the simulation continues. So - I am >>>> not sure whether this is feasible - but if Petsc could return out with an >>>> error message - rather than dumping the core and terminating the program - >>>> then that would effectively resolve the issue for us. Would this change be >>>> possible? >>>> >>> >>> At some level, I think it is currently doing what you want. CHKERRQ() >>> simply returns an error code from that function call, printing an error >>> message. Suppressing the message is harder I think, >>> >> >> He does not need this. >> >> >>> but for now, if you know what function call is causing the error, you >>> can just catch the (ierr != 0) yourself instead of using CHKERRQ. >>> >> >> This is what I suggested earlier but maybe I was not clear enough. >> >> Your code calls something like >> >> ierr = SNESSolve(....); CHKERRQ(ierr); >> >> You can replace this with: >> >> ierr = SNESSolve(....); >> if (ierr) { >> > How to deal with CHKERRQ(ierr); inside SNESSolve()? > > > > PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); > > But the problem in this code's runs appear to be due to corrupt data, > why and how it gets corrupted is not known. Continuing with an alternative > solver because a solver failed for numerical or algorithmic reasons is > generally fine but continuing when there is corrupted data is always iffy > because one doesn't know how far the corruption has spread. > SNESDestroy(&snes); SNESCreate(&snes); may likely clean out any potentially > corrupted data but if the corruption got into the mesh data structures it > will still be there. > > A very sophisticated library code would, when it detects this type of > corruption, sweep through all the data structures looking for any > indications of corruption, to help determine the cause and/or even fix the > problem. We don't have this code in place, though we could add some, > because generally we relay on valgrind or -malloc_debug to detect such > corruption, the problem is valgrind and -malloc_debug don't fit well in a > production environment. Handling corruption that comes up in production but > not testing is difficult. > > Barry > > > > > .... >> } >> >> I suggested something earlier to do here. Maybe call KSPView. You could >> even destroy the solver and start the solver from scratch and see if that >> works. >> >> Mark >> >> >>> The drawback here is that we might not have cleaned up >>> all the state so that restarting makes sense. It should be possible to >>> just kill the solve, reset the solver, and retry, although it is not clear >>> to me at first glance if MPI will be in an okay state. >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hu.ds.abel at icloud.com Sun Sep 27 05:13:46 2020 From: hu.ds.abel at icloud.com (huabel) Date: Sun, 27 Sep 2020 18:13:46 +0800 Subject: [petsc-users] mpi with different OS(CentOS and macOS) Message-ID: Hi all, I have two computer, one is CentOS another is macOS connected with local area network, Can I run PETSc examples on that two computers in parallel with mpi? And how? Many thanks, Abel From jed at jedbrown.org Sun Sep 27 08:33:53 2020 From: jed at jedbrown.org (Jed Brown) Date: Sun, 27 Sep 2020 07:33:53 -0600 Subject: [petsc-users] mpi with different OS(CentOS and macOS) In-Reply-To: References: Message-ID: <87d027ny2m.fsf@jedbrown.org> Probably, but it's fragile in that you need to use separate executables and you'd launch something like mpiexec -n 1 ./app-linux : -n 1 ./app-osx provided you have identical MPI versions and the same app configuration compiled on each platform. huabel via petsc-users writes: > Hi all, > I have two computer, one is CentOS another is macOS connected with local area network, > Can I run PETSc examples on that two computers in parallel with mpi? And how? > > Many thanks, > Abel From gautham3 at illinois.edu Sat Sep 26 14:01:46 2020 From: gautham3 at illinois.edu (Krishnan, Gautham) Date: Sat, 26 Sep 2020 19:01:46 +0000 Subject: [petsc-users] Regarding help with MatGetSubMatrix parallel use In-Reply-To: References: , Message-ID: In parallel I have passed the global indices that I want on the PE. However, the result I obtain seems to change based on whether I divide the initial domain between PEs along the x axis or the y axis. This is what I am trying to resolve- for example: the FORTRAN code I use is basically: PetscInt :: idx(ngx*ngy*ngz) Mat :: D2_t, lapl, , lapl_t IS :: irow, irow1, icol1 call DMCreateMatrix(grid_pressure, lapl, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) call DMCreateMatrix(grid_pressure, lapl_t, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) . . . !(Assembling matrix lapl) call VecCreate(petsc_xyzcom, vec_b, ierr) call VecSetSizes(vec_b, PETSC_DECIDE, ngx*ngy*ngz-1, ierr) call VecSetFromOptions(vec_b, ierr) call VecSetUp(vec_b, ierr) call VecGetLocalSize(vec_b,vecsx,ierr) call VecGetOwnershipRange(vec_b,veclo,vechi,ierr) call ISCreateStride(petsc_xyzcom, vecsx,veclo,1,icol1,ierr) idx = (/ (j, j=veclo,vechi-1)/) call ISCreateGeneral(petsc_xyzcom, vecsx,idx, PETSC_COPY_VALUES,irow,ierr) call MatTranspose(lapl, MAT_INPLACE_MATRIX, lapl_t,ierr) !transpose the laplacian call MatGetSubMatrix(lapl_t, irow, icol1, MAT_INITIAL_MATRIX, D2_t,ierr) call MatView(lapl_t, PETSC_VIEWER_STDOUT_WORLD, ierr) call MatView(D2_t, PETSC_VIEWER_STDOUT_WORLD, ierr) Output: ngx=ngy=4; ngz=1; such that n=4*4*1=16 lapl_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) (15, -3.) row 15: (15, 3.) Case 1: nprocx =1; nprocy=2; ! number of processors in x and y among which to divide the domain Here, the (n-1 x n-1) submatrix is extracted correctly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) However, for Case 2: nprocx =2; nprocy=1; lapl_t is correctly assembled and transposed but the (n-1 x n-1) submatrix is extracted incorrectly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (3, -9.) row 2: (2, -3.) (3, -9.) row 3: (1, 3.) (2, 3.) (3, 36.) (5, -9.) (10, -9.) row 4: (4, -3.) (5, -9.) row 5: (3, -9.) (4, 3.) (5, 36.) (7, -3.) (12, -9.) row 6: (6, -3.) row 7: (5, -9.) (6, 3.) (7, 3.) row 8: (8, -3.) (9, -3.) (10, -9.) row 9: (9, 3.) row 10: (3, -9.) (8, 3.) (10, 36.) (11, -3.) (12, -9.) row 11: (10, -9.) (11, 3.) row 12: (5, -9.) (10, -9.) (12, 36.) (13, -3.) (14, -3.) row 13: (12, -9.) (13, 3.) row 14: (12, -9.) (14, 3.) I am unable to understand why the extracted submatrix is incorrect when nprocx>1 but works when nprocx=1 and nprocy>=1. P.S. the parallel IS in cases 1 and 2 are the same and are as follows: irow: [0] Number of indices in set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 icol1: [0] Index set is permutation [0] Number of indices in (stride) set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in (stride) set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 Could you please help me find out what is going wrong here? Regards, Gautham ________________________________ From: Matthew Knepley Sent: Wednesday, September 23, 2020 3:55 PM To: Krishnan, Gautham Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Regarding help with MatGetSubMatrix parallel use On Wed, Sep 23, 2020 at 4:12 PM Krishnan, Gautham > wrote: Hello, For a CFD code being developed with FORTRAN and MPI, I am using PETSC matrices and for a particular operation, I require to extract a submatrix(n-1 x n-1) of a matrix created (n x n). However using the petsc MatGetSubMatrix works for serial runs but fails when the domain is split up over PEs- I suspect the indexing changed for parallel runs and hence the global indexing that worked for serial case just shuffles around matrix entries in parallel undesirably. I would like to ask whether anybody could offer some guidance regarding this. I would like to note that the 2D domain is split along both x and y axes for parallel runs on multiple PEs. In parallel, you pass MatGetSubmatrix() the global indices that you want on your process. Thanks, Matt Regards, Gautham Krishnan, -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Sun Sep 27 20:16:09 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Mon, 28 Sep 2020 01:16:09 +0000 Subject: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct In-Reply-To: References: <4c071605-83fe-cb4d-866e-0a4fd1a6c542@cea.fr> , Message-ID: ________________________________ From: Olivier Jamond Sent: Friday, September 25, 2020 2:31 AM To: Zhang, Hong ; Matthew Knepley ; Mark Adams Cc: PETSc Subject: Re: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct @Mark: Sorry, I totally misunderstood your answer, so my answer to your answer about petsc versions is non-sense... @Hong: MatTransposeMatMult gives (Ct*C), no? And I feel that MatCreateTranspose actually works for mpiaij. Yes, MatTransposeMatMult gives (Ct*C). MatCreateTranspose(C,&N) creates a new matrix N which shares the data structure of C. When user calls y = N*x, PETSc actually does MatMultTranspose with C. I do not think you can uses N to do N*C, which is not supported. -- Hong On 24/09/2020 16:16, Zhang, Hong via petsc-users wrote: Olivier and Matt, MatPtAP with A=I gives Pt*P, not P*Pt. We have sequential MatRARt and MatMatTransposeMult(), but no support for mpiaij matrices. The problem is that we do not have a way to implement C*Ct without explicitly transpose C in parallel. We support MatTransposeMatMult (A*Bt) for mpiaij. Can you use this instead? Hong ________________________________ From: petsc-users on behalf of Zhang, Hong via petsc-users Sent: Thursday, September 24, 2020 8:56 AM To: Matthew Knepley ; Mark Adams Cc: PETSc Subject: Re: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct Indeed, we do not have MatCreateTranspose for mpaij matrix. I can adding such support. How soon do you need it? Hong ________________________________ From: petsc-users on behalf of Matthew Knepley Sent: Thursday, September 24, 2020 6:12 AM To: Mark Adams Cc: PETSc Subject: Re: [petsc-users] Compute C*Ct using MatCreateTranspose for Ct On Thu, Sep 24, 2020 at 6:48 AM Mark Adams > wrote: Is there a way to avoid the explicit transposition of the matrix? It does not look like we have A*B^T for mpiaij as the error message says. I am not finding it in the code. Note, MatMatMult with a transpose shell matrix, I suspect that it does an explicit transpose internally, or it could notice that you have C^T*C and we might have that implemented in-place (I doubt it, but it would be legal and fine to do). We definitely have https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatPtAP.html For now, you can put the identity in for A. It would be nice it we assumed that when A = NULL. Patrick, the implementation strategy is broken for the MatProduct mechanism that was just introduced, so we cannot see which things are implemented in the documentation. How would I go about fixing it? Thanks, Matt Many thanks, Olivier Jamond -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 28 08:43:39 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Sep 2020 09:43:39 -0400 Subject: [petsc-users] Regarding help with MatGetSubMatrix parallel use In-Reply-To: References: Message-ID: On Sat, Sep 26, 2020 at 4:13 PM Krishnan, Gautham wrote: > In parallel, I have passed the global indices that I want on the PE. > However, the result I obtain seems to change based on whether I divide the > initial domain between PEs along the x axis or the y axis. This is what I > am trying to resolve- for example: > You mean that you have a DMDA, and you change the pattern of division across processes? This will change the global numbers of given vertices. Remember that these are not "natural numbers", meaning a lexicographic ordering of vertices, but rather "global numbers", meaning the PETSc ordering of unknowns which is contiguous across processes. There is a discussion of this in the manual. Thanks, Matt > the *FORTRAN* code I use is basically: > PetscInt :: idx(ngx*ngy*ngz) > Mat :: D2_t, lapl, , lapl_t > IS :: irow, irow1, icol1 > > call DMCreateMatrix(grid_pressure, lapl, ierr) > call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) > call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) > call DMCreateMatrix(grid_pressure, lapl_t, ierr) > call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) > call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) > . > . > . !(Assembling matrix lapl) > > call VecCreate(petsc_xyzcom, vec_b, ierr) > call VecSetSizes(vec_b, PETSC_DECIDE, ngx*ngy*ngz-1, ierr) > call VecSetFromOptions(vec_b, ierr) > call VecSetUp(vec_b, ierr) > call VecGetLocalSize(vec_b,vecsx,ierr) > call VecGetOwnershipRange(vec_b,veclo,vechi,ierr) > call ISCreateStride(petsc_xyzcom, vecsx,veclo,1,icol1,ierr) > idx = (/ (j, j=veclo,vechi-1)/) > call ISCreateGeneral(petsc_xyzcom, vecsx,idx, PETSC_COPY_VALUES,irow,ierr) > call MatTranspose(lapl, MAT_INPLACE_MATRIX, lapl_t,ierr) !transpose the > laplacian > call MatGetSubMatrix(lapl_t, irow, icol1, MAT_INITIAL_MATRIX, D2_t,ierr) > call MatView(lapl_t, PETSC_VIEWER_STDOUT_WORLD, ierr) > call MatView(D2_t, PETSC_VIEWER_STDOUT_WORLD, ierr) > > *Output*: > > ngx=ngy=4; ngz=1; such that n=4*4*1=16 > > lapl_t: > row 0: (0, -3.) > row 1: (0, 3.) (1, -3.) (5, -9.) > row 2: (2, -3.) (3, -3.) (6, -9.) > row 3: (3, 3.) > row 4: (4, -3.) (5, -9.) > row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) > row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) > row 7: (6, -9.) (7, 3.) > row 8: (8, -3.) (9, -9.) > row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) > row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) > row 11: (10, -9.) (11, 3.) > row 12: (12, -3.) > row 13: (9, -9.) (12, 3.) (13, 3.) > row 14: (10, -9.) (14, 3.) (15, -3.) > row 15: (15, 3.) > > > *Case 1:* > nprocx =1; nprocy=2; ! number of processors in x and y among which to > divide the domain > > Here, the (n-1 x n-1) submatrix is extracted correctly as > D2_t: > row 0: (0, -3.) > row 1: (0, 3.) (1, -3.) (5, -9.) > row 2: (2, -3.) (3, -3.) (6, -9.) > row 3: (3, 3.) > row 4: (4, -3.) (5, -9.) > row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) > row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) > row 7: (6, -9.) (7, 3.) > row 8: (8, -3.) (9, -9.) > row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) > row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) > row 11: (10, -9.) (11, 3.) > row 12: (12, -3.) > row 13: (9, -9.) (12, 3.) (13, 3.) > row 14: (10, -9.) (14, 3.) > > However, for > *Case 2:* > nprocx =2; nprocy=1; > > lapl_t is correctly assembled and transposed but the (n-1 x n-1) > submatrix is extracted incorrectly as > D2_t: > row 0: (0, -3.) > row 1: (0, 3.) (1, -3.) (3, -9.) > row 2: (2, -3.) (3, -9.) > row 3: (1, 3.) (2, 3.) (3, 36.) (5, -9.) (10, -9.) > row 4: (4, -3.) (5, -9.) > row 5: (3, -9.) (4, 3.) (5, 36.) (7, -3.) (12, -9.) > row 6: (6, -3.) > row 7: (5, -9.) (6, 3.) (7, 3.) > row 8: (8, -3.) (9, -3.) (10, -9.) > row 9: (9, 3.) > row 10: (3, -9.) (8, 3.) (10, 36.) (11, -3.) (12, -9.) > row 11: (10, -9.) (11, 3.) > row 12: (5, -9.) (10, -9.) (12, 36.) (13, -3.) (14, -3.) > row 13: (12, -9.) (13, 3.) > row 14: (12, -9.) (14, 3.) > > I am unable to understand why the extracted submatrix is incorrect when > nprocx>1 but works when nprocx=1 and nprocy>=1. > > P.S. the parallel IS in cases 1 and 2 are the same and are as follows: > irow: > [0] Number of indices in set 8 > [0] 0 0 > [0] 1 1 > [0] 2 2 > [0] 3 3 > [0] 4 4 > [0] 5 5 > [0] 6 6 > [0] 7 7 > [1] Number of indices in set 7 > [1] 0 8 > [1] 1 9 > [1] 2 10 > [1] 3 11 > [1] 4 12 > [1] 5 13 > [1] 6 14 > > icol1: > [0] Index set is permutation > [0] Number of indices in (stride) set 8 > [0] 0 0 > [0] 1 1 > [0] 2 2 > [0] 3 3 > [0] 4 4 > [0] 5 5 > [0] 6 6 > [0] 7 7 > [1] Number of indices in (stride) set 7 > [1] 0 8 > [1] 1 9 > [1] 2 10 > [1] 3 11 > [1] 4 12 > [1] 5 13 > [1] 6 14 > > Could you please help me find out what is going wrong here? > > Regards, > Gautham > > ------------------------------ > *From:* Krishnan, Gautham > *Sent:* Saturday, September 26, 2020 2:01 PM > *To:* Matthew Knepley > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Regarding help with MatGetSubMatrix parallel > use > > In parallel I have passed the global indices that I want on the PE. > However, the result I obtain seems to change based on whether I divide the > initial domain between PEs along the x axis or the y axis. This is what I > am trying to resolve- for example: > > the *FORTRAN* code I use is basically: > PetscInt :: idx(ngx*ngy*ngz) > Mat :: D2_t, lapl, , lapl_t > IS :: irow, irow1, icol1 > > call DMCreateMatrix(grid_pressure, lapl, ierr) > call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) > call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) > call DMCreateMatrix(grid_pressure, lapl_t, ierr) > call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) > call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) > . > . > . !(Assembling matrix lapl) > > call VecCreate(petsc_xyzcom, vec_b, ierr) > call VecSetSizes(vec_b, PETSC_DECIDE, ngx*ngy*ngz-1, ierr) > call VecSetFromOptions(vec_b, ierr) > call VecSetUp(vec_b, ierr) > call VecGetLocalSize(vec_b,vecsx,ierr) > call VecGetOwnershipRange(vec_b,veclo,vechi,ierr) > call ISCreateStride(petsc_xyzcom, vecsx,veclo,1,icol1,ierr) > idx = (/ (j, j=veclo,vechi-1)/) > call ISCreateGeneral(petsc_xyzcom, vecsx,idx, PETSC_COPY_VALUES,irow,ierr) > call MatTranspose(lapl, MAT_INPLACE_MATRIX, lapl_t,ierr) !transpose the > laplacian > call MatGetSubMatrix(lapl_t, irow, icol1, MAT_INITIAL_MATRIX, D2_t,ierr) > call MatView(lapl_t, PETSC_VIEWER_STDOUT_WORLD, ierr) > call MatView(D2_t, PETSC_VIEWER_STDOUT_WORLD, ierr) > > *Output*: > > ngx=ngy=4; ngz=1; such that n=4*4*1=16 > > lapl_t: > row 0: (0, -3.) > row 1: (0, 3.) (1, -3.) (5, -9.) > row 2: (2, -3.) (3, -3.) (6, -9.) > row 3: (3, 3.) > row 4: (4, -3.) (5, -9.) > row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) > row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) > row 7: (6, -9.) (7, 3.) > row 8: (8, -3.) (9, -9.) > row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) > row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) > row 11: (10, -9.) (11, 3.) > row 12: (12, -3.) > row 13: (9, -9.) (12, 3.) (13, 3.) > row 14: (10, -9.) (14, 3.) (15, -3.) > row 15: (15, 3.) > > > *Case 1:* > nprocx =1; nprocy=2; ! number of processors in x and y among which to > divide the domain > > Here, the (n-1 x n-1) submatrix is extracted correctly as > D2_t: > row 0: (0, -3.) > row 1: (0, 3.) (1, -3.) (5, -9.) > row 2: (2, -3.) (3, -3.) (6, -9.) > row 3: (3, 3.) > row 4: (4, -3.) (5, -9.) > row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) > row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) > row 7: (6, -9.) (7, 3.) > row 8: (8, -3.) (9, -9.) > row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) > row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) > row 11: (10, -9.) (11, 3.) > row 12: (12, -3.) > row 13: (9, -9.) (12, 3.) (13, 3.) > row 14: (10, -9.) (14, 3.) > > However, for > *Case 2:* > nprocx =2; nprocy=1; > > lapl_t is correctly assembled and transposed but the (n-1 x n-1) > submatrix is extracted incorrectly as > D2_t: > row 0: (0, -3.) > row 1: (0, 3.) (1, -3.) (3, -9.) > row 2: (2, -3.) (3, -9.) > row 3: (1, 3.) (2, 3.) (3, 36.) (5, -9.) (10, -9.) > row 4: (4, -3.) (5, -9.) > row 5: (3, -9.) (4, 3.) (5, 36.) (7, -3.) (12, -9.) > row 6: (6, -3.) > row 7: (5, -9.) (6, 3.) (7, 3.) > row 8: (8, -3.) (9, -3.) (10, -9.) > row 9: (9, 3.) > row 10: (3, -9.) (8, 3.) (10, 36.) (11, -3.) (12, -9.) > row 11: (10, -9.) (11, 3.) > row 12: (5, -9.) (10, -9.) (12, 36.) (13, -3.) (14, -3.) > row 13: (12, -9.) (13, 3.) > row 14: (12, -9.) (14, 3.) > > I am unable to understand why the extracted submatrix is incorrect when > nprocx>1 but works when nprocx=1 and nprocy>=1. > > P.S. the parallel IS in cases 1 and 2 are the same and are as follows: > irow: > [0] Number of indices in set 8 > [0] 0 0 > [0] 1 1 > [0] 2 2 > [0] 3 3 > [0] 4 4 > [0] 5 5 > [0] 6 6 > [0] 7 7 > [1] Number of indices in set 7 > [1] 0 8 > [1] 1 9 > [1] 2 10 > [1] 3 11 > [1] 4 12 > [1] 5 13 > [1] 6 14 > > icol1: > [0] Index set is permutation > [0] Number of indices in (stride) set 8 > [0] 0 0 > [0] 1 1 > [0] 2 2 > [0] 3 3 > [0] 4 4 > [0] 5 5 > [0] 6 6 > [0] 7 7 > [1] Number of indices in (stride) set 7 > [1] 0 8 > [1] 1 9 > [1] 2 10 > [1] 3 11 > [1] 4 12 > [1] 5 13 > [1] 6 14 > > Could you please help me find out what is going wrong here? > > Regards, > Gautham > ------------------------------ > *From:* Matthew Knepley > *Sent:* Wednesday, September 23, 2020 3:55 PM > *To:* Krishnan, Gautham > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Regarding help with MatGetSubMatrix parallel > use > > On Wed, Sep 23, 2020 at 4:12 PM Krishnan, Gautham > wrote: > > Hello, > > For a CFD code being developed with FORTRAN and MPI, I am using PETSC > matrices and for a particular operation, I require to extract a > submatrix(n-1 x n-1) of a matrix created (n x n). However using the petsc > MatGetSubMatrix works for serial runs but fails when the domain is split up > over PEs- I suspect the indexing changed for parallel runs and hence the > global indexing that worked for serial case just shuffles around matrix > entries in parallel undesirably. I would like to ask whether anybody could > offer some guidance regarding this. I would like to note that the 2D > domain is split along both x and y axes for parallel runs on multiple PEs. > > > In parallel, you pass MatGetSubmatrix() the global indices that you want > on your process. > > Thanks, > > Matt > > > Regards, > Gautham Krishnan, > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gautham3 at illinois.edu Mon Sep 28 10:05:07 2020 From: gautham3 at illinois.edu (Krishnan, Gautham) Date: Mon, 28 Sep 2020 15:05:07 +0000 Subject: [petsc-users] Regarding help with MatGetSubMatrix parallel use In-Reply-To: References: , Message-ID: Thank you, the issue was that I did not provide the ordering as in the DMDA. Regards, Gautham ________________________________ From: Matthew Knepley Sent: Monday, September 28, 2020 8:43 AM To: Krishnan, Gautham Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Regarding help with MatGetSubMatrix parallel use On Sat, Sep 26, 2020 at 4:13 PM Krishnan, Gautham > wrote: In parallel, I have passed the global indices that I want on the PE. However, the result I obtain seems to change based on whether I divide the initial domain between PEs along the x axis or the y axis. This is what I am trying to resolve- for example: You mean that you have a DMDA, and you change the pattern of division across processes? This will change the global numbers of given vertices. Remember that these are not "natural numbers", meaning a lexicographic ordering of vertices, but rather "global numbers", meaning the PETSc ordering of unknowns which is contiguous across processes. There is a discussion of this in the manual. Thanks, Matt the FORTRAN code I use is basically: PetscInt :: idx(ngx*ngy*ngz) Mat :: D2_t, lapl, , lapl_t IS :: irow, irow1, icol1 call DMCreateMatrix(grid_pressure, lapl, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) call DMCreateMatrix(grid_pressure, lapl_t, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) . . . !(Assembling matrix lapl) call VecCreate(petsc_xyzcom, vec_b, ierr) call VecSetSizes(vec_b, PETSC_DECIDE, ngx*ngy*ngz-1, ierr) call VecSetFromOptions(vec_b, ierr) call VecSetUp(vec_b, ierr) call VecGetLocalSize(vec_b,vecsx,ierr) call VecGetOwnershipRange(vec_b,veclo,vechi,ierr) call ISCreateStride(petsc_xyzcom, vecsx,veclo,1,icol1,ierr) idx = (/ (j, j=veclo,vechi-1)/) call ISCreateGeneral(petsc_xyzcom, vecsx,idx, PETSC_COPY_VALUES,irow,ierr) call MatTranspose(lapl, MAT_INPLACE_MATRIX, lapl_t,ierr) !transpose the laplacian call MatGetSubMatrix(lapl_t, irow, icol1, MAT_INITIAL_MATRIX, D2_t,ierr) call MatView(lapl_t, PETSC_VIEWER_STDOUT_WORLD, ierr) call MatView(D2_t, PETSC_VIEWER_STDOUT_WORLD, ierr) Output: ngx=ngy=4; ngz=1; such that n=4*4*1=16 lapl_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) (15, -3.) row 15: (15, 3.) Case 1: nprocx =1; nprocy=2; ! number of processors in x and y among which to divide the domain Here, the (n-1 x n-1) submatrix is extracted correctly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) However, for Case 2: nprocx =2; nprocy=1; lapl_t is correctly assembled and transposed but the (n-1 x n-1) submatrix is extracted incorrectly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (3, -9.) row 2: (2, -3.) (3, -9.) row 3: (1, 3.) (2, 3.) (3, 36.) (5, -9.) (10, -9.) row 4: (4, -3.) (5, -9.) row 5: (3, -9.) (4, 3.) (5, 36.) (7, -3.) (12, -9.) row 6: (6, -3.) row 7: (5, -9.) (6, 3.) (7, 3.) row 8: (8, -3.) (9, -3.) (10, -9.) row 9: (9, 3.) row 10: (3, -9.) (8, 3.) (10, 36.) (11, -3.) (12, -9.) row 11: (10, -9.) (11, 3.) row 12: (5, -9.) (10, -9.) (12, 36.) (13, -3.) (14, -3.) row 13: (12, -9.) (13, 3.) row 14: (12, -9.) (14, 3.) I am unable to understand why the extracted submatrix is incorrect when nprocx>1 but works when nprocx=1 and nprocy>=1. P.S. the parallel IS in cases 1 and 2 are the same and are as follows: irow: [0] Number of indices in set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 icol1: [0] Index set is permutation [0] Number of indices in (stride) set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in (stride) set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 Could you please help me find out what is going wrong here? Regards, Gautham ________________________________ From: Krishnan, Gautham > Sent: Saturday, September 26, 2020 2:01 PM To: Matthew Knepley > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Regarding help with MatGetSubMatrix parallel use In parallel I have passed the global indices that I want on the PE. However, the result I obtain seems to change based on whether I divide the initial domain between PEs along the x axis or the y axis. This is what I am trying to resolve- for example: the FORTRAN code I use is basically: PetscInt :: idx(ngx*ngy*ngz) Mat :: D2_t, lapl, , lapl_t IS :: irow, irow1, icol1 call DMCreateMatrix(grid_pressure, lapl, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) call DMCreateMatrix(grid_pressure, lapl_t, ierr) call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr) . . . !(Assembling matrix lapl) call VecCreate(petsc_xyzcom, vec_b, ierr) call VecSetSizes(vec_b, PETSC_DECIDE, ngx*ngy*ngz-1, ierr) call VecSetFromOptions(vec_b, ierr) call VecSetUp(vec_b, ierr) call VecGetLocalSize(vec_b,vecsx,ierr) call VecGetOwnershipRange(vec_b,veclo,vechi,ierr) call ISCreateStride(petsc_xyzcom, vecsx,veclo,1,icol1,ierr) idx = (/ (j, j=veclo,vechi-1)/) call ISCreateGeneral(petsc_xyzcom, vecsx,idx, PETSC_COPY_VALUES,irow,ierr) call MatTranspose(lapl, MAT_INPLACE_MATRIX, lapl_t,ierr) !transpose the laplacian call MatGetSubMatrix(lapl_t, irow, icol1, MAT_INITIAL_MATRIX, D2_t,ierr) call MatView(lapl_t, PETSC_VIEWER_STDOUT_WORLD, ierr) call MatView(D2_t, PETSC_VIEWER_STDOUT_WORLD, ierr) Output: ngx=ngy=4; ngz=1; such that n=4*4*1=16 lapl_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) (15, -3.) row 15: (15, 3.) Case 1: nprocx =1; nprocy=2; ! number of processors in x and y among which to divide the domain Here, the (n-1 x n-1) submatrix is extracted correctly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (5, -9.) row 2: (2, -3.) (3, -3.) (6, -9.) row 3: (3, 3.) row 4: (4, -3.) (5, -9.) row 5: (1, 3.) (4, 3.) (5, 36.) (6, -9.) (9, -9.) row 6: (2, 3.) (5, -9.) (6, 36.) (7, -3.) (10, -9.) row 7: (6, -9.) (7, 3.) row 8: (8, -3.) (9, -9.) row 9: (5, -9.) (8, 3.) (9, 36.) (10, -9.) (13, -3.) row 10: (6, -9.) (9, -9.) (10, 36.) (11, -3.) (14, -3.) row 11: (10, -9.) (11, 3.) row 12: (12, -3.) row 13: (9, -9.) (12, 3.) (13, 3.) row 14: (10, -9.) (14, 3.) However, for Case 2: nprocx =2; nprocy=1; lapl_t is correctly assembled and transposed but the (n-1 x n-1) submatrix is extracted incorrectly as D2_t: row 0: (0, -3.) row 1: (0, 3.) (1, -3.) (3, -9.) row 2: (2, -3.) (3, -9.) row 3: (1, 3.) (2, 3.) (3, 36.) (5, -9.) (10, -9.) row 4: (4, -3.) (5, -9.) row 5: (3, -9.) (4, 3.) (5, 36.) (7, -3.) (12, -9.) row 6: (6, -3.) row 7: (5, -9.) (6, 3.) (7, 3.) row 8: (8, -3.) (9, -3.) (10, -9.) row 9: (9, 3.) row 10: (3, -9.) (8, 3.) (10, 36.) (11, -3.) (12, -9.) row 11: (10, -9.) (11, 3.) row 12: (5, -9.) (10, -9.) (12, 36.) (13, -3.) (14, -3.) row 13: (12, -9.) (13, 3.) row 14: (12, -9.) (14, 3.) I am unable to understand why the extracted submatrix is incorrect when nprocx>1 but works when nprocx=1 and nprocy>=1. P.S. the parallel IS in cases 1 and 2 are the same and are as follows: irow: [0] Number of indices in set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 icol1: [0] Index set is permutation [0] Number of indices in (stride) set 8 [0] 0 0 [0] 1 1 [0] 2 2 [0] 3 3 [0] 4 4 [0] 5 5 [0] 6 6 [0] 7 7 [1] Number of indices in (stride) set 7 [1] 0 8 [1] 1 9 [1] 2 10 [1] 3 11 [1] 4 12 [1] 5 13 [1] 6 14 Could you please help me find out what is going wrong here? Regards, Gautham ________________________________ From: Matthew Knepley > Sent: Wednesday, September 23, 2020 3:55 PM To: Krishnan, Gautham > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Regarding help with MatGetSubMatrix parallel use On Wed, Sep 23, 2020 at 4:12 PM Krishnan, Gautham > wrote: Hello, For a CFD code being developed with FORTRAN and MPI, I am using PETSC matrices and for a particular operation, I require to extract a submatrix(n-1 x n-1) of a matrix created (n x n). However using the petsc MatGetSubMatrix works for serial runs but fails when the domain is split up over PEs- I suspect the indexing changed for parallel runs and hence the global indexing that worked for serial case just shuffles around matrix entries in parallel undesirably. I would like to ask whether anybody could offer some guidance regarding this. I would like to note that the 2D domain is split along both x and y axes for parallel runs on multiple PEs. In parallel, you pass MatGetSubmatrix() the global indices that you want on your process. Thanks, Matt Regards, Gautham Krishnan, -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Sep 28 14:42:30 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 28 Sep 2020 12:42:30 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: Hi Stefano and PETSc dev team, I want to try your suggestion to always load complex version of PETSc but if my input matrix A is real, I want to create shell matrix to matrix-vector and factorization using real only. I still need to understand how MatRealPart works. Does it just zero out the image numerical values or does it delete the image memory? If my input matrix A is real, how do I create a shell matrix to matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, x and y are Vec? I notice there is a VecRealPart but it seems it just zeros the image numerical values. It seems I still have to create a PetscReal pointer to copy the real part of PetacScalar pointers like following. Can you comment on it? Thanks, Sam PetscScalar *px = nullptr; VecGetArrayRead(x, &px); PetscScalar *py = nullptr; VecGetArray(y, &py); int localSize = 0; VecGetLocalSize(x, &localSize); std::vector realX(localSize); // I am using c++ to call PETSc //retrieve real part for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); // do real matrix-vector multiplication // realY=A*realX // here where realY is std::vector //put real part back to py for(int i = 0; i < localSize; i++) pv[i] = realY[i]; VecRestoreArray(y,&py); On Tue, May 26, 2020 at 1:49 PM Sam Guo wrote: > Thanks > > On Tuesday, May 26, 2020, Stefano Zampini > wrote: > >> All the solvers/matrices/vectors works for PetscScalar types (i.e. in >> your case complex) >> If you need to solve for the real part only, you can duplicate the matrix >> and call MatRealPart to zero out the imaginary part. But the solve will >> always run in the complex space >> You should not be worried about doubling the memory for a matrix (i.e. >> real and imaginary part) >> >> >> On May 26, 2020, at 11:28 PM, Sam Guo wrote: >> >> complex version is needed since matrix sometimes is real and sometimes is >> complex. I want to solve real matrix without allocating memory for >> imaginary part((except eigen pairs). >> >> On Tuesday, May 26, 2020, Zhang, Hong wrote: >> >>> You can build PETSc with complex version, and declare some variables as >>> 'PETSC_REAL'. >>> Hong >>> >>> ------------------------------ >>> *From:* petsc-users on behalf of Sam >>> Guo >>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>> *To:* PETSc >>> *Subject:* [petsc-users] using real and complex together >>> >>> Dear PETSc dev team, >>> Can I use both real and complex versions together? >>> >>> Thanks, >>> Sam >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 28 14:52:04 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Sep 2020 15:52:04 -0400 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: On Mon, Sep 28, 2020 at 3:43 PM Sam Guo wrote: > Hi Stefano and PETSc dev team, > I want to try your suggestion to always load complex version of PETSc > but if my input matrix A is real, I want to create shell matrix to > matrix-vector and factorization using real only. > I do not think that will work as you expect. I will try to explain below. > I still need to understand how MatRealPart works. Does it just zero out > the image numerical values or does it delete the image memory? > When we have complex values, we use the "complex" type to allocate and store them. Thus you cannot talk about just the memory to store imaginary parts. MatRealPart sets the imaginary parts of all the matrix elements to zero. > If my input matrix A is real, how do I create a shell matrix to matrix > -vector multiplication y=A*x where A is real, PestcScalar = complex, x and > y are Vec? I notice there is a VecRealPart but it seems it just zeros the > image numerical values. It seems I still have to create a PetscReal > pointer to copy the real part of PetacScalar pointers like following. Can > you comment on it? > What you suggest would mean rewriting the matrix multiplication algorithm by hand after extracting the values. I am not sure if this is really what you want to do. Is the matrix memory really your limiting factor? Even if you tried to do this with templates, the memory from temporaries would be very hard to control. Thanks, Matt > Thanks, > Sam > > PetscScalar *px = nullptr; > VecGetArrayRead(x, &px); > PetscScalar *py = nullptr; > VecGetArray(y, &py); > int localSize = 0; > VecGetLocalSize(x, &localSize); > std::vector realX(localSize); // I am using c++ to call PETSc > > //retrieve real part > for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); > > // do real matrix-vector multiplication > // realY=A*realX > // here where realY is std::vector > > //put real part back to py > for(int i = 0; i < localSize; i++) pv[i] = realY[i]; > VecRestoreArray(y,&py); > > On Tue, May 26, 2020 at 1:49 PM Sam Guo wrote: > >> Thanks >> >> On Tuesday, May 26, 2020, Stefano Zampini >> wrote: >> >>> All the solvers/matrices/vectors works for PetscScalar types (i.e. in >>> your case complex) >>> If you need to solve for the real part only, you can duplicate the >>> matrix and call MatRealPart to zero out the imaginary part. But the solve >>> will always run in the complex space >>> You should not be worried about doubling the memory for a matrix (i.e. >>> real and imaginary part) >>> >>> >>> On May 26, 2020, at 11:28 PM, Sam Guo wrote: >>> >>> complex version is needed since matrix sometimes is real and sometimes >>> is complex. I want to solve real matrix without allocating memory for >>> imaginary part((except eigen pairs). >>> >>> On Tuesday, May 26, 2020, Zhang, Hong wrote: >>> >>>> You can build PETSc with complex version, and declare some variables >>>> as 'PETSC_REAL'. >>>> Hong >>>> >>>> ------------------------------ >>>> *From:* petsc-users on behalf of Sam >>>> Guo >>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>> *To:* PETSc >>>> *Subject:* [petsc-users] using real and complex together >>>> >>>> Dear PETSc dev team, >>>> Can I use both real and complex versions together? >>>> >>>> Thanks, >>>> Sam >>>> >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Sep 28 16:01:44 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 28 Sep 2020 14:01:44 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: Hi Matt, Since I use MUMPS as preconditioner, complex uses too much memory if my input matrix is real. Ideally if I can compile real and complex into different symbols (like MUMPS) , I can load both version without conflict. Thanks, Sam On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley wrote: > On Mon, Sep 28, 2020 at 3:43 PM Sam Guo wrote: > >> Hi Stefano and PETSc dev team, >> I want to try your suggestion to always load complex version of PETSc >> but if my input matrix A is real, I want to create shell matrix to >> matrix-vector and factorization using real only. >> > > I do not think that will work as you expect. I will try to explain below. > > >> I still need to understand how MatRealPart works. Does it just zero >> out the image numerical values or does it delete the image memory? >> > > When we have complex values, we use the "complex" type to allocate and > store them. Thus you cannot talk about just the memory to store imaginary > parts. > MatRealPart sets the imaginary parts of all the matrix elements to zero. > > >> If my input matrix A is real, how do I create a shell matrix to matrix >> -vector multiplication y=A*x where A is real, PestcScalar = complex, x and >> y are Vec? I notice there is a VecRealPart but it seems it just zeros the >> image numerical values. It seems I still have to create a PetscReal >> pointer to copy the real part of PetacScalar pointers like following. Can >> you comment on it? >> > > What you suggest would mean rewriting the matrix multiplication algorithm > by hand after extracting the values. I am not sure if this > is really what you want to do. Is the matrix memory really your limiting > factor? Even if you tried to do this with templates, the memory > from temporaries would be very hard to control. > > Thanks, > > Matt > > >> Thanks, >> Sam >> >> PetscScalar *px = nullptr; >> VecGetArrayRead(x, &px); >> PetscScalar *py = nullptr; >> VecGetArray(y, &py); >> int localSize = 0; >> VecGetLocalSize(x, &localSize); >> std::vector realX(localSize); // I am using c++ to call PETSc >> >> //retrieve real part >> for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); >> >> // do real matrix-vector multiplication >> // realY=A*realX >> // here where realY is std::vector >> >> //put real part back to py >> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >> VecRestoreArray(y,&py); >> >> On Tue, May 26, 2020 at 1:49 PM Sam Guo wrote: >> >>> Thanks >>> >>> On Tuesday, May 26, 2020, Stefano Zampini >>> wrote: >>> >>>> All the solvers/matrices/vectors works for PetscScalar types (i.e. in >>>> your case complex) >>>> If you need to solve for the real part only, you can duplicate the >>>> matrix and call MatRealPart to zero out the imaginary part. But the solve >>>> will always run in the complex space >>>> You should not be worried about doubling the memory for a matrix (i.e. >>>> real and imaginary part) >>>> >>>> >>>> On May 26, 2020, at 11:28 PM, Sam Guo wrote: >>>> >>>> complex version is needed since matrix sometimes is real and sometimes >>>> is complex. I want to solve real matrix without allocating memory for >>>> imaginary part((except eigen pairs). >>>> >>>> On Tuesday, May 26, 2020, Zhang, Hong wrote: >>>> >>>>> You can build PETSc with complex version, and declare some variables >>>>> as 'PETSC_REAL'. >>>>> Hong >>>>> >>>>> ------------------------------ >>>>> *From:* petsc-users on behalf of >>>>> Sam Guo >>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>> *To:* PETSc >>>>> *Subject:* [petsc-users] using real and complex together >>>>> >>>>> Dear PETSc dev team, >>>>> Can I use both real and complex versions together? >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>> >>>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 28 17:05:19 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Sep 2020 18:05:19 -0400 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: On Mon, Sep 28, 2020 at 5:01 PM Sam Guo wrote: > Hi Matt, > Since I use MUMPS as preconditioner, complex uses too much memory if my > input matrix is real. Ideally if I can compile real and complex into > different symbols (like MUMPS) , I can load both version without conflict. > What I mean to say is that it would be great if it were as simple as using two different symbols, but unfortunately the problem is more difficult. I was trying to use the example of templates. This would be a very intrusive change no matter what technology you are using. So your main memory usage is from the MUMPS factorization, and you cannot afford to double that usage? You could consider writing a version of AIJ that stores real entries, but allows complex vector values. It would promote to complex for the row dot product. However, you would also have to do the same work for all the solves you do with MUMPS. I think it would be much easier to just decompose your complex work into real and imaginary parts and use PETSc with real scalars to compute them separately. Since you know your matrices have 0 imaginary part, this becomes very straightforward. Thanks, Matt > Thanks, > Sam > > On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley > wrote: > >> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo wrote: >> >>> Hi Stefano and PETSc dev team, >>> I want to try your suggestion to always load complex version of PETSc >>> but if my input matrix A is real, I want to create shell matrix to >>> matrix-vector and factorization using real only. >>> >> >> I do not think that will work as you expect. I will try to explain below. >> >> >>> I still need to understand how MatRealPart works. Does it just zero >>> out the image numerical values or does it delete the image memory? >>> >> >> When we have complex values, we use the "complex" type to allocate and >> store them. Thus you cannot talk about just the memory to store imaginary >> parts. >> MatRealPart sets the imaginary parts of all the matrix elements to zero. >> >> >>> If my input matrix A is real, how do I create a shell matrix to matrix >>> -vector multiplication y=A*x where A is real, PestcScalar = complex, x and >>> y are Vec? I notice there is a VecRealPart but it seems it just zeros the >>> image numerical values. It seems I still have to create a PetscReal >>> pointer to copy the real part of PetacScalar pointers like following. Can >>> you comment on it? >>> >> >> What you suggest would mean rewriting the matrix multiplication algorithm >> by hand after extracting the values. I am not sure if this >> is really what you want to do. Is the matrix memory really your limiting >> factor? Even if you tried to do this with templates, the memory >> from temporaries would be very hard to control. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Sam >>> >>> PetscScalar *px = nullptr; >>> VecGetArrayRead(x, &px); >>> PetscScalar *py = nullptr; >>> VecGetArray(y, &py); >>> int localSize = 0; >>> VecGetLocalSize(x, &localSize); >>> std::vector realX(localSize); // I am using c++ to call PETSc >>> >>> //retrieve real part >>> for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); >>> >>> // do real matrix-vector multiplication >>> // realY=A*realX >>> // here where realY is std::vector >>> >>> //put real part back to py >>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>> VecRestoreArray(y,&py); >>> >>> On Tue, May 26, 2020 at 1:49 PM Sam Guo wrote: >>> >>>> Thanks >>>> >>>> On Tuesday, May 26, 2020, Stefano Zampini >>>> wrote: >>>> >>>>> All the solvers/matrices/vectors works for PetscScalar types (i.e. in >>>>> your case complex) >>>>> If you need to solve for the real part only, you can duplicate the >>>>> matrix and call MatRealPart to zero out the imaginary part. But the solve >>>>> will always run in the complex space >>>>> You should not be worried about doubling the memory for a matrix (i.e. >>>>> real and imaginary part) >>>>> >>>>> >>>>> On May 26, 2020, at 11:28 PM, Sam Guo wrote: >>>>> >>>>> complex version is needed since matrix sometimes is real and sometimes >>>>> is complex. I want to solve real matrix without allocating memory for >>>>> imaginary part((except eigen pairs). >>>>> >>>>> On Tuesday, May 26, 2020, Zhang, Hong wrote: >>>>> >>>>>> You can build PETSc with complex version, and declare some variables >>>>>> as 'PETSC_REAL'. >>>>>> Hong >>>>>> >>>>>> ------------------------------ >>>>>> *From:* petsc-users on behalf of >>>>>> Sam Guo >>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>> *To:* PETSc >>>>>> *Subject:* [petsc-users] using real and complex together >>>>>> >>>>>> Dear PETSc dev team, >>>>>> Can I use both real and complex versions together? >>>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>> >>>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Sep 28 17:29:05 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 28 Sep 2020 15:29:05 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: ? I think it would be much easier to just decompose your complex work into real and imaginary parts and use PETSc with real scalars to compute them separately. Since you know your matrices have 0 imaginary part, this becomes very straightforward.? I think this is what I am trying to do. But since I have to provide matrix-vector operator(since I am using shell matrix), the Vec I receive is complex. I need to convert complex vec to real one and then convert it back(that?s the code I shown before). On Monday, September 28, 2020, Matthew Knepley wrote: > On Mon, Sep 28, 2020 at 5:01 PM Sam Guo wrote: > >> Hi Matt, >> Since I use MUMPS as preconditioner, complex uses too much memory if >> my input matrix is real. Ideally if I can compile real and complex into >> different symbols (like MUMPS) , I can load both version without conflict. >> > > What I mean to say is that it would be great if it were as simple as using > two different symbols, but unfortunately the problem is more difficult. I > was trying to use > the example of templates. This would be a very intrusive change no matter > what technology you are using. > > So your main memory usage is from the MUMPS factorization, and you cannot > afford to double that usage? > > You could consider writing a version of AIJ that stores real entries, but > allows complex vector values. It would promote to complex for the row dot > product. > However, you would also have to do the same work for all the solves you do > with MUMPS. > > I think it would be much easier to just decompose your complex work into > real and imaginary parts and use PETSc with real scalars to compute them > separately. > Since you know your matrices have 0 imaginary part, this becomes very > straightforward. > > Thanks, > > Matt > > >> Thanks, >> Sam >> >> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley >> wrote: >> >>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo wrote: >>> >>>> Hi Stefano and PETSc dev team, >>>> I want to try your suggestion to always load complex version of >>>> PETSc but if my input matrix A is real, I want to create shell matrix to >>>> matrix-vector and factorization using real only. >>>> >>> >>> I do not think that will work as you expect. I will try to explain below. >>> >>> >>>> I still need to understand how MatRealPart works. Does it just zero >>>> out the image numerical values or does it delete the image memory? >>>> >>> >>> When we have complex values, we use the "complex" type to allocate and >>> store them. Thus you cannot talk about just the memory to store imaginary >>> parts. >>> MatRealPart sets the imaginary parts of all the matrix elements to zero. >>> >>> >>>> If my input matrix A is real, how do I create a shell matrix to matrix >>>> -vector multiplication y=A*x where A is real, PestcScalar = complex, x and >>>> y are Vec? I notice there is a VecRealPart but it seems it just zeros the >>>> image numerical values. It seems I still have to create a PetscReal >>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>> you comment on it? >>>> >>> >>> What you suggest would mean rewriting the matrix multiplication >>> algorithm by hand after extracting the values. I am not sure if this >>> is really what you want to do. Is the matrix memory really your limiting >>> factor? Even if you tried to do this with templates, the memory >>> from temporaries would be very hard to control. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Sam >>>> >>>> PetscScalar *px = nullptr; >>>> VecGetArrayRead(x, &px); >>>> PetscScalar *py = nullptr; >>>> VecGetArray(y, &py); >>>> int localSize = 0; >>>> VecGetLocalSize(x, &localSize); >>>> std::vector realX(localSize); // I am using c++ to call PETSc >>>> >>>> //retrieve real part >>>> for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); >>>> >>>> // do real matrix-vector multiplication >>>> // realY=A*realX >>>> // here where realY is std::vector >>>> >>>> //put real part back to py >>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>> VecRestoreArray(y,&py); >>>> >>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo wrote: >>>> >>>>> Thanks >>>>> >>>>> On Tuesday, May 26, 2020, Stefano Zampini >>>>> wrote: >>>>> >>>>>> All the solvers/matrices/vectors works for PetscScalar types (i.e. in >>>>>> your case complex) >>>>>> If you need to solve for the real part only, you can duplicate the >>>>>> matrix and call MatRealPart to zero out the imaginary part. But the solve >>>>>> will always run in the complex space >>>>>> You should not be worried about doubling the memory for a matrix >>>>>> (i.e. real and imaginary part) >>>>>> >>>>>> >>>>>> On May 26, 2020, at 11:28 PM, Sam Guo wrote: >>>>>> >>>>>> complex version is needed since matrix sometimes is real and >>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>> for imaginary part((except eigen pairs). >>>>>> >>>>>> On Tuesday, May 26, 2020, Zhang, Hong wrote: >>>>>> >>>>>>> You can build PETSc with complex version, and declare some >>>>>>> variables as 'PETSC_REAL'. >>>>>>> Hong >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* petsc-users on behalf of >>>>>>> Sam Guo >>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>> *To:* PETSc >>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>> >>>>>>> Dear PETSc dev team, >>>>>>> Can I use both real and complex versions together? >>>>>>> >>>>>>> Thanks, >>>>>>> Sam >>>>>>> >>>>>> >>>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Sep 28 17:32:11 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 28 Sep 2020 15:32:11 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: Another idea I am pursuing is to prefix real and complex using some compiler flags. Didn?t figure out how to do this cross platform for shared libs yet. On Monday, September 28, 2020, Matthew Knepley wrote: > On Mon, Sep 28, 2020 at 5:01 PM Sam Guo wrote: > >> Hi Matt, >> Since I use MUMPS as preconditioner, complex uses too much memory if >> my input matrix is real. Ideally if I can compile real and complex into >> different symbols (like MUMPS) , I can load both version without conflict. >> > > What I mean to say is that it would be great if it were as simple as using > two different symbols, but unfortunately the problem is more difficult. I > was trying to use > the example of templates. This would be a very intrusive change no matter > what technology you are using. > > So your main memory usage is from the MUMPS factorization, and you cannot > afford to double that usage? > > You could consider writing a version of AIJ that stores real entries, but > allows complex vector values. It would promote to complex for the row dot > product. > However, you would also have to do the same work for all the solves you do > with MUMPS. > > I think it would be much easier to just decompose your complex work into > real and imaginary parts and use PETSc with real scalars to compute them > separately. > Since you know your matrices have 0 imaginary part, this becomes very > straightforward. > > Thanks, > > Matt > > >> Thanks, >> Sam >> >> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley >> wrote: >> >>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo wrote: >>> >>>> Hi Stefano and PETSc dev team, >>>> I want to try your suggestion to always load complex version of >>>> PETSc but if my input matrix A is real, I want to create shell matrix to >>>> matrix-vector and factorization using real only. >>>> >>> >>> I do not think that will work as you expect. I will try to explain below. >>> >>> >>>> I still need to understand how MatRealPart works. Does it just zero >>>> out the image numerical values or does it delete the image memory? >>>> >>> >>> When we have complex values, we use the "complex" type to allocate and >>> store them. Thus you cannot talk about just the memory to store imaginary >>> parts. >>> MatRealPart sets the imaginary parts of all the matrix elements to zero. >>> >>> >>>> If my input matrix A is real, how do I create a shell matrix to matrix >>>> -vector multiplication y=A*x where A is real, PestcScalar = complex, x and >>>> y are Vec? I notice there is a VecRealPart but it seems it just zeros the >>>> image numerical values. It seems I still have to create a PetscReal >>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>> you comment on it? >>>> >>> >>> What you suggest would mean rewriting the matrix multiplication >>> algorithm by hand after extracting the values. I am not sure if this >>> is really what you want to do. Is the matrix memory really your limiting >>> factor? Even if you tried to do this with templates, the memory >>> from temporaries would be very hard to control. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Sam >>>> >>>> PetscScalar *px = nullptr; >>>> VecGetArrayRead(x, &px); >>>> PetscScalar *py = nullptr; >>>> VecGetArray(y, &py); >>>> int localSize = 0; >>>> VecGetLocalSize(x, &localSize); >>>> std::vector realX(localSize); // I am using c++ to call PETSc >>>> >>>> //retrieve real part >>>> for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); >>>> >>>> // do real matrix-vector multiplication >>>> // realY=A*realX >>>> // here where realY is std::vector >>>> >>>> //put real part back to py >>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>> VecRestoreArray(y,&py); >>>> >>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo wrote: >>>> >>>>> Thanks >>>>> >>>>> On Tuesday, May 26, 2020, Stefano Zampini >>>>> wrote: >>>>> >>>>>> All the solvers/matrices/vectors works for PetscScalar types (i.e. in >>>>>> your case complex) >>>>>> If you need to solve for the real part only, you can duplicate the >>>>>> matrix and call MatRealPart to zero out the imaginary part. But the solve >>>>>> will always run in the complex space >>>>>> You should not be worried about doubling the memory for a matrix >>>>>> (i.e. real and imaginary part) >>>>>> >>>>>> >>>>>> On May 26, 2020, at 11:28 PM, Sam Guo wrote: >>>>>> >>>>>> complex version is needed since matrix sometimes is real and >>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>> for imaginary part((except eigen pairs). >>>>>> >>>>>> On Tuesday, May 26, 2020, Zhang, Hong wrote: >>>>>> >>>>>>> You can build PETSc with complex version, and declare some >>>>>>> variables as 'PETSC_REAL'. >>>>>>> Hong >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* petsc-users on behalf of >>>>>>> Sam Guo >>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>> *To:* PETSc >>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>> >>>>>>> Dear PETSc dev team, >>>>>>> Can I use both real and complex versions together? >>>>>>> >>>>>>> Thanks, >>>>>>> Sam >>>>>>> >>>>>> >>>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Sep 28 18:03:29 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 28 Sep 2020 18:03:29 -0500 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: <3AA42B55-9B2C-430E-9EFB-4806F226F332@petsc.dev> > On Sep 28, 2020, at 5:32 PM, Sam Guo wrote: > > Another idea I am pursuing is to prefix real and complex using some compiler flags. Didn?t figure out how to do this cross platform for shared libs yet. This is very hairy because there are some symbols that can be common and some that are not common. I could never figure out a rational general purpose approach. This is definitely something we need to support in any future refactorization https://gitlab.com/petsc/petsc/-/issues/643 > > On Monday, September 28, 2020, Matthew Knepley > wrote: > On Mon, Sep 28, 2020 at 5:01 PM Sam Guo > wrote: > Hi Matt, > Since I use MUMPS as preconditioner, complex uses too much memory if my input matrix is real. Ideally if I can compile real and complex into different symbols (like MUMPS) , I can load both version without conflict. > > What I mean to say is that it would be great if it were as simple as using two different symbols, but unfortunately the problem is more difficult. I was trying to use > the example of templates. This would be a very intrusive change no matter what technology you are using. > > So your main memory usage is from the MUMPS factorization, and you cannot afford to double that usage? > > You could consider writing a version of AIJ that stores real entries, but allows complex vector values. It would promote to complex for the row dot product. > However, you would also have to do the same work for all the solves you do with MUMPS. > > I think it would be much easier to just decompose your complex work into real and imaginary parts and use PETSc with real scalars to compute them separately. > Since you know your matrices have 0 imaginary part, this becomes very straightforward. > > Thanks, > > Matt > > Thanks, > Sam > > On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley > wrote: > On Mon, Sep 28, 2020 at 3:43 PM Sam Guo > wrote: > Hi Stefano and PETSc dev team, > I want to try your suggestion to always load complex version of PETSc but if my input matrix A is real, I want to create shell matrix to matrix-vector and factorization using real only. > > I do not think that will work as you expect. I will try to explain below. > > I still need to understand how MatRealPart works. Does it just zero out the image numerical values or does it delete the image memory? > > When we have complex values, we use the "complex" type to allocate and store them. Thus you cannot talk about just the memory to store imaginary parts. > MatRealPart sets the imaginary parts of all the matrix elements to zero. > > If my input matrix A is real, how do I create a shell matrix to matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, x and y are Vec? I notice there is a VecRealPart but it seems it just zeros the image numerical values. It seems I still have to create a PetscReal pointer to copy the real part of PetacScalar pointers like following. Can you comment on it? > > What you suggest would mean rewriting the matrix multiplication algorithm by hand after extracting the values. I am not sure if this > is really what you want to do. Is the matrix memory really your limiting factor? Even if you tried to do this with templates, the memory > from temporaries would be very hard to control. > > Thanks, > > Matt > > Thanks, > Sam > > PetscScalar *px = nullptr; > VecGetArrayRead(x, &px); > PetscScalar *py = nullptr; > VecGetArray(y, &py); > int localSize = 0; > VecGetLocalSize(x, &localSize); > std::vector realX(localSize); // I am using c++ to call PETSc > > //retrieve real part > for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); > > // do real matrix-vector multiplication > // realY=A*realX > // here where realY is std::vector > > //put real part back to py > for(int i = 0; i < localSize; i++) pv[i] = realY[i]; > VecRestoreArray(y,&py); > > On Tue, May 26, 2020 at 1:49 PM Sam Guo > wrote: > Thanks > > On Tuesday, May 26, 2020, Stefano Zampini > wrote: > All the solvers/matrices/vectors works for PetscScalar types (i.e. in your case complex) > If you need to solve for the real part only, you can duplicate the matrix and call MatRealPart to zero out the imaginary part. But the solve will always run in the complex space > You should not be worried about doubling the memory for a matrix (i.e. real and imaginary part) > > >> On May 26, 2020, at 11:28 PM, Sam Guo > wrote: >> >> complex version is needed since matrix sometimes is real and sometimes is complex. I want to solve real matrix without allocating memory for imaginary part((except eigen pairs). >> >> On Tuesday, May 26, 2020, Zhang, Hong > wrote: >> You can build PETSc with complex version, and declare some variables as 'PETSC_REAL'. >> Hong >> >> From: petsc-users > on behalf of Sam Guo > >> Sent: Tuesday, May 26, 2020 1:00 PM >> To: PETSc > >> Subject: [petsc-users] using real and complex together >> >> Dear PETSc dev team, >> Can I use both real and complex versions together? >> >> Thanks, >> Sam > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 28 18:07:41 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Sep 2020 19:07:41 -0400 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: On Mon, Sep 28, 2020 at 6:29 PM Sam Guo wrote: > ? I think it would be much easier to just decompose your complex work > into real and imaginary parts and use PETSc with real scalars to compute > them separately. > Since you know your matrices have 0 imaginary part, this becomes very > straightforward.? > > I think this is what I am trying to do. But since I have to provide > matrix-vector operator(since I am using shell matrix), the Vec I receive is > complex. I need to convert complex vec to real one and then convert it > back(that?s the code I shown before). > No, this is not what I am advocating. Keep your vectors real, just keep one for the real part and one for the imaginary part. Then you can just call MatMult() twice with your matrix. Thanks, Matt > On Monday, September 28, 2020, Matthew Knepley wrote: > >> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo wrote: >> >>> Hi Matt, >>> Since I use MUMPS as preconditioner, complex uses too much memory if >>> my input matrix is real. Ideally if I can compile real and complex into >>> different symbols (like MUMPS) , I can load both version without conflict. >>> >> >> What I mean to say is that it would be great if it were as simple as >> using two different symbols, but unfortunately the problem is more >> difficult. I was trying to use >> the example of templates. This would be a very intrusive change no matter >> what technology you are using. >> >> So your main memory usage is from the MUMPS factorization, and you cannot >> afford to double that usage? >> >> You could consider writing a version of AIJ that stores real entries, but >> allows complex vector values. It would promote to complex for the row dot >> product. >> However, you would also have to do the same work for all the solves you >> do with MUMPS. >> >> I think it would be much easier to just decompose your complex work into >> real and imaginary parts and use PETSc with real scalars to compute them >> separately. >> Since you know your matrices have 0 imaginary part, this becomes very >> straightforward. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Sam >>> >>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley >>> wrote: >>> >>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo wrote: >>>> >>>>> Hi Stefano and PETSc dev team, >>>>> I want to try your suggestion to always load complex version of >>>>> PETSc but if my input matrix A is real, I want to create shell matrix to >>>>> matrix-vector and factorization using real only. >>>>> >>>> >>>> I do not think that will work as you expect. I will try to explain >>>> below. >>>> >>>> >>>>> I still need to understand how MatRealPart works. Does it just zero >>>>> out the image numerical values or does it delete the image memory? >>>>> >>>> >>>> When we have complex values, we use the "complex" type to allocate and >>>> store them. Thus you cannot talk about just the memory to store imaginary >>>> parts. >>>> MatRealPart sets the imaginary parts of all the matrix elements to zero. >>>> >>>> >>>>> If my input matrix A is real, how do I create a shell matrix to matrix >>>>> -vector multiplication y=A*x where A is real, PestcScalar = complex, x and >>>>> y are Vec? I notice there is a VecRealPart but it seems it just zeros the >>>>> image numerical values. It seems I still have to create a PetscReal >>>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>>> you comment on it? >>>>> >>>> >>>> What you suggest would mean rewriting the matrix multiplication >>>> algorithm by hand after extracting the values. I am not sure if this >>>> is really what you want to do. Is the matrix memory really your >>>> limiting factor? Even if you tried to do this with templates, the memory >>>> from temporaries would be very hard to control. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> PetscScalar *px = nullptr; >>>>> VecGetArrayRead(x, &px); >>>>> PetscScalar *py = nullptr; >>>>> VecGetArray(y, &py); >>>>> int localSize = 0; >>>>> VecGetLocalSize(x, &localSize); >>>>> std::vector realX(localSize); // I am using c++ to call >>>>> PETSc >>>>> >>>>> //retrieve real part >>>>> for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); >>>>> >>>>> // do real matrix-vector multiplication >>>>> // realY=A*realX >>>>> // here where realY is std::vector >>>>> >>>>> //put real part back to py >>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>> VecRestoreArray(y,&py); >>>>> >>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo wrote: >>>>> >>>>>> Thanks >>>>>> >>>>>> On Tuesday, May 26, 2020, Stefano Zampini >>>>>> wrote: >>>>>> >>>>>>> All the solvers/matrices/vectors works for PetscScalar types (i.e. >>>>>>> in your case complex) >>>>>>> If you need to solve for the real part only, you can duplicate the >>>>>>> matrix and call MatRealPart to zero out the imaginary part. But the solve >>>>>>> will always run in the complex space >>>>>>> You should not be worried about doubling the memory for a matrix >>>>>>> (i.e. real and imaginary part) >>>>>>> >>>>>>> >>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo wrote: >>>>>>> >>>>>>> complex version is needed since matrix sometimes is real and >>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>> for imaginary part((except eigen pairs). >>>>>>> >>>>>>> On Tuesday, May 26, 2020, Zhang, Hong wrote: >>>>>>> >>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>> variables as 'PETSC_REAL'. >>>>>>>> Hong >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> *From:* petsc-users on behalf of >>>>>>>> Sam Guo >>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>> *To:* PETSc >>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>> >>>>>>>> Dear PETSc dev team, >>>>>>>> Can I use both real and complex versions together? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sam >>>>>>>> >>>>>>> >>>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Sep 28 18:44:34 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 28 Sep 2020 16:44:34 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: I want to make sure I understand you. I load real version of PETSc. If my input matrix is complex, just get real and imagine parts of PETSc Vec and do the matrix vector multiplication. Right? On Monday, September 28, 2020, Matthew Knepley wrote: > On Mon, Sep 28, 2020 at 6:29 PM Sam Guo wrote: > >> ? I think it would be much easier to just decompose your complex work >> into real and imaginary parts and use PETSc with real scalars to compute >> them separately. >> Since you know your matrices have 0 imaginary part, this becomes very >> straightforward.? >> >> I think this is what I am trying to do. But since I have to provide >> matrix-vector operator(since I am using shell matrix), the Vec I receive is >> complex. I need to convert complex vec to real one and then convert it >> back(that?s the code I shown before). >> > > No, this is not what I am advocating. Keep your vectors real, just keep > one for the real part and one for the imaginary part. Then you can just > call MatMult() twice with your matrix. > > Thanks, > > Matt > > >> On Monday, September 28, 2020, Matthew Knepley wrote: >> >>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo wrote: >>> >>>> Hi Matt, >>>> Since I use MUMPS as preconditioner, complex uses too much memory if >>>> my input matrix is real. Ideally if I can compile real and complex into >>>> different symbols (like MUMPS) , I can load both version without conflict. >>>> >>> >>> What I mean to say is that it would be great if it were as simple as >>> using two different symbols, but unfortunately the problem is more >>> difficult. I was trying to use >>> the example of templates. This would be a very intrusive change no >>> matter what technology you are using. >>> >>> So your main memory usage is from the MUMPS factorization, and you >>> cannot afford to double that usage? >>> >>> You could consider writing a version of AIJ that stores real entries, >>> but allows complex vector values. It would promote to complex for the row >>> dot product. >>> However, you would also have to do the same work for all the solves you >>> do with MUMPS. >>> >>> I think it would be much easier to just decompose your complex work into >>> real and imaginary parts and use PETSc with real scalars to compute them >>> separately. >>> Since you know your matrices have 0 imaginary part, this becomes very >>> straightforward. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo wrote: >>>>> >>>>>> Hi Stefano and PETSc dev team, >>>>>> I want to try your suggestion to always load complex version of >>>>>> PETSc but if my input matrix A is real, I want to create shell matrix to >>>>>> matrix-vector and factorization using real only. >>>>>> >>>>> >>>>> I do not think that will work as you expect. I will try to explain >>>>> below. >>>>> >>>>> >>>>>> I still need to understand how MatRealPart works. Does it just >>>>>> zero out the image numerical values or does it delete the image memory? >>>>>> >>>>> >>>>> When we have complex values, we use the "complex" type to allocate and >>>>> store them. Thus you cannot talk about just the memory to store imaginary >>>>> parts. >>>>> MatRealPart sets the imaginary parts of all the matrix elements to >>>>> zero. >>>>> >>>>> >>>>>> If my input matrix A is real, how do I create a shell matrix to >>>>>> matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, >>>>>> x and y are Vec? I notice there is a VecRealPart but it seems it just zeros >>>>>> the image numerical values. It seems I still have to create a PetscReal >>>>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>>>> you comment on it? >>>>>> >>>>> >>>>> What you suggest would mean rewriting the matrix multiplication >>>>> algorithm by hand after extracting the values. I am not sure if this >>>>> is really what you want to do. Is the matrix memory really your >>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>> from temporaries would be very hard to control. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>>> PetscScalar *px = nullptr; >>>>>> VecGetArrayRead(x, &px); >>>>>> PetscScalar *py = nullptr; >>>>>> VecGetArray(y, &py); >>>>>> int localSize = 0; >>>>>> VecGetLocalSize(x, &localSize); >>>>>> std::vector realX(localSize); // I am using c++ to call >>>>>> PETSc >>>>>> >>>>>> //retrieve real part >>>>>> for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); >>>>>> >>>>>> // do real matrix-vector multiplication >>>>>> // realY=A*realX >>>>>> // here where realY is std::vector >>>>>> >>>>>> //put real part back to py >>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>> VecRestoreArray(y,&py); >>>>>> >>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Tuesday, May 26, 2020, Stefano Zampini >>>>>>> wrote: >>>>>>> >>>>>>>> All the solvers/matrices/vectors works for PetscScalar types (i.e. >>>>>>>> in your case complex) >>>>>>>> If you need to solve for the real part only, you can duplicate the >>>>>>>> matrix and call MatRealPart to zero out the imaginary part. But the solve >>>>>>>> will always run in the complex space >>>>>>>> You should not be worried about doubling the memory for a matrix >>>>>>>> (i.e. real and imaginary part) >>>>>>>> >>>>>>>> >>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>> for imaginary part((except eigen pairs). >>>>>>>> >>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong wrote: >>>>>>>> >>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> ------------------------------ >>>>>>>>> *From:* petsc-users on behalf >>>>>>>>> of Sam Guo >>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>> *To:* PETSc >>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>> >>>>>>>>> Dear PETSc dev team, >>>>>>>>> Can I use both real and complex versions together? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sam >>>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 28 19:01:17 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Sep 2020 20:01:17 -0400 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: On Mon, Sep 28, 2020 at 7:44 PM Sam Guo wrote: > I want to make sure I understand you. I load real version of PETSc. If my > input matrix is complex, You said that your matrix was real. > just get real and imagine parts of PETSc Vec No, the PETSc Vec would be real. You would have two vectors. Thanks, Matt > and do the matrix vector multiplication. Right? > > On Monday, September 28, 2020, Matthew Knepley wrote: > >> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo wrote: >> >>> ? I think it would be much easier to just decompose your complex work >>> into real and imaginary parts and use PETSc with real scalars to compute >>> them separately. >>> Since you know your matrices have 0 imaginary part, this becomes very >>> straightforward.? >>> >>> I think this is what I am trying to do. But since I have to provide >>> matrix-vector operator(since I am using shell matrix), the Vec I receive is >>> complex. I need to convert complex vec to real one and then convert it >>> back(that?s the code I shown before). >>> >> >> No, this is not what I am advocating. Keep your vectors real, just keep >> one for the real part and one for the imaginary part. Then you can just >> call MatMult() twice with your matrix. >> >> Thanks, >> >> Matt >> >> >>> On Monday, September 28, 2020, Matthew Knepley >>> wrote: >>> >>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo wrote: >>>> >>>>> Hi Matt, >>>>> Since I use MUMPS as preconditioner, complex uses too much memory >>>>> if my input matrix is real. Ideally if I can compile real and complex into >>>>> different symbols (like MUMPS) , I can load both version without conflict. >>>>> >>>> >>>> What I mean to say is that it would be great if it were as simple as >>>> using two different symbols, but unfortunately the problem is more >>>> difficult. I was trying to use >>>> the example of templates. This would be a very intrusive change no >>>> matter what technology you are using. >>>> >>>> So your main memory usage is from the MUMPS factorization, and you >>>> cannot afford to double that usage? >>>> >>>> You could consider writing a version of AIJ that stores real entries, >>>> but allows complex vector values. It would promote to complex for the row >>>> dot product. >>>> However, you would also have to do the same work for all the solves you >>>> do with MUMPS. >>>> >>>> I think it would be much easier to just decompose your complex work >>>> into real and imaginary parts and use PETSc with real scalars to compute >>>> them separately. >>>> Since you know your matrices have 0 imaginary part, this becomes very >>>> straightforward. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> Hi Stefano and PETSc dev team, >>>>>>> I want to try your suggestion to always load complex version of >>>>>>> PETSc but if my input matrix A is real, I want to create shell matrix to >>>>>>> matrix-vector and factorization using real only. >>>>>>> >>>>>> >>>>>> I do not think that will work as you expect. I will try to explain >>>>>> below. >>>>>> >>>>>> >>>>>>> I still need to understand how MatRealPart works. Does it just >>>>>>> zero out the image numerical values or does it delete the image memory? >>>>>>> >>>>>> >>>>>> When we have complex values, we use the "complex" type to allocate >>>>>> and store them. Thus you cannot talk about just the memory to store >>>>>> imaginary parts. >>>>>> MatRealPart sets the imaginary parts of all the matrix elements to >>>>>> zero. >>>>>> >>>>>> >>>>>>> If my input matrix A is real, how do I create a shell matrix to >>>>>>> matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, >>>>>>> x and y are Vec? I notice there is a VecRealPart but it seems it just zeros >>>>>>> the image numerical values. It seems I still have to create a PetscReal >>>>>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>>>>> you comment on it? >>>>>>> >>>>>> >>>>>> What you suggest would mean rewriting the matrix multiplication >>>>>> algorithm by hand after extracting the values. I am not sure if this >>>>>> is really what you want to do. Is the matrix memory really your >>>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>>> from temporaries would be very hard to control. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Sam >>>>>>> >>>>>>> PetscScalar *px = nullptr; >>>>>>> VecGetArrayRead(x, &px); >>>>>>> PetscScalar *py = nullptr; >>>>>>> VecGetArray(y, &py); >>>>>>> int localSize = 0; >>>>>>> VecGetLocalSize(x, &localSize); >>>>>>> std::vector realX(localSize); // I am using c++ to call >>>>>>> PETSc >>>>>>> >>>>>>> //retrieve real part >>>>>>> for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); >>>>>>> >>>>>>> // do real matrix-vector multiplication >>>>>>> // realY=A*realX >>>>>>> // here where realY is std::vector >>>>>>> >>>>>>> //put real part back to py >>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>> VecRestoreArray(y,&py); >>>>>>> >>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>> >>>>>>>>> All the solvers/matrices/vectors works for PetscScalar types (i.e. >>>>>>>>> in your case complex) >>>>>>>>> If you need to solve for the real part only, you can duplicate the >>>>>>>>> matrix and call MatRealPart to zero out the imaginary part. But the solve >>>>>>>>> will always run in the complex space >>>>>>>>> You should not be worried about doubling the memory for a matrix >>>>>>>>> (i.e. real and imaginary part) >>>>>>>>> >>>>>>>>> >>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>>> for imaginary part((except eigen pairs). >>>>>>>>> >>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong wrote: >>>>>>>>> >>>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>>> Hong >>>>>>>>>> >>>>>>>>>> ------------------------------ >>>>>>>>>> *From:* petsc-users on behalf >>>>>>>>>> of Sam Guo >>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>> *To:* PETSc >>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>> >>>>>>>>>> Dear PETSc dev team, >>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Sep 28 19:06:22 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 28 Sep 2020 17:06:22 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: If I load complex version of PETSc, how can Vec be real? On Monday, September 28, 2020, Matthew Knepley wrote: > On Mon, Sep 28, 2020 at 7:44 PM Sam Guo wrote: > >> I want to make sure I understand you. I load real version of PETSc. If >> my input matrix is complex, > > > You said that your matrix was real. > > >> just get real and imagine parts of PETSc Vec > > > No, the PETSc Vec would be real. You would have two vectors. > > Thanks, > > Matt > > >> and do the matrix vector multiplication. Right? >> >> On Monday, September 28, 2020, Matthew Knepley wrote: >> >>> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo wrote: >>> >>>> ? I think it would be much easier to just decompose your complex work >>>> into real and imaginary parts and use PETSc with real scalars to compute >>>> them separately. >>>> Since you know your matrices have 0 imaginary part, this becomes very >>>> straightforward.? >>>> >>>> I think this is what I am trying to do. But since I have to provide >>>> matrix-vector operator(since I am using shell matrix), the Vec I receive is >>>> complex. I need to convert complex vec to real one and then convert it >>>> back(that?s the code I shown before). >>>> >>> >>> No, this is not what I am advocating. Keep your vectors real, just keep >>> one for the real part and one for the imaginary part. Then you can just >>> call MatMult() twice with your matrix. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> On Monday, September 28, 2020, Matthew Knepley >>>> wrote: >>>> >>>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo wrote: >>>>> >>>>>> Hi Matt, >>>>>> Since I use MUMPS as preconditioner, complex uses too much memory >>>>>> if my input matrix is real. Ideally if I can compile real and complex into >>>>>> different symbols (like MUMPS) , I can load both version without conflict. >>>>>> >>>>> >>>>> What I mean to say is that it would be great if it were as simple as >>>>> using two different symbols, but unfortunately the problem is more >>>>> difficult. I was trying to use >>>>> the example of templates. This would be a very intrusive change no >>>>> matter what technology you are using. >>>>> >>>>> So your main memory usage is from the MUMPS factorization, and you >>>>> cannot afford to double that usage? >>>>> >>>>> You could consider writing a version of AIJ that stores real entries, >>>>> but allows complex vector values. It would promote to complex for the row >>>>> dot product. >>>>> However, you would also have to do the same work for all the solves >>>>> you do with MUMPS. >>>>> >>>>> I think it would be much easier to just decompose your complex work >>>>> into real and imaginary parts and use PETSc with real scalars to compute >>>>> them separately. >>>>> Since you know your matrices have 0 imaginary part, this becomes very >>>>> straightforward. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Stefano and PETSc dev team, >>>>>>>> I want to try your suggestion to always load complex version of >>>>>>>> PETSc but if my input matrix A is real, I want to create shell matrix to >>>>>>>> matrix-vector and factorization using real only. >>>>>>>> >>>>>>> >>>>>>> I do not think that will work as you expect. I will try to explain >>>>>>> below. >>>>>>> >>>>>>> >>>>>>>> I still need to understand how MatRealPart works. Does it just >>>>>>>> zero out the image numerical values or does it delete the image memory? >>>>>>>> >>>>>>> >>>>>>> When we have complex values, we use the "complex" type to allocate >>>>>>> and store them. Thus you cannot talk about just the memory to store >>>>>>> imaginary parts. >>>>>>> MatRealPart sets the imaginary parts of all the matrix elements to >>>>>>> zero. >>>>>>> >>>>>>> >>>>>>>> If my input matrix A is real, how do I create a shell matrix to >>>>>>>> matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, >>>>>>>> x and y are Vec? I notice there is a VecRealPart but it seems it just zeros >>>>>>>> the image numerical values. It seems I still have to create a PetscReal >>>>>>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>>>>>> you comment on it? >>>>>>>> >>>>>>> >>>>>>> What you suggest would mean rewriting the matrix multiplication >>>>>>> algorithm by hand after extracting the values. I am not sure if this >>>>>>> is really what you want to do. Is the matrix memory really your >>>>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>>>> from temporaries would be very hard to control. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Sam >>>>>>>> >>>>>>>> PetscScalar *px = nullptr; >>>>>>>> VecGetArrayRead(x, &px); >>>>>>>> PetscScalar *py = nullptr; >>>>>>>> VecGetArray(y, &py); >>>>>>>> int localSize = 0; >>>>>>>> VecGetLocalSize(x, &localSize); >>>>>>>> std::vector realX(localSize); // I am using c++ to call >>>>>>>> PETSc >>>>>>>> >>>>>>>> //retrieve real part >>>>>>>> for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); >>>>>>>> >>>>>>>> // do real matrix-vector multiplication >>>>>>>> // realY=A*realX >>>>>>>> // here where realY is std::vector >>>>>>>> >>>>>>>> //put real part back to py >>>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>>> VecRestoreArray(y,&py); >>>>>>>> >>>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> All the solvers/matrices/vectors works for PetscScalar types >>>>>>>>>> (i.e. in your case complex) >>>>>>>>>> If you need to solve for the real part only, you can duplicate >>>>>>>>>> the matrix and call MatRealPart to zero out the imaginary part. But the >>>>>>>>>> solve will always run in the complex space >>>>>>>>>> You should not be worried about doubling the memory for a matrix >>>>>>>>>> (i.e. real and imaginary part) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>>>> for imaginary part((except eigen pairs). >>>>>>>>>> >>>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong wrote: >>>>>>>>>> >>>>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>>>> Hong >>>>>>>>>>> >>>>>>>>>>> ------------------------------ >>>>>>>>>>> *From:* petsc-users on behalf >>>>>>>>>>> of Sam Guo >>>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>>> *To:* PETSc >>>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>>> >>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 28 19:10:20 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Sep 2020 20:10:20 -0400 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: On Mon, Sep 28, 2020 at 8:06 PM Sam Guo wrote: > If I load complex version of PETSc, how can Vec be real? > I was saying to "use PETSc with real scalars" Thanks, Matt > On Monday, September 28, 2020, Matthew Knepley wrote: > >> On Mon, Sep 28, 2020 at 7:44 PM Sam Guo wrote: >> >>> I want to make sure I understand you. I load real version of PETSc. If >>> my input matrix is complex, >> >> >> You said that your matrix was real. >> >> >>> just get real and imagine parts of PETSc Vec >> >> >> No, the PETSc Vec would be real. You would have two vectors. >> >> Thanks, >> >> Matt >> >> >>> and do the matrix vector multiplication. Right? >>> >>> On Monday, September 28, 2020, Matthew Knepley >>> wrote: >>> >>>> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo wrote: >>>> >>>>> ? I think it would be much easier to just decompose your complex work >>>>> into real and imaginary parts and use PETSc with real scalars to compute >>>>> them separately. >>>>> Since you know your matrices have 0 imaginary part, this becomes very >>>>> straightforward.? >>>>> >>>>> I think this is what I am trying to do. But since I have to provide >>>>> matrix-vector operator(since I am using shell matrix), the Vec I receive is >>>>> complex. I need to convert complex vec to real one and then convert it >>>>> back(that?s the code I shown before). >>>>> >>>> >>>> No, this is not what I am advocating. Keep your vectors real, just keep >>>> one for the real part and one for the imaginary part. Then you can just >>>> call MatMult() twice with your matrix. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> On Monday, September 28, 2020, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> Hi Matt, >>>>>>> Since I use MUMPS as preconditioner, complex uses too much memory >>>>>>> if my input matrix is real. Ideally if I can compile real and complex into >>>>>>> different symbols (like MUMPS) , I can load both version without conflict. >>>>>>> >>>>>> >>>>>> What I mean to say is that it would be great if it were as simple as >>>>>> using two different symbols, but unfortunately the problem is more >>>>>> difficult. I was trying to use >>>>>> the example of templates. This would be a very intrusive change no >>>>>> matter what technology you are using. >>>>>> >>>>>> So your main memory usage is from the MUMPS factorization, and you >>>>>> cannot afford to double that usage? >>>>>> >>>>>> You could consider writing a version of AIJ that stores real entries, >>>>>> but allows complex vector values. It would promote to complex for the row >>>>>> dot product. >>>>>> However, you would also have to do the same work for all the solves >>>>>> you do with MUMPS. >>>>>> >>>>>> I think it would be much easier to just decompose your complex work >>>>>> into real and imaginary parts and use PETSc with real scalars to compute >>>>>> them separately. >>>>>> Since you know your matrices have 0 imaginary part, this becomes very >>>>>> straightforward. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Sam >>>>>>> >>>>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Stefano and PETSc dev team, >>>>>>>>> I want to try your suggestion to always load complex version of >>>>>>>>> PETSc but if my input matrix A is real, I want to create shell matrix to >>>>>>>>> matrix-vector and factorization using real only. >>>>>>>>> >>>>>>>> >>>>>>>> I do not think that will work as you expect. I will try to explain >>>>>>>> below. >>>>>>>> >>>>>>>> >>>>>>>>> I still need to understand how MatRealPart works. Does it just >>>>>>>>> zero out the image numerical values or does it delete the image memory? >>>>>>>>> >>>>>>>> >>>>>>>> When we have complex values, we use the "complex" type to allocate >>>>>>>> and store them. Thus you cannot talk about just the memory to store >>>>>>>> imaginary parts. >>>>>>>> MatRealPart sets the imaginary parts of all the matrix elements to >>>>>>>> zero. >>>>>>>> >>>>>>>> >>>>>>>>> If my input matrix A is real, how do I create a shell matrix to >>>>>>>>> matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, >>>>>>>>> x and y are Vec? I notice there is a VecRealPart but it seems it just zeros >>>>>>>>> the image numerical values. It seems I still have to create a PetscReal >>>>>>>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>>>>>>> you comment on it? >>>>>>>>> >>>>>>>> >>>>>>>> What you suggest would mean rewriting the matrix multiplication >>>>>>>> algorithm by hand after extracting the values. I am not sure if this >>>>>>>> is really what you want to do. Is the matrix memory really your >>>>>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>>>>> from temporaries would be very hard to control. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sam >>>>>>>>> >>>>>>>>> PetscScalar *px = nullptr; >>>>>>>>> VecGetArrayRead(x, &px); >>>>>>>>> PetscScalar *py = nullptr; >>>>>>>>> VecGetArray(y, &py); >>>>>>>>> int localSize = 0; >>>>>>>>> VecGetLocalSize(x, &localSize); >>>>>>>>> std::vector realX(localSize); // I am using c++ to call >>>>>>>>> PETSc >>>>>>>>> >>>>>>>>> //retrieve real part >>>>>>>>> for(int i = 0; i < localSize; i++) realX[i] = >>>>>>>>> PetscRealPart(px[i]); >>>>>>>>> >>>>>>>>> // do real matrix-vector multiplication >>>>>>>>> // realY=A*realX >>>>>>>>> // here where realY is std::vector >>>>>>>>> >>>>>>>>> //put real part back to py >>>>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>>>> VecRestoreArray(y,&py); >>>>>>>>> >>>>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> All the solvers/matrices/vectors works for PetscScalar types >>>>>>>>>>> (i.e. in your case complex) >>>>>>>>>>> If you need to solve for the real part only, you can duplicate >>>>>>>>>>> the matrix and call MatRealPart to zero out the imaginary part. But the >>>>>>>>>>> solve will always run in the complex space >>>>>>>>>>> You should not be worried about doubling the memory for a matrix >>>>>>>>>>> (i.e. real and imaginary part) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>>>>> for imaginary part((except eigen pairs). >>>>>>>>>>> >>>>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>>>>> Hong >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------ >>>>>>>>>>>> *From:* petsc-users on >>>>>>>>>>>> behalf of Sam Guo >>>>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>>>> *To:* PETSc >>>>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>>>> >>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Sam >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Sep 28 19:12:47 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 28 Sep 2020 17:12:47 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: All I want is to solve both real and complex matrix using either real or complex version of PETSc. If I load complex version of PETSc, I waste memory solving real matrix. If I can solve complex matrix using real version of PETSc, I will accept it. On Monday, September 28, 2020, Sam Guo wrote: > If I load complex version of PETSc, how can Vec be real? > > On Monday, September 28, 2020, Matthew Knepley wrote: > >> On Mon, Sep 28, 2020 at 7:44 PM Sam Guo wrote: >> >>> I want to make sure I understand you. I load real version of PETSc. If >>> my input matrix is complex, >> >> >> You said that your matrix was real. >> >> >>> just get real and imagine parts of PETSc Vec >> >> >> No, the PETSc Vec would be real. You would have two vectors. >> >> Thanks, >> >> Matt >> >> >>> and do the matrix vector multiplication. Right? >>> >>> On Monday, September 28, 2020, Matthew Knepley >>> wrote: >>> >>>> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo wrote: >>>> >>>>> ? I think it would be much easier to just decompose your complex work >>>>> into real and imaginary parts and use PETSc with real scalars to compute >>>>> them separately. >>>>> Since you know your matrices have 0 imaginary part, this becomes very >>>>> straightforward.? >>>>> >>>>> I think this is what I am trying to do. But since I have to provide >>>>> matrix-vector operator(since I am using shell matrix), the Vec I receive is >>>>> complex. I need to convert complex vec to real one and then convert it >>>>> back(that?s the code I shown before). >>>>> >>>> >>>> No, this is not what I am advocating. Keep your vectors real, just keep >>>> one for the real part and one for the imaginary part. Then you can just >>>> call MatMult() twice with your matrix. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> On Monday, September 28, 2020, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> Hi Matt, >>>>>>> Since I use MUMPS as preconditioner, complex uses too much memory >>>>>>> if my input matrix is real. Ideally if I can compile real and complex into >>>>>>> different symbols (like MUMPS) , I can load both version without conflict. >>>>>>> >>>>>> >>>>>> What I mean to say is that it would be great if it were as simple as >>>>>> using two different symbols, but unfortunately the problem is more >>>>>> difficult. I was trying to use >>>>>> the example of templates. This would be a very intrusive change no >>>>>> matter what technology you are using. >>>>>> >>>>>> So your main memory usage is from the MUMPS factorization, and you >>>>>> cannot afford to double that usage? >>>>>> >>>>>> You could consider writing a version of AIJ that stores real entries, >>>>>> but allows complex vector values. It would promote to complex for the row >>>>>> dot product. >>>>>> However, you would also have to do the same work for all the solves >>>>>> you do with MUMPS. >>>>>> >>>>>> I think it would be much easier to just decompose your complex work >>>>>> into real and imaginary parts and use PETSc with real scalars to compute >>>>>> them separately. >>>>>> Since you know your matrices have 0 imaginary part, this becomes very >>>>>> straightforward. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Sam >>>>>>> >>>>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Stefano and PETSc dev team, >>>>>>>>> I want to try your suggestion to always load complex version of >>>>>>>>> PETSc but if my input matrix A is real, I want to create shell matrix to >>>>>>>>> matrix-vector and factorization using real only. >>>>>>>>> >>>>>>>> >>>>>>>> I do not think that will work as you expect. I will try to explain >>>>>>>> below. >>>>>>>> >>>>>>>> >>>>>>>>> I still need to understand how MatRealPart works. Does it just >>>>>>>>> zero out the image numerical values or does it delete the image memory? >>>>>>>>> >>>>>>>> >>>>>>>> When we have complex values, we use the "complex" type to allocate >>>>>>>> and store them. Thus you cannot talk about just the memory to store >>>>>>>> imaginary parts. >>>>>>>> MatRealPart sets the imaginary parts of all the matrix elements to >>>>>>>> zero. >>>>>>>> >>>>>>>> >>>>>>>>> If my input matrix A is real, how do I create a shell matrix to >>>>>>>>> matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, >>>>>>>>> x and y are Vec? I notice there is a VecRealPart but it seems it just zeros >>>>>>>>> the image numerical values. It seems I still have to create a PetscReal >>>>>>>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>>>>>>> you comment on it? >>>>>>>>> >>>>>>>> >>>>>>>> What you suggest would mean rewriting the matrix multiplication >>>>>>>> algorithm by hand after extracting the values. I am not sure if this >>>>>>>> is really what you want to do. Is the matrix memory really your >>>>>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>>>>> from temporaries would be very hard to control. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sam >>>>>>>>> >>>>>>>>> PetscScalar *px = nullptr; >>>>>>>>> VecGetArrayRead(x, &px); >>>>>>>>> PetscScalar *py = nullptr; >>>>>>>>> VecGetArray(y, &py); >>>>>>>>> int localSize = 0; >>>>>>>>> VecGetLocalSize(x, &localSize); >>>>>>>>> std::vector realX(localSize); // I am using c++ to call >>>>>>>>> PETSc >>>>>>>>> >>>>>>>>> //retrieve real part >>>>>>>>> for(int i = 0; i < localSize; i++) realX[i] = >>>>>>>>> PetscRealPart(px[i]); >>>>>>>>> >>>>>>>>> // do real matrix-vector multiplication >>>>>>>>> // realY=A*realX >>>>>>>>> // here where realY is std::vector >>>>>>>>> >>>>>>>>> //put real part back to py >>>>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>>>> VecRestoreArray(y,&py); >>>>>>>>> >>>>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> All the solvers/matrices/vectors works for PetscScalar types >>>>>>>>>>> (i.e. in your case complex) >>>>>>>>>>> If you need to solve for the real part only, you can duplicate >>>>>>>>>>> the matrix and call MatRealPart to zero out the imaginary part. But the >>>>>>>>>>> solve will always run in the complex space >>>>>>>>>>> You should not be worried about doubling the memory for a matrix >>>>>>>>>>> (i.e. real and imaginary part) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>>>>> for imaginary part((except eigen pairs). >>>>>>>>>>> >>>>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>>>>> Hong >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------ >>>>>>>>>>>> *From:* petsc-users on >>>>>>>>>>>> behalf of Sam Guo >>>>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>>>> *To:* PETSc >>>>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>>>> >>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Sam >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Mon Sep 28 19:15:03 2020 From: rlmackie862 at gmail.com (Randall Mackie) Date: Mon, 28 Sep 2020 17:15:03 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: <394756BB-2F17-4960-8CC7-C525E19BDB1F@gmail.com> Sam, you can solve a complex matrix using a real version of PETSc by doubling the size of your matrix and spitting out real/imaginary parts. See this paper: https://epubs.siam.org/doi/abs/10.1137/S1064827500372262?mobileUi=0 Randy M. > On Sep 28, 2020, at 5:12 PM, Sam Guo wrote: > > All I want is to solve both real and complex matrix using either real or complex version of PETSc. If I load complex version of PETSc, I waste memory solving real matrix. If I can solve complex matrix using real version of PETSc, I will accept it. > > On Monday, September 28, 2020, Sam Guo > wrote: > If I load complex version of PETSc, how can Vec be real? > > On Monday, September 28, 2020, Matthew Knepley > wrote: > On Mon, Sep 28, 2020 at 7:44 PM Sam Guo > wrote: > I want to make sure I understand you. I load real version of PETSc. If my input matrix is complex, > > You said that your matrix was real. > > just get real and imagine parts of PETSc Vec > > No, the PETSc Vec would be real. You would have two vectors. > > Thanks, > > Matt > > and do the matrix vector multiplication. Right? > > On Monday, September 28, 2020, Matthew Knepley > wrote: > On Mon, Sep 28, 2020 at 6:29 PM Sam Guo > wrote: > ? I think it would be much easier to just decompose your complex work into real and imaginary parts and use PETSc with real scalars to compute them separately. > Since you know your matrices have 0 imaginary part, this becomes very straightforward.? > > I think this is what I am trying to do. But since I have to provide matrix-vector operator(since I am using shell matrix), the Vec I receive is complex. I need to convert complex vec to real one and then convert it back(that?s the code I shown before). > > No, this is not what I am advocating. Keep your vectors real, just keep one for the real part and one for the imaginary part. Then you can just call MatMult() twice with your matrix. > > Thanks, > > Matt > > On Monday, September 28, 2020, Matthew Knepley > wrote: > On Mon, Sep 28, 2020 at 5:01 PM Sam Guo > wrote: > Hi Matt, > Since I use MUMPS as preconditioner, complex uses too much memory if my input matrix is real. Ideally if I can compile real and complex into different symbols (like MUMPS) , I can load both version without conflict. > > What I mean to say is that it would be great if it were as simple as using two different symbols, but unfortunately the problem is more difficult. I was trying to use > the example of templates. This would be a very intrusive change no matter what technology you are using. > > So your main memory usage is from the MUMPS factorization, and you cannot afford to double that usage? > > You could consider writing a version of AIJ that stores real entries, but allows complex vector values. It would promote to complex for the row dot product. > However, you would also have to do the same work for all the solves you do with MUMPS. > > I think it would be much easier to just decompose your complex work into real and imaginary parts and use PETSc with real scalars to compute them separately. > Since you know your matrices have 0 imaginary part, this becomes very straightforward. > > Thanks, > > Matt > > Thanks, > Sam > > On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley > wrote: > On Mon, Sep 28, 2020 at 3:43 PM Sam Guo > wrote: > Hi Stefano and PETSc dev team, > I want to try your suggestion to always load complex version of PETSc but if my input matrix A is real, I want to create shell matrix to matrix-vector and factorization using real only. > > I do not think that will work as you expect. I will try to explain below. > > I still need to understand how MatRealPart works. Does it just zero out the image numerical values or does it delete the image memory? > > When we have complex values, we use the "complex" type to allocate and store them. Thus you cannot talk about just the memory to store imaginary parts. > MatRealPart sets the imaginary parts of all the matrix elements to zero. > > If my input matrix A is real, how do I create a shell matrix to matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, x and y are Vec? I notice there is a VecRealPart but it seems it just zeros the image numerical values. It seems I still have to create a PetscReal pointer to copy the real part of PetacScalar pointers like following. Can you comment on it? > > What you suggest would mean rewriting the matrix multiplication algorithm by hand after extracting the values. I am not sure if this > is really what you want to do. Is the matrix memory really your limiting factor? Even if you tried to do this with templates, the memory > from temporaries would be very hard to control. > > Thanks, > > Matt > > Thanks, > Sam > > PetscScalar *px = nullptr; > VecGetArrayRead(x, &px); > PetscScalar *py = nullptr; > VecGetArray(y, &py); > int localSize = 0; > VecGetLocalSize(x, &localSize); > std::vector realX(localSize); // I am using c++ to call PETSc > > //retrieve real part > for(int i = 0; i < localSize; i++) realX[i] = PetscRealPart(px[i]); > > // do real matrix-vector multiplication > // realY=A*realX > // here where realY is std::vector > > //put real part back to py > for(int i = 0; i < localSize; i++) pv[i] = realY[i]; > VecRestoreArray(y,&py); > > On Tue, May 26, 2020 at 1:49 PM Sam Guo > wrote: > Thanks > > On Tuesday, May 26, 2020, Stefano Zampini > wrote: > All the solvers/matrices/vectors works for PetscScalar types (i.e. in your case complex) > If you need to solve for the real part only, you can duplicate the matrix and call MatRealPart to zero out the imaginary part. But the solve will always run in the complex space > You should not be worried about doubling the memory for a matrix (i.e. real and imaginary part) > > >> On May 26, 2020, at 11:28 PM, Sam Guo > wrote: >> >> complex version is needed since matrix sometimes is real and sometimes is complex. I want to solve real matrix without allocating memory for imaginary part((except eigen pairs). >> >> On Tuesday, May 26, 2020, Zhang, Hong > wrote: >> You can build PETSc with complex version, and declare some variables as 'PETSC_REAL'. >> Hong >> >> From: petsc-users > on behalf of Sam Guo > >> Sent: Tuesday, May 26, 2020 1:00 PM >> To: PETSc > >> Subject: [petsc-users] using real and complex together >> >> Dear PETSc dev team, >> Can I use both real and complex versions together? >> >> Thanks, >> Sam > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 28 19:17:28 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Sep 2020 20:17:28 -0400 Subject: [petsc-users] using real and complex together In-Reply-To: <394756BB-2F17-4960-8CC7-C525E19BDB1F@gmail.com> References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> <394756BB-2F17-4960-8CC7-C525E19BDB1F@gmail.com> Message-ID: On Mon, Sep 28, 2020 at 8:15 PM Randall Mackie wrote: > Sam, you can solve a complex matrix using a real version of PETSc by > doubling the size of your matrix and spitting out real/imaginary parts. > > See this paper: > > https://epubs.siam.org/doi/abs/10.1137/S1064827500372262?mobileUi=0 > Thanks Randy. Yes, I meant that one. Matt > Randy M. > > > On Sep 28, 2020, at 5:12 PM, Sam Guo wrote: > > All I want is to solve both real and complex matrix using either real or > complex version of PETSc. If I load complex version of PETSc, I waste > memory solving real matrix. If I can solve complex matrix using real > version of PETSc, I will accept it. > > On Monday, September 28, 2020, Sam Guo wrote: > >> If I load complex version of PETSc, how can Vec be real? >> >> On Monday, September 28, 2020, Matthew Knepley wrote: >> >>> On Mon, Sep 28, 2020 at 7:44 PM Sam Guo wrote: >>> >>>> I want to make sure I understand you. I load real version of PETSc. If >>>> my input matrix is complex, >>> >>> >>> You said that your matrix was real. >>> >>> >>>> just get real and imagine parts of PETSc Vec >>> >>> >>> No, the PETSc Vec would be real. You would have two vectors. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> and do the matrix vector multiplication. Right? >>>> >>>> On Monday, September 28, 2020, Matthew Knepley >>>> wrote: >>>> >>>>> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo wrote: >>>>> >>>>>> ? I think it would be much easier to just decompose your complex >>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>> compute them separately. >>>>>> Since you know your matrices have 0 imaginary part, this becomes very >>>>>> straightforward.? >>>>>> >>>>>> I think this is what I am trying to do. But since I have to provide >>>>>> matrix-vector operator(since I am using shell matrix), the Vec I receive is >>>>>> complex. I need to convert complex vec to real one and then convert it >>>>>> back(that?s the code I shown before). >>>>>> >>>>> >>>>> No, this is not what I am advocating. Keep your vectors real, just >>>>> keep one for the real part and one for the imaginary part. Then you can >>>>> just call MatMult() twice with your matrix. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Matt, >>>>>>>> Since I use MUMPS as preconditioner, complex uses too much >>>>>>>> memory if my input matrix is real. Ideally if I can compile real and >>>>>>>> complex into different symbols (like MUMPS) , I can load both version >>>>>>>> without conflict. >>>>>>>> >>>>>>> >>>>>>> What I mean to say is that it would be great if it were as simple as >>>>>>> using two different symbols, but unfortunately the problem is more >>>>>>> difficult. I was trying to use >>>>>>> the example of templates. This would be a very intrusive change no >>>>>>> matter what technology you are using. >>>>>>> >>>>>>> So your main memory usage is from the MUMPS factorization, and you >>>>>>> cannot afford to double that usage? >>>>>>> >>>>>>> You could consider writing a version of AIJ that stores real >>>>>>> entries, but allows complex vector values. It would promote to complex for >>>>>>> the row dot product. >>>>>>> However, you would also have to do the same work for all the solves >>>>>>> you do with MUMPS. >>>>>>> >>>>>>> I think it would be much easier to just decompose your complex work >>>>>>> into real and imaginary parts and use PETSc with real scalars to compute >>>>>>> them separately. >>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>> very straightforward. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Sam >>>>>>>> >>>>>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Stefano and PETSc dev team, >>>>>>>>>> I want to try your suggestion to always load complex version >>>>>>>>>> of PETSc but if my input matrix A is real, I want to create shell matrix to >>>>>>>>>> matrix-vector and factorization using real only. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I do not think that will work as you expect. I will try to explain >>>>>>>>> below. >>>>>>>>> >>>>>>>>> >>>>>>>>>> I still need to understand how MatRealPart works. Does it just >>>>>>>>>> zero out the image numerical values or does it delete the image memory? >>>>>>>>>> >>>>>>>>> >>>>>>>>> When we have complex values, we use the "complex" type to allocate >>>>>>>>> and store them. Thus you cannot talk about just the memory to store >>>>>>>>> imaginary parts. >>>>>>>>> MatRealPart sets the imaginary parts of all the matrix elements to >>>>>>>>> zero. >>>>>>>>> >>>>>>>>> >>>>>>>>>> If my input matrix A is real, how do I create a shell matrix to >>>>>>>>>> matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, >>>>>>>>>> x and y are Vec? I notice there is a VecRealPart but it seems it just zeros >>>>>>>>>> the image numerical values. It seems I still have to create a PetscReal >>>>>>>>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>>>>>>>> you comment on it? >>>>>>>>>> >>>>>>>>> >>>>>>>>> What you suggest would mean rewriting the matrix multiplication >>>>>>>>> algorithm by hand after extracting the values. I am not sure if this >>>>>>>>> is really what you want to do. Is the matrix memory really your >>>>>>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>>>>>> from temporaries would be very hard to control. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>>> PetscScalar *px = nullptr; >>>>>>>>>> VecGetArrayRead(x, &px); >>>>>>>>>> PetscScalar *py = nullptr; >>>>>>>>>> VecGetArray(y, &py); >>>>>>>>>> int localSize = 0; >>>>>>>>>> VecGetLocalSize(x, &localSize); >>>>>>>>>> std::vector realX(localSize); // I am using c++ to >>>>>>>>>> call PETSc >>>>>>>>>> >>>>>>>>>> //retrieve real part >>>>>>>>>> for(int i = 0; i < localSize; i++) realX[i] = >>>>>>>>>> PetscRealPart(px[i]); >>>>>>>>>> >>>>>>>>>> // do real matrix-vector multiplication >>>>>>>>>> // realY=A*realX >>>>>>>>>> // here where realY is std::vector >>>>>>>>>> >>>>>>>>>> //put real part back to py >>>>>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>>>>> VecRestoreArray(y,&py); >>>>>>>>>> >>>>>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> All the solvers/matrices/vectors works for PetscScalar types >>>>>>>>>>>> (i.e. in your case complex) >>>>>>>>>>>> If you need to solve for the real part only, you can duplicate >>>>>>>>>>>> the matrix and call MatRealPart to zero out the imaginary part. But the >>>>>>>>>>>> solve will always run in the complex space >>>>>>>>>>>> You should not be worried about doubling the memory for a >>>>>>>>>>>> matrix (i.e. real and imaginary part) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>>>>>> for imaginary part((except eigen pairs). >>>>>>>>>>>> >>>>>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>>>>>> Hong >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>> *From:* petsc-users on >>>>>>>>>>>>> behalf of Sam Guo >>>>>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>>>>> *To:* PETSc >>>>>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>>>>> >>>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Sam >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Sep 28 19:20:21 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 28 Sep 2020 17:20:21 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> <394756BB-2F17-4960-8CC7-C525E19BDB1F@gmail.com> Message-ID: Hi Randy and Matt, Thanks a lot. I?ll look into it. BR, Sam On Monday, September 28, 2020, Matthew Knepley wrote: > On Mon, Sep 28, 2020 at 8:15 PM Randall Mackie > wrote: > >> Sam, you can solve a complex matrix using a real version of PETSc by >> doubling the size of your matrix and spitting out real/imaginary parts. >> >> See this paper: >> >> https://epubs.siam.org/doi/abs/10.1137/S1064827500372262?mobileUi=0 >> > > Thanks Randy. Yes, I meant that one. > > Matt > > >> Randy M. >> >> >> On Sep 28, 2020, at 5:12 PM, Sam Guo wrote: >> >> All I want is to solve both real and complex matrix using either real or >> complex version of PETSc. If I load complex version of PETSc, I waste >> memory solving real matrix. If I can solve complex matrix using real >> version of PETSc, I will accept it. >> >> On Monday, September 28, 2020, Sam Guo wrote: >> >>> If I load complex version of PETSc, how can Vec be real? >>> >>> On Monday, September 28, 2020, Matthew Knepley >>> wrote: >>> >>>> On Mon, Sep 28, 2020 at 7:44 PM Sam Guo wrote: >>>> >>>>> I want to make sure I understand you. I load real version of PETSc. >>>>> If my input matrix is complex, >>>> >>>> >>>> You said that your matrix was real. >>>> >>>> >>>>> just get real and imagine parts of PETSc Vec >>>> >>>> >>>> No, the PETSc Vec would be real. You would have two vectors. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> and do the matrix vector multiplication. Right? >>>>> >>>>> On Monday, September 28, 2020, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> ? I think it would be much easier to just decompose your complex >>>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>>> compute them separately. >>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>> very straightforward.? >>>>>>> >>>>>>> I think this is what I am trying to do. But since I have to provide >>>>>>> matrix-vector operator(since I am using shell matrix), the Vec I receive is >>>>>>> complex. I need to convert complex vec to real one and then convert it >>>>>>> back(that?s the code I shown before). >>>>>>> >>>>>> >>>>>> No, this is not what I am advocating. Keep your vectors real, just >>>>>> keep one for the real part and one for the imaginary part. Then you can >>>>>> just call MatMult() twice with your matrix. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Matt, >>>>>>>>> Since I use MUMPS as preconditioner, complex uses too much >>>>>>>>> memory if my input matrix is real. Ideally if I can compile real and >>>>>>>>> complex into different symbols (like MUMPS) , I can load both version >>>>>>>>> without conflict. >>>>>>>>> >>>>>>>> >>>>>>>> What I mean to say is that it would be great if it were as simple >>>>>>>> as using two different symbols, but unfortunately the problem is more >>>>>>>> difficult. I was trying to use >>>>>>>> the example of templates. This would be a very intrusive change no >>>>>>>> matter what technology you are using. >>>>>>>> >>>>>>>> So your main memory usage is from the MUMPS factorization, and you >>>>>>>> cannot afford to double that usage? >>>>>>>> >>>>>>>> You could consider writing a version of AIJ that stores real >>>>>>>> entries, but allows complex vector values. It would promote to complex for >>>>>>>> the row dot product. >>>>>>>> However, you would also have to do the same work for all the solves >>>>>>>> you do with MUMPS. >>>>>>>> >>>>>>>> I think it would be much easier to just decompose your complex work >>>>>>>> into real and imaginary parts and use PETSc with real scalars to compute >>>>>>>> them separately. >>>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>>> very straightforward. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sam >>>>>>>>> >>>>>>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley < >>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Stefano and PETSc dev team, >>>>>>>>>>> I want to try your suggestion to always load complex version >>>>>>>>>>> of PETSc but if my input matrix A is real, I want to create shell matrix to >>>>>>>>>>> matrix-vector and factorization using real only. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I do not think that will work as you expect. I will try to >>>>>>>>>> explain below. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I still need to understand how MatRealPart works. Does it >>>>>>>>>>> just zero out the image numerical values or does it delete the image memory? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> When we have complex values, we use the "complex" type to >>>>>>>>>> allocate and store them. Thus you cannot talk about just the memory to >>>>>>>>>> store imaginary parts. >>>>>>>>>> MatRealPart sets the imaginary parts of all the matrix elements >>>>>>>>>> to zero. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> If my input matrix A is real, how do I create a shell matrix to >>>>>>>>>>> matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, >>>>>>>>>>> x and y are Vec? I notice there is a VecRealPart but it seems it just zeros >>>>>>>>>>> the image numerical values. It seems I still have to create a PetscReal >>>>>>>>>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>>>>>>>>> you comment on it? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What you suggest would mean rewriting the matrix multiplication >>>>>>>>>> algorithm by hand after extracting the values. I am not sure if this >>>>>>>>>> is really what you want to do. Is the matrix memory really your >>>>>>>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>>>>>>> from temporaries would be very hard to control. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>>> PetscScalar *px = nullptr; >>>>>>>>>>> VecGetArrayRead(x, &px); >>>>>>>>>>> PetscScalar *py = nullptr; >>>>>>>>>>> VecGetArray(y, &py); >>>>>>>>>>> int localSize = 0; >>>>>>>>>>> VecGetLocalSize(x, &localSize); >>>>>>>>>>> std::vector realX(localSize); // I am using c++ to >>>>>>>>>>> call PETSc >>>>>>>>>>> >>>>>>>>>>> //retrieve real part >>>>>>>>>>> for(int i = 0; i < localSize; i++) realX[i] = >>>>>>>>>>> PetscRealPart(px[i]); >>>>>>>>>>> >>>>>>>>>>> // do real matrix-vector multiplication >>>>>>>>>>> // realY=A*realX >>>>>>>>>>> // here where realY is std::vector >>>>>>>>>>> >>>>>>>>>>> //put real part back to py >>>>>>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>>>>>> VecRestoreArray(y,&py); >>>>>>>>>>> >>>>>>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> All the solvers/matrices/vectors works for PetscScalar types >>>>>>>>>>>>> (i.e. in your case complex) >>>>>>>>>>>>> If you need to solve for the real part only, you can duplicate >>>>>>>>>>>>> the matrix and call MatRealPart to zero out the imaginary part. But the >>>>>>>>>>>>> solve will always run in the complex space >>>>>>>>>>>>> You should not be worried about doubling the memory for a >>>>>>>>>>>>> matrix (i.e. real and imaginary part) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>>>>>>> for imaginary part((except eigen pairs). >>>>>>>>>>>>> >>>>>>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>>>>>>> Hong >>>>>>>>>>>>>> >>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>> *From:* petsc-users on >>>>>>>>>>>>>> behalf of Sam Guo >>>>>>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>>>>>> *To:* PETSc >>>>>>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>>>>>> >>>>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Sam >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>> experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 28 19:21:42 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Sep 2020 20:21:42 -0400 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> <394756BB-2F17-4960-8CC7-C525E19BDB1F@gmail.com> Message-ID: On Mon, Sep 28, 2020 at 8:20 PM Sam Guo wrote: > Hi Randy and Matt, > Thanks a lot. I?ll look into it. > Another option, with less code for you but some code in PETSc, is to check for a matrix with 0 imaginary part, and then copy it to a real matrix and call real MUMPS. However, in this case, we would have to check for real solves as well. Thanks, Matt > BR, > Sam > > On Monday, September 28, 2020, Matthew Knepley wrote: > >> On Mon, Sep 28, 2020 at 8:15 PM Randall Mackie >> wrote: >> >>> Sam, you can solve a complex matrix using a real version of PETSc by >>> doubling the size of your matrix and spitting out real/imaginary parts. >>> >>> See this paper: >>> >>> https://epubs.siam.org/doi/abs/10.1137/S1064827500372262?mobileUi=0 >>> >> >> Thanks Randy. Yes, I meant that one. >> >> Matt >> >> >>> Randy M. >>> >>> >>> On Sep 28, 2020, at 5:12 PM, Sam Guo wrote: >>> >>> All I want is to solve both real and complex matrix using either real or >>> complex version of PETSc. If I load complex version of PETSc, I waste >>> memory solving real matrix. If I can solve complex matrix using real >>> version of PETSc, I will accept it. >>> >>> On Monday, September 28, 2020, Sam Guo wrote: >>> >>>> If I load complex version of PETSc, how can Vec be real? >>>> >>>> On Monday, September 28, 2020, Matthew Knepley >>>> wrote: >>>> >>>>> On Mon, Sep 28, 2020 at 7:44 PM Sam Guo wrote: >>>>> >>>>>> I want to make sure I understand you. I load real version of PETSc. >>>>>> If my input matrix is complex, >>>>> >>>>> >>>>> You said that your matrix was real. >>>>> >>>>> >>>>>> just get real and imagine parts of PETSc Vec >>>>> >>>>> >>>>> No, the PETSc Vec would be real. You would have two vectors. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> and do the matrix vector multiplication. Right? >>>>>> >>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> ? I think it would be much easier to just decompose your complex >>>>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>>>> compute them separately. >>>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>>> very straightforward.? >>>>>>>> >>>>>>>> I think this is what I am trying to do. But since I have to provide >>>>>>>> matrix-vector operator(since I am using shell matrix), the Vec I receive is >>>>>>>> complex. I need to convert complex vec to real one and then convert it >>>>>>>> back(that?s the code I shown before). >>>>>>>> >>>>>>> >>>>>>> No, this is not what I am advocating. Keep your vectors real, just >>>>>>> keep one for the real part and one for the imaginary part. Then you can >>>>>>> just call MatMult() twice with your matrix. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Matt, >>>>>>>>>> Since I use MUMPS as preconditioner, complex uses too much >>>>>>>>>> memory if my input matrix is real. Ideally if I can compile real and >>>>>>>>>> complex into different symbols (like MUMPS) , I can load both version >>>>>>>>>> without conflict. >>>>>>>>>> >>>>>>>>> >>>>>>>>> What I mean to say is that it would be great if it were as simple >>>>>>>>> as using two different symbols, but unfortunately the problem is more >>>>>>>>> difficult. I was trying to use >>>>>>>>> the example of templates. This would be a very intrusive change no >>>>>>>>> matter what technology you are using. >>>>>>>>> >>>>>>>>> So your main memory usage is from the MUMPS factorization, and you >>>>>>>>> cannot afford to double that usage? >>>>>>>>> >>>>>>>>> You could consider writing a version of AIJ that stores real >>>>>>>>> entries, but allows complex vector values. It would promote to complex for >>>>>>>>> the row dot product. >>>>>>>>> However, you would also have to do the same work for all the >>>>>>>>> solves you do with MUMPS. >>>>>>>>> >>>>>>>>> I think it would be much easier to just decompose your complex >>>>>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>>>>> compute them separately. >>>>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>>>> very straightforward. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley < >>>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Stefano and PETSc dev team, >>>>>>>>>>>> I want to try your suggestion to always load complex version >>>>>>>>>>>> of PETSc but if my input matrix A is real, I want to create shell matrix to >>>>>>>>>>>> matrix-vector and factorization using real only. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I do not think that will work as you expect. I will try to >>>>>>>>>>> explain below. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I still need to understand how MatRealPart works. Does it >>>>>>>>>>>> just zero out the image numerical values or does it delete the image memory? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> When we have complex values, we use the "complex" type to >>>>>>>>>>> allocate and store them. Thus you cannot talk about just the memory to >>>>>>>>>>> store imaginary parts. >>>>>>>>>>> MatRealPart sets the imaginary parts of all the matrix elements >>>>>>>>>>> to zero. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> If my input matrix A is real, how do I create a shell matrix to >>>>>>>>>>>> matrix -vector multiplication y=A*x where A is real, PestcScalar = complex, >>>>>>>>>>>> x and y are Vec? I notice there is a VecRealPart but it seems it just zeros >>>>>>>>>>>> the image numerical values. It seems I still have to create a PetscReal >>>>>>>>>>>> pointer to copy the real part of PetacScalar pointers like following. Can >>>>>>>>>>>> you comment on it? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> What you suggest would mean rewriting the matrix multiplication >>>>>>>>>>> algorithm by hand after extracting the values. I am not sure if this >>>>>>>>>>> is really what you want to do. Is the matrix memory really your >>>>>>>>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>>>>>>>> from temporaries would be very hard to control. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Matt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Sam >>>>>>>>>>>> >>>>>>>>>>>> PetscScalar *px = nullptr; >>>>>>>>>>>> VecGetArrayRead(x, &px); >>>>>>>>>>>> PetscScalar *py = nullptr; >>>>>>>>>>>> VecGetArray(y, &py); >>>>>>>>>>>> int localSize = 0; >>>>>>>>>>>> VecGetLocalSize(x, &localSize); >>>>>>>>>>>> std::vector realX(localSize); // I am using c++ to >>>>>>>>>>>> call PETSc >>>>>>>>>>>> >>>>>>>>>>>> //retrieve real part >>>>>>>>>>>> for(int i = 0; i < localSize; i++) realX[i] = >>>>>>>>>>>> PetscRealPart(px[i]); >>>>>>>>>>>> >>>>>>>>>>>> // do real matrix-vector multiplication >>>>>>>>>>>> // realY=A*realX >>>>>>>>>>>> // here where realY is std::vector >>>>>>>>>>>> >>>>>>>>>>>> //put real part back to py >>>>>>>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>>>>>>> VecRestoreArray(y,&py); >>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> All the solvers/matrices/vectors works for PetscScalar types >>>>>>>>>>>>>> (i.e. in your case complex) >>>>>>>>>>>>>> If you need to solve for the real part only, you can >>>>>>>>>>>>>> duplicate the matrix and call MatRealPart to zero out the imaginary part. >>>>>>>>>>>>>> But the solve will always run in the complex space >>>>>>>>>>>>>> You should not be worried about doubling the memory for a >>>>>>>>>>>>>> matrix (i.e. real and imaginary part) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>>>>>>>> for imaginary part((except eigen pairs). >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>> *From:* petsc-users on >>>>>>>>>>>>>>> behalf of Sam Guo >>>>>>>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>>>>>>> *To:* PETSc >>>>>>>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Sam >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>>> experiments lead. >>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>> >>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Sep 28 19:34:09 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 28 Sep 2020 17:34:09 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> <394756BB-2F17-4960-8CC7-C525E19BDB1F@gmail.com> Message-ID: That sounds like an useful feature for future PETSc. On Monday, September 28, 2020, Matthew Knepley wrote: > On Mon, Sep 28, 2020 at 8:20 PM Sam Guo wrote: > >> Hi Randy and Matt, >> Thanks a lot. I?ll look into it. >> > > Another option, with less code for you but some code in PETSc, is to check > for a matrix with 0 imaginary part, and then > copy it to a real matrix and call real MUMPS. However, in this case, we > would have to check for real solves as well. > > Thanks, > > Matt > > >> BR, >> Sam >> >> On Monday, September 28, 2020, Matthew Knepley wrote: >> >>> On Mon, Sep 28, 2020 at 8:15 PM Randall Mackie >>> wrote: >>> >>>> Sam, you can solve a complex matrix using a real version of PETSc by >>>> doubling the size of your matrix and spitting out real/imaginary parts. >>>> >>>> See this paper: >>>> >>>> https://epubs.siam.org/doi/abs/10.1137/S1064827500372262?mobileUi=0 >>>> >>> >>> Thanks Randy. Yes, I meant that one. >>> >>> Matt >>> >>> >>>> Randy M. >>>> >>>> >>>> On Sep 28, 2020, at 5:12 PM, Sam Guo wrote: >>>> >>>> All I want is to solve both real and complex matrix using either real >>>> or complex version of PETSc. If I load complex version of PETSc, I waste >>>> memory solving real matrix. If I can solve complex matrix using real >>>> version of PETSc, I will accept it. >>>> >>>> On Monday, September 28, 2020, Sam Guo wrote: >>>> >>>>> If I load complex version of PETSc, how can Vec be real? >>>>> >>>>> On Monday, September 28, 2020, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Mon, Sep 28, 2020 at 7:44 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> I want to make sure I understand you. I load real version of PETSc. >>>>>>> If my input matrix is complex, >>>>>> >>>>>> >>>>>> You said that your matrix was real. >>>>>> >>>>>> >>>>>>> just get real and imagine parts of PETSc Vec >>>>>> >>>>>> >>>>>> No, the PETSc Vec would be real. You would have two vectors. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> and do the matrix vector multiplication. Right? >>>>>>> >>>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> ? I think it would be much easier to just decompose your complex >>>>>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>>>>> compute them separately. >>>>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>>>> very straightforward.? >>>>>>>>> >>>>>>>>> I think this is what I am trying to do. But since I have to >>>>>>>>> provide matrix-vector operator(since I am using shell matrix), the Vec I >>>>>>>>> receive is complex. I need to convert complex vec to real one and then >>>>>>>>> convert it back(that?s the code I shown before). >>>>>>>>> >>>>>>>> >>>>>>>> No, this is not what I am advocating. Keep your vectors real, just >>>>>>>> keep one for the real part and one for the imaginary part. Then you can >>>>>>>> just call MatMult() twice with your matrix. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Matt, >>>>>>>>>>> Since I use MUMPS as preconditioner, complex uses too much >>>>>>>>>>> memory if my input matrix is real. Ideally if I can compile real and >>>>>>>>>>> complex into different symbols (like MUMPS) , I can load both version >>>>>>>>>>> without conflict. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What I mean to say is that it would be great if it were as simple >>>>>>>>>> as using two different symbols, but unfortunately the problem is more >>>>>>>>>> difficult. I was trying to use >>>>>>>>>> the example of templates. This would be a very intrusive change >>>>>>>>>> no matter what technology you are using. >>>>>>>>>> >>>>>>>>>> So your main memory usage is from the MUMPS factorization, and >>>>>>>>>> you cannot afford to double that usage? >>>>>>>>>> >>>>>>>>>> You could consider writing a version of AIJ that stores real >>>>>>>>>> entries, but allows complex vector values. It would promote to complex for >>>>>>>>>> the row dot product. >>>>>>>>>> However, you would also have to do the same work for all the >>>>>>>>>> solves you do with MUMPS. >>>>>>>>>> >>>>>>>>>> I think it would be much easier to just decompose your complex >>>>>>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>>>>>> compute them separately. >>>>>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>>>>> very straightforward. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley < >>>>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Stefano and PETSc dev team, >>>>>>>>>>>>> I want to try your suggestion to always load complex >>>>>>>>>>>>> version of PETSc but if my input matrix A is real, I want to create shell >>>>>>>>>>>>> matrix to matrix-vector and factorization using real only. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I do not think that will work as you expect. I will try to >>>>>>>>>>>> explain below. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I still need to understand how MatRealPart works. Does it >>>>>>>>>>>>> just zero out the image numerical values or does it delete the image memory? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> When we have complex values, we use the "complex" type to >>>>>>>>>>>> allocate and store them. Thus you cannot talk about just the memory to >>>>>>>>>>>> store imaginary parts. >>>>>>>>>>>> MatRealPart sets the imaginary parts of all the matrix elements >>>>>>>>>>>> to zero. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> If my input matrix A is real, how do I create a shell matrix >>>>>>>>>>>>> to matrix -vector multiplication y=A*x where A is real, PestcScalar = >>>>>>>>>>>>> complex, x and y are Vec? I notice there is a VecRealPart but it seems it >>>>>>>>>>>>> just zeros the image numerical values. It seems I still have to create a >>>>>>>>>>>>> PetscReal pointer to copy the real part of PetacScalar pointers like >>>>>>>>>>>>> following. Can you comment on it? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> What you suggest would mean rewriting the matrix multiplication >>>>>>>>>>>> algorithm by hand after extracting the values. I am not sure if this >>>>>>>>>>>> is really what you want to do. Is the matrix memory really your >>>>>>>>>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>>>>>>>>> from temporaries would be very hard to control. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Matt >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Sam >>>>>>>>>>>>> >>>>>>>>>>>>> PetscScalar *px = nullptr; >>>>>>>>>>>>> VecGetArrayRead(x, &px); >>>>>>>>>>>>> PetscScalar *py = nullptr; >>>>>>>>>>>>> VecGetArray(y, &py); >>>>>>>>>>>>> int localSize = 0; >>>>>>>>>>>>> VecGetLocalSize(x, &localSize); >>>>>>>>>>>>> std::vector realX(localSize); // I am using c++ to >>>>>>>>>>>>> call PETSc >>>>>>>>>>>>> >>>>>>>>>>>>> //retrieve real part >>>>>>>>>>>>> for(int i = 0; i < localSize; i++) realX[i] = >>>>>>>>>>>>> PetscRealPart(px[i]); >>>>>>>>>>>>> >>>>>>>>>>>>> // do real matrix-vector multiplication >>>>>>>>>>>>> // realY=A*realX >>>>>>>>>>>>> // here where realY is std::vector >>>>>>>>>>>>> >>>>>>>>>>>>> //put real part back to py >>>>>>>>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>>>>>>>> VecRestoreArray(y,&py); >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> All the solvers/matrices/vectors works for PetscScalar types >>>>>>>>>>>>>>> (i.e. in your case complex) >>>>>>>>>>>>>>> If you need to solve for the real part only, you can >>>>>>>>>>>>>>> duplicate the matrix and call MatRealPart to zero out the imaginary part. >>>>>>>>>>>>>>> But the solve will always run in the complex space >>>>>>>>>>>>>>> You should not be worried about doubling the memory for a >>>>>>>>>>>>>>> matrix (i.e. real and imaginary part) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>>>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>>>>>>>>> for imaginary part((except eigen pairs). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>> *From:* petsc-users on >>>>>>>>>>>>>>>> behalf of Sam Guo >>>>>>>>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>>>>>>>> *To:* PETSc >>>>>>>>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Sam >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>> their experiments is infinitely more interesting than any results to which >>>>>>>>>>>> their experiments lead. >>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>> >>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>> experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Sep 28 19:44:52 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 28 Sep 2020 19:44:52 -0500 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> <394756BB-2F17-4960-8CC7-C525E19BDB1F@gmail.com> Message-ID: On Mon, Sep 28, 2020 at 7:21 PM Matthew Knepley wrote: > On Mon, Sep 28, 2020 at 8:20 PM Sam Guo wrote: > >> Hi Randy and Matt, >> Thanks a lot. I?ll look into it. >> > > Another option, with less code for you but some code in PETSc, is to check > for a matrix with 0 imaginary part, and then > copy it to a real matrix and call real MUMPS. However, in this case, we > would have to check for real solves as well. > What do you mean? How to let mumps solve with a real matrix and a complex rhs? > > Thanks, > > Matt > > >> BR, >> Sam >> >> On Monday, September 28, 2020, Matthew Knepley wrote: >> >>> On Mon, Sep 28, 2020 at 8:15 PM Randall Mackie >>> wrote: >>> >>>> Sam, you can solve a complex matrix using a real version of PETSc by >>>> doubling the size of your matrix and spitting out real/imaginary parts. >>>> >>>> See this paper: >>>> >>>> https://epubs.siam.org/doi/abs/10.1137/S1064827500372262?mobileUi=0 >>>> >>> >>> Thanks Randy. Yes, I meant that one. >>> >>> Matt >>> >>> >>>> Randy M. >>>> >>>> >>>> On Sep 28, 2020, at 5:12 PM, Sam Guo wrote: >>>> >>>> All I want is to solve both real and complex matrix using either real >>>> or complex version of PETSc. If I load complex version of PETSc, I waste >>>> memory solving real matrix. If I can solve complex matrix using real >>>> version of PETSc, I will accept it. >>>> >>>> On Monday, September 28, 2020, Sam Guo wrote: >>>> >>>>> If I load complex version of PETSc, how can Vec be real? >>>>> >>>>> On Monday, September 28, 2020, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Mon, Sep 28, 2020 at 7:44 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> I want to make sure I understand you. I load real version of PETSc. >>>>>>> If my input matrix is complex, >>>>>> >>>>>> >>>>>> You said that your matrix was real. >>>>>> >>>>>> >>>>>>> just get real and imagine parts of PETSc Vec >>>>>> >>>>>> >>>>>> No, the PETSc Vec would be real. You would have two vectors. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> and do the matrix vector multiplication. Right? >>>>>>> >>>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> ? I think it would be much easier to just decompose your complex >>>>>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>>>>> compute them separately. >>>>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>>>> very straightforward.? >>>>>>>>> >>>>>>>>> I think this is what I am trying to do. But since I have to >>>>>>>>> provide matrix-vector operator(since I am using shell matrix), the Vec I >>>>>>>>> receive is complex. I need to convert complex vec to real one and then >>>>>>>>> convert it back(that?s the code I shown before). >>>>>>>>> >>>>>>>> >>>>>>>> No, this is not what I am advocating. Keep your vectors real, just >>>>>>>> keep one for the real part and one for the imaginary part. Then you can >>>>>>>> just call MatMult() twice with your matrix. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Matt, >>>>>>>>>>> Since I use MUMPS as preconditioner, complex uses too much >>>>>>>>>>> memory if my input matrix is real. Ideally if I can compile real and >>>>>>>>>>> complex into different symbols (like MUMPS) , I can load both version >>>>>>>>>>> without conflict. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What I mean to say is that it would be great if it were as simple >>>>>>>>>> as using two different symbols, but unfortunately the problem is more >>>>>>>>>> difficult. I was trying to use >>>>>>>>>> the example of templates. This would be a very intrusive change >>>>>>>>>> no matter what technology you are using. >>>>>>>>>> >>>>>>>>>> So your main memory usage is from the MUMPS factorization, and >>>>>>>>>> you cannot afford to double that usage? >>>>>>>>>> >>>>>>>>>> You could consider writing a version of AIJ that stores real >>>>>>>>>> entries, but allows complex vector values. It would promote to complex for >>>>>>>>>> the row dot product. >>>>>>>>>> However, you would also have to do the same work for all the >>>>>>>>>> solves you do with MUMPS. >>>>>>>>>> >>>>>>>>>> I think it would be much easier to just decompose your complex >>>>>>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>>>>>> compute them separately. >>>>>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>>>>> very straightforward. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley < >>>>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Stefano and PETSc dev team, >>>>>>>>>>>>> I want to try your suggestion to always load complex >>>>>>>>>>>>> version of PETSc but if my input matrix A is real, I want to create shell >>>>>>>>>>>>> matrix to matrix-vector and factorization using real only. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I do not think that will work as you expect. I will try to >>>>>>>>>>>> explain below. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I still need to understand how MatRealPart works. Does it >>>>>>>>>>>>> just zero out the image numerical values or does it delete the image memory? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> When we have complex values, we use the "complex" type to >>>>>>>>>>>> allocate and store them. Thus you cannot talk about just the memory to >>>>>>>>>>>> store imaginary parts. >>>>>>>>>>>> MatRealPart sets the imaginary parts of all the matrix elements >>>>>>>>>>>> to zero. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> If my input matrix A is real, how do I create a shell matrix >>>>>>>>>>>>> to matrix -vector multiplication y=A*x where A is real, PestcScalar = >>>>>>>>>>>>> complex, x and y are Vec? I notice there is a VecRealPart but it seems it >>>>>>>>>>>>> just zeros the image numerical values. It seems I still have to create a >>>>>>>>>>>>> PetscReal pointer to copy the real part of PetacScalar pointers like >>>>>>>>>>>>> following. Can you comment on it? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> What you suggest would mean rewriting the matrix multiplication >>>>>>>>>>>> algorithm by hand after extracting the values. I am not sure if this >>>>>>>>>>>> is really what you want to do. Is the matrix memory really your >>>>>>>>>>>> limiting factor? Even if you tried to do this with templates, the memory >>>>>>>>>>>> from temporaries would be very hard to control. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Matt >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Sam >>>>>>>>>>>>> >>>>>>>>>>>>> PetscScalar *px = nullptr; >>>>>>>>>>>>> VecGetArrayRead(x, &px); >>>>>>>>>>>>> PetscScalar *py = nullptr; >>>>>>>>>>>>> VecGetArray(y, &py); >>>>>>>>>>>>> int localSize = 0; >>>>>>>>>>>>> VecGetLocalSize(x, &localSize); >>>>>>>>>>>>> std::vector realX(localSize); // I am using c++ to >>>>>>>>>>>>> call PETSc >>>>>>>>>>>>> >>>>>>>>>>>>> //retrieve real part >>>>>>>>>>>>> for(int i = 0; i < localSize; i++) realX[i] = >>>>>>>>>>>>> PetscRealPart(px[i]); >>>>>>>>>>>>> >>>>>>>>>>>>> // do real matrix-vector multiplication >>>>>>>>>>>>> // realY=A*realX >>>>>>>>>>>>> // here where realY is std::vector >>>>>>>>>>>>> >>>>>>>>>>>>> //put real part back to py >>>>>>>>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>>>>>>>> VecRestoreArray(y,&py); >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> All the solvers/matrices/vectors works for PetscScalar types >>>>>>>>>>>>>>> (i.e. in your case complex) >>>>>>>>>>>>>>> If you need to solve for the real part only, you can >>>>>>>>>>>>>>> duplicate the matrix and call MatRealPart to zero out the imaginary part. >>>>>>>>>>>>>>> But the solve will always run in the complex space >>>>>>>>>>>>>>> You should not be worried about doubling the memory for a >>>>>>>>>>>>>>> matrix (i.e. real and imaginary part) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> complex version is needed since matrix sometimes is real and >>>>>>>>>>>>>>> sometimes is complex. I want to solve real matrix without allocating memory >>>>>>>>>>>>>>> for imaginary part((except eigen pairs). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You can build PETSc with complex version, and declare some >>>>>>>>>>>>>>>> variables as 'PETSC_REAL'. >>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>> *From:* petsc-users on >>>>>>>>>>>>>>>> behalf of Sam Guo >>>>>>>>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>>>>>>>> *To:* PETSc >>>>>>>>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Sam >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>> their experiments is infinitely more interesting than any results to which >>>>>>>>>>>> their experiments lead. >>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>> >>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>> experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 28 20:12:51 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Sep 2020 21:12:51 -0400 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> <394756BB-2F17-4960-8CC7-C525E19BDB1F@gmail.com> Message-ID: On Mon, Sep 28, 2020 at 8:45 PM Junchao Zhang wrote: > On Mon, Sep 28, 2020 at 7:21 PM Matthew Knepley wrote: > >> On Mon, Sep 28, 2020 at 8:20 PM Sam Guo wrote: >> >>> Hi Randy and Matt, >>> Thanks a lot. I?ll look into it. >>> >> >> Another option, with less code for you but some code in PETSc, is to >> check for a matrix with 0 imaginary part, and then >> copy it to a real matrix and call real MUMPS. However, in this case, we >> would have to check for real solves as well. >> > What do you mean? How to let mumps solve with a real matrix and a complex > rhs? > Two solves. Matt > Thanks, >> >> Matt >> >> >>> BR, >>> Sam >>> >>> On Monday, September 28, 2020, Matthew Knepley >>> wrote: >>> >>>> On Mon, Sep 28, 2020 at 8:15 PM Randall Mackie >>>> wrote: >>>> >>>>> Sam, you can solve a complex matrix using a real version of PETSc by >>>>> doubling the size of your matrix and spitting out real/imaginary parts. >>>>> >>>>> See this paper: >>>>> >>>>> https://epubs.siam.org/doi/abs/10.1137/S1064827500372262?mobileUi=0 >>>>> >>>> >>>> Thanks Randy. Yes, I meant that one. >>>> >>>> Matt >>>> >>>> >>>>> Randy M. >>>>> >>>>> >>>>> On Sep 28, 2020, at 5:12 PM, Sam Guo wrote: >>>>> >>>>> All I want is to solve both real and complex matrix using either real >>>>> or complex version of PETSc. If I load complex version of PETSc, I waste >>>>> memory solving real matrix. If I can solve complex matrix using real >>>>> version of PETSc, I will accept it. >>>>> >>>>> On Monday, September 28, 2020, Sam Guo wrote: >>>>> >>>>>> If I load complex version of PETSc, how can Vec be real? >>>>>> >>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Mon, Sep 28, 2020 at 7:44 PM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> I want to make sure I understand you. I load real version of >>>>>>>> PETSc. If my input matrix is complex, >>>>>>> >>>>>>> >>>>>>> You said that your matrix was real. >>>>>>> >>>>>>> >>>>>>>> just get real and imagine parts of PETSc Vec >>>>>>> >>>>>>> >>>>>>> No, the PETSc Vec would be real. You would have two vectors. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> and do the matrix vector multiplication. Right? >>>>>>>> >>>>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Mon, Sep 28, 2020 at 6:29 PM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> ? I think it would be much easier to just decompose your complex >>>>>>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>>>>>> compute them separately. >>>>>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>>>>> very straightforward.? >>>>>>>>>> >>>>>>>>>> I think this is what I am trying to do. But since I have to >>>>>>>>>> provide matrix-vector operator(since I am using shell matrix), the Vec I >>>>>>>>>> receive is complex. I need to convert complex vec to real one and then >>>>>>>>>> convert it back(that?s the code I shown before). >>>>>>>>>> >>>>>>>>> >>>>>>>>> No, this is not what I am advocating. Keep your vectors real, just >>>>>>>>> keep one for the real part and one for the imaginary part. Then you can >>>>>>>>> just call MatMult() twice with your matrix. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Monday, September 28, 2020, Matthew Knepley >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On Mon, Sep 28, 2020 at 5:01 PM Sam Guo >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Matt, >>>>>>>>>>>> Since I use MUMPS as preconditioner, complex uses too much >>>>>>>>>>>> memory if my input matrix is real. Ideally if I can compile real and >>>>>>>>>>>> complex into different symbols (like MUMPS) , I can load both version >>>>>>>>>>>> without conflict. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> What I mean to say is that it would be great if it were as >>>>>>>>>>> simple as using two different symbols, but unfortunately the problem is >>>>>>>>>>> more difficult. I was trying to use >>>>>>>>>>> the example of templates. This would be a very intrusive change >>>>>>>>>>> no matter what technology you are using. >>>>>>>>>>> >>>>>>>>>>> So your main memory usage is from the MUMPS factorization, and >>>>>>>>>>> you cannot afford to double that usage? >>>>>>>>>>> >>>>>>>>>>> You could consider writing a version of AIJ that stores real >>>>>>>>>>> entries, but allows complex vector values. It would promote to complex for >>>>>>>>>>> the row dot product. >>>>>>>>>>> However, you would also have to do the same work for all the >>>>>>>>>>> solves you do with MUMPS. >>>>>>>>>>> >>>>>>>>>>> I think it would be much easier to just decompose your complex >>>>>>>>>>> work into real and imaginary parts and use PETSc with real scalars to >>>>>>>>>>> compute them separately. >>>>>>>>>>> Since you know your matrices have 0 imaginary part, this becomes >>>>>>>>>>> very straightforward. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Matt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Sam >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 28, 2020 at 12:52 PM Matthew Knepley < >>>>>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Sep 28, 2020 at 3:43 PM Sam Guo >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Stefano and PETSc dev team, >>>>>>>>>>>>>> I want to try your suggestion to always load complex >>>>>>>>>>>>>> version of PETSc but if my input matrix A is real, I want to create shell >>>>>>>>>>>>>> matrix to matrix-vector and factorization using real only. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I do not think that will work as you expect. I will try to >>>>>>>>>>>>> explain below. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> I still need to understand how MatRealPart works. Does it >>>>>>>>>>>>>> just zero out the image numerical values or does it delete the image memory? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> When we have complex values, we use the "complex" type to >>>>>>>>>>>>> allocate and store them. Thus you cannot talk about just the memory to >>>>>>>>>>>>> store imaginary parts. >>>>>>>>>>>>> MatRealPart sets the imaginary parts of all the matrix >>>>>>>>>>>>> elements to zero. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> If my input matrix A is real, how do I create a shell matrix >>>>>>>>>>>>>> to matrix -vector multiplication y=A*x where A is real, PestcScalar = >>>>>>>>>>>>>> complex, x and y are Vec? I notice there is a VecRealPart but it seems it >>>>>>>>>>>>>> just zeros the image numerical values. It seems I still have to create a >>>>>>>>>>>>>> PetscReal pointer to copy the real part of PetacScalar pointers like >>>>>>>>>>>>>> following. Can you comment on it? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> What you suggest would mean rewriting the matrix >>>>>>>>>>>>> multiplication algorithm by hand after extracting the values. I am not sure >>>>>>>>>>>>> if this >>>>>>>>>>>>> is really what you want to do. Is the matrix memory really >>>>>>>>>>>>> your limiting factor? Even if you tried to do this with templates, the >>>>>>>>>>>>> memory >>>>>>>>>>>>> from temporaries would be very hard to control. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Matt >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Sam >>>>>>>>>>>>>> >>>>>>>>>>>>>> PetscScalar *px = nullptr; >>>>>>>>>>>>>> VecGetArrayRead(x, &px); >>>>>>>>>>>>>> PetscScalar *py = nullptr; >>>>>>>>>>>>>> VecGetArray(y, &py); >>>>>>>>>>>>>> int localSize = 0; >>>>>>>>>>>>>> VecGetLocalSize(x, &localSize); >>>>>>>>>>>>>> std::vector realX(localSize); // I am using c++ to >>>>>>>>>>>>>> call PETSc >>>>>>>>>>>>>> >>>>>>>>>>>>>> //retrieve real part >>>>>>>>>>>>>> for(int i = 0; i < localSize; i++) realX[i] = >>>>>>>>>>>>>> PetscRealPart(px[i]); >>>>>>>>>>>>>> >>>>>>>>>>>>>> // do real matrix-vector multiplication >>>>>>>>>>>>>> // realY=A*realX >>>>>>>>>>>>>> // here where realY is std::vector >>>>>>>>>>>>>> >>>>>>>>>>>>>> //put real part back to py >>>>>>>>>>>>>> for(int i = 0; i < localSize; i++) pv[i] = realY[i]; >>>>>>>>>>>>>> VecRestoreArray(y,&py); >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, May 26, 2020 at 1:49 PM Sam Guo < >>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tuesday, May 26, 2020, Stefano Zampini < >>>>>>>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> All the solvers/matrices/vectors works for PetscScalar >>>>>>>>>>>>>>>> types (i.e. in your case complex) >>>>>>>>>>>>>>>> If you need to solve for the real part only, you can >>>>>>>>>>>>>>>> duplicate the matrix and call MatRealPart to zero out the imaginary part. >>>>>>>>>>>>>>>> But the solve will always run in the complex space >>>>>>>>>>>>>>>> You should not be worried about doubling the memory for a >>>>>>>>>>>>>>>> matrix (i.e. real and imaginary part) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On May 26, 2020, at 11:28 PM, Sam Guo < >>>>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> complex version is needed since matrix sometimes is real >>>>>>>>>>>>>>>> and sometimes is complex. I want to solve real matrix without allocating >>>>>>>>>>>>>>>> memory for imaginary part((except eigen pairs). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tuesday, May 26, 2020, Zhang, Hong >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You can build PETSc with complex version, and declare >>>>>>>>>>>>>>>>> some variables as 'PETSC_REAL'. >>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>> *From:* petsc-users on >>>>>>>>>>>>>>>>> behalf of Sam Guo >>>>>>>>>>>>>>>>> *Sent:* Tuesday, May 26, 2020 1:00 PM >>>>>>>>>>>>>>>>> *To:* PETSc >>>>>>>>>>>>>>>>> *Subject:* [petsc-users] using real and complex together >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>>>>>>> Can I use both real and complex versions together? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Sam >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which >>>>>>>>>>>>> their experiments lead. >>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>> >>>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>>> experiments lead. >>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>> >>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Sep 28 20:46:38 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 28 Sep 2020 19:46:38 -0600 Subject: [petsc-users] using real and complex together In-Reply-To: References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> <394756BB-2F17-4960-8CC7-C525E19BDB1F@gmail.com> Message-ID: <87o8lp9wxt.fsf@jedbrown.org> Matthew Knepley writes: >> What do you mean? How to let mumps solve with a real matrix and a complex >> rhs? >> > > Two solves. It can be one solve with two right-hand sides, which should cost about the same as a solve with one right-hand side (because solves are memory bandwidth-limited on the matrix). From gotofd at gmail.com Mon Sep 28 21:09:43 2020 From: gotofd at gmail.com (Ji Zhang) Date: Tue, 29 Sep 2020 10:09:43 +0800 Subject: [petsc-users] how to solve stochastic differential equations using PETSc Message-ID: Dear all, I'm studying physics. Now I have a group of ordinary differential equations (kinetic equations) and I solve them using the Runge-Kutta method implied in PETSc. My question is, is there any solver or efficient method in PETSc that can solve stochastic differential equations (i.e. Langevin equation)? Thanks a lot. Best, Regards, Zhang Ji, PhD student Beijing Computational Science Research Center Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian District, Beijing 100193, China -------------- next part -------------- An HTML attachment was scrubbed... URL: From olz at freenet.de Tue Sep 29 04:25:26 2020 From: olz at freenet.de (Oliver Zacharias) Date: Tue, 29 Sep 2020 11:25:26 +0200 Subject: [petsc-users] Information on SNES Message-ID: Dear Sir or Madam, regarding your SNES-Tool one can specifiy at least three types of accuracy which force the code to terminate. I would like know how you have defined (and implemented) the following quantities: a) "rtol" b) "abstol" The definition of the quantity "stol" is already described on one of your manual pages. Would you also like to present a cite where to find those definitions? Thanks in advance, Oliver Zacharias From Pierre.Seize at onera.fr Tue Sep 29 10:34:05 2020 From: Pierre.Seize at onera.fr (Pierre Seize) Date: Tue, 29 Sep 2020 17:34:05 +0200 Subject: [petsc-users] Cell numbering in DMPlex Message-ID: <97e77532-4407-bd60-0c01-1c325565ff6d@onera.fr> Hello! I have a parallel DMPlex, and I would like to loop on every "real" cell. It seems that the indexing is as such: [cStart (always 0 I think), XXX [ -> actual cell [XXX, cStartGhost [ -> parallel cells, as I have overlap = 1 [cStartGhost, cEndGhost= cEnd[ -> my finite volume boundaries cells. I can get cStart and cEnd with DMPlexGetHeightStratum, and cStartGhost and cEndGhost with DMPlexGetGhostCellStratum. What I want is the bound XXX. Right now, I do loop from cStart to cEnd, and when I find a cell that gives me DMGetLabelValue(dm, "ghost", c, &value) with a positive value I break my loop and take the current cell number as the wanted bound. I am not unsatisfied with this but I wonder if there is a more straightforward way to get what I want. Thank you. Pierre From mfadams at lbl.gov Tue Sep 29 10:58:49 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 29 Sep 2020 11:58:49 -0400 Subject: [petsc-users] Cell numbering in DMPlex In-Reply-To: <97e77532-4407-bd60-0c01-1c325565ff6d@onera.fr> References: <97e77532-4407-bd60-0c01-1c325565ff6d@onera.fr> Message-ID: I believe it is: ierr = DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd);CHKERRQ(ierr); ierr = DMPlexGetGhostCellStratum(dm, &cEndInterior, NULL);CHKERRQ(ierr); On Tue, Sep 29, 2020 at 11:34 AM Pierre Seize wrote: > Hello! > > I have a parallel DMPlex, and I would like to loop on every "real" cell. > It seems that the indexing is as such: > > [cStart (always 0 I think), XXX [ -> actual cell > > [XXX, cStartGhost [ -> parallel cells, as I have overlap = 1 > > [cStartGhost, cEndGhost= cEnd[ -> my finite volume boundaries cells. > > I can get cStart and cEnd with DMPlexGetHeightStratum, and cStartGhost > and cEndGhost with DMPlexGetGhostCellStratum. > > What I want is the bound XXX. Right now, I do loop from cStart to cEnd, > and when I find a cell that gives me DMGetLabelValue(dm, "ghost", c, > &value) with a positive value I break my loop and take the current cell > number as the wanted bound. I am not unsatisfied with this but I wonder > if there is a more straightforward way to get what I want. > > Thank you. > > > Pierre > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.seize at onera.fr Tue Sep 29 12:19:45 2020 From: pierre.seize at onera.fr (Pierre Seize) Date: Tue, 29 Sep 2020 19:19:45 +0200 Subject: [petsc-users] Cell numbering in DMPlex In-Reply-To: References: <97e77532-4407-bd60-0c01-1c325565ff6d@onera.fr> Message-ID: <2376e2abb551aad1d9cde497a1a87329@onera.fr> Thank you for the answer. As I said, DMPlexGetGhostCellStratum gives me the bounds of the domain boundaries (actual domain boundaries), that's what I called cStartGhost in my first message. What I want is the bounds of the actual cells, or the bounds of the overlapping cells. Maybe this will help clarify my question: if I have a 1D mesh with two cells I have this numbering on two procs with overlap = 1: [0] 2-|-0-|-1-| [1] |-1-|-0-|-2 The first proc have one "real" cell (0), one "overlap" cell (1) and one boundary cell (2). Here, DMPlexGetGhostCellStratum would give me cEndInterior = 2, where I want 1. Pierre Le 2020-09-29 17:58, Mark Adams a ?crit : > I believe it is: > > ierr = DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd);CHKERRQ(ierr); > ierr = DMPlexGetGhostCellStratum(dm, &cEndInterior, > NULL);CHKERRQ(ierr); > > On Tue, Sep 29, 2020 at 11:34 AM Pierre Seize > wrote: > >> Hello! >> >> I have a parallel DMPlex, and I would like to loop on every "real" >> cell. >> It seems that the indexing is as such: >> >> [cStart (always 0 I think), XXX [ -> actual cell >> >> [XXX, cStartGhost [ -> parallel cells, as I have overlap = 1 >> >> [cStartGhost, cEndGhost= cEnd[ -> my finite volume boundaries cells. >> >> I can get cStart and cEnd with DMPlexGetHeightStratum, and cStartGhost >> and cEndGhost with DMPlexGetGhostCellStratum. >> >> What I want is the bound XXX. Right now, I do loop from cStart to >> cEnd, >> and when I find a cell that gives me DMGetLabelValue(dm, "ghost", c, >> &value) with a positive value I break my loop and take the current >> cell >> number as the wanted bound. I am not unsatisfied with this but I >> wonder >> if there is a more straightforward way to get what I want. >> >> Thank you. >> >> Pierre -- Pierre Seize From knepley at gmail.com Tue Sep 29 12:41:07 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 29 Sep 2020 13:41:07 -0400 Subject: [petsc-users] Information on SNES In-Reply-To: References: Message-ID: On Tue, Sep 29, 2020 at 9:21 AM Oliver Zacharias wrote: > Dear Sir or Madam, > > regarding your SNES-Tool one can specifiy at least three types of > accuracy which force the code to terminate. I would like know how you > have defined (and implemented) the following quantities: > > a) "rtol" > This means a relative tolerance, or bound on ||F(x_n)||/||F(x_0)|| > b) "abstol" > This mean an absolute tolerance, or a bound on ||F(x_n)|| Thanks, Matt > The definition of the quantity "stol" is already described on one of > your manual pages. Would you also like to present a cite where to find > those definitions? > Thanks in advance, > Oliver Zacharias > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Sep 29 12:44:04 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 29 Sep 2020 13:44:04 -0400 Subject: [petsc-users] how to solve stochastic differential equations using PETSc In-Reply-To: References: Message-ID: On Mon, Sep 28, 2020 at 10:10 PM Ji Zhang wrote: > Dear all, > > I'm studying physics. Now I have a group of ordinary differential > equations (kinetic equations) and I solve them using the Runge-Kutta method > implied in PETSc. > > My question is, is there any solver or efficient method in PETSc that can > solve stochastic differential equations (i.e. Langevin equation)? > We do not currently have anything that will integrate stochastic differential equations directly. I think we have pieces you would need to build the method. Thanks, Matt > Thanks a lot. > > Best, > Regards, > Zhang Ji, PhD student > Beijing Computational Science Research Center > Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian > District, Beijing 100193, China > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Sep 29 12:56:14 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 29 Sep 2020 13:56:14 -0400 Subject: [petsc-users] Cell numbering in DMPlex In-Reply-To: <97e77532-4407-bd60-0c01-1c325565ff6d@onera.fr> References: <97e77532-4407-bd60-0c01-1c325565ff6d@onera.fr> Message-ID: On Tue, Sep 29, 2020 at 11:34 AM Pierre Seize wrote: > Hello! > > I have a parallel DMPlex, and I would like to loop on every "real" cell. > It seems that the indexing is as such: > > [cStart (always 0 I think), XXX [ -> actual cell > > [XXX, cStartGhost [ -> parallel cells, as I have overlap = 1 > > [cStartGhost, cEndGhost= cEnd[ -> my finite volume boundaries cells. > > I can get cStart and cEnd with DMPlexGetHeightStratum, and cStartGhost > and cEndGhost with DMPlexGetGhostCellStratum. > > What I want is the bound XXX. Right now, I do loop from cStart to cEnd, > and when I find a cell that gives me DMGetLabelValue(dm, "ghost", c, > &value) with a positive value I break my loop and take the current cell > number as the wanted bound. I am not unsatisfied with this but I wonder > if there is a more straightforward way to get what I want. > The documentation needs to be improved I see. Okay, first [cStart, cEnd) is the range for all cells, meaning codimension 0 mesh points [cStartGhost, cEndGhost) is the range for FV ghost cells. These are _not_ parallel ghosts. They are phantom cells outside boundary faces, used to enforce boundary conditions. Thus, [cStart, cStartGhost) are the cells in the local Plex However, if you have nonzero overlap, then cells can be shared. Cells that are not owned by this process are listed in the pointSF, DMGetPointSF(dm, &sf) and you can check for them using PetscSFGetGraph(sf, &nroots, &nleaves, &leaves, &remoteLeaves); PetscFindInt(cell, nleaves, leaves, &idx); if (idx >= 0) { } You could, of course, put the leaves in a DMLabel if you think it is easier than the SF check. Thanks, Matt > Thank you. > > > Pierre > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Sep 29 13:04:05 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 29 Sep 2020 13:04:05 -0500 Subject: [petsc-users] Information on SNES In-Reply-To: References: Message-ID: <8C0A08CB-059D-42E7-9A58-2DD06EBD7ECA@petsc.dev> It is documented in the manual page for SNESConvergedDefault > On Sep 29, 2020, at 12:41 PM, Matthew Knepley wrote: > > On Tue, Sep 29, 2020 at 9:21 AM Oliver Zacharias > wrote: > Dear Sir or Madam, > > regarding your SNES-Tool one can specifiy at least three types of > accuracy which force the code to terminate. I would like know how you > have defined (and implemented) the following quantities: > > a) "rtol" > > This means a relative tolerance, or bound on ||F(x_n)||/||F(x_0)|| > > b) "abstol" > > This mean an absolute tolerance, or a bound on ||F(x_n)|| > > Thanks, > > Matt > > The definition of the quantity "stol" is already described on one of > your manual pages. Would you also like to present a cite where to find > those definitions? > Thanks in advance, > Oliver Zacharias > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bui at calcreek.com Tue Sep 29 15:50:35 2020 From: bui at calcreek.com (Thuc Bui) Date: Tue, 29 Sep 2020 13:50:35 -0700 Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows Message-ID: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> Dear Petsc Users, I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without Fortran compiler and without MPI. These successful builds are either linked statically with Blas-Lapack or dynamically with Intel MKL. However, when linking dynamically to MKL, I would have to include 4 more Intel DLLs for building applications. I would like to have Petsc linking statically to MKL to avoid having these extra DLLs, but have not been successful. Here is how I run configure for static link ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 --with-openmp --with-debugging=0 --with-mpi=0 --with-blas-lapack-lib="-Wl,--start-group /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.li b /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. lib -Wl,--end-group" I got this configuration error, ============================================================================ === TESTING: checkLib from config.packages.BlasLapack(config/BuildSystem/config/pack******************* ************************************************************ UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ---------------------------------------------------------------------------- --- You set a value for --with-blaslapack-lib=, but ['-Wl,--start-group', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.l ib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread .lib', '-Wl,--end-group'] cannot be used **************************************************************************** *** Can someone point me to a proper syntax to link to MKL statically? Attached is the configure.log if you need it. I have also tried unsuccessfuly, --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/m kl/lib/mkl_intel_lp64.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. lib" or --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows /mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib -lmkl_intel_thread.lib", or --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" and all gave the similar error message. Many thanks in advance for your help, Thuc Bui Senior R&D Engineer Calabazas Creek Research, Inc. (650) 948-5361 (Office) (650) 948-7562 (Fax) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 3744180 bytes Desc: not available URL: From bsmith at petsc.dev Tue Sep 29 16:33:21 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 29 Sep 2020 16:33:21 -0500 Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows In-Reply-To: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> Message-ID: <6A966758-58B5-4526-A59A-10B9E6AA53D5@petsc.dev> > On Sep 29, 2020, at 3:50 PM, Thuc Bui wrote: > > Dear Petsc Users, > > I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without Fortran compiler and without MPI. These successful builds are either linked statically with Blas-Lapack or dynamically with Intel MKL. However, when linking dynamically to MKL, I would have to include 4 more Intel DLLs for building applications. I would like to have Petsc linking statically to MKL to avoid having these extra DLLs, but have not been successful. Here is how I run configure for static link > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 --with-openmp --with-debugging=0 --with-mpi=0 --with-blas-lapack-lib="-Wl,--start-group /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread.lib -Wl,--end-group" > > I got this configuration error, > > =============================================================================== TESTING: checkLib from config.packages.BlasLapack(config/BuildSystem/config/pack******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > You set a value for --with-blaslapack-lib=, but ['-Wl,--start-group', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.lib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread.lib', '-Wl,--end-group'] cannot be used > ******************************************************************************* > > Can someone point me to a proper syntax to link to MKL statically? Attached is the configure.log if you need it. > > I have also tried unsuccessfuly, > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread.lib" Please send configure.log for the above attempt. Barry > or > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib -lmkl_intel_thread.lib", > or > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" > and all gave the similar error message. > > Many thanks in advance for your help, > Thuc Bui > Senior R&D Engineer > Calabazas Creek Research, Inc. > (650) 948-5361 (Office) > (650) 948-7562 (Fax) > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bui at calcreek.com Tue Sep 29 17:39:16 2020 From: bui at calcreek.com (Thuc Bui) Date: Tue, 29 Sep 2020 15:39:16 -0700 Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows In-Reply-To: <6A966758-58B5-4526-A59A-10B9E6AA53D5@petsc.dev> References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> <6A966758-58B5-4526-A59A-10B9E6AA53D5@petsc.dev> Message-ID: <001201d696b1$5d21c3c0$17654b40$@calcreek.com> Dear Barry, Thank you very much for looking into this. I really appreciate it. Attached is the configure log generated by the first configuration listed below. Best regards, Thuc From: Barry Smith [mailto:bsmith at petsc.dev] Sent: Tuesday, September 29, 2020 2:33 PM To: Thuc Bui Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL for Windows On Sep 29, 2020, at 3:50 PM, Thuc Bui wrote: Dear Petsc Users, I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without Fortran compiler and without MPI. These successful builds are either linked statically with Blas-Lapack or dynamically with Intel MKL. However, when linking dynamically to MKL, I would have to include 4 more Intel DLLs for building applications. I would like to have Petsc linking statically to MKL to avoid having these extra DLLs, but have not been successful. Here is how I run configure for static link ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 --with-openmp --with-debugging=0 --with-mpi=0 --with-blas-lapack-lib="-Wl,--start-group /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.li b /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. lib -Wl,--end-group" I got this configuration error, ============================================================================ === TESTING: checkLib from config.packages.BlasLapack(config/BuildSystem/config/pack******************* ************************************************************ UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ---------------------------------------------------------------------------- --- You set a value for --with-blaslapack-lib=, but ['-Wl,--start-group', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.l ib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread .lib', '-Wl,--end-group'] cannot be used **************************************************************************** *** Can someone point me to a proper syntax to link to MKL statically? Attached is the configure.log if you need it. I have also tried unsuccessfuly, --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/m kl/lib/mkl_intel_lp64.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. lib" Please send configure.log for the above attempt. Barry or --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows /mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib -lmkl_intel_thread.lib", or --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" and all gave the similar error message. Many thanks in advance for your help, Thuc Bui Senior R&D Engineer Calabazas Creek Research, Inc. (650) 948-5361 (Office) (650) 948-7562 (Fax) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 3744180 bytes Desc: not available URL: From bsmith at petsc.dev Tue Sep 29 18:01:17 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 29 Sep 2020 18:01:17 -0500 Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows In-Reply-To: <001201d696b1$5d21c3c0$17654b40$@calcreek.com> References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> <6A966758-58B5-4526-A59A-10B9E6AA53D5@petsc.dev> <001201d696b1$5d21c3c0$17654b40$@calcreek.com> Message-ID: <2A59933A-BE32-4008-87E9-B40286B4B13C@petsc.dev> Sorry, it appears that configure.log file is for a build with -Wl,--start-group in the listing. Please send the configure.log for the case were you just listed the libraries with the -Wl stuff. Barry > On Sep 29, 2020, at 5:39 PM, Thuc Bui wrote: > > Dear Barry, > > Thank you very much for looking into this. I really appreciate it. Attached is the configure log generated by the first configuration listed below. > > Best regards, > Thuc > > From: Barry Smith [mailto:bsmith at petsc.dev ] > Sent: Tuesday, September 29, 2020 2:33 PM > To: Thuc Bui > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL for Windows > > > > > On Sep 29, 2020, at 3:50 PM, Thuc Bui > wrote: > > Dear Petsc Users, > > I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without Fortran compiler and without MPI. These successful builds are either linked statically with Blas-Lapack or dynamically with Intel MKL. However, when linking dynamically to MKL, I would have to include 4 more Intel DLLs for building applications. I would like to have Petsc linking statically to MKL to avoid having these extra DLLs, but have not been successful. Here is how I run configure for static link > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 --with-openmp --with-debugging=0 --with-mpi=0 --with-blas-lapack-lib="-Wl,--start-group /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread.lib -Wl,--end-group" > > I got this configuration error, > > =============================================================================== TESTING: checkLib from config.packages.BlasLapack(config/BuildSystem/config/pack******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > You set a value for --with-blaslapack-lib=, but ['-Wl,--start-group', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.lib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread.lib', '-Wl,--end-group'] cannot be used > ******************************************************************************* > > Can someone point me to a proper syntax to link to MKL statically? Attached is the configure.log if you need it. > > I have also tried unsuccessfuly, > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread.lib" > > Please send configure.log for the above attempt. > > Barry > > > or > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib -lmkl_intel_thread.lib", > or > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" > and all gave the similar error message. > > Many thanks in advance for your help, > Thuc Bui > Senior R&D Engineer > Calabazas Creek Research, Inc. > (650) 948-5361 (Office) > (650) 948-7562 (Fax) > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Sep 29 18:12:00 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 29 Sep 2020 18:12:00 -0500 (CDT) Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows In-Reply-To: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> Message-ID: https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link-line-advisor.html Checking for MS compilers - [MKL2018+] it gives: mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib i.e no -Wl,--start-group [MS compiler most likely does not use this option] So you can try: --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/ mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib" Satish On Tue, 29 Sep 2020, Thuc Bui wrote: > Dear Petsc Users, > > > > I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without > Fortran compiler and without MPI. These successful builds are either linked > statically with Blas-Lapack or dynamically with Intel MKL. However, when > linking dynamically to MKL, I would have to include 4 more Intel DLLs for > building applications. I would like to have Petsc linking statically to MKL > to avoid having these extra DLLs, but have not been successful. Here is how > I run configure for static link > > > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > --with-openmp --with-debugging=0 --with-mpi=0 > --with-blas-lapack-lib="-Wl,--start-group > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.li > b /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > lib -Wl,--end-group" > > > > I got this configuration error, > > > > ============================================================================ > === TESTING: checkLib from > config.packages.BlasLapack(config/BuildSystem/config/pack******************* > ************************************************************ > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > ---------------------------------------------------------------------------- > --- > > You set a value for --with-blaslapack-lib=, but ['-Wl,--start-group', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.l > ib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread > .lib', '-Wl,--end-group'] cannot be used > > **************************************************************************** > *** > > > > Can someone point me to a proper syntax to link to MKL statically? Attached > is the configure.log if you need it. > > > > I have also tried unsuccessfuly, > > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/m > kl/lib/mkl_intel_lp64.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > lib" > > or > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > /mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib -lmkl_intel_thread.lib", > > or > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" > > and all gave the similar error message. > > > > Many thanks in advance for your help, > > Thuc Bui > > Senior R&D Engineer > > Calabazas Creek Research, Inc. > > (650) 948-5361 (Office) > > (650) 948-7562 (Fax) > > > > > > From bui at calcreek.com Tue Sep 29 19:16:31 2020 From: bui at calcreek.com (Thuc Bui) Date: Tue, 29 Sep 2020 17:16:31 -0700 Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows In-Reply-To: References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> Message-ID: <001e01d696be$f37cbfc0$da763f40$@calcreek.com> Hi Barry & Satish, I really appreciate you are getting back to me. Because libiomp5md.lib is not in the mkl/lib directory, I just copy it to the same directory and per your suggestion use the followings ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 --with-openmp --with-debugging=0 --with-mpi=0 --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib libiomp5md.lib" Unfortunately, I got the same error message as below. I am attaching configure log, and hopefully you can see what I am doing icorrectly. Many thanks, Thuc ---------------------------------------------------------------------------- --- You set a value for --with-blaslapack-lib=, but ['-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib', 'mkl_intel_lp64.lib', 'mkl_core.lib', 'mkl_intel_thread.lib', 'libiomp5md.lib'] cannot be used **************************************************************************** *** -----Original Message----- From: Satish Balay [mailto:balay at mcs.anl.gov] Sent: Tuesday, September 29, 2020 4:12 PM To: Thuc Bui Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL for Windows https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link -line-advisor.html Checking for MS compilers - [MKL2018+] it gives: mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib i.e no -Wl,--start-group [MS compiler most likely does not use this option] So you can try: --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/ mkl/lib/ mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib" Satish On Tue, 29 Sep 2020, Thuc Bui wrote: > Dear Petsc Users, > > > > I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without > Fortran compiler and without MPI. These successful builds are either linked > statically with Blas-Lapack or dynamically with Intel MKL. However, when > linking dynamically to MKL, I would have to include 4 more Intel DLLs for > building applications. I would like to have Petsc linking statically to MKL > to avoid having these extra DLLs, but have not been successful. Here is how > I run configure for static link > > > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > --with-openmp --with-debugging=0 --with-mpi=0 > --with-blas-lapack-lib="-Wl,--start-group > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.li > b /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > lib -Wl,--end-group" > > > > I got this configuration error, > > > > ============================================================================ > === TESTING: checkLib from > config.packages.BlasLapack(config/BuildSystem/config/pack******************* > ************************************************************ > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > ---------------------------------------------------------------------------- > --- > > You set a value for --with-blaslapack-lib=, but ['-Wl,--start-group', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.l > ib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread > .lib', '-Wl,--end-group'] cannot be used > > **************************************************************************** > *** > > > > Can someone point me to a proper syntax to link to MKL statically? Attached > is the configure.log if you need it. > > > > I have also tried unsuccessfuly, > > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/m > kl/lib/mkl_intel_lp64.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > lib" > > or > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > /mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib -lmkl_intel_thread.lib", > > or > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" > > and all gave the similar error message. > > > > Many thanks in advance for your help, > > Thuc Bui > > Senior R&D Engineer > > Calabazas Creek Research, Inc. > > (650) 948-5361 (Office) > > (650) 948-7562 (Fax) > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 3744039 bytes Desc: not available URL: From balay at mcs.anl.gov Tue Sep 29 19:27:11 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 29 Sep 2020 19:27:11 -0500 (CDT) Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows In-Reply-To: <001e01d696be$f37cbfc0$da763f40$@calcreek.com> References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> <001e01d696be$f37cbfc0$da763f40$@calcreek.com> Message-ID: Likely the libraries are in /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/intel64 and not /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib Please verify: ls /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/intel64 Then you would need: --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/intel64 mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib libiomp5md.lib" Satish On Tue, 29 Sep 2020, Thuc Bui wrote: > Hi Barry & Satish, > > I really appreciate you are getting back to me. Because libiomp5md.lib is > not in the mkl/lib directory, I just copy it to the same directory and per > your suggestion use the followings > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > --with-openmp --with-debugging=0 --with-mpi=0 > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib > libiomp5md.lib" > > Unfortunately, I got the same error message as below. I am attaching > configure log, and hopefully you can see what I am doing icorrectly. > > Many thanks, > Thuc > > ---------------------------------------------------------------------------- > --- > You set a value for --with-blaslapack-lib=, but > ['-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib', > 'mkl_intel_lp64.lib', 'mkl_core.lib', 'mkl_intel_thread.lib', > 'libiomp5md.lib'] cannot be used > **************************************************************************** > *** > > -----Original Message----- > From: Satish Balay [mailto:balay at mcs.anl.gov] > Sent: Tuesday, September 29, 2020 4:12 PM > To: Thuc Bui > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL > for Windows > > https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link > -line-advisor.html > > Checking for MS compilers - [MKL2018+] it gives: > > mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib > > i.e no -Wl,--start-group [MS compiler most likely does not use this option] > > So you can try: > > --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/ > mkl/lib/ mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib > libiomp5md.lib" > > Satish > > > On Tue, 29 Sep 2020, Thuc Bui wrote: > > > Dear Petsc Users, > > > > > > > > I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without > > Fortran compiler and without MPI. These successful builds are either > linked > > statically with Blas-Lapack or dynamically with Intel MKL. However, when > > linking dynamically to MKL, I would have to include 4 more Intel DLLs for > > building applications. I would like to have Petsc linking statically to > MKL > > to avoid having these extra DLLs, but have not been successful. Here is > how > > I run configure for static link > > > > > > > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > > --with-openmp --with-debugging=0 --with-mpi=0 > > --with-blas-lapack-lib="-Wl,--start-group > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.li > > b /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > > lib -Wl,--end-group" > > > > > > > > I got this configuration error, > > > > > > > > > ============================================================================ > > === TESTING: checkLib from > > > config.packages.BlasLapack(config/BuildSystem/config/pack******************* > > ************************************************************ > > > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > > details): > > > > > ---------------------------------------------------------------------------- > > --- > > > > You set a value for --with-blaslapack-lib=, but ['-Wl,--start-group', > > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.l > > ib', > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', > > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread > > .lib', '-Wl,--end-group'] cannot be used > > > > > **************************************************************************** > > *** > > > > > > > > Can someone point me to a proper syntax to link to MKL statically? > Attached > > is the configure.log if you need it. > > > > > > > > I have also tried unsuccessfuly, > > > > > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/m > > kl/lib/mkl_intel_lp64.lib > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > > lib" > > > > or > > > > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > > /mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib -lmkl_intel_thread.lib", > > > > or > > > > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > > /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" > > > > and all gave the similar error message. > > > > > > > > Many thanks in advance for your help, > > > > Thuc Bui > > > > Senior R&D Engineer > > > > Calabazas Creek Research, Inc. > > > > (650) 948-5361 (Office) > > > > (650) 948-7562 (Fax) > > > > > > > > > > > > > From bui at calcreek.com Tue Sep 29 19:29:36 2020 From: bui at calcreek.com (Thuc Bui) Date: Tue, 29 Sep 2020 17:29:36 -0700 Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> Message-ID: <002501d696c0$c74d56b0$55e80410$@calcreek.com> Hi Barry & Satish, I also did a configuration to have a full path for each library as follows ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 --with-openmp --with-debugging=0 --with-mpi=0 --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/m kl/lib/mkl_intel_lp64.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. lib /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/libiomp5md.lib" and also got the same error message. Attached is the configure.log. UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ---------------------------------------------------------------------------- --- You set a value for --with-blaslapack-lib=, but ['/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64. lib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread .lib', '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/libiomp5md.lib'] cannot be used **************************************************************************** *** Thanks, Thuc -----Original Message----- From: Thuc Bui [mailto:bui at calcreek.com] Sent: Tuesday, September 29, 2020 5:17 PM To: petsc-users Cc: Barry Smith Subject: RE: [petsc-users] Failed to build Petsc statically linked to MKL for Windows Hi Barry & Satish, I really appreciate you are getting back to me. Because libiomp5md.lib is not in the mkl/lib directory, I just copy it to the same directory and per your suggestion use the followings ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 --with-openmp --with-debugging=0 --with-mpi=0 --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib libiomp5md.lib" Unfortunately, I got the same error message as below. I am attaching configure log, and hopefully you can see what I am doing icorrectly. Many thanks, Thuc ---------------------------------------------------------------------------- --- You set a value for --with-blaslapack-lib=, but ['-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib', 'mkl_intel_lp64.lib', 'mkl_core.lib', 'mkl_intel_thread.lib', 'libiomp5md.lib'] cannot be used **************************************************************************** *** -----Original Message----- From: Satish Balay [mailto:balay at mcs.anl.gov] Sent: Tuesday, September 29, 2020 4:12 PM To: Thuc Bui Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL for Windows https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link -line-advisor.html Checking for MS compilers - [MKL2018+] it gives: mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib i.e no -Wl,--start-group [MS compiler most likely does not use this option] So you can try: --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/ mkl/lib/ mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib" Satish On Tue, 29 Sep 2020, Thuc Bui wrote: > Dear Petsc Users, > > > > I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without > Fortran compiler and without MPI. These successful builds are either > linked statically with Blas-Lapack or dynamically with Intel MKL. > However, when linking dynamically to MKL, I would have to include 4 > more Intel DLLs for building applications. I would like to have Petsc > linking statically to MKL to avoid having these extra DLLs, but have > not been successful. Here is how I run configure for static link > > > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > --with-openmp --with-debugging=0 --with-mpi=0 > --with-blas-lapack-lib="-Wl,--start-group > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_l > p64.li b > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.li > b > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > lib -Wl,--end-group" > > > > I got this configuration error, > > > > ====================================================================== > ====== > === TESTING: checkLib from > config.packages.BlasLapack(config/BuildSystem/config/pack************* > ****** > ************************************************************ > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > ---------------------------------------------------------------------- > ------ > --- > > You set a value for --with-blaslapack-lib=, but > ['-Wl,--start-group', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_ > lp64.l > ib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.l > ib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_ > thread .lib', '-Wl,--end-group'] cannot be used > > ********************************************************************** > ****** > *** > > > > Can someone point me to a proper syntax to link to MKL statically? > Attached is the configure.log if you need it. > > > > I have also tried unsuccessfuly, > > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/win > dows/m > kl/lib/mkl_intel_lp64.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.li > b > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > lib" > > or > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/w > indows /mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib > -lmkl_intel_thread.lib", > > or > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/w > indows /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" > > and all gave the similar error message. > > > > Many thanks in advance for your help, > > Thuc Bui > > Senior R&D Engineer > > Calabazas Creek Research, Inc. > > (650) 948-5361 (Office) > > (650) 948-7562 (Fax) > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 3748700 bytes Desc: not available URL: From balay at mcs.anl.gov Tue Sep 29 19:38:30 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 29 Sep 2020 19:38:30 -0500 (CDT) Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows In-Reply-To: <002501d696c0$c74d56b0$55e80410$@calcreek.com> References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> <002501d696c0$c74d56b0$55e80410$@calcreek.com> Message-ID: > LINK : fatal error LNK1181: cannot open input file 'C:\PROGRA~2\INTELS~1\COMPIL~1.279\windows\mkl\lib\mkl_intel_lp64.lib' please verify if the file exists. What do you have for: ls /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.lib Satish On Tue, 29 Sep 2020, Thuc Bui wrote: > Hi Barry & Satish, > > I also did a configuration to have a full path for each library as follows > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > --with-openmp --with-debugging=0 --with-mpi=0 > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/m > kl/lib/mkl_intel_lp64.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/libiomp5md.lib" > > and also got the same error message. Attached is the configure.log. > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > ---------------------------------------------------------------------------- > --- > You set a value for --with-blaslapack-lib=, but > ['/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64. > lib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread > .lib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/libiomp5md.lib'] > cannot be used > **************************************************************************** > *** > > Thanks, > Thuc > > -----Original Message----- > From: Thuc Bui [mailto:bui at calcreek.com] > Sent: Tuesday, September 29, 2020 5:17 PM > To: petsc-users > Cc: Barry Smith > Subject: RE: [petsc-users] Failed to build Petsc statically linked to MKL > for Windows > > Hi Barry & Satish, > > I really appreciate you are getting back to me. Because libiomp5md.lib is > not in the mkl/lib directory, I just copy it to the same directory and per > your suggestion use the followings > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > --with-openmp --with-debugging=0 --with-mpi=0 > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib > libiomp5md.lib" > > Unfortunately, I got the same error message as below. I am attaching > configure log, and hopefully you can see what I am doing icorrectly. > > Many thanks, > Thuc > > ---------------------------------------------------------------------------- > --- > You set a value for --with-blaslapack-lib=, but > ['-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib', > 'mkl_intel_lp64.lib', 'mkl_core.lib', 'mkl_intel_thread.lib', > 'libiomp5md.lib'] cannot be used > **************************************************************************** > *** > > -----Original Message----- > From: Satish Balay [mailto:balay at mcs.anl.gov] > Sent: Tuesday, September 29, 2020 4:12 PM > To: Thuc Bui > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL > for Windows > > https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link > -line-advisor.html > > Checking for MS compilers - [MKL2018+] it gives: > > mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib > > i.e no -Wl,--start-group [MS compiler most likely does not use this option] > > So you can try: > > --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/ > mkl/lib/ mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib > libiomp5md.lib" > > Satish > > > On Tue, 29 Sep 2020, Thuc Bui wrote: > > > Dear Petsc Users, > > > > > > > > I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without > > Fortran compiler and without MPI. These successful builds are either > > linked statically with Blas-Lapack or dynamically with Intel MKL. > > However, when linking dynamically to MKL, I would have to include 4 > > more Intel DLLs for building applications. I would like to have Petsc > > linking statically to MKL to avoid having these extra DLLs, but have > > not been successful. Here is how I run configure for static link > > > > > > > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > > --with-openmp --with-debugging=0 --with-mpi=0 > > --with-blas-lapack-lib="-Wl,--start-group > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_l > > p64.li b > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.li > > b > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > > lib -Wl,--end-group" > > > > > > > > I got this configuration error, > > > > > > > > ====================================================================== > > ====== > > === TESTING: checkLib from > > config.packages.BlasLapack(config/BuildSystem/config/pack************* > > ****** > > ************************************************************ > > > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > > details): > > > > ---------------------------------------------------------------------- > > ------ > > --- > > > > You set a value for --with-blaslapack-lib=, but > > ['-Wl,--start-group', > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_ > > lp64.l > > ib', > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.l > > ib', > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_ > > thread .lib', '-Wl,--end-group'] cannot be used > > > > ********************************************************************** > > ****** > > *** > > > > > > > > Can someone point me to a proper syntax to link to MKL statically? > > Attached is the configure.log if you need it. > > > > > > > > I have also tried unsuccessfuly, > > > > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/win > > dows/m > > kl/lib/mkl_intel_lp64.lib > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.li > > b > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > > lib" > > > > or > > > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/w > > indows /mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib > > -lmkl_intel_thread.lib", > > > > or > > > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/w > > indows /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" > > > > and all gave the similar error message. > > > > > > > > Many thanks in advance for your help, > > > > Thuc Bui > > > > Senior R&D Engineer > > > > Calabazas Creek Research, Inc. > > > > (650) 948-5361 (Office) > > > > (650) 948-7562 (Fax) > > > > > > > > > > > > > From bui at calcreek.com Tue Sep 29 20:58:57 2020 From: bui at calcreek.com (Thuc Bui) Date: Tue, 29 Sep 2020 18:58:57 -0700 Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows In-Reply-To: References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> <002501d696c0$c74d56b0$55e80410$@calcreek.com> Message-ID: <004701d696cd$429e48e0$c7dadaa0$@calcreek.com> Thank you Satish very much. You are so right in catching this error. My bad! Here is the configuration that works for Windows 10 VS 2015 x64 Release build, no Fortran compiler, no MPI and links statically to Intel MKL ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 --with-openmp --with-debugging=0 --with-mpi=0 --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows /mkl/lib/intel64_win mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib libiomp5md.lib" Many thanks again for your help, Thuc -----Original Message----- From: Satish Balay [mailto:balay at mcs.anl.gov] Sent: Tuesday, September 29, 2020 5:39 PM To: Thuc Bui Cc: 'petsc-users' Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL for Windows > LINK : fatal error LNK1181: cannot open input file 'C:\PROGRA~2\INTELS~1\COMPIL~1.279\windows\mkl\lib\mkl_intel_lp64.lib' please verify if the file exists. What do you have for: ls /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.li b Satish On Tue, 29 Sep 2020, Thuc Bui wrote: > Hi Barry & Satish, > > I also did a configuration to have a full path for each library as follows > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > --with-openmp --with-debugging=0 --with-mpi=0 > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/m > kl/lib/mkl_intel_lp64.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > lib > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/libiomp5md.lib" > > and also got the same error message. Attached is the configure.log. > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > ---------------------------------------------------------------------------- > --- > You set a value for --with-blaslapack-lib=, but > ['/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64. > lib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread > .lib', > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/libiomp5md.lib'] > cannot be used > **************************************************************************** > *** > > Thanks, > Thuc > > -----Original Message----- > From: Thuc Bui [mailto:bui at calcreek.com] > Sent: Tuesday, September 29, 2020 5:17 PM > To: petsc-users > Cc: Barry Smith > Subject: RE: [petsc-users] Failed to build Petsc statically linked to MKL > for Windows > > Hi Barry & Satish, > > I really appreciate you are getting back to me. Because libiomp5md.lib is > not in the mkl/lib directory, I just copy it to the same directory and per > your suggestion use the followings > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > --with-openmp --with-debugging=0 --with-mpi=0 > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib > libiomp5md.lib" > > Unfortunately, I got the same error message as below. I am attaching > configure log, and hopefully you can see what I am doing icorrectly. > > Many thanks, > Thuc > > ---------------------------------------------------------------------------- > --- > You set a value for --with-blaslapack-lib=, but > ['-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib', > 'mkl_intel_lp64.lib', 'mkl_core.lib', 'mkl_intel_thread.lib', > 'libiomp5md.lib'] cannot be used > **************************************************************************** > *** > > -----Original Message----- > From: Satish Balay [mailto:balay at mcs.anl.gov] > Sent: Tuesday, September 29, 2020 4:12 PM > To: Thuc Bui > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL > for Windows > > https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link > -line-advisor.html > > Checking for MS compilers - [MKL2018+] it gives: > > mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib > > i.e no -Wl,--start-group [MS compiler most likely does not use this option] > > So you can try: > > --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/ > mkl/lib/ mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib > libiomp5md.lib" > > Satish > > > On Tue, 29 Sep 2020, Thuc Bui wrote: > > > Dear Petsc Users, > > > > > > > > I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without > > Fortran compiler and without MPI. These successful builds are either > > linked statically with Blas-Lapack or dynamically with Intel MKL. > > However, when linking dynamically to MKL, I would have to include 4 > > more Intel DLLs for building applications. I would like to have Petsc > > linking statically to MKL to avoid having these extra DLLs, but have > > not been successful. Here is how I run configure for static link > > > > > > > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > > --with-openmp --with-debugging=0 --with-mpi=0 > > --with-blas-lapack-lib="-Wl,--start-group > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_l > > p64.li b > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.li > > b > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > > lib -Wl,--end-group" > > > > > > > > I got this configuration error, > > > > > > > > ====================================================================== > > ====== > > === TESTING: checkLib from > > config.packages.BlasLapack(config/BuildSystem/config/pack************* > > ****** > > ************************************************************ > > > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > > details): > > > > ---------------------------------------------------------------------- > > ------ > > --- > > > > You set a value for --with-blaslapack-lib=, but > > ['-Wl,--start-group', > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_ > > lp64.l > > ib', > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.l > > ib', > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_ > > thread .lib', '-Wl,--end-group'] cannot be used > > > > ********************************************************************** > > ****** > > *** > > > > > > > > Can someone point me to a proper syntax to link to MKL statically? > > Attached is the configure.log if you need it. > > > > > > > > I have also tried unsuccessfuly, > > > > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/win > > dows/m > > kl/lib/mkl_intel_lp64.lib > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.li > > b > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > > lib" > > > > or > > > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/w > > indows /mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib > > -lmkl_intel_thread.lib", > > > > or > > > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/w > > indows /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" > > > > and all gave the similar error message. > > > > > > > > Many thanks in advance for your help, > > > > Thuc Bui > > > > Senior R&D Engineer > > > > Calabazas Creek Research, Inc. > > > > (650) 948-5361 (Office) > > > > (650) 948-7562 (Fax) > > > > > > > > > > > > > From balay at mcs.anl.gov Tue Sep 29 21:02:03 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 29 Sep 2020 21:02:03 -0500 (CDT) Subject: [petsc-users] Failed to build Petsc statically linked to MKL for Windows In-Reply-To: <004701d696cd$429e48e0$c7dadaa0$@calcreek.com> References: <002e01d696a2$2eaaeb70$8c00c250$@calcreek.com> <002501d696c0$c74d56b0$55e80410$@calcreek.com> <004701d696cd$429e48e0$c7dadaa0$@calcreek.com> Message-ID: Glad it works! Thanks for the update. Satish On Tue, 29 Sep 2020, Thuc Bui wrote: > Thank you Satish very much. You are so right in catching this error. My bad! > Here is the configuration that works for Windows 10 VS 2015 x64 Release > build, no Fortran compiler, no MPI and links statically to Intel MKL > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > --with-openmp --with-debugging=0 --with-mpi=0 > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > /mkl/lib/intel64_win mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib > libiomp5md.lib" > > Many thanks again for your help, > Thuc > > -----Original Message----- > From: Satish Balay [mailto:balay at mcs.anl.gov] > Sent: Tuesday, September 29, 2020 5:39 PM > To: Thuc Bui > Cc: 'petsc-users' > Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL > for Windows > > > LINK : fatal error LNK1181: cannot open input file > 'C:\PROGRA~2\INTELS~1\COMPIL~1.279\windows\mkl\lib\mkl_intel_lp64.lib' > > please verify if the file exists. What do you have for: > > ls > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64.li > b > > Satish > > On Tue, 29 Sep 2020, Thuc Bui wrote: > > > Hi Barry & Satish, > > > > I also did a configuration to have a full path for each library as follows > > > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > > --with-openmp --with-debugging=0 --with-mpi=0 > > > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/m > > kl/lib/mkl_intel_lp64.lib > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > > lib > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/libiomp5md.lib" > > > > and also got the same error message. Attached is the configure.log. > > > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > > details): > > > ---------------------------------------------------------------------------- > > --- > > You set a value for --with-blaslapack-lib=, but > > > ['/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_lp64. > > lib', > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.lib', > > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread > > .lib', > > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/libiomp5md.lib'] > > cannot be used > > > **************************************************************************** > > *** > > > > Thanks, > > Thuc > > > > -----Original Message----- > > From: Thuc Bui [mailto:bui at calcreek.com] > > Sent: Tuesday, September 29, 2020 5:17 PM > > To: petsc-users > > Cc: Barry Smith > > Subject: RE: [petsc-users] Failed to build Petsc statically linked to MKL > > for Windows > > > > Hi Barry & Satish, > > > > I really appreciate you are getting back to me. Because libiomp5md.lib is > > not in the mkl/lib directory, I just copy it to the same directory and per > > your suggestion use the followings > > > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > > --with-openmp --with-debugging=0 --with-mpi=0 > > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows > > /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib > > libiomp5md.lib" > > > > Unfortunately, I got the same error message as below. I am attaching > > configure log, and hopefully you can see what I am doing icorrectly. > > > > Many thanks, > > Thuc > > > > > ---------------------------------------------------------------------------- > > --- > > You set a value for --with-blaslapack-lib=, but > > ['-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib', > > 'mkl_intel_lp64.lib', 'mkl_core.lib', 'mkl_intel_thread.lib', > > 'libiomp5md.lib'] cannot be used > > > **************************************************************************** > > *** > > > > -----Original Message----- > > From: Satish Balay [mailto:balay at mcs.anl.gov] > > Sent: Tuesday, September 29, 2020 4:12 PM > > To: Thuc Bui > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Failed to build Petsc statically linked to MKL > > for Windows > > > > > https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link > > -line-advisor.html > > > > Checking for MS compilers - [MKL2018+] it gives: > > > > mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib > > > > i.e no -Wl,--start-group [MS compiler most likely does not use this > option] > > > > So you can try: > > > > > --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/ > > mkl/lib/ mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib > > libiomp5md.lib" > > > > Satish > > > > > > On Tue, 29 Sep 2020, Thuc Bui wrote: > > > > > Dear Petsc Users, > > > > > > > > > > > > I have successfully built Petsc v3.13.5 for Windows 10 VS2015 without > > > Fortran compiler and without MPI. These successful builds are either > > > linked statically with Blas-Lapack or dynamically with Intel MKL. > > > However, when linking dynamically to MKL, I would have to include 4 > > > more Intel DLLs for building applications. I would like to have Petsc > > > linking statically to MKL to avoid having these extra DLLs, but have > > > not been successful. Here is how I run configure for static link > > > > > > > > > > > > ./configure --with-cc='win32fe cl' --with-cxx='win32fe cl' --with-fc=0 > > > --with-openmp --with-debugging=0 --with-mpi=0 > > > --with-blas-lapack-lib="-Wl,--start-group > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_l > > > p64.li b > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.li > > > b > > > > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > > > lib -Wl,--end-group" > > > > > > > > > > > > I got this configuration error, > > > > > > > > > > > > ====================================================================== > > > ====== > > > === TESTING: checkLib from > > > config.packages.BlasLapack(config/BuildSystem/config/pack************* > > > ****** > > > ************************************************************ > > > > > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log > for > > > details): > > > > > > ---------------------------------------------------------------------- > > > ------ > > > --- > > > > > > You set a value for --with-blaslapack-lib=, but > > > ['-Wl,--start-group', > > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_ > > > lp64.l > > > ib', > > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.l > > > ib', > > > '/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_ > > > thread .lib', '-Wl,--end-group'] cannot be used > > > > > > ********************************************************************** > > > ****** > > > *** > > > > > > > > > > > > Can someone point me to a proper syntax to link to MKL statically? > > > Attached is the configure.log if you need it. > > > > > > > > > > > > I have also tried unsuccessfuly, > > > > > > --with-blas-lapack-lib="/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/win > > > dows/m > > > kl/lib/mkl_intel_lp64.lib > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_core.li > > > b > > > > > > /cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/windows/mkl/lib/mkl_intel_thread. > > > lib" > > > > > > or > > > > > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/w > > > indows /mkl/lib -lmkl_intel_lp64.lib -lmkl_core.lib > > > -lmkl_intel_thread.lib", > > > > > > or > > > > > > --with-blas-lapack-lib="-L/cygdrive/c/PROGRA~2/INTELS~1/COMPIL~1.279/w > > > indows /mkl/lib mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib" > > > > > > and all gave the similar error message. > > > > > > > > > > > > Many thanks in advance for your help, > > > > > > Thuc Bui > > > > > > Senior R&D Engineer > > > > > > Calabazas Creek Research, Inc. > > > > > > (650) 948-5361 (Office) > > > > > > (650) 948-7562 (Fax) > > > > > > > > > > > > > > > > > > > > > From bsmith at petsc.dev Tue Sep 29 22:35:54 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 29 Sep 2020 22:35:54 -0500 Subject: [petsc-users] PETSc 3.14 release available References: Message-ID: We are pleased to announce the release of PETSc version 3.14 at http://www.mcs.anl.gov/petsc/download For those who use the git maint branch this branch is now named release to reflect that it contains the current release. You can switch to the new release with git fetch git checkout release A list of the major changes and updates can be found at http://www.mcs.anl.gov/petsc/documentation/changes/314.html The final update to petsc-3.13 i.e petsc-3.13.6 is also available We recommend upgrading to PETSc 3.14 soon. As always, please report problems to petsc-maint at mcs.anl.gov and ask questions at petsc-users at mcs.anl.gov Thanks again to Satish Balay for managing the PETSc merge request process and managing the entire release process and to Patrick Sanan for reorganizing the users manual to use Sphinx which should lead to much better documentation in the future. This release includes contributions from Alejandro Otero Alexei Colin Alex Lindsay Alp Dener Aron Ahmadia Arthur Soprano Asbj?rn Nilsen Riseth Barry Smith Ben Mather Brandon Whitchurch Chris Eldred Chris Hennekinne David A Ham David Ham Dmitry Karpeev Ed Bueler Fande Kong Florian Wechsung Francesco Ballarin Fr?d?ric Simonis Garth N. Wells Getnet Betrie Gianfranco Costamagna gouarin Hendrik Ranocha Hong Zhang Ivan Yashchuk Jacob Faibussowitsch Jan Blechta Jean-Yves LExcellent Jed Brown Jonathan Guyer Jorge Ca?ardo Alastuey Jose Roman Junchao Zhang Justin Chang Karl Rupp Kaushik Kulkarni Lawrence Mitchell Lisandro Dalcin Lo?c Gouarin Manasi Mark Adams Martin Diehl Matthew Knepley Michael Lange Miguel Arriaga Miklos Homolya Moritz Huck Mos? Giordano Mr. Hong Zhang Mukkund Sunjii Murat Keceli Nathan Collier Nicolas Barral Patrick Farrell Patrick Sanan Pierre Jolivet resundermann Reuben Hill Richard Tran Mills Rodolfo Oliveira Romain Beucher Sajid Ali Satish Balay Scott Kruger Shri Abhyankar Simon Funke Stefano Zampini Tadeu Manoel Thomas Hisch Tianjiao Sun Toby Isaac Tristan Konolige Vaclav Hapla V?clav Hapla Vishwas Rao Zongze Yang and bug reports/patches/proposed improvements received from Adolfo Rodriguez Aidan Hamilton Alfredo Jaramillo Barry Smith Sophie Blondel Boris Kaus Chonglin Zhang Chris Reidy Eda Oktay Ed Bueler Emil Constantinescu Eugene Y Yan Patrick Timothy Greene huabel Jeffrey De'Haven Hyman Jacob Faibussowitsch Jan Busa Sr. Jeremy Theler Jin Chen Jose E. Roman Karl Yang Kenneth L Meyer Kyle B. Thompson Lawrence Mitchell Lisandro Dalcin Magne Rudshaug Maria Zhukova Mark Adams Mark Lohry Mark Olesen Matthew Knepley Matt Otten Miguel Salazar Todd Munson Nick Papior Nidish Oliver Maclaren Patrick Sanan Pierre Jolivet Pierre Seize Robert Nr Nourgaliev Rodolfo Oliveira Robert C Rutherford Sajid Ali Sanjay Govindjee Satish Balay Sergi Molins Rafa Stanimire Tomov Stefano Zampini Tejaswini Gautham Thibaut Appel Thomas Vasileiou Vaclav Hapla Vincent Darrigrand" Xiaoye S. Li Yang Liu Yuke Li Yu, Xiangyu Zhuo Chen Zongze Yang Zulkifli Halim As always, thanks for your support, Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Sep 30 09:14:41 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 30 Sep 2020 09:14:41 -0500 (CDT) Subject: [petsc-users] reset release branch Message-ID: All, I had to force fix the release branch due to a bad merge. If you've pulled on the release branch after the bad merge (before this fix) - and now have the commit 25cac2be9df307cc6f0df502d8399122c3a2b6a3 in it - i.e check with: git branch --contains 25cac2be9df307cc6f0df502d8399122c3a2b6a3 please do: git checkout master git branch -D release git fetch -p git checkout release [Note: As the petsc-3.14 release announcement e-mail indicated - we switched from using 'maint' branch to 'release' branch or release fixes] Satish From Pierre.Seize at onera.fr Wed Sep 30 09:52:42 2020 From: Pierre.Seize at onera.fr (Pierre Seize) Date: Wed, 30 Sep 2020 16:52:42 +0200 Subject: [petsc-users] Memory violation in PetscFVLeastSquaresPseudoInverseSVD_Static Message-ID: Hi, In PetscFVLeastSquaresPseudoInverseSVD_Static, there is ? Brhs = work; ? maxmn = PetscMax(m,n); ? for (j=0; jwork. The size of the work array is computed in PetscFVLeastSquaresSetMaxFaces_LS as: ? ls->maxFaces = maxFaces; ? m?????? = ls->maxFaces; ? n?????? = dim; ? nrhs??? = ls->maxFaces; ? minwork = 3*PetscMin(m,n) + PetscMax(2*PetscMin(m,n), PetscMax(PetscMax(m,n), nrhs)); /* required by LAPACK */ ? ls->workSize = 5*minwork; /* We can afford to be extra generous */ In my example, the used size (maxmn * maxmn) is 81, and the actual size (ls->workSize) is 75, and therefore valgrind complains. Is it because I am missing something, or is it a bug ? Thanks Pierre Seize From hecbarcab at gmail.com Wed Sep 30 09:49:21 2020 From: hecbarcab at gmail.com (=?UTF-8?Q?H=C3=A9ctor_Barreiro_Cabrera?=) Date: Wed, 30 Sep 2020 16:49:21 +0200 Subject: [petsc-users] Assembling MatSeqAICUSPARSE from kernel Message-ID: Greetings fellow PETSc users! I have recently started working on a project that requires assembling and solving a large sparse linear system directly on the GPU using CUDA. I have just discovered that PETSc has support for building and solving such a system through cuSparse (MatSeqAICUSPARSE), which is quite cool because this opens up the possibility of using PETSc's awesome functionalities. However, I have not been able to find information about how (or even if it is possible) to assemble the matrix from data stored in device memory. From the documentation, it seems that vectors (VecSeqCUDA) expose the raw device pointer through VecCUDAGetArrayRead and VecCUDAGetArrayWrite functions which I could use to update the systems' RHS vector, but I couldn't find anything alike for matrices. Is there anything equivalent? Ideally I would like to generate all the assembly information from a kernel, avoiding any data synchronization with the host (except maybe updating the pointers). Is this even possible? Thank you very much for your help! Greetings, Hector Barreiro -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 30 10:55:22 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 30 Sep 2020 10:55:22 -0500 Subject: [petsc-users] Assembling MatSeqAICUSPARSE from kernel In-Reply-To: References: Message-ID: <7601C57A-1E93-4C9E-8694-ED2E3D1806F9@petsc.dev> Hector, You asked at just the right time. Mark Adams has started to implement some code to do this https://gitlab.com/petsc/petsc/-/merge_requests/3137 it is now in the PETSc master branch. It is preliminary and not yet well documented but hopefully if you read through the MR (above) it will give you the main ideas. Please feel free to ask questions on this topic at GitLab or on petsc-dev at mcs.anl.gov we are eager to have users for this material and improve it to make it practical and straightforward. Barry > On Sep 30, 2020, at 9:49 AM, H?ctor Barreiro Cabrera wrote: > > Greetings fellow PETSc users! > > I have recently started working on a project that requires assembling and solving a large sparse linear system directly on the GPU using CUDA. I have just discovered that PETSc has support for building and solving such a system through cuSparse (MatSeqAICUSPARSE), which is quite cool because this opens up the possibility of using PETSc's awesome functionalities. > > However, I have not been able to find information about how (or even if it is possible) to assemble the matrix from data stored in device memory. From the documentation, it seems that vectors (VecSeqCUDA) expose the raw device pointer through VecCUDAGetArrayRead and VecCUDAGetArrayWrite functions which I could use to update the systems' RHS vector, but I couldn't find anything alike for matrices. Is there anything equivalent? > > Ideally I would like to generate all the assembly information from a kernel, avoiding any data synchronization with the host (except maybe updating the pointers). Is this even possible? > > Thank you very much for your help! > > Greetings, > Hector Barreiro -------------- next part -------------- An HTML attachment was scrubbed... URL: From salazardetro1 at llnl.gov Wed Sep 30 11:58:05 2020 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Wed, 30 Sep 2020 16:58:05 +0000 Subject: [petsc-users] Adaptive time stepping for implicit solvers Message-ID: <57C06406-134B-4BCD-80B3-13EA97B6610A@llnl.gov> Hello, I have a heat equation with a right-hand side coefficient that changes sign when a certain condition is met (solution of heat equation equal to a given parameter). I am thinking of modeling the sign change with a smoothed-out Heaviside approximation and let the solver adjust the time step to capture the sudden transition. Given that it is a heat equation, I am interested in using an implicit solver. Which ones in the TS suite also have adaptive capabilities? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 30 12:19:18 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 30 Sep 2020 12:19:18 -0500 Subject: [petsc-users] Adaptive time stepping for implicit solvers In-Reply-To: <57C06406-134B-4BCD-80B3-13EA97B6610A@llnl.gov> References: <57C06406-134B-4BCD-80B3-13EA97B6610A@llnl.gov> Message-ID: You might consider using the Event feature of TS to manage this. It detects when your "indicator function" changes sign and stops the integration right at that point allowing you to change your ODE or do what ever you need. It allows managing discontinuous changes in the ODE without needing smoothing (that messes up the solution). You would write an indicator function that depends on "solution of heat equation equal to a given parameter" I am including Shri and Emil on this message because they understand the event handling the best (Shri wrote it). Barry > On Sep 30, 2020, at 11:58 AM, Salazar De Troya, Miguel via petsc-users wrote: > > Hello, > > I have a heat equation with a right-hand side coefficient that changes sign when a certain condition is met (solution of heat equation equal to a given parameter). I am thinking of modeling the sign change with a smoothed-out Heaviside approximation and let the solver adjust the time step to capture the sudden transition. Given that it is a heat equation, I am interested in using an implicit solver. Which ones in the TS suite also have adaptive capabilities? > > Thanks > Miguel > > Miguel A. Salazar de Troya > Postdoctoral Researcher, Lawrence Livermore National Laboratory > B141 > Rm: 1085-5 > Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hecbarcab at gmail.com Wed Sep 30 13:31:46 2020 From: hecbarcab at gmail.com (=?UTF-8?Q?H=C3=A9ctor_Barreiro_Cabrera?=) Date: Wed, 30 Sep 2020 20:31:46 +0200 Subject: [petsc-users] Assembling MatSeqAICUSPARSE from kernel In-Reply-To: <7601C57A-1E93-4C9E-8694-ED2E3D1806F9@petsc.dev> References: <7601C57A-1E93-4C9E-8694-ED2E3D1806F9@petsc.dev> Message-ID: This is awesome news! I will go over it and see how can I fit this with my code. Thank you! El mi?., 30 sept. 2020 17:55, Barry Smith escribi?: > > Hector, > > You asked at just the right time. Mark Adams has started to implement > some code to do this https://gitlab.com/petsc/petsc/-/merge_requests/3137 it > is now in the PETSc master branch. It is preliminary and not yet well > documented but hopefully if you read through the MR (above) it will give > you the main ideas. > > Please feel free to ask questions on this topic at GitLab or on > petsc-dev at mcs.anl.gov we are eager to have users for this material and > improve it to make it practical and straightforward. > > > Barry > > > On Sep 30, 2020, at 9:49 AM, H?ctor Barreiro Cabrera > wrote: > > Greetings fellow PETSc users! > > I have recently started working on a project that requires assembling and > solving a large sparse linear system directly on the GPU using CUDA. I have > just discovered that PETSc has support for building and solving such a > system through cuSparse (MatSeqAICUSPARSE), which is quite cool because > this opens up the possibility of using PETSc's awesome functionalities. > > However, I have not been able to find information about how (or even if it > is possible) to assemble the matrix from data stored in device memory. From > the documentation, it seems that vectors (VecSeqCUDA) expose the raw device > pointer through VecCUDAGetArrayRead and VecCUDAGetArrayWrite functions > which I could use to update the systems' RHS vector, but I couldn't find > anything alike for matrices. Is there anything equivalent? > > Ideally I would like to generate all the assembly information from a > kernel, avoiding any data synchronization with the host (except maybe > updating the pointers). Is this even possible? > > Thank you very much for your help! > > Greetings, > Hector Barreiro > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Sep 30 13:38:18 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 30 Sep 2020 12:38:18 -0600 Subject: [petsc-users] Memory violation in PetscFVLeastSquaresPseudoInverseSVD_Static In-Reply-To: References: Message-ID: <875z7vp0th.fsf@jedbrown.org> Pierre Seize writes: > Hi, > > In PetscFVLeastSquaresPseudoInverseSVD_Static, there is > ? Brhs = work; > ? maxmn = PetscMax(m,n); > ? for (j=0; j ??? for (i=0; i ? } > where on the calling function, PetscFVComputeGradient_LeastSquares, we > set the arguments m <= numFaces, n <= dim and work <= ls->work. The size > of the work array is computed in PetscFVLeastSquaresSetMaxFaces_LS as: > ? ls->maxFaces = maxFaces; > ? m?????? = ls->maxFaces; > ? n?????? = dim; > ? nrhs??? = ls->maxFaces; > ? minwork = 3*PetscMin(m,n) + PetscMax(2*PetscMin(m,n), > PetscMax(PetscMax(m,n), nrhs)); /* required by LAPACK */ It's totally buggy because this formula is for the argument to dgelss, but the array is being used for a different purpose (to place Brhs). WORK WORK is DOUBLE PRECISION array, dimension (MAX(1,LWORK)) On exit, if INFO = 0, WORK(1) returns the optimal LWORK. LWORK LWORK is INTEGER The dimension of the array WORK. LWORK >= 1, and also: LWORK >= 3*min(M,N) + max( 2*min(M,N), max(M,N), NRHS ) For good performance, LWORK should generally be larger. If LWORK = -1, then a workspace query is assumed; the routine only calculates the optimal size of the WORK array, returns this value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA. There should be a separate allocation for Brhs and the work argument should be passed through to dgelss. The current code passes tmpwork = Ainv; along to dgelss, but we don't know that it's the right size either. Would you be willing to submit a merge request with your best attempt at fixing this. I can help review and we'll get it into the 3.14.1 release? > ? ls->workSize = 5*minwork; /* We can afford to be extra generous */ > > In my example, the used size (maxmn * maxmn) is 81, and the actual size > (ls->workSize) is 75, and therefore valgrind complains. > Is it because I am missing something, or is it a bug ? > > Thanks > > Pierre Seize From salazardetro1 at llnl.gov Wed Sep 30 14:05:13 2020 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Wed, 30 Sep 2020 19:05:13 +0000 Subject: [petsc-users] Adaptive time stepping for implicit solvers In-Reply-To: References: <57C06406-134B-4BCD-80B3-13EA97B6610A@llnl.gov> Message-ID: <5937B524-9CAE-4199-A01A-58D5F9667452@llnl.gov> Thanks. I see the example in petsc (https://www.mcs.anl.gov/petsc/petsc-current/src/ts/tutorials/ex40.c.html) modifies the solution vector in the PostEventFunction(), but there is no straightforward way to modify the RHSFunction(). Can one call TSSetRHSFunction() inside PostEventFunction() and set it for another routine? Maybe one can just change a variable in the context which is also used in the original RHSFunction(). Even if I could, such a sudden change in the RHS can make the current solution oscillate heavily. Will the TS solver adapt the time step to accommodate for this drastic change? Thanks Miguel From: Barry Smith Date: Wednesday, September 30, 2020 at 10:19 AM To: "Salazar De Troya, Miguel" , "Abhyankar, Shrirang G" , Emil Constantinescu Cc: Satish Balay via petsc-users Subject: Re: [petsc-users] Adaptive time stepping for implicit solvers You might consider using the Event feature of TS to manage this. It detects when your "indicator function" changes sign and stops the integration right at that point allowing you to change your ODE or do what ever you need. It allows managing discontinuous changes in the ODE without needing smoothing (that messes up the solution). You would write an indicator function that depends on "solution of heat equation equal to a given parameter" I am including Shri and Emil on this message because they understand the event handling the best (Shri wrote it). Barry On Sep 30, 2020, at 11:58 AM, Salazar De Troya, Miguel via petsc-users > wrote: Hello, I have a heat equation with a right-hand side coefficient that changes sign when a certain condition is met (solution of heat equation equal to a given parameter). I am thinking of modeling the sign change with a smoothed-out Heaviside approximation and let the solver adjust the time step to capture the sudden transition. Given that it is a heat equation, I am interested in using an implicit solver. Which ones in the TS suite also have adaptive capabilities? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 30 14:27:32 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 30 Sep 2020 14:27:32 -0500 Subject: [petsc-users] Adaptive time stepping for implicit solvers In-Reply-To: <5937B524-9CAE-4199-A01A-58D5F9667452@llnl.gov> References: <57C06406-134B-4BCD-80B3-13EA97B6610A@llnl.gov> <5937B524-9CAE-4199-A01A-58D5F9667452@llnl.gov> Message-ID: > On Sep 30, 2020, at 2:05 PM, Salazar De Troya, Miguel wrote: > > Thanks. I see the example in petsc (https://www.mcs.anl.gov/petsc/petsc-current/src/ts/tutorials/ex40.c.html ) modifies the solution vector in the PostEventFunction(), but there is no straightforward way to modify the RHSFunction(). Can one call TSSetRHSFunction() inside PostEventFunction() and set it for another routine? Maybe one can just change a variable in the context which is also used in the original RHSFunction(). Even if I could, such a sudden change in the RHS can make the current solution oscillate heavily. Will the TS solver adapt the time step to accommodate for this drastic change? You should be able to do any of those things to change the function. Since PETSc uses multi-stage methods after the event is detected the new ODE is now solved with an initial time of the located event time so it will just start solving the new ODE "from scratch" and hence won't see the old ODE so I don't think the solution should oscillate (unless your new ODE has large oscillations and needs a much smaller time step), if you need a smaller time step after the event then you can just set the new smaller tilmestep in your post event function. Barry > > Thanks > Miguel > > From: Barry Smith > > Date: Wednesday, September 30, 2020 at 10:19 AM > To: "Salazar De Troya, Miguel" >, "Abhyankar, Shrirang G" >, Emil Constantinescu > > Cc: Satish Balay via petsc-users > > Subject: Re: [petsc-users] Adaptive time stepping for implicit solvers > > > You might consider using the Event feature of TS to manage this. It detects when your "indicator function" changes sign and stops the integration right at that point allowing you to change your ODE or do what ever you need. It allows managing discontinuous changes in the ODE without needing smoothing (that messes up the solution). > > You would write an indicator function that depends on "solution of heat equation equal to a given parameter" > > I am including Shri and Emil on this message because they understand the event handling the best (Shri wrote it). > > Barry > > > >> On Sep 30, 2020, at 11:58 AM, Salazar De Troya, Miguel via petsc-users > wrote: >> >> Hello, >> >> I have a heat equation with a right-hand side coefficient that changes sign when a certain condition is met (solution of heat equation equal to a given parameter). I am thinking of modeling the sign change with a smoothed-out Heaviside approximation and let the solver adjust the time step to capture the sudden transition. Given that it is a heat equation, I am interested in using an implicit solver. Which ones in the TS suite also have adaptive capabilities? >> >> Thanks >> Miguel >> >> Miguel A. Salazar de Troya >> Postdoctoral Researcher, Lawrence Livermore National Laboratory >> B141 >> Rm: 1085-5 >> Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Sep 30 17:16:12 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 30 Sep 2020 18:16:12 -0400 Subject: [petsc-users] Assembling MatSeqAICUSPARSE from kernel In-Reply-To: References: <7601C57A-1E93-4C9E-8694-ED2E3D1806F9@petsc.dev> Message-ID: Hector, It is pretty simple and a bit cumbersome, but see src/mat/tutorials/ex5cu.cu for the example. Mark On Wed, Sep 30, 2020 at 2:32 PM H?ctor Barreiro Cabrera wrote: > This is awesome news! I will go over it and see how can I fit this with my > code. > > Thank you! > > El mi?., 30 sept. 2020 17:55, Barry Smith escribi?: > >> >> Hector, >> >> You asked at just the right time. Mark Adams has started to implement >> some code to do this https://gitlab.com/petsc/petsc/-/merge_requests/3137 it >> is now in the PETSc master branch. It is preliminary and not yet well >> documented but hopefully if you read through the MR (above) it will give >> you the main ideas. >> >> Please feel free to ask questions on this topic at GitLab or on >> petsc-dev at mcs.anl.gov we are eager to have users for this material and >> improve it to make it practical and straightforward. >> >> >> Barry >> >> >> On Sep 30, 2020, at 9:49 AM, H?ctor Barreiro Cabrera >> wrote: >> >> Greetings fellow PETSc users! >> >> I have recently started working on a project that requires assembling and >> solving a large sparse linear system directly on the GPU using CUDA. I have >> just discovered that PETSc has support for building and solving such a >> system through cuSparse (MatSeqAICUSPARSE), which is quite cool because >> this opens up the possibility of using PETSc's awesome functionalities. >> >> However, I have not been able to find information about how (or even if >> it is possible) to assemble the matrix from data stored in device memory. >> From the documentation, it seems that vectors (VecSeqCUDA) expose the raw >> device pointer through VecCUDAGetArrayRead and VecCUDAGetArrayWrite >> functions which I could use to update the systems' RHS vector, but I >> couldn't find anything alike for matrices. Is there anything equivalent? >> >> Ideally I would like to generate all the assembly information from a >> kernel, avoiding any data synchronization with the host (except maybe >> updating the pointers). Is this even possible? >> >> Thank you very much for your help! >> >> Greetings, >> Hector Barreiro >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: